“Suno’s training data includes essentially all music files of reasonable quality that are accessible on the open internet.”

“Rather than trying to argue that Suno was not trained on copyrighted songs, the company is instead making a Fair Use argument to say that the law should allow for AI training on copyrighted works without permission or compensation.”

Archived (also bypass paywall): https://archive.ph/ivTGs

  • Even_Adder@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    0
    ·
    3 months ago

    It should be fully legal because it’s still a person doing it. Like Cory Doctrow said in this article:

    Break down the steps of training a model and it quickly becomes apparent why it’s technically wrong to call this a copyright infringement. First, the act of making transient copies of works – even billions of works – is unequivocally fair use. Unless you think search engines and the Internet Archive shouldn’t exist, then you should support scraping at scale: https://pluralistic.net/2023/09/17/how-to-think-about-scraping/

    Making quantitative observations about works is a longstanding, respected and important tool for criticism, analysis, archiving and new acts of creation. Measuring the steady contraction of the vocabulary in successive Agatha Christie novels turns out to offer a fascinating window into her dementia: https://www.theguardian.com/books/2009/apr/03/agatha-christie-alzheimers-research

    The final step in training a model is publishing the conclusions of the quantitative analysis of the temporarily copied documents as software code. Code itself is a form of expressive speech – and that expressivity is key to the fight for privacy, because the fact that code is speech limits how governments can censor software: https://www.eff.org/deeplinks/2015/04/remembering-case-established-code-speech/

    That’s all these models are, someone’s analysis of the training data in relation to each other, not the data itself. I feel like this is where most people get tripped up. Understanding how these things work makes it all obvious.

    • Petter1@lemm.ee
      link
      fedilink
      English
      arrow-up
      0
      ·
      3 months ago

      I think AI should be allowed ti use any available data, but it has to be made freely available e.g. by making it downloadable on huggingface