One of the biggest controversies about the growing use of generative AI is that the companies who train these AI models may be using web data that is copyrighted by content makers. That has led to a number of lawsuits being filed against generative AI companies, including Microsoft, by newspapers, authors, and other media companies who claim Copilot, ChatGPT, and other LLMs are being trained on data they don"t have the right to access.
This week, Mustafa Suleyman, the recently named CEO of Microsoft"s new AI division, was interviewed during the Aspen Ideas Festival. The interview, which was conducted by CNBC"s Andrew Ross Sorkin, was posted on the NBC News YouTube page.
At the 13:34 minute part of the video, Sorkin asked Suleyman about the subject of generative AI taking data from the web and if these models have, in Sorkin"s words, "whether the AI companies have effectively stolen the world"s IP."
Suleyman"s response may not be what web-based content creators may want to hear. He stated:
With respect to content that’s already on the open web, the social contract of that content since the 90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been “freeware,” that’s been the understanding.
There’s a separate category where a website, a publisher, or a news organization has explicitly said do not crawl or scrape me for any other reason than indexing me so that other people can find this content. That’s a grey area, and I think it’s going to work its way through the courts.
Suleyman"s answer seems to suggest that Microsoft, and perhaps other generative AI companies, believe nearly everything on the internet can be used to train their models, and those companies don"t have to compensate the creators of that content. That contention will likely cause even more legal issues to be brought up in the months and years to come.