_story.jpg)
In January, a lawsuit accused Meta of training its AI models on a dataset of pirated ebooks and articles. The unsealed emails have provided new evidence against Meta in a copyright case brought by book authors. The authors have accused Meta of illegally training its AI models on pirated books, allegations now further corroborated by the leaked communications.
The emails show that Meta admitted to torrenting a controversial large dataset known as LibGen, which includes tens of millions of pirated books. According to the authors' court filing, Meta torrented at least 81.7 terabytes of data across multiple shadow libraries through the site Anna's Archive, including at least 35.7 terabytes of data from Z-Library and LibGen. Furthermore, the company had previously torrented 80.6 terabytes of data from LibGen.
The authors described the magnitude of Meta's unlawful torrenting scheme as astonishing, noting that "vastly smaller acts of data piracy—just .008 percent of the amount of copyrighted works Meta pirated—have resulted in Judges referring the conduct to the US Attorneys' office for criminal investigation."
The emails show that Meta employees were also aware of the legal risks of their actions. In April 2023, Nikolay Bashlykov, a research engineer at Meta, wrote that "torrenting from a company laptop doesn't feel right."
In an internal message, Nikolay Bashlykov expresses concern about using Meta IP addresses “to load through torrents pirate content,” and says, “torrenting from a corporate laptop doesn’t feel right.”
By September 2023, Bashlykov had stepped up his protests and consulted with the legal team. "Using torrents would mean 'seeding' the files—i.e., sharing the content outside. This could be legally not OK," he wrote.
Despite these warnings, the authors argue that Meta decided to hide its seeding activities, editing settings so that the smallest amount of seeding possible could occur. The company also allegedly tried to avoid the risk that anyone was able to "trace back the seeder/downloader" from Meta servers by downloading the dataset to non-Meta servers.
Source: Ars Technica
20 Comments - Add comment