AI & ML News

GeForce GPU giant has been data scraping 80 years' worth of videos every day for AI training to 'unlock various downstream applications critical to Nvidia'

Leaked documents reveal Nvidia's extensive use of YouTube videos, Netflix, and other sources to train an AI model for its Omniverse, autonomous vehicles, and digital avatars. This data scraping operation, detailed by 404 Media, involved an internal project named Cosmos, where Nvidia used virtual PCs on AWS to download over 30 million URLs in a month. Employees discussed copyright concerns, finding ways to avoid direct violations, such as using Google's cloud service to download the YouTube-8M dataset. Nvidia claimed compliance with copyright laws, despite using some datasets intended only for academic purposes for commercial goals. Nvidia isn't alone in this practice, with OpenAI and Runway also accused of using protected material for AI training. Interestingly, Nvidia has faced challenges using gameplay footage from its GeForce Now service due to engineering and regulatory issues. AI models require vast amounts of data, raising questions about the legality of using copyrighted materials and personal data. In the EU, GDPR strictly regulates personal data use, posing potential legal risks for companies like Nvidia. There is a growing need for transparency in AI training practices to ensure accountability and adherence to legal standards.
www.pcgamer.com
www.pcgamer.com
Create attached notes ...