YouTube CEO Neal Mohan has said that using the videos on the platform to train an artificial intelligence (AI) model would be a “clear violation” of YouTube’s terms and conditions after OpenAI’s CTO “didn’t know” whether the tool was trained on YouTube videos.
In an interview with Bloomberg, Mohan made his first public statements on OpenAI’s Sora, which was announced earlier this year to much fanfare and excitement.
“From a creator’s perspective, when a creator uploads their hard work to our platform, they have certain expectations. One of those expectations is that the terms of service is going to be abided by. It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service. Those are the rules of the road in terms of content on our platform.”
In March, the CTO of OpenAI, Mira Murati, talked to The Wall Street Journal about the new generative AI tool that can apparently create videos several minutes long. When asked about the training data for Sora, Murati said, “We used publicly available data and licensed data,” but the CTO did not know if that included content from YouTube, Instagram, and Facebook.
OpenAI has already faced questions about its training data
OpenAI has been opaque when it comes to the training data it uses to create its large language models (LLMs) and other generative AI tools. This has resulted in several lawsuits.
Comedian Sarah Silverman and a collection of other authors are suing OpenAI under California’s unfair competition law, accusing the firm of using copyrighted materials in the training data.
The New York Times has also filed a lawsuit against the AI company for copyright infringement, asserting that OpenAI be held responsible for damages caused by their unlawful use of copyrighted material.
The Wall Street Journal recently reported that OpenAI plans to use YouTube video transcripts to train ChatGPT-5, suggesting that they are not perturbed by the lawsuits.
Featured image credit: Ideogram