Meta company recently made headlines using public data from Facebook and Instagram to train their cutting-edge Meta AI virtual assistant. But what exactly does this mean, and how does it impact user privacy and the ongoing debate about copyright infringement? Let me break it down for you.
A few weeks back, Meta's CEO, Mark Zuckerberg, announced a new AI assistant tool to help users in daily life, like a normal, weaker AI model.
Meta Platforms took a clear stance on privacy when developing its Meta AI. They consciously decided to exclude private posts shared only among family and friends from their training data.
In an interview with Reuters, Meta's President of Global Affairs, Nick Clegg, emphasized their commitment to respecting consumers' privacy. This privacy-first approach extends to the avoidance of private chats on their messaging services as well.
Clegg highlighted their efforts to filter out private details from the public datasets used for training. He stated, "We've tried to exclude datasets that have a heavy preponderance of personal information."
According to Clegg, the "vast majority" of the data used by Meta for training was publicly available. Notably, they chose to steer clear of LinkedIn, citing privacy concerns, as an example of a website whose content they deliberately avoided.
Meta Platforms, along with tech giants like OpenAI and Alphabet's Google, have faced criticism for using data scraped from the internet without permission to train their AI models.
One of the key challenges these companies face is how to navigate the territory of private or copyrighted materials inadvertently included in their AI systems' training data. They are also confronted with legal battles as authors accuse them of infringing copyrights.
Among Meta Platforms' array of consumer-facing AI tools, Meta AI stands out as the most significant. CEO Mark Zuckerberg introduced this groundbreaking technology at the Meta annual products conference, Connect. Unlike previous conferences that focused on augmented and virtual reality, this year's event was dominated by discussions about artificial intelligence.
Meta AI was created using a custom model based on the powerful Llama 2 large language model, which the company made available for public commercial use in July.
Meta also developed a new large language model called Emu to generate images in response to text prompts. The capabilities of this product extend to generating text, audio, and imagery, along with real-time information access through a partnership with Microsoft's Bing search engine.
The data used to train Meta AI primarily consisted of public Facebook and Instagram posts encompassing text and photos. The image generation aspect of the product, powered by Emu, relied on this dataset. On the other hand, the chat functions were built using Llama 2, supplemented by publicly available and annotated datasets, as confirmed by a Meta spokesperson in a conversation with Reuters.
To ensure the responsible use of their technology, Meta imposed safety restrictions on Meta AI, including a ban on generating photo-realistic images of public figures. However, the issue of copyrighted materials remains a complex and contentious matter.
Clegg acknowledged the potential for a "fair amount of litigation" surrounding the question of whether creative content falls under the existing fair use doctrine, which permits limited use of protected works for purposes such as commentary, research, and parody. He expressed Meta's belief in its stance but anticipates legal challenges in the future.
As the debate around AI, privacy, and copyright continues, numerous authors, artists, and developers have raised concerns about their work being used without their consent to train technologies that could potentially undermine their careers.
Remember, according to Meta's policies, Facebook and Instagram users retain ownership of their content if it doesn't infringe on someone else's intellectual property rights.