Open-source artificial intelligence (AI), also known as OpenAI, is all about applying open-source practices to create intelligence technologies. It's a way of developing AI resources by sharing and collaborating.
Let me give you simple examples to understand this concept better. It's like different flavors of your favorite ice cream – like the variations in AI tools and tech shared openly by companies.
Usually, big companies keep some AI technologies and innovations to themselves as closed products for a competitive advantage. This may not be the best strategy. Open-source AI tools could outpace the closed ones in the long run.
Large language models, machine translation tools, and those ever-helpful chatbots are the most famous kinds of OpenAI projects. Developers of open-source AI have to have some serious trust in various other open-source software bits and bobs that play a role in the development process. Let's talk a bit more about this topic.
The Origin of Artificial Intelligence: Open Source Community Influence
Open source and artificial intelligence have been growing since their inception. Richard M. Stallman at MIT's Artificial Intelligence Lab in 1971 laid the groundwork for what would become the transformative idea of Free Software and eventually evolve into the powerhouse we know as open source.
Meanwhile, Alan Turing, the father of modern AI, set the stage in 1950 with his groundbreaking paper introducing the Turing Test. Fast forward to the 1980s, when challenges in AI development persisted, but the emergence of crucial elements like Big Data brought new possibilities.
Despite his initial reservations about open source, Bill Gates now recognizes its significance in AI. Hugging Face's Transformer is a go-to open-source library, powering various generative AI programs like ChatGPT and Llama 2.
Noteworthy frameworks, TensorFlow and PyTorch, born from Google and Facebook, respectively, fuel the engines of AI, supporting creations like ChatGPT.
OpenAI, once an open-source advocate, shifted gears, sparking Elon Musk's criticism. True open-source LLMs like Falcon180B exist, but many commercial giants, despite training on open data sources like CommonCrawl, aren't fully embracing open source. A leaked Google document hints at a rising force—the open-source community—quietly challenging the status quo in the Generative AI arms race.
Why AI Should Be Open Source
President Joe Biden's executive order ushered in a wave of new rules for using AI, acknowledging its transformative impact spanning healthcare, education, and entertainment. The big debate in the AI community centers around whether the core principles, algorithms, and datasets should be kept under wraps or adopted as open source.
Advocates for the latter praise significant advantages. When it comes to reducing bias, open-source AI takes the lead, utilizing transparency, audits, and community involvement as powerful tools.
Making AI models openly available for inspection ensures a check against bias in both training data and design. On the flip side, closed-source AI, infamous for biased outcomes, lacks this level of scrutiny.
The advancement of science is another arena where open-source AI shines, fostering collaboration through platforms like Google's TensorFlow and Meta's PyTorch. The influential role of open-source projects like HuggingFace in diverse scientific disciplines further underscores the power of shared knowledge.
Lastly, by creating new standards, open-source AI successes. It focused discussions on ethics, bias, fairness, and transparency, while closed-source AI needs help to keep up with innovation and ethical responsibility.
As an interesting twist, the recent executive order, with its stringent reporting and approval requirements, starkly contrasts the fluid and democratizing nature of open-source AI, which seeks to break the status quo favoring a select few corporations.
Challenging the Notion of 'Open Source' in AI
ChatGPT has basically revolutionized accessibility, bringing powerful AI capabilities within reach for everyone. Notably, Meta did the same thing with the Llama model, which opened doors for modification and reuse.
They released the Llama 2 model, a more robust model available for download, modification, and reuse, while also introducing Llama 2 Code, specifically tailored for coding tasks. This shift aligns with an overarching open-source approach, promoting democratic access, transparency, and enhanced security in AI.
However, as researchers from Carnegie Mellon University, the AI Now Institute, and the Signal Foundation point out, not all models labeled as "open" truly embody openness.
For instance, while Llama 2 is free for modification, it carries restrictions, emphasizing the nuanced nature of licensing in the AI landscape.
The discourse extends to broader challenges in open-source AI, including the secrecy of training data, corporate control of software frameworks like TensorFlow and PyTorch, the high cost of computing power, and the limited availability of human resources for model refinement.
These complexities raise concerns about potential centralized power dynamics and prompt calls for meaningful alternatives and regulatory measures, such as antitrust reforms, to ensure a more inclusive and accessible future for AI, unlocking its full potential while addressing inherent risks.
What characterizes the 'open' in 'OpenAI'?
The tech world is all about openness, but things get a bit hazy when it comes to OpenAI. We've got clear definitions for open source, open research, and open data, but 'open AI' is still a puzzle.
For researchers, it might mean teaming up for collaborative science, while for others, it could be about freely using and sharing AI models. As more jump on the 'open' bandwagon, we realize we need a solid definition for everyone to follow.
Back in 2015, OpenAI set out to create value for everyone, sharing patents as they went. Fast forward to 2019, and they flipped from a non-profit to a "capped" for-profit. GPT-3, their language model, was locked behind a commercial API gate in 2020, with the code exclusively licensed to their big investor, Microsoft.
And now, their DALL·E is taking a commercial turn, waving goodbye to some freebies and welcoming a 'freemium' business model.
What is LLaMA (Large Language Model Meta AI)
The LLaMA (Large Language Model Meta AI) series, introduced by Meta AI in February 2023, comprises large language models (LLMs).
The Open Source Initiative (OSI) challenges the assertion of LLaMA 2 being Open Source, contending that "Meta's licensing terms for the LLaMa models and code fail to meet standard open-source criteria.
Specifically, it imposes limitations on certain users' commercial use and restricts the model and software usage for specific purposes, as outlined in the Acceptable Use Policy."
Over at Google, the Ethical AI team led by Timnit Gebru and Margaret Mitchell changed things in 2021. They dropped a paper highlighting the not-so-great aspects of large language models (LLMs), sparking a fiery debate on ethics in big tech.
Things got messy, and Gebru founded the Distributed AI Research Institute (DAIR), while Mitchell became Chief Ethics Scientist at Hugging Face. Their BigScience workshop, with over 1,000 researchers, birthed BLOOM, a multilingual LLM, now open for download on Hugging Face.
What makes BLOOM shine? It's trained on a mind-boggling 46 human languages and 13 programming languages using a French supercomputer with 28 petaflops. This LLM is a pioneer in building models openly, revealing the magic behind the model creation process.
It's not just for big tech; BLOOM shares the power with communities outside, incorporating their needs through regional working groups.
10 Leading Open Source AI Frameworks and Tools for Developers
In open-source AI with a myriad of frameworks and tools at your disposal. So let me suggest 10 Game-Changing Tools for Intelligent System Development.
- TensorFlow: A comprehensive neural network framework developed by Google, supporting multiple programming languages like Python and JavaScript. Its robust support structure and diverse training resources make it an excellent starting point for AI development.
- PyTorch: Developed by Facebook's AI Research lab, PyTorch is a user-friendly platform preferred for various AI projects, especially in machine learning. Its object-oriented approach streamlines code development for enhanced collaboration.
- Keras: A user-friendly tool designed for quick and easy sharing of the front-end of deep learning models through an Application Programming Interface (API). Ideal for programmers who value a streamlined user interface.
- OpenAI: Pioneering in Natural Language Processing (NLP), OpenAI's Codex model transforms natural language into code in your specified programming language. The platform encourages collaboration by opening its models to the public.
- OpenCV: A renowned AI platform specializing in computer vision, written in C and portable to various platforms. Perfect for AI developers focusing on computer vision applications.
- H2O.ai: Claiming to be the fastest and most accurate AI platform globally, H2O.ai aims to democratize AI accessibility—a solid choice for companies prioritizing development speed to impact the world positively.
- Rasa: An excellent tool for building conversational AI, designed to be future-proof. Rasa allows integration with any NLP or ML model, ensuring increasingly accurate results as technology advances.
- Amazon Web Services (AWS): A familiar platform for developers, offering the ability to run code, store results, and provide value-added features for business marketing. A comprehensive tool for AI development.
- GitHub: The go-to platform for organized collaborative work, essential for individuals or teams working on projects where many hands touch the same code.
- Scikit-Learn: A powerful open-source tool for machine learning and predictive data analysis, built on Python libraries like NumPy and SciPy. Offers a variety of algorithms for tasks such as classification, regression, clustering, and more."
BigScience, in the mix, blurs the lines between AI creators, funders, users, regulators, and the communities feeling the impact. They birthed a next-gen LLM not just with tech wizards but a mix of librarians, lawyers, engineers, and public servants. It's a shift from the old 'move fast, break things' mantra to a new one: 'move together, build the right thing.'
The Turing, the UK's data science and AI hub, is on a mission. Their focus? Best practices for open and responsible AI innovation. They've been in the room with BigScience's Data Governance team, hashing out discussions on how to ensure tech initiatives connect with existing communities and bridge the gap between ethics principles and ethical AI development.