Schneier on Security: AI Industry is Trying to Subvert the Definition of “Open Source AI”

Source URL: https://www.schneier.com/blog/archives/2024/11/ai-industry-is-trying-to-subvert-the-definition-of-open-source-ai.html
Source: Schneier on Security
Title: AI Industry is Trying to Subvert the Definition of “Open Source AI”

Feedly Summary: The Open Source Initiative has published (news article here) its definition of “open source AI,” and it’s terrible. It allows for secret training data and mechanisms. It allows for development to be done in secret. Since for a neural network, the training data is the source code—it’s how the model gets programmed—the definition makes no sense.
And it’s confusing; most “open source” AI models—like LLAMA—are open source in name only. But the OSI seems to have been co-opted by industry players that want both corporate secrecy and the “open source” label. (Here’s one …

AI Summary and Description: Yes

Summary: The text critiques the Open Source Initiative’s definition of “open source AI,” arguing that it compromises true open source principles by allowing for secrecy in training data and model development. It highlights the need for a clear distinction between genuine open source and models that only claim to be open source, advocating for real public AI options that respect privacy and legal restrictions on data sharing.

Detailed Description: The article discusses significant concerns regarding the Open Source Initiative’s (OSI) definition of “open source AI,” emphasizing the following key points:

– **Critique of OSI Definition**: The current definition is viewed as inadequate since it permits secretive training data and mechanisms that contradict the essence of open source principles.

– **Misleading Open Source Models**: Many AI models labeled as “open source,” such as LLAMA, are argued to be open source only in name. The OSI is perceived as being influenced by industry interests prioritizing corporate secrecy over transparency.

– **Advocacy for Public AI**: The author pushes for a genuine public option in AI that is rooted in true open source methodologies.

– **Need for Definitions**: The text acknowledges the existence of partially open models and argues for the establishment of clear definitions to categorize them appropriately.

– **Privacy-Preserving Techniques**: The existence of privacy-preserving, federated learning approaches is highlighted positively as a necessary evolution in AI development practices.

– **Legal and Ethical Considerations**: The text points out that legal frameworks often restrict data sharing, particularly in sensitive areas like healthcare. It emphasizes the need to protect individual rights and indigenous knowledge from exploitation under vague open source definitions.

– **Proposed Terminology**: The author suggests renaming “open source AI” to “open weights” to accurately reflect models that do not fully embrace open source principles.

Overall, the text serves as a call to action for better definitions within the open source community, particularly in the realm of AI, urging vigilance against practices that conflate secrecy with openness. This is particularly relevant for professionals in AI, cloud computing, and compliance who navigate the complexities of data sharing, privacy regulations, and the ethical implications of AI development.