Wired: A New Group Is Trying to Make AI Data Licensing Ethical

Source URL: https://www.wired.com/story/dataset-providers-alliance-ethical-generative-ai-licensing/
Source: Wired
Title: A New Group Is Trying to Make AI Data Licensing Ethical

Feedly Summary: The Dataset Providers Alliance calls for creators and rights holders to be able to opt in to having their material used for training purposes.

AI Summary and Description: Yes

Summary: The text discusses the evolving landscape of data sourcing for generative AI, emphasizing the shift from publicly available data to licensed data and the emergence of the Dataset Providers Alliance (DPA), which advocates for an ethical opt-in approach over existing opt-out mechanisms. This development has significant implications for the intersection of AI, copyright, and ethical compliance in the industry.

Detailed Description:
The text outlines a pivotal moment in the generative AI domain, highlighting a shift in how data is sourced and licensed for AI training. Key points include:

– **Transition in Data Sourcing**: Initially, generative AI tools trained on broadly scraped public data, but the landscape is changing as more data sources begin to restrict access and demand licensing agreements. This necessitates the involvement of new licensing startups to facilitate access to data.

– **Formation of the Dataset Providers Alliance (DPA)**: Established to create a more standardized and fair system for data usage in AI. The alliance consists of various companies focusing on licensing, including Rightsify and Calliope Networks.

– **Advocacy for Opt-In System**:
– The DPA recommends an opt-in approach, where data can only be utilized after explicit consent from creators and rights holders.
– This move contrasts starkly with many AI companies that either operate on an opt-out basis or do not offer opt-outs at all, placing the onus on the creator to manage their data.

– **Industry Perspectives**:
– Alex Bestall, CEO of Rightsify, argues that moving to an opt-in framework enhances credibility and reduces legal risk, stating that reliance on publicly available datasets could lead to substantial lawsuits.
– Ed Newton-Rex emphasizes that the opt-out approach is unfair to creators, who may be unaware of their options to opt-out.

– **Challenges Faced**:
– Shayne Longpre raises concerns about the feasibility of implementing an opt-in strategy given the vast amounts of data required for modern AI models, suggesting it might limit data availability and disproportionately affect smaller companies versus large tech giants who can afford to license data comprehensively.

– **DPA’s Position on Licensing Models**: The alliance proposes various compensation structures, such as:
– **Subscription-based licensing**
– **Usage-based licensing** (fee per use)
– **Outcome-based licensing** (royalties tied to profits)

Each of these models aims to ensure fair compensation for data providers across multiple creative fields, including music and visual media.

Overall, this evolution in licensing practices illustrates a critical shift towards ethical data sourcing in AI and raises vital questions regarding compliance, creator rights, and market dynamics in the AI landscape. For security and compliance professionals, these developments underscore the need to navigate the complexities of data governance, copyright compliance, and ethical AI practices moving forward.