Cloud Security Alliance News Clipping Site

Tag: data licensing

Simon Willison’s Weblog: Releasing the largest multilingual open pretraining dataset

Nov 14, 2024

—

by

system automation

in Uncategorized

Source URL: https://simonwillison.net/2024/Nov/14/releasing-the-largest-multilingual-open-pretraining-dataset/#atom-everything Source: Simon Willison’s Weblog Title: Releasing the largest multilingual open pretraining dataset Feedly Summary: Releasing the largest multilingual open pretraining dataset Common Corpus is a new “open and permissible licensed text dataset, comprising over 2 trillion tokens (2,003,039,184,047 tokens)" released by French AI Lab PleIAs. This appears to be the largest available…
Wired: This Startup Wants YouTube Creators to Get Paid for AI Training Data

Sep 30, 2024

—

by

system automation

in Uncategorized

Source URL: https://www.wired.com/story/license-to-scrape-youtube-ai-data-license-creators/ Source: Wired Title: This Startup Wants YouTube Creators to Get Paid for AI Training Data Feedly Summary: While big platforms like Reddit have signed deals with the AI giants, YouTube leaves licensing in the hands of individual creators. The “License to Scrape” program aims to give those streaming stars proper leverage. AI…