Tag: training practices
-
Wired: New York Times Says OpenAI Erased Potential Lawsuit Evidence
Source URL: https://www.wired.com/story/new-york-times-openai-erased-potential-lawsuit-evidence/ Source: Wired Title: New York Times Says OpenAI Erased Potential Lawsuit Evidence Feedly Summary: As part of an ongoing copyright lawsuit, The New York Times says it spent 150 hours sifting through OpenAI’s training data looking for potential evidence—only for OpenAI to delete all of its work. AI Summary and Description: Yes…
-
Slashdot: AI Lab PleIAs Releases Fully Open Dataset, as AMD, Ai2 Release Open AI Models
Source URL: https://news.slashdot.org/story/24/11/16/0326222/ai-lab-pleias-releases-fully-open-dataset-as-amd-ai2-release-open-ai-models?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: AI Lab PleIAs Releases Fully Open Dataset, as AMD, Ai2 Release Open AI Models Feedly Summary: AI Summary and Description: Yes Summary: The text outlines PleIAs’ commitment to open training for large language models (LLMs) through the release of Common Corpus, highlighting the significance of open data for LLM…
-
Simon Willison’s Weblog: Releasing the largest multilingual open pretraining dataset
Source URL: https://simonwillison.net/2024/Nov/14/releasing-the-largest-multilingual-open-pretraining-dataset/#atom-everything Source: Simon Willison’s Weblog Title: Releasing the largest multilingual open pretraining dataset Feedly Summary: Releasing the largest multilingual open pretraining dataset Common Corpus is a new “open and permissible licensed text dataset, comprising over 2 trillion tokens (2,003,039,184,047 tokens)" released by French AI Lab PleIAs. This appears to be the largest available…
-
Hacker News: Data movement bottlenecks to large-scale model training: Scaling past 1e28 FLOP
Source URL: https://epochai.org/blog/data-movement-bottlenecks-scaling-past-1e28-flop Source: Hacker News Title: Data movement bottlenecks to large-scale model training: Scaling past 1e28 FLOP Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The provided text explores the limitations and challenges of scaling large language models (LLMs) in distributed training environments. It highlights critical technological constraints related to data movement both…
-
Wired: OpenAI Scored a Legal Win Over Progressive Publishers—but the Fight’s Not Finished
Source URL: https://www.wired.com/story/opena-alternet-raw-story-copyright-lawsuit-dmca-standing/ Source: Wired Title: OpenAI Scored a Legal Win Over Progressive Publishers—but the Fight’s Not Finished Feedly Summary: A judge tossed out a case against OpenAI brought by Alternet and Raw Story, in what could be a significant ruling in the larger battle between AI companies and publishers. AI Summary and Description: Yes…