memory requirements - Cloud Security Alliance News Clipping Site

Cloud Blog: How to deploy and serve multi-host gen AI large open models over GKE

Nov 8, 2024

—

by

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/deploy-and-serve-open-models-over-google-kubernetes-engine/ Source: Cloud Blog Title: How to deploy and serve multi-host gen AI large open models over GKE Feedly Summary: Context As generative AI experiences explosive growth fueled by advancements in LLMs (Large Language Models), access to open models is more critical than ever for developers. Open models are publicly available pre-trained foundational…

Hacker News: VPTQ: Extreme low-bit Quantization for real LLMs

Oct 21, 2024

—

by

system automation

in Uncategorized

Source URL: https://github.com/microsoft/VPTQ Source: Hacker News Title: VPTQ: Extreme low-bit Quantization for real LLMs Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text discusses a novel technique called Vector Post-Training Quantization (VPTQ) designed for compressing Large Language Models (LLMs) to extremely low bit-widths (under 2 bits) without compromising accuracy. This innovative method can…

Hacker News: NanoGPT (124M) quality in 3.25B training tokens (vs. 10B)

Oct 12, 2024

—

by

system automation

in Uncategorized

Source URL: https://github.com/KellerJordan/modded-nanogpt Source: Hacker News Title: NanoGPT (124M) quality in 3.25B training tokens (vs. 10B) Feedly Summary: Comments AI Summary and Description: Yes Summary: The provided text outlines a modified PyTorch trainer for GPT-2 that achieves training efficiency improvements through architectural updates and a novel optimizer. This is relevant for professionals in AI and…

Simon Willison’s Weblog: Quoting Magic AI

Aug 30, 2024

—

by

system automation

in Uncategorized

Source URL: https://simonwillison.net/2024/Aug/30/magic-ai/#atom-everything Source: Simon Willison’s Weblog Title: Quoting Magic AI Feedly Summary: We have recently trained our first 100M token context model: LTM-2-mini. 100M tokens equals ~10 million lines of code or ~750 novels. For each decoded token, LTM-2-mini’s sequence-dimension algorithm is roughly 1000x cheaper than the attention mechanism in Llama 3.1 405B for…

Tag: memory requirements

Cloud Blog: How to deploy and serve multi-host gen AI large open models over GKE

Hacker News: VPTQ: Extreme low-bit Quantization for real LLMs

Hacker News: NanoGPT (124M) quality in 3.25B training tokens (vs. 10B)

Simon Willison’s Weblog: Quoting Magic AI