quantization - Cloud Security Alliance News Clipping Site

Simon Willison’s Weblog: llm-gguf 0.2, now with embeddings

Nov 21, 2024

—

by

Source URL: https://simonwillison.net/2024/Nov/21/llm-gguf-embeddings/#atom-everything Source: Simon Willison’s Weblog Title: llm-gguf 0.2, now with embeddings Feedly Summary: llm-gguf 0.2, now with embeddings This new release of my llm-gguf plugin – which adds support for locally hosted GGUF LLMs – adds a new feature: it now supports embedding models distributed as GGUFs as well. This means you can…

Hacker News: Binary vector embeddings are so cool

Nov 11, 2024

—

by

system automation

in Uncategorized

Source URL: https://emschwartz.me/binary-vector-embeddings-are-so-cool/ Source: Hacker News Title: Binary vector embeddings are so cool Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses binary quantized vector embeddings, emphasizing their ability to retain high accuracy while dramatically reducing storage size for machine learning applications. This topic is particularly relevant for AI and infrastructure security…

Hacker News: SVDQuant: 4-Bit Quantization Powers 12B Flux on a 16GB 4090 GPU with 3x Speedup

Nov 9, 2024

—

by

system automation

in Uncategorized

Source URL: https://hanlab.mit.edu/blog/svdquant Source: Hacker News Title: SVDQuant: 4-Bit Quantization Powers 12B Flux on a 16GB 4090 GPU with 3x Speedup Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The provided text discusses the innovative SVDQuant paradigm for post-training quantization of diffusion models, which enhances computational efficiency by quantizing both weights and activations to…

Hacker News: Tencent drops a 389B MoE model(Open-source and free for commercial use))

Nov 5, 2024

—

by

system automation

in Uncategorized

Source URL: https://github.com/Tencent/Tencent-Hunyuan-Large Source: Hacker News Title: Tencent drops a 389B MoE model(Open-source and free for commercial use)) Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text introduces the Hunyuan-Large model, the largest open-source Transformer-based Mixture of Experts (MoE) model, developed by Tencent, which boasts 389 billion parameters, optimizing performance while managing resource…

Hacker News: An embarrassingly simple approach to recover unlearned knowledge for LLMs

Nov 4, 2024

—

by

system automation

in Uncategorized

Source URL: https://arxiv.org/abs/2410.16454 Source: Hacker News Title: An embarrassingly simple approach to recover unlearned knowledge for LLMs Feedly Summary: Comments AI Summary and Description: Yes Summary: The text focuses on the challenge of “unlearning” in large language models (LLMs), specifically addressing the effectiveness of current unlearning methods in truly erasing unwanted knowledge. It highlights a…

Simon Willison’s Weblog: SmolLM2

Nov 2, 2024

—

by

system automation

in Uncategorized

Source URL: https://simonwillison.net/2024/Nov/2/smollm2/#atom-everything Source: Simon Willison’s Weblog Title: SmolLM2 Feedly Summary: SmolLM2 New from Loubna Ben Allal and her research team at Hugging Face: SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough…

The Register: Apple throws shade on pokey AI PCs, claims its maxed out M4 chips are 4x faster

Oct 31, 2024

—

by

system automation

in Uncategorized

Source URL: https://www.theregister.com/2024/10/31/apple_m4_ai_chip/ Source: The Register Title: Apple throws shade on pokey AI PCs, claims its maxed out M4 chips are 4x faster Feedly Summary: Busy week for Cupertino sees shrunken Mac minis, updated lappies, and new SoCs With the arrival of its M4 silicon on the Mac this week, Apple wants the world to…

The Cloudflare Blog: Building Vectorize, a distributed vector database, on Cloudflare’s Developer Platform

Oct 28, 2024

—

by

system automation

in Uncategorized

Source URL: https://blog.cloudflare.com/building-vectorize-a-distributed-vector-database-on-cloudflare-developer-platform Source: The Cloudflare Blog Title: Building Vectorize, a distributed vector database, on Cloudflare’s Developer Platform Feedly Summary: Vectorize was recently upgraded and made generally available, now supporting indexes of up to 5 million vectors, delivering faster responses, with lower pricing and a free tier. This post dives deep into how we built…

Cloud Blog: AI Hypercomputer software updates: Faster training and inference, a new resource hub, and more

Oct 25, 2024

—

by

system automation

in Uncategorized

Source URL: https://cloud.google.com/blog/products/compute/updates-to-ai-hypercomputer-software-stack/ Source: Cloud Blog Title: AI Hypercomputer software updates: Faster training and inference, a new resource hub, and more Feedly Summary: The potential of AI has never been greater, and infrastructure plays a foundational role in driving it forward. AI Hypercomputer is our supercomputing architecture based on performance-optimized hardware, open software, and flexible…

Hacker News: VPTQ: Extreme low-bit Quantization for real LLMs

Oct 21, 2024

—

by

system automation

in Uncategorized

Source URL: https://github.com/microsoft/VPTQ Source: Hacker News Title: VPTQ: Extreme low-bit Quantization for real LLMs Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text discusses a novel technique called Vector Post-Training Quantization (VPTQ) designed for compressing Large Language Models (LLMs) to extremely low bit-widths (under 2 bits) without compromising accuracy. This innovative method can…

Tag: quantization