Tag: large language models

  • Simon Willison’s Weblog: Project: VERDAD – tracking misinformation in radio broadcasts using Gemini 1.5

    Source URL: https://simonwillison.net/2024/Nov/7/project-verdad/#atom-everything Source: Simon Willison’s Weblog Title: Project: VERDAD – tracking misinformation in radio broadcasts using Gemini 1.5 Feedly Summary: I’m starting a new interview series called Project. The idea is to interview people who are building interesting data projects and talk about what they’ve built, how they built it, and what they learned…

  • Schneier on Security: Prompt Injection Defenses Against LLM Cyberattacks

    Source URL: https://www.schneier.com/blog/archives/2024/11/prompt-injection-defenses-against-llm-cyberattacks.html Source: Schneier on Security Title: Prompt Injection Defenses Against LLM Cyberattacks Feedly Summary: Interesting research: “Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks“: Large language models (LLMs) are increasingly being harnessed to automate cyberattacks, making sophisticated exploits more accessible and scalable. In response, we propose a new defense…

  • Schneier on Security: Subverting LLM Coders

    Source URL: https://www.schneier.com/blog/archives/2024/11/subverting-llm-coders.html Source: Schneier on Security Title: Subverting LLM Coders Feedly Summary: Really interesting research: “An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection“: Abstract: Large Language Models (LLMs) have transformed code com- pletion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often…

  • Hacker News: Evaluating the World Model Implicit in a Generative Model

    Source URL: https://arxiv.org/abs/2406.03689 Source: Hacker News Title: Evaluating the World Model Implicit in a Generative Model Feedly Summary: Comments AI Summary and Description: Yes Summary: This paper delves into the evaluation of world models implicitly learned by generative models, particularly large language models (LLMs). It highlights the potential limitations and fragilities of these models in…

  • Simon Willison’s Weblog: yet-another-applied-llm-benchmark

    Source URL: https://simonwillison.net/2024/Nov/6/yet-another-applied-llm-benchmark/#atom-everything Source: Simon Willison’s Weblog Title: yet-another-applied-llm-benchmark Feedly Summary: yet-another-applied-llm-benchmark Nicholas Carlini introduced this personal LLM benchmark suite back in February as a collection of over 100 automated tests he runs against new LLM models to evaluate their performance against the kinds of tasks he uses them for. There are two defining features…

  • Hacker News: WebRL: Training LLM Web Agents via Self-Evolving Online Reinforcement Learning

    Source URL: https://arxiv.org/abs/2411.02337 Source: Hacker News Title: WebRL: Training LLM Web Agents via Self-Evolving Online Reinforcement Learning Feedly Summary: Comments AI Summary and Description: Yes Summary: The paper introduces WebRL, a novel framework that employs self-evolving online curriculum reinforcement learning to enhance the training of large language models (LLMs) as web agents. This development is…

  • Hacker News: Generative AI Has an E-Waste Problem

    Source URL: https://spectrum.ieee.org/e-waste Source: Hacker News Title: Generative AI Has an E-Waste Problem Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses a significant increase in private investment in generative AI and its substantial impact on the production of electronic waste (e-waste), particularly focusing on large language models (LLMs). It highlights the…

  • Simon Willison’s Weblog: Claude 3.5 Haiku

    Source URL: https://simonwillison.net/2024/Nov/4/haiku/#atom-everything Source: Simon Willison’s Weblog Title: Claude 3.5 Haiku Feedly Summary: Anthropic released Claude 3.5 Haiku today, a few days later than expected (they said it would be out by the end of October). I was expecting this to be a complete replacement for their existing Claude 3 Haiku model, in the same…