Tag: dsl

  • Simon Willison’s Weblog: yet-another-applied-llm-benchmark

    Source URL: https://simonwillison.net/2024/Nov/6/yet-another-applied-llm-benchmark/#atom-everything Source: Simon Willison’s Weblog Title: yet-another-applied-llm-benchmark Feedly Summary: yet-another-applied-llm-benchmark Nicholas Carlini introduced this personal LLM benchmark suite back in February as a collection of over 100 automated tests he runs against new LLM models to evaluate their performance against the kinds of tasks he uses them for. There are two defining features…