Tag: Frontier Models

  • METR Blog – METR: An update on our general capability evaluations

    Source URL: https://metr.org/blog/2024-08-06-update-on-evaluations/ Source: METR Blog – METR Title: An update on our general capability evaluations Feedly Summary: AI Summary and Description: Yes **Summary:** The provided text discusses the development of evaluation metrics for AI capabilities, particularly focusing on autonomous systems. It aims to create measures that can assess general autonomy rather than solely relying…

  • Hacker News: IBM’s new SWE agents for developers

    Source URL: https://research.ibm.com/blog/ibm-swe-agents Source: Hacker News Title: IBM’s new SWE agents for developers Feedly Summary: Comments AI Summary and Description: Yes Summary: IBM has introduced a novel set of AI agents called SWE Agents designed to streamline the bug-fixing process for software developers using GitHub. These agents leverage open LLMs to automate the localization of…

  • Hacker News: Sabotage Evaluations for Frontier Models

    Source URL: https://www.anthropic.com/research/sabotage-evaluations Source: Hacker News Title: Sabotage Evaluations for Frontier Models Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text outlines a comprehensive series of evaluation techniques developed by the Anthropic Alignment Science team to assess potential sabotage capabilities in AI models. These evaluations are crucial for ensuring the safety and integrity…