Tag: model safety
-
The Register: Voice-enabled AI agents can automate everything, even your phone scams
Source URL: https://www.theregister.com/2024/10/24/openai_realtime_api_phone_scam/ Source: The Register Title: Voice-enabled AI agents can automate everything, even your phone scams Feedly Summary: All for the low, low price of a mere dollar Scammers, rejoice. OpenAI’s real-time voice API can be used to build AI agents capable of conducting successful phone call scams for less than a dollar.… AI…
-
METR Blog – METR: BIS Comment Regarding "Establishment of Reporting Requirements for the Development of Advanced Artificial Intelligence Models and Computing Clusters"
Source URL: https://downloads.regulations.gov/BIS-2024-0047-0048/attachment_1.pdf Source: METR Blog – METR Title: BIS Comment Regarding "Establishment of Reporting Requirements for the Development of Advanced Artificial Intelligence Models and Computing Clusters" Feedly Summary: AI Summary and Description: Yes Summary: The text discusses the Bureau of Industry and Security’s proposed reporting requirements for advanced AI models and computing clusters, emphasizing…
-
Hacker News: Sabotage Evaluations for Frontier Models
Source URL: https://www.anthropic.com/research/sabotage-evaluations Source: Hacker News Title: Sabotage Evaluations for Frontier Models Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text outlines a comprehensive series of evaluation techniques developed by the Anthropic Alignment Science team to assess potential sabotage capabilities in AI models. These evaluations are crucial for ensuring the safety and integrity…
-
The Register: Anthropic’s Claude vulnerable to ’emotional manipulation’
Source URL: https://www.theregister.com/2024/10/12/anthropics_claude_vulnerable_to_emotional/ Source: The Register Title: Anthropic’s Claude vulnerable to ’emotional manipulation’ Feedly Summary: AI model safety only goes so far Anthropic’s Claude 3.5 Sonnet, despite its reputation as one of the better behaved generative AI models, can still be convinced to emit racist hate speech and malware.… AI Summary and Description: Yes Summary:…
-
Hacker News: OpenAI and Anthropic agree to send models to US Government for safety evaluation
Source URL: https://venturebeat.com/ai/openai-and-anthropic-agree-to-send-models-to-us-government-for-safety-evaluations/ Source: Hacker News Title: OpenAI and Anthropic agree to send models to US Government for safety evaluation Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses a collaboration between OpenAI, Anthropic, and the AI Safety Institute under NIST focused on AI model safety research and evaluation. This agreement aims…