Tag: jailbreaking
-
Hacker News: SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Source URL: https://arxiv.org/abs/2310.03684 Source: Hacker News Title: SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks Feedly Summary: Comments AI Summary and Description: Yes Summary: This text presents “SmoothLLM,” an innovative algorithm designed to enhance the security of Large Language Models (LLMs) against jailbreaking attacks, which manipulate models into producing undesirable content. The proposal highlights a…
-
The Register: Letting chatbots run robots ends as badly as you’d expect
Source URL: https://www.theregister.com/2024/11/16/chatbots_run_robots/ Source: The Register Title: Letting chatbots run robots ends as badly as you’d expect Feedly Summary: LLM-controlled droids easily jailbroken to perform mayhem, researchers warn Science fiction author Isaac Asimov proposed three laws of robotics, and you’d never know it from the behavior of today’s robots or those making them.… AI Summary…
-
The Register: How to jailbreak ChatGPT and trick the AI into writing exploit code using hex encoding
Source URL: https://www.theregister.com/2024/10/29/chatgpt_hex_encoded_jailbreak/ Source: The Register Title: How to jailbreak ChatGPT and trick the AI into writing exploit code using hex encoding Feedly Summary: ‘It was like watching a robot going rogue’ says researcher OpenAI’s language model GPT-4o can be tricked into writing exploit code by encoding the malicious instructions in hexadecimal, which allows an…
-
Slashdot: LLM Attacks Take Just 42 Seconds On Average, 20% of Jailbreaks Succeed
Source URL: https://it.slashdot.org/story/24/10/12/213247/llm-attacks-take-just-42-seconds-on-average-20-of-jailbreaks-succeed?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: LLM Attacks Take Just 42 Seconds On Average, 20% of Jailbreaks Succeed Feedly Summary: AI Summary and Description: Yes Summary: The article discusses alarming findings from Pillar Security’s report on attacks against large language models (LLMs), revealing that such attacks are not only alarmingly quick but also frequently result…
-
The Register: Anthropic’s Claude vulnerable to ’emotional manipulation’
Source URL: https://www.theregister.com/2024/10/12/anthropics_claude_vulnerable_to_emotional/ Source: The Register Title: Anthropic’s Claude vulnerable to ’emotional manipulation’ Feedly Summary: AI model safety only goes so far Anthropic’s Claude 3.5 Sonnet, despite its reputation as one of the better behaved generative AI models, can still be convinced to emit racist hate speech and malware.… AI Summary and Description: Yes Summary:…
-
Slashdot: OpenAI Threatens To Ban Users Who Probe Its ‘Strawberry’ AI Models
Source URL: https://slashdot.org/story/24/09/18/1858224/openai-threatens-to-ban-users-who-probe-its-strawberry-ai-models?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: OpenAI Threatens To Ban Users Who Probe Its ‘Strawberry’ AI Models Feedly Summary: AI Summary and Description: Yes Summary: The text discusses OpenAI’s recent efforts to obscure the workings of its “Strawberry” AI model family, particularly the o1-preview and o1-mini models, which are equipped with new reasoning abilities. OpenAI…