jailbreaking - Cloud Security Alliance News Clipping Site

Hacker News: SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

Nov 17, 2024

—

by

Source URL: https://arxiv.org/abs/2310.03684 Source: Hacker News Title: SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks Feedly Summary: Comments AI Summary and Description: Yes Summary: This text presents “SmoothLLM,” an innovative algorithm designed to enhance the security of Large Language Models (LLMs) against jailbreaking attacks, which manipulate models into producing undesirable content. The proposal highlights a…

The Register: Letting chatbots run robots ends as badly as you’d expect

Nov 16, 2024

—

by

system automation

in Uncategorized

Source URL: https://www.theregister.com/2024/11/16/chatbots_run_robots/ Source: The Register Title: Letting chatbots run robots ends as badly as you’d expect Feedly Summary: LLM-controlled droids easily jailbroken to perform mayhem, researchers warn Science fiction author Isaac Asimov proposed three laws of robotics, and you’d never know it from the behavior of today’s robots or those making them.… AI Summary…

The Register: How to jailbreak ChatGPT and trick the AI into writing exploit code using hex encoding

Oct 29, 2024

—

by

system automation

in Uncategorized

Source URL: https://www.theregister.com/2024/10/29/chatgpt_hex_encoded_jailbreak/ Source: The Register Title: How to jailbreak ChatGPT and trick the AI into writing exploit code using hex encoding Feedly Summary: ‘It was like watching a robot going rogue’ says researcher OpenAI’s language model GPT-4o can be tricked into writing exploit code by encoding the malicious instructions in hexadecimal, which allows an…

Hacker News: Show HN: Arch – an intelligent prompt gateway built on Envoy

Oct 15, 2024

—

by

system automation

in Uncategorized

Source URL: https://github.com/katanemo/arch Source: Hacker News Title: Show HN: Arch – an intelligent prompt gateway built on Envoy Feedly Summary: Comments AI Summary and Description: Yes Summary: This text introduces “Arch,” an intelligent Layer 7 gateway designed specifically for managing LLM applications and enhancing the security, observability, and efficiency of generative AI interactions. Arch provides…

Slashdot: LLM Attacks Take Just 42 Seconds On Average, 20% of Jailbreaks Succeed

Oct 13, 2024

—

by

system automation

in Uncategorized

Source URL: https://it.slashdot.org/story/24/10/12/213247/llm-attacks-take-just-42-seconds-on-average-20-of-jailbreaks-succeed?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: LLM Attacks Take Just 42 Seconds On Average, 20% of Jailbreaks Succeed Feedly Summary: AI Summary and Description: Yes Summary: The article discusses alarming findings from Pillar Security’s report on attacks against large language models (LLMs), revealing that such attacks are not only alarmingly quick but also frequently result…

The Register: Anthropic’s Claude vulnerable to ’emotional manipulation’

Oct 12, 2024

—

by

system automation

in Uncategorized

Source URL: https://www.theregister.com/2024/10/12/anthropics_claude_vulnerable_to_emotional/ Source: The Register Title: Anthropic’s Claude vulnerable to ’emotional manipulation’ Feedly Summary: AI model safety only goes so far Anthropic’s Claude 3.5 Sonnet, despite its reputation as one of the better behaved generative AI models, can still be convinced to emit racist hate speech and malware.… AI Summary and Description: Yes Summary:…

Slashdot: OpenAI Threatens To Ban Users Who Probe Its ‘Strawberry’ AI Models

Sep 18, 2024

—

by

system automation

in Uncategorized

Source URL: https://slashdot.org/story/24/09/18/1858224/openai-threatens-to-ban-users-who-probe-its-strawberry-ai-models?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: OpenAI Threatens To Ban Users Who Probe Its ‘Strawberry’ AI Models Feedly Summary: AI Summary and Description: Yes Summary: The text discusses OpenAI’s recent efforts to obscure the workings of its “Strawberry” AI model family, particularly the o1-preview and o1-mini models, which are equipped with new reasoning abilities. OpenAI…

Tag: jailbreaking

Hacker News: SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

The Register: Letting chatbots run robots ends as badly as you’d expect

The Register: How to jailbreak ChatGPT and trick the AI into writing exploit code using hex encoding

Hacker News: Show HN: Arch – an intelligent prompt gateway built on Envoy

Slashdot: LLM Attacks Take Just 42 Seconds On Average, 20% of Jailbreaks Succeed

The Register: Anthropic’s Claude vulnerable to ’emotional manipulation’

Slashdot: OpenAI Threatens To Ban Users Who Probe Its ‘Strawberry’ AI Models