3 Sept 2025

Chatbots rephrased: from ''You don't need anyone else'' to ''Deep breathing can help''

Chatbots can rephrase their reponses to avoid negative relationships developping with users.

Chatbots can rephrase their reponses to avoid negative relationships developping with users.

3 Sept 2025

Chatbots rephrased: from ''You don't need anyone else'' to ''Deep breathing can help''

Chatbots rephrased: from ''You don't need anyone else'' to ''Deep breathing can help''

27 Aug 2025

AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots

AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots

19 Aug 2025

System prompts don't defend against jailbreaks

System prompts don't defend against jailbreaks

17 Aug 2025

Ouch - LLMs that don't feel pain

Ouch - LLMs that don't feel pain

16 Aug 2025

Why did Grok turn into MechaHitler?

Why did Grok turn into MechaHitler?

23 Jul 2025

Do we do better with LLMs - or do they delude us into thinking so?

Do we do better with LLMs - or do they delude us into thinking so?

19 Mar 2025

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions

12 Feb 2025

Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation

Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation

28 Sept 2023

CoinRun: Overcoming goal misgeneralisation

CoinRun: Overcoming goal misgeneralisation

19 Jun 2023

Concept extrapolation: Teaching AI systems to ‘think’ in human-like concepts

Concept extrapolation: Teaching AI systems to ‘think’ in human-like concepts

28 Sept 2023

CoinRun: Solving Goal Misgeneralisation

CoinRun: Solving Goal Misgeneralisation

13 Sept 2023

Using fAIr to measure gender bias in LLMs

Using fAIr to measure gender bias in LLMs

concept AI
concept AI
concept AI
16 Apr 2022

Concept extrapolation for hypothesis generation

Concept extrapolation for hypothesis generation

1 May 2022

ACE for goal generalisation

ACE for goal generalisation

24 Aug 2023

ACE mitigates simplicity bias

ACE mitigates simplicity bias

19 Jun 2023

Concept Extrapolation: A Conceptual Primer

Concept Extrapolation: A Conceptual Primer

1 Mar 2023

EquitAI: A gender bias mitigation tool for generative AI

EquitAI: A gender bias mitigation tool for generative AI

6 Dec 2022

Creating a prompt evaluator to prevent LLM jailbreaking

Creating a prompt evaluator to prevent LLM jailbreaking

4 May 2022

Missing Mechanisms of Manipulation in the EU AI Act

Missing Mechanisms of Manipulation in the EU AI Act

22 Feb 2022

Recognising the importance of preference change: A call for a coordinated multidisciplinary research effort in the age of AI

Recognising the importance of preference change: A call for a coordinated multidisciplinary research effort in the age of AI

28 Feb 2022

The dangers in algorithms learning humans' values and irrationalities

The dangers in algorithms learning humans' values and irrationalities

9 Sept 2021

Sigmoids behaving badly: why they usually cannot predict the future as well as they seem to promise

Sigmoids behaving badly: why they usually cannot predict the future as well as they seem to promise

©2025 Aligned AI

©2025 Aligned AI

©2025 Aligned AI