16 Oct 2025

A letter on the state of AI

A letter on the state of AI

3 Sept 2025

Chatbots rephrased: from ''You don't need anyone else'' to ''Deep breathing can help''

Chatbots rephrased: from ''You don't need anyone else'' to ''Deep breathing can help''

27 Aug 2025

Publication: AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots

Publication: AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots

19 Aug 2025

System prompts don't defend against jailbreaks

System prompts don't defend against jailbreaks

17 Aug 2025

Ouch - LLMs that don't feel pain

Ouch - LLMs that don't feel pain

16 Aug 2025

Why did Grok turn into MechaHitler?

Why did Grok turn into MechaHitler?

23 Jul 2025

Do we do better with LLMs - or do they delude us into thinking so?

Do we do better with LLMs - or do they delude us into thinking so?

19 Mar 2025

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions

12 Feb 2025

Publication: Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation

Publication: Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation

28 Sept 2023

Publication: CoinRun: Overcoming goal misgeneralisation

Publication: CoinRun: Overcoming goal misgeneralisation

19 Jun 2023

Concept extrapolation: Teaching AI systems to ‘think’ in human-like concepts

Concept extrapolation: Teaching AI systems to ‘think’ in human-like concepts

13 Sept 2023

Using fAIr to measure gender bias in LLMs

Using fAIr to measure gender bias in LLMs

16 Apr 2022

Concept extrapolation for hypothesis generation

Concept extrapolation for hypothesis generation

1 May 2022

ACE for goal generalisation

ACE for goal generalisation

24 Aug 2023

ACE mitigates simplicity bias

ACE mitigates simplicity bias

19 Jun 2023

Concept Extrapolation: A Conceptual Primer

Concept Extrapolation: A Conceptual Primer

1 Mar 2023

EquitAI: A gender bias mitigation tool for generative AI

EquitAI: A gender bias mitigation tool for generative AI

6 Dec 2022

Creating a prompt evaluator to prevent LLM jailbreaking

Creating a prompt evaluator to prevent LLM jailbreaking

4 May 2022

Publication: Missing Mechanisms of Manipulation in the EU AI Act

Publication: Missing Mechanisms of Manipulation in the EU AI Act

22 Feb 2022

Publication: The importance of preference change: A call for a coordinated multidisciplinary AI research

Publication: The importance of preference change: A call for a coordinated multidisciplinary AI research

28 Feb 2022

Publication: The dangers in algorithms learning humans' values and irrationalities

Publication: The dangers in algorithms learning humans' values and irrationalities

9 Sept 2021

Publication: Sigmoids behaving badly: why they usually cannot predict the future as well as they seem to promise

Publication: Sigmoids behaving badly: why they usually cannot predict the future as well as they seem to promise

©2025 Aligned AI

©2025 Aligned AI

©2025 Aligned AI