18 Feb 2026

Aligned AI comes out of stealth with powerful calm AI: the Canvas digital ecosystem

Aligned AI comes out of stealth with powerful calm AI: the Canvas digital ecosystem

16 Feb 2026

Research Paper: *One* Good Game in 400: LLMs Can Describe Chess Rules But Just Can't Follow Them

Research Paper: *One* Good Game in 400: LLMs Can Describe Chess Rules But Just Can't Follow Them

16 Oct 2025

A letter on the state of AI

A letter on the state of AI

3 Sept 2025

Chatbots rephrased: from ''You don't need anyone else'' to ''Deep breathing can help''

Chatbots rephrased: from ''You don't need anyone else'' to ''Deep breathing can help''

27 Aug 2025

Research Paper: AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots

Research Paper: AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots

19 Aug 2025

System prompts don't defend against jailbreaks

System prompts don't defend against jailbreaks

17 Aug 2025

Ouch - LLMs that don't feel pain

Ouch - LLMs that don't feel pain

16 Aug 2025

Why did Grok turn into MechaHitler?

Why did Grok turn into MechaHitler?

23 Jul 2025

Do we do better with LLMs - or do they delude us into thinking so?

Do we do better with LLMs - or do they delude us into thinking so?

19 Mar 2025

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions

12 Feb 2025

Research Paper: Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation

Research Paper: Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation

28 Sept 2023

Research Paper: CoinRun: Overcoming goal misgeneralisation

Research Paper: CoinRun: Overcoming goal misgeneralisation

19 Jun 2023

Concept extrapolation: Teaching AI systems to ‘think’ in human-like concepts

Concept extrapolation: Teaching AI systems to ‘think’ in human-like concepts

13 Sept 2023

Using fAIr to measure gender bias in LLMs

Using fAIr to measure gender bias in LLMs

16 Apr 2022

Concept extrapolation for hypothesis generation

Concept extrapolation for hypothesis generation

1 May 2022

ACE for goal generalisation

ACE for goal generalisation

24 Aug 2023

ACE mitigates simplicity bias

ACE mitigates simplicity bias

19 Jun 2023

Concept Extrapolation: A Conceptual Primer

Concept Extrapolation: A Conceptual Primer

1 Mar 2023

EquitAI: A gender bias mitigation tool for generative AI

EquitAI: A gender bias mitigation tool for generative AI

6 Dec 2022

Creating a prompt evaluator to prevent LLM jailbreaking

Creating a prompt evaluator to prevent LLM jailbreaking

4 May 2022

Research Paper: Missing Mechanisms of Manipulation in the EU AI Act

Research Paper: Missing Mechanisms of Manipulation in the EU AI Act

22 Feb 2022

Research Paper: The importance of preference change: A call for a coordinated multidisciplinary AI research

Research Paper: The importance of preference change: A call for a coordinated multidisciplinary AI research

28 Feb 2022

Research Paper: The dangers in algorithms learning humans' values and irrationalities

Research Paper: The dangers in algorithms learning humans' values and irrationalities

9 Sept 2021

Research Paper: Sigmoids behaving badly: why they usually cannot predict the future as well as they seem to promise

Research Paper: Sigmoids behaving badly: why they usually cannot predict the future as well as they seem to promise

©2025 Aligned AI

©2025 Aligned AI

©2025 Aligned AI