/ ABOUT US
/ BLOG
/ ABOUT US
/ BLOG
16 Oct 2025
A letter on the state of AI
A letter on the state of AI
3 Sept 2025
Chatbots rephrased: from ''You don't need anyone else'' to ''Deep breathing can help''
Chatbots rephrased: from ''You don't need anyone else'' to ''Deep breathing can help''
27 Aug 2025
AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots
AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots
19 Aug 2025
System prompts don't defend against jailbreaks
System prompts don't defend against jailbreaks
17 Aug 2025
Ouch - LLMs that don't feel pain
Ouch - LLMs that don't feel pain
16 Aug 2025
Why did Grok turn into MechaHitler?
Why did Grok turn into MechaHitler?
23 Jul 2025
Do we do better with LLMs - or do they delude us into thinking so?
Do we do better with LLMs - or do they delude us into thinking so?
19 Mar 2025
Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
12 Feb 2025
Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation
Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation
28 Sept 2023
CoinRun: Overcoming goal misgeneralisation
CoinRun: Overcoming goal misgeneralisation
19 Jun 2023
Concept extrapolation: Teaching AI systems to ‘think’ in human-like concepts
Concept extrapolation: Teaching AI systems to ‘think’ in human-like concepts
28 Sept 2023
CoinRun: Solving Goal Misgeneralisation
CoinRun: Solving Goal Misgeneralisation
13 Sept 2023
Using fAIr to measure gender bias in LLMs
Using fAIr to measure gender bias in LLMs
16 Apr 2022
Concept extrapolation for hypothesis generation
Concept extrapolation for hypothesis generation
1 May 2022
ACE for goal generalisation
ACE for goal generalisation
24 Aug 2023
ACE mitigates simplicity bias
ACE mitigates simplicity bias
19 Jun 2023
Concept Extrapolation: A Conceptual Primer
Concept Extrapolation: A Conceptual Primer
1 Mar 2023
EquitAI: A gender bias mitigation tool for generative AI
EquitAI: A gender bias mitigation tool for generative AI
6 Dec 2022
Creating a prompt evaluator to prevent LLM jailbreaking
Creating a prompt evaluator to prevent LLM jailbreaking
4 May 2022
Missing Mechanisms of Manipulation in the EU AI Act
Missing Mechanisms of Manipulation in the EU AI Act
22 Feb 2022
Recognising the importance of preference change: A call for a coordinated multidisciplinary research effort in the age of AI
Recognising the importance of preference change: A call for a coordinated multidisciplinary research effort in the age of AI
28 Feb 2022
The dangers in algorithms learning humans' values and irrationalities
The dangers in algorithms learning humans' values and irrationalities
9 Sept 2021
Sigmoids behaving badly: why they usually cannot predict the future as well as they seem to promise
Sigmoids behaving badly: why they usually cannot predict the future as well as they seem to promise
J
o
i
n
t
h
e
M
a
i
l
i
n
g
L
i
s
t
Name
Email
Submit
©2025 Aligned AI
J
o
i
n
t
h
e
M
a
i
l
i
n
g
L
i
s
t
Name
Email
Submit
©2025 Aligned AI
J
o
i
n
t
h
e
M
a
i
l
i
n
g
L
i
s
t
Name
Email
Submit
©2025 Aligned AI