/ ABOUT US
/ BLOG
/ ABOUT US
/ BLOG
16 Oct 2025
A letter on the state of AI
A letter on the state of AI
3 Sept 2025
Chatbots rephrased: from ''You don't need anyone else'' to ''Deep breathing can help''
Chatbots rephrased: from ''You don't need anyone else'' to ''Deep breathing can help''
27 Aug 2025
Publication: AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots
Publication: AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots
19 Aug 2025
System prompts don't defend against jailbreaks
System prompts don't defend against jailbreaks
17 Aug 2025
Ouch - LLMs that don't feel pain
Ouch - LLMs that don't feel pain
16 Aug 2025
Why did Grok turn into MechaHitler?
Why did Grok turn into MechaHitler?
23 Jul 2025
Do we do better with LLMs - or do they delude us into thinking so?
Do we do better with LLMs - or do they delude us into thinking so?
19 Mar 2025
Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
12 Feb 2025
Publication: Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation
Publication: Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation
28 Sept 2023
Publication: CoinRun: Overcoming goal misgeneralisation
Publication: CoinRun: Overcoming goal misgeneralisation
19 Jun 2023
Concept extrapolation: Teaching AI systems to ‘think’ in human-like concepts
Concept extrapolation: Teaching AI systems to ‘think’ in human-like concepts
13 Sept 2023
Using fAIr to measure gender bias in LLMs
Using fAIr to measure gender bias in LLMs
16 Apr 2022
Concept extrapolation for hypothesis generation
Concept extrapolation for hypothesis generation
1 May 2022
ACE for goal generalisation
ACE for goal generalisation
24 Aug 2023
ACE mitigates simplicity bias
ACE mitigates simplicity bias
19 Jun 2023
Concept Extrapolation: A Conceptual Primer
Concept Extrapolation: A Conceptual Primer
1 Mar 2023
EquitAI: A gender bias mitigation tool for generative AI
EquitAI: A gender bias mitigation tool for generative AI
6 Dec 2022
Creating a prompt evaluator to prevent LLM jailbreaking
Creating a prompt evaluator to prevent LLM jailbreaking
4 May 2022
Publication: Missing Mechanisms of Manipulation in the EU AI Act
Publication: Missing Mechanisms of Manipulation in the EU AI Act
22 Feb 2022
Publication: The importance of preference change: A call for a coordinated multidisciplinary AI research
Publication: The importance of preference change: A call for a coordinated multidisciplinary AI research
28 Feb 2022
Publication: The dangers in algorithms learning humans' values and irrationalities
Publication: The dangers in algorithms learning humans' values and irrationalities
9 Sept 2021
Publication: Sigmoids behaving badly: why they usually cannot predict the future as well as they seem to promise
Publication: Sigmoids behaving badly: why they usually cannot predict the future as well as they seem to promise
J
o
i
n
t
h
e
M
a
i
l
i
n
g
L
i
s
t
Name
Email
Submit
©2025 Aligned AI
J
o
i
n
t
h
e
M
a
i
l
i
n
g
L
i
s
t
Name
Email
Submit
©2025 Aligned AI
J
o
i
n
t
h
e
M
a
i
l
i
n
g
L
i
s
t
Name
Email
Submit
©2025 Aligned AI