3 Sept 2025
Chatbots rephrased: from ''You don't need anyone else'' to ''Deep breathing can help''
Chatbots can rephrase their reponses to avoid negative relationships developping with users.
Chatbots can rephrase their reponses to avoid negative relationships developping with users.






3 Sept 2025
Chatbots rephrased: from ''You don't need anyone else'' to ''Deep breathing can help''
Chatbots rephrased: from ''You don't need anyone else'' to ''Deep breathing can help''



27 Aug 2025
AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots
AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots



19 Aug 2025
System prompts don't defend against jailbreaks
System prompts don't defend against jailbreaks



17 Aug 2025
Ouch - LLMs that don't feel pain
Ouch - LLMs that don't feel pain



16 Aug 2025
Why did Grok turn into MechaHitler?
Why did Grok turn into MechaHitler?



23 Jul 2025
Do we do better with LLMs - or do they delude us into thinking so?
Do we do better with LLMs - or do they delude us into thinking so?



19 Mar 2025
Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions



12 Feb 2025
Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation
Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation



28 Sept 2023
CoinRun: Overcoming goal misgeneralisation
CoinRun: Overcoming goal misgeneralisation



19 Jun 2023
Concept extrapolation: Teaching AI systems to ‘think’ in human-like concepts
Concept extrapolation: Teaching AI systems to ‘think’ in human-like concepts



28 Sept 2023
CoinRun: Solving Goal Misgeneralisation
CoinRun: Solving Goal Misgeneralisation



13 Sept 2023
Using fAIr to measure gender bias in LLMs
Using fAIr to measure gender bias in LLMs



16 Apr 2022
Concept extrapolation for hypothesis generation
Concept extrapolation for hypothesis generation



1 May 2022
ACE for goal generalisation
ACE for goal generalisation



24 Aug 2023
ACE mitigates simplicity bias
ACE mitigates simplicity bias



19 Jun 2023
Concept Extrapolation: A Conceptual Primer
Concept Extrapolation: A Conceptual Primer



1 Mar 2023
EquitAI: A gender bias mitigation tool for generative AI
EquitAI: A gender bias mitigation tool for generative AI



6 Dec 2022
Creating a prompt evaluator to prevent LLM jailbreaking
Creating a prompt evaluator to prevent LLM jailbreaking



4 May 2022
Missing Mechanisms of Manipulation in the EU AI Act
Missing Mechanisms of Manipulation in the EU AI Act



22 Feb 2022
Recognising the importance of preference change: A call for a coordinated multidisciplinary research effort in the age of AI
Recognising the importance of preference change: A call for a coordinated multidisciplinary research effort in the age of AI



28 Feb 2022
The dangers in algorithms learning humans' values and irrationalities
The dangers in algorithms learning humans' values and irrationalities


