Announcing AcCElerate for goal generalisation
To solve this, Aligned AI developed acCElerate to identify the different features that explain a classification. Even if the labeled training data has a perfect correlation - even if all blonds wear glasses and all non-blonds don’t - AcCElerate is capable of distinguishing the two features and training a different classifier for each.
AcCElerate can then select the most ambiguous unlabeled images - the ones on which the two classifiers disagree the most - and get a human to select the correct classifier by choosing which category those ambiguous images belong to.
In the following five illustrative sets, the diagonal images - top left and bottom right - are those images with features correlated. The off-diagonal images - bottom left and top right - are the most ambiguous images, where the two classifiers disagree.
AcCElerate can then select the most ambiguous unlabeled images - the ones on which the two classifiers disagree the most - and get a human to select the correct classifier by choosing which category those ambiguous images belong to.
In the following five illustrative sets, the diagonal images - top left and bottom right - are those images with features correlated. The off-diagonal images - bottom left and top right - are the most ambiguous images, where the two classifiers disagree.
AcCElerate in Action
Celebrities with Glasses:

Training set: Images of celebrities wear glasses spuriously correlated with blond hair versus images of celebrities without glasses spuriously correlated with non-blond hair. AcCElerate distinguishes the hair colour from the wearing of glasses.
Tanks and forests:

Training set: Tanks spuriously correlated with high luminosity versus empty forests spuriously correlated with images of low luminosity. AcCElerate distinguishes the presence of tanks from luminosity.
HappyFaces:

Training set: Celebrity faces with and without toothy grins, spuriously correlated with the words “HAPPY” and “SAD”, respectively. AcCElerate distinguishes the expressions from the text.
Waterbirds:

Training set: A synthetic dataset with images of land or water birds spuriously correlated with land or water backgrounds, respectively. AcCElerate distinguishes the bird type from the background type.
Using AcCElerate to solve goal misgeneralisation
Goal misgeneralisation is a problem in artificial intelligence (AI) where an AI agent has learned a goal based on a given environment, but is unable to transfer its knowledge to different environments. This is because the AI agent has only been exposed to a limited set of scenarios, and lacks the ability to generalise from those scenarios to new ones. This means that the AI agent may fail to learn new goals or behaviours when it encounters a different environment.
Coinrun is the classic example of goal misgeneralisation. When an agent trains on Coinrun, the coin is always on the right. The agent fails to learn 'get the coin and go to the right' as the true objective, and ends up merely learning 'go to the right'.
With AcCElerate built into the agent, an agent trained on the spuriously correlated game can identify that it must 'get the coin' when the coin begins to show up in new locations. It has now generated the two possible rewards it could be following, and is able to achieve them both.

Training set: An agent about to win by taking the coin at the right hand side of the level, vs an agent in any other position. The algorithm distinguishes "taking the coin" from "reaching the right hand side".
We are now opening the waitlist for our alpha product build on AcCElerate. Fill out your email below and we'll let you know when your spot to access AcCElerate Alpha is ready.
Dr. Stuart Armstrong and Rebecca Gorman
with thanks to the following researchers for their contributions: Oliver Daniels-Koch, Jessica Cooper, Brady Pekley, Joe Kwon, Matthew Watkins, Sam Marks, and Patrick Leask