Research

Research

Research

ACE for goal generalisation

1 May 2022

1 May 2022

1 May 2022

What distinguishes Owen Wilson from Beyoncé? There are a whole host of possible features - age, hair-color, gender, etc... But neural nets will generally focus on the easiest feature they can find; in this instance, maybe the glasses. So, instead of an Owen Wilson-vs-Beyoncé classifier, we may end up with a mere eyeglasses detector.


To solve this, Aligned AI developed ACE to identify the different features that explain a classification. Even if the labeled training data has a perfect correlation - even if all blonds wear glasses and all non-blonds don’t - ACE is capable of distinguishing the two features and training a different classifier for each.


ACE can then select the most ambiguous unlabeled images - the ones on which the two classifiers disagree the most - and get a human to select the correct classifier by choosing which category those ambiguous images belong to.


In the following five illustrative sets, the diagonal images - top left and bottom right - are those images with features correlated. The off-diagonal images - bottom left and top right - are the most ambiguous images, where the two classifiers disagree.


ACE can then select the most ambiguous unlabeled images - the ones on which the two classifiers disagree the most - and get a human to select the correct classifier by choosing which category those ambiguous images belong to.


In the following five illustrative sets, the diagonal images - top left and bottom right - are those images with features correlated. The off-diagonal images - bottom left and top right - are the most ambiguous images, where the two classifiers disagree.


ACE in action

Celebrities with Glasses:

Training set: Images of celebrities wear glasses spuriously correlated with blond hair versus images of celebrities without glasses spuriously correlated with non-blond hair. ACE distinguishes the hair colour from the wearing of glasses.


Tanks and forests:


Training set: Tanks spuriously correlated with high luminosity versus empty forests spuriously correlated with images of low luminosity. ACE distinguishes the presence of tanks from luminosity.


HappyFaces:

Training set: Celebrity faces with and without toothy grins, spuriously correlated with the words “HAPPY” and “SAD”, respectively. ACE distinguishes the expressions from the text.


Waterbirds:

Training set: A synthetic dataset with images of land or water birds spuriously correlated with land or water backgrounds, respectively. ACE distinguishes the bird type from the background type.



Dr. Stuart Armstrong and Rebecca Gorman, with thanks to the following researchers for their contributions: Oliver Daniels-Koch, Jessica Cooper, Brady Pekley, Joe Kwon, Matthew Watkins, Sam Marks, and Patrick Leask

©2024 Aligned AI

©2024 Aligned AI

©2024 Aligned AI