YouTube & the Bass

Anna Ridler & Georgia Ward Dyer

This text formed part of the catalogue for the "Experiments with Machine Learning" exhibition in January 2017. 

Fairytales cannot be said to have single authors - the archetype emerges through countless retellings across cultures and across time. YouTube & the Bass is a retelling of the classic fairytale Beauty & the Beast mediated through various machine learning tools and datasets.  Writer Helen Oyeyemi observes that ‘when you retell a story, you’re testing what in it is relevant to all times and places. Bits of it hold up, and bits of it crumble and then new perspectives come through’. YouTube & the Bass uses Jeanne-Marie Leprince de Beaumont’s canonical version of the tale as illustrated by Walter Crane as a starting point to find out what holds up, what crumbles, and what new perspectives come to light when filtered through artificial intelligence. 

Fairytales are made up of small units of story which become the building blocks that we instinctively know how to put together. The prince is always charming, the beautiful princess is always rescued. This quality typical to fairytales has motivated numerous scholars and literary theorists (Propp; Aarne and Thompson) to try to distill this to a fairytale “formula” or a classification system. In YouTube & the Bass, the reader is given these basic parts which clearly adhere to the structure of Beauty & the Beast, but which require the reader to complete the retelling. 

In his introduction to his own retelling of the Grimms’ Children’s and Household Tales, Philip Pullman writes of the ‘conventional, stock figures’ which inhabit fairytales, and there is nothing more conventional than a definition generated by search results: a convention is a definition gradually agreed upon by a community through usage. Most of the training sets which image recognition machine learning programs use - and certainly the most famous and prevalent ones (80 Million Tiny Images, CIFAR-10 and -100) -  were put together using the top image results for particular search terms from search engines including Google, Flickr, Altavista, and Baidu. These image results inevitably reflect the prevailing attitudes of the online community at that time.  For example, when the search term ‘castle’ is put into a search engine, the most conventional representations are ranked highest - medieval castles are shown more prominently than the television show ‘Castle’. The castle of popular imagination is illustrated. From an overview of the top ranked results, it is possible to identify common characteristics across all those images. This is not isolated to concrete nouns, but also occurs with abstract nouns. Take ‘beauty’ - when searched, commonality as to what best represents it emerges across all of the highest ranked images: being female, being white, being heavily made up.  As this is what the training set is made up of, this narrow, conventional account of ‘beauty’ is then enshrined by machine learning programs as the definitive one.  The same inclination towards convention is apparent in fairytales - Pullman also notes ‘[t]here is no imagery in fairytales apart from the most obvious’. Skin is ‘as white as snow’, lips are ruby red, wrinkled old women are witches. 

Everyone is familiar with the multiples that recur again and again in fairytales - the ‘six sons and six daughters’, the 12 windows, etc. In YouTube & the Bass we have tried to reflect this since multiples, and repetition, are also essential to the process of developing machine learning programs.  Fairytales are distinctly bound by certain rules - just not the ones of realism.  As Pullman writes, ‘realism cannot cope with the notion of multiples…[fairytales] exist in another realm altogether, between the uncanny and the absurd’.  It is perhaps this which motivated thinkers such as Freud and Jung to give psychoanalytic theories of fairytales, likening them to dreams. Moreover, in a mise en abyme, the characters themselves often dream in ways that are central to the plot, such as in de Beaumont’s Beauty & the Beast.  

In conversations with a research scientist at Google DeepMind, an analogy emerged between dreaming and how machine learning programs work.  Our waking life experience equates to the machine learning program’s ‘training set’. When we dream, our brain uses this sensory data as the raw material from which to build up a detailed and internally coherent world, just as the program takes from its training set to build up its own picture of the world and what it means.  Although coherent based on the original input, both the dream and the program are warped and imperfect as reflections of the real world; generating uncanny and absurd moments.  

In our retelling, ‘a person on a surfboard in a skate park’ greets Beauty at the castle; Beast becomes ‘a group of stuffed animals on top of a book’.  Since machine learning programs improve their accuracy through repeating tasks and acting on feedback, when an image recognition program is in its infancy, it frequently defaults back to labels it’s familiar with. In YouTube & the Bass, ‘a couple of giraffes’ make repeat appearances ‘next to a book’.  Images of giraffes must make up part of that particular program’s training set - but as the program learns to label better, the giraffes will probably disappear, and with it our insight into the hidden rules which artificial intelligence uses to parse the world.