Laws of Ordered Form (2020) is a two part video work and a downloadable handmade dataset created by first taking thousands of photographs of images found in Victorian and Edwardian-era encyclopaedias and then manually reclassifying them. The work calls to attention how echoes of historic taxonomies and beliefs can still be heard in modern implementations of machine learning. By collapsing this moment of history with today’s current concerns around dataset bias, the piece emphasises the problems with classification without thought, and consider the histories that remain in our present, even within the latest technologies.
Commissioned by the Photographers’ Gallery as part of their year-long ‘Data / Set / Match’ digital programme exploring the technical, cultural and social significance of image datasets.
Process & Research
Our lives are built around systems of classification: it is how we make sense of the world. Without classification we have no memory, no way to retrieve information that we know. Language is necessary in order for us to pull things back. However, it can be problematic: classification can lead to the illustration and confirmation of beliefs that are racist, sexist or otherwise problematic and is always, at some level, subjective. Borges writes “there is no classification of the Universe not being arbitrary and full of conjectures” and this subjectivity is apparent both in the categories and choices made by those making encyclopedias and those making training sets.
In many ways a training set can be seen as a contemporary encyclopedia - both try to describe everything in the world and make decisions about what is important enough to record - and these decisions will inevitably reflect the cultural and social attitudes of the time. Laws of Ordered Form (2020) is an ongoing project that explores this and how the echoes of historic taxonomies and beliefs can still be heard in modern implementations of machine learning. Its current formats include a downloadable dataset of hundreds of images scanned in from Victorian and Edwardian-era encyclopaedias and a videowork documenting this process. By collapsing this moment of history with today’s current concerns around dataset bias, the project emphasises the problems with classification without thought, and considers the histories that remain in our present, even within the latest technologies.
Part of Laws of Ordered Form looks at the photographs from the books, alongside the captions and fragments of the text about the images, showing how classifications have historically manifested in books. It explores the repetitive, mundane and laborious process of making a dataset - in this case scanning each image and definition from an encyclopaedia and creating a database - and how much time it takes to create these things, referenced in the video piece. The way that I work is very much in opposition to how datasets are normally made, where the work has to be completed as quickly as possible. Instead I'm taking time to consider and look at each picture and definition and then create my own fairly arbitrary categories. Increasingly it is important for the work of making to be surfaced in the final result - not just for the artwork to be the dataset, but the making of the dataset also. A further part of the project recontextualises the images with the new categories that I have created. The specifics of the encyclopaedic entries have changed to new and abstracted terms. Part of the dataset that I made is available to download in a .Zip folder, allowing people to explore and reclassify the images, taking them off the fixed page and allowing the definitions to be more fluid and changing. It plays into what people have always done with encyclopaedias, particularly in the nineteenth century where there is a movement towards these systems of knowledge, cutting, rearranging, and pasting words and pictures to tell new and more relevant stories. If encyclopaedias are designed to record and explain the world around us in some sort of totality, people have always used them to work against that narrative, literally cutting out individual entries and reconfiguring them to be what they want. Ordinarily this option is not readily available for datasets as they are specialist tools that require knowledge, technologies and some level of skill, locked inside different interlocking systems that might not be easily accessed, but opening this up as part of the project, I offer a way of challenging this.
There are a number of famous datasets frequently mined for information in research papers, or used to run and test code that are ten, sometimes fifteen years old. They do not reflect the world that we live in now. Once produced, they are very rarely reviewed or updated. There is an assumption that because algorithms and models are using these datasets as benchmarks, they will be constantly refreshed, but this is often not the case. Like an encyclopaedia (a physical material object), it is static and exists as a snapshot of a moment in time. They need to be updated to reflect the new things going on. Some of these canonical datasets are also now very hard to find in their entirety. Despite their importance—both to the machine learning community and to broader society as cultural artefacts—no one is really looking after them, archiving them. ImageNet is now almost impossible to find in its entirety and has been offline for all but 1000 categories for over a year now. The ability to trace back its fifteen years as a dataset—what it might have been in and how it might have impacted other systems - has been lost. These datasets are working objects that degrade and fall apart over time, and need to be cared for, otherwise they will disintegrate.
The construction of datasets mirrors that of encyclopaedias—it is anonymous, hidden; all of the labour disappears and becomes invisible—but it does have a key difference. People doing the work of labelling, or finding the imagery for the datasets, tend not to be experts in the field; it is done for the most part by mechanical turkers who are paid very small amounts of money per task and who want to work as rapidly as possible. Imagery tends to come from the Internet, which is one layer of standardisation. Then the images are returned to individuals who decide what image “fits” the term in question. Each time that decision is made, it falls more and more towards the conventional, and the options are skewed. This causes datasets to reflect society in the same way that encyclopaedias reflect the people who construct them, and datasets can reinforce cultural stereotypes. People assume that with technology and the Internet, it is easier to have a more inclusive database with a nuanced approach to representation, but all of the choices that occur when constructing the encyclopaedia or the dataset are still there. Data can still be warped, manipulated, ignored, or lost, whether it was generated five, fifteen, or five hundred years ago.