.webp)
A video work and downloadable dataset built from thousands of photographs of images found in Victorian and Edwardian encyclopaedias, manually reclassified by the artist. The classification structures built in the nineteenth century to organise knowledge are still audible in contemporary machine learning: when a model learns to identify a plant, a person or a landscape, it inherits the ideological assumptions of the datasets it was trained on, which inherit the assumptions of the datasets before them. Laws of Ordered Form traces that inheritance, collapsing Victorian natural history and contemporary computer vision into a single system running on the same logic. The downloadable dataset allows others to reclassify the images, opening up the usually locked structure of a training set to the same interventions people have always made in encyclopaedias: cutting, rearranging, telling different stories.



Our lives are built around systems of classification: it is how we make sense of the world. Without classification we have no memory, no way to retrieve information that we know. Language is necessary in order for us to pull things back. However, it can be problematic: classification can lead to the illustration and confirmation of beliefs that are racist, sexist or otherwise problematic and is always, at some level, subjective. Borges writes “there is no classification of the Universe not being arbitrary and full of conjectures” and this subjectivity is apparent both in the categories and choices made by those making encyclopedias and those making training sets. In many ways a training set can be seen as a contemporary encyclopedia - both try to describe everything in the world and make decisions about what is important enough to record - and these decisions will inevitably reflect the cultural and social attitudes of the time.There are a number of famous datasets frequently mined for information in research papers, or used to run and test code that are ten, sometimes fifteen years old. They do not reflect the world that we live in now. Once produced, they are very rarely reviewed or updated. There is an assumption that because algorithms and models are using these datasets as benchmarks, they will be constantly refreshed, but this is often not the case. Like an encyclopaedia (a physical material object), it is static and exists as a snapshot of a moment in time. They need to be updated to reflect the new things going on. Some of these canonical datasets are also now very hard to find in their entirety. Despite their importance—both to the machine learning community and to broader society as cultural artefacts—no one is really looking after them, archiving them. ImageNet is now almost impossible to find in its entirety and has been offline for all but 1000 categories for over a year now. The ability to trace back its fifteen years as a dataset—what it might have been in and how it might have impacted other systems - has been lost. These datasets are working objects that degrade and fall apart over time, and need to be cared for, otherwise they will disintegrate. The construction of datasets mirrors that of encyclopaedias—it is anonymous, hidden; all of the labour disappears and becomes invisible—but it does have a key difference. People doing the work of labelling, or finding the imagery for the datasets, tend not to be experts in the field; it is done for the most part by mechanical turkers who are paid very small amounts of money per task and who want to work as rapidly as possible. Imagery tends to come from the Internet, which is one layer of standardisation. Then the images are returned to individuals who decide what image “fits” the term in question. Each time that decision is made, it falls more and more towards the conventional and the options are skewed. This causes datasets to reflect society in the same way that encyclopaedias reflect the people who construct them, and datasets can reinforce cultural stereotypes. People assume that with technology and the Internet, it is easier to have a more inclusive database with a nuanced approach to representation, but all of the choices that occur when constructing the encyclopaedia or the dataset are still there. Data can still be warped, manipulated, ignored or lost, whether it was generated five, fifteen or five hundred years ago.
Commissioned by the Photographers’ Gallery as part of their year-long ‘Data / Set / Match’ digital programme exploring the technical, cultural and social significance of image datasets.
The dataset can be downloaded here: https://thephotographersgallery.org.uk/whats-on/anna-ridler-laws-ordered-form