The challenges of recording our past through digitisation

The pace of technological change is as fast as it has ever been – but will never be as slow again.

Caption: The National Library of Scotland is digitising about three million items every year, but the process inevitably begets biases. Picture: Sam Wood

As we face up to rapid-fire innovation, there are major implications for data science – but Melissa Terras thinks we must never forget the social and cultural impact of change and innovation.

Terras, Professor of Digital Cultural Heritage at the University of Edinburgh, is co-director of the £10 million Creative Informatics project, which aims to ensure that culture and heritage is not forgotten in these days of turbo-charged change.

“The crucial models that have driven data innovation in the last five to ten years in particular are not doing society good,” says Terras.

“They are not democratic and they are not encouraging good mental health and well-being. They are designed to disrupt our relationships and to get us to spend more money, in ways that can be controlled by major corporations.

“It can’t be all about just using data and technology for surveillance or for buying material things. We have used technology and data in a very narrow way in the last five to ten years and it has become very difficult to build technology for social good. How can the GLAM sector (Galleries, Libraries, Archives & Museums) ensure it is represented in that digital space? It is the custodian of things that people care about and those things have digital counterparts so that people can research, analyse and explore.

“But how do we make them accessible for people? And how can we facilitate and resource digital services that let people explore the past?”

Terras has a very specific interest in data and heritage as she started off in higher education with Ancient Art History and transferred to computer science and then engineering.

“My work has been at the juncture of engineering science and old stuff you want to find more about,” she says. “By pointing computation at heritage collections, you can often learn more about them than the human mind or eye can do, and that increases our understanding of past histories and cultures. There is a lot of work going on in terms of cultural digitisation – for example, the National Library of Scotland is digitising about three million items every year.”

So how do we use our rapidly-developing digital know-how to understand collections and do data-mining in a way that humans alone cannot do? How does digitisation change our relationship with the past and the scope and scale of what we are doing?

“There are so many interesting questions about data in terms of cultural heritage,” says Terras.

“If we create new products and services, it can’t just be about the future – how can those new things help us understand more about our past? What can we do that is useful? There are many different ways to come at it and the Edinburgh Futures Institute is doing lots of work about social good and re-forming conversations about how we engage with questions that are beneficial to society.”

Terras is fascinated by the way that we use and analyse our digital heritage – and how we understand and try to overcome the biases that inevitably come with it.

She says: “We are starting to see that the choices made 10-15 years ago in digitising newspapers, for example, are affecting what researchers can do today and therefore affecting their outputs. There are unintended ethical consequences if we choose to digitise one newspaper and not another one, for example.

“Let’s say that The Times in the Victorian era was the only newspaper digitised for that period and you had no other written sources. About 90 per cent of the source material will relate to men and mainly rich, white Christian men – a very narrow, but powerful slice of society.

“If you use that to activate artificial intelligence, it will be biased because it will think that only stories including these men are important.”

Taking that further, Terras highlights the fact that some bots have been sexist and misogynistic – because the material they are drawing on is sexist and misogynistic.

“These are questions we have to grapple with if we want a fair and egalitarian society. We cannot base our artificial intelligence on historical (or contemporary) records if they are sexist and/or racist,” she says.

So is it possible to select source material that gives a broader and more accurate reflection of history? “The University of Edinburgh has begun digitising historic Court of Session papers which is where you get the real life events – divorces, people suing each other and so on – and that helps us think about how we get that more accurate reflection of history,” says Terras.

“The National Library of Scotland is looking at what it digitises in a way that adds to the range of voices available rather than keeping a narrow, prescriptive view. It’s about looking for gaps in gender, race and language, and adding diversity. Yet this in turn involves its own value judgments and biases.”

Digitising the past is also a major technological challenge, Terras says. “The big shift is away from laptops and desktops to high processing computing. Dealing with a whole server full of data is a new skill and most people have no idea how to do it. How can you tell a computer that you want to search and visualize 100,000 newspaper articles at once?

“And it’s still not easy to transfer big data around. Anything about one terabyte is very hard to move around on the internet.”

Terras is also keen to ensure there is a more equal distribution of access to technology and data - and to information that is meaningful and which people want. She explains: “We are trying to use the Data-Driven Innovation initiative and the City Deal to build a repository that gives access to a lot of data sets, which should include historical data.

“History, literature and culture matter to people – and they should matter in the digital space too.

“How can we build a technological environment around play, creativity and human culture? We must make a space for that. Success would be showing how technology matters in heritage.

What we can achieve is amazing but we have to make a space for knowledge and culture in a world that just wants us to buy more shoes.”

Will we always be scratching the surface in terms of what we digitise because there is so much material out there?

Melissa Terras agrees. “Usually, the data we look at is there because of a partnership, a range of organisations that want to look at that data for a specific reason. Sometimes it is driven by funding, sometimes it’s a really good opportunity. Often it’s a question someone wants answering and we look at how technology can help us to answer it. Everything should be research-led,” she says. “We have some very hi-tech kit and our museums have specific objects – like historic medical devices – that they want to see inside.

“We have billions of items of newspaper content from the Victorian era – digitised so that family historians would pay subscription fees to use it – but what do we do with it? Sometimes it is about scale and scope – if you are working with a larger amount of material, you might find more needles in the haystacks.

“We are really at the infancy of how to apply computing to large-scale digitised historical data. Here in Edinburgh we are starting to build the text-mining infrastructure that will allow us to look at history at scale. This is just part of the type of activity about computing for social good which the Edinburgh Futures Institute will embrace.”