Data Capital: Flying the flag for diversity

Why did you write Queer Data?

The book was published [in January 2022] at a point where people were starting to ask more critical questions about data, the benefits of diversity data and the operation of data systems – and particularly who was counted.

It also coincided closely with the UK census, with new questions on sexual orientation and transgender identity. I had an eye towards the future, thinking hopefully this will be of interest to those that collect, analyse and use identity data.

Were you happy with the response?

I’m thrilled how the book was received by a wide range of audiences. I was very keen for it to speak not just to academics, but also to people working in different fields and disciplines, not just related to data. It’s been exciting to have the opportunity to speak with people in tech, academia, the voluntary sector, public sector, arts and culture, and more.

Also, although the book primarily focuses on the UK, questions about who we count and how we count are relevant across the world.

What are the challenges in terms of historical data collection relating to LGBTQ communities?

My academic training and background is in 20th Century gender history, so the book has a strong historical thread. I find it strange when people speak about data, and don’t consider the history about how data came to be and people’s historical relationships to data collection.

So in the book, and articles since, I’ve tried to highlight how data about LGBTQ communities has a particular history that is often quite negative, quite toxic. When data was collected about people we might now consider LGBTQ, it was often as evidence of criminality, deviance, or psychological maladjustment, really negative topics – if data was collected at all.

I don’t think that historical baggage is factored into how people think about contemporary engagement with data collection or sharing your data in a census, for example. But by overlooking that negative history, we miss a lot.

So a big thread in the book is that data isn’t necessarily objective. Data has a history, a certain politics to it. The book and my broader work shows when we consider tricky questions of data politics, power and history, our engagement with data is far richer.

How do data labels attached to identity characteristics change over time? Why is it important?

This issue comes up a lot, particularly around the question of how we design diversity questions in a census, or in monitoring forms for businesses, for example.

Whether it’s gender, sex, sexuality, race, religion, or nationality, things are always fluid, always changing and evolving. These categories aren’t fixed in time and space.

In recent years, we’ve seen an increasing move to recognise and count these identities in data systems, by employers and governments. These data systems often require things that are fixed and permanent, categorical items.

So a tension is emerging between an increasing need by data systems to count – requiring something binary and categorical – versus the reality of people’s lives and experiences, which are far more messy, exciting, diverse and fluid.

How does LGBTQ data collection in the UK compare to elsewhere?

The UK and Scotland are international trailblazers in the amount of data collected on gender, sex and sexuality and other identity characteristics, partly the product of reporting requirements for the 2010 Equality Act.

In many parts of the world, the idea of collecting any data about LGBTQ communities, the idea of governments asking about your sexual orientation or whether you’re trans, is beyond comprehension.

What about that question of asking people to describe their own identity?

Historically, studies wouldn’t ask somebody to identify their gender. They might just discern that information from somebody’s voice, appearance, hairstyle or name. Increasingly, researchers became aware of the limitations of that approach. Self-identification puts the power back with the individual best-placed to decide how they identify.

So when we share information about ourselves in a staff survey or a census, the assumption is that the person answering the questions is best-placed to answer, rather than some external agent.

However, to exist in the modern world, we share so much data through what we look at, what we turn on, the GPS on our mobile phones.

Many websites, platforms, apps that we use can collect information about our preferences, and package that to make a fairly good guess at our sexuality or gender or race.

These interactions are commercial. When we’re using something like Instagram, Netflix or Amazon, there is the intention of selling us something. Our identity is being captured and categorised in a way that is designed to sell us adverts or products. LGBTQ identities are being understood through this commercial, capitalist prism too.

These companies will say it’s probably more accurate to discern information through behavioural clicks and likes than to ask users to complete a form, because some would try to subvert the process in some way.

In terms of whether or not it improves the material lives of minority communities, whether LGBTQ or others, I don’t think these companies are necessarily designed to advance the interests of minoritised groups.

You recently wrote “data is a battlefield” – what did you mean?

A lot of my work around data is looking at the questions of politics, power and history. It’s a fundamental question of who is counted, who is recognised, and who isn’t – and who does the counting.

When we think about data and administrative practices, we might wrongly assume they’re mundane or bureaucratic. But look at campaigns around social justice, whether LGBTQ rights or other equality work – fights over administrative practices are often a battleground.

In Scotland, there’s been huge interest around the Gender Recognition Reform Bill. I was invited to provide evidence on the bill at the Scottish Parliament around data implications – whether altering the process by which trans people can change the sex marker on their birth certificate has any data ramifications.

My argument was it doesn’t. Fundamentally, the bill is a fairly minor administrative change. But, at the same time, data was being weaponised by critics of the bill to oppose its implementation.

Data has this sheen of being objective, scholarly and above politics, but actually, whether it’s the publication of census results, data on hate crimes or about the experiences of LGBTQ people, it can be weaponised by those who are for and against different arguments. It can serve a lot of political purposes.

Was it right for the latest census to retain a binary question about sex?

In my view, doing so was exclusionary as some people didn’t feel they could answer the question meaningfully. At the same time, there were some inclusive elements to the census – new questions on sexual orientation and transgender identity.

More broadly, things which seem inclusive are actually often only inclusive for some people in minority groups. If you’re a cisgender, gay, white man, there are benefits and you were counted in the most recent census. But what about those who don’t fit neat, narrow boxes? This apparently “inclusive” exercise can actually sometimes push marginalised groups further into the shadows.

A census or a survey is only going to convey one representation of the world. So the question is whether you want that to be a narrow, exclusionary representation, or something that tries to fully capture the messiness and diversity of the society we live in.

My view is that a census should try as best as possible to represent the society we live in, not try to shoehorn people’s lives and experiences into boxes that they don’t quite fit.

You have said certain lives have been “designed out” by the process of how we collect data. What specifically do you mean by that?

It’s about the design of the questions in the census, and the design process more broadly. We don’t go into the back garden and dig up questions for a census in a neatly-packaged box. The questions are designed, tweaked and managed by certain individuals, organisations, and ways of thinking.

A census shines particular lights on certain lives and experiences, while at the same time excluding others. Many identities don’t quite fit the pre-existing rules or expectations of a census, so they are designed out of the process.

Who were the individuals in positions of power to make these calls, to decide who’s in and who’s out? In terms of the census in Scotland, predominantly people who were cisgender and heterosexual. I think it’s important to think through what was omitted or removed as part of that design process.

Can we collect too much data?

Yes, I think it’s a common problem. A few years ago, I was a big advocate for the benefits of data and its ability to change the world. Now I see limitations.

Even fantastic datasets and robust evidence don’t always translate to shifts in power or changing people’s lives and material conditions for the better. Increasingly, I see data doing things which might “talk the diversity talk”, but which actually just scratches the surface of the problem.

Often, when someone is considering how to address a diversity or inclusion issue, they think they need to first diagnose the problem and go out and collect data about it.

After a few months or years, they will have amassed a lot of data. Moving to the next stage, using that data as an evidence base, is the tricky bit. How do you translate the data into meaningful actions that are actually going to resolve the problem? Lots of organisations and institutions get stuck at the data collection stage.

Organisations and employers should be moving on to that next stage – using data for action.

How can we address that?

One thing I’m trying to do in my work is show how data collection can be a stalling technique and actually cause harm to groups who should be the focus of these interventions.

It’s far easier to say we’ll commission a study or spend X conducting more research. But could this time, money, energy and resources be dedicated to actions required to resolve the problem?

There’s no easy solution but I hope my work can expose how data can sometimes function as a jam in the system. Collecting data isn’t, on its own, doing anything to improve the lives and material conditions of LGBTQ people.

Where are you going from here with your work?

I’m interested in the idea that many LGBTQ people are included and recognised by data systems, but in this moment of high visibility, who is still being excluded?

An area which I think also requires further investigation is whether data systems and practices can always be fixed – can we always remedy problems? Are we chasing an impossible dream?

Are these data systems broken or are they operating exactly as designed? Were they designed to exploit, to differentiate, to inflict harm on some groups and communities and to advantage others? Do we need to adopt a more radical approach; to abandon data systems and build something new?

Should people who do diversity and inclusion work, myself included, advocate for reform and repair? Or are we wasting our time and energy perpetuating something that’s ultimately broken? This is an important idea I hope more people engaged in data projects in Scotland will consider.