Tiffany Jenkins: Don’t count on big data for answers

“I have been struck by how important measurement is to improving the human condition”. So wrote Bill Gates in his annual letter at the start of this year. Gates is a signed-up supporter of “big data” – the next big thing – about which many claims are being made, but about which we should be more than a little sceptical.

Big data is the massive amounts of data that it is now possible to assemble in our digital world. Advances in technology and improved algorithms mean that what would have taken a roomful of computers more than a week to process, now takes one machine one day, if that, and it can be constantly updated.

In his book The Signal and the Noise, about the data explosion, statistical analyst Nate Silver – who is renowned for his ability to read opinion polls and anticipate elections – imparts an striking estimate from IBM: 2.5 quintillion (17 zeros) new bytes (sequences of eight binary digits that each encode a single character of text in a computer) of data is being created every day. That’s a lot of new data.

The first question is: so what if there is so many quintillions of data? There is no doubt that the input of greater amounts of information and it’s processing, in certain areas, will yield informative results, and results that could not have been predicted. Particular fields – science, economics, business and health have embraced big data, making use of it daily. Facebook and Google take the mass of information that we provide, to suggest friends, purchases and websites back to us.

Elsewhere, the government is looking to use it to monitor tax fraud. In medicine, researchers can better anticipate patients’ susceptibility to diseases. During the American election campaign, Obama’s technology people were able to look at the figures, and in no time at all, identified voters who needed a little encouragement to go to the polls. So far, so good.

But the point about data is that only we – human beings, not machines – can make something of it. Only we can draw conclusions, hypothesis and work out why it matters, or if it matters. Only we can turn data into information that is more than just numbers and zeros. It requires human input to analyse, as well as work out its relevance and application.

The trouble is this human element is often factored out in the claims that are made for big data. Kenneth Cukier, data editor of the Economist and co-author of the forthcoming Big Data: A Revolution That Will Transform How We Live, Work And Think, proclaims that big data heralds “a shift in mindset” and that something new and special is taking place. He argues that we should “let the data speak”, and that human judgment and expertise will be sidelined. Understanding what the data says is important, but understanding why it says what it says is less so, Cukier asserts.

That is the wrong conclusion to draw. There is a creeping data determinism in the arguments made for big data, a deification of data. The data blog of the Guardian newspaper states that “facts are sacred”, but things are not that simple. Facts are not easily established. Conclusions drawn are not always neutral. Interpretations need to be discussed, subjected to analysis and, crucially, debate. It is the latter – wide debate and discussion – that people who use data as their trump card, as incontrovertible evidence to be obeyed, seek to avoid. It fills a void in place of an argument about what could and should be done.

You can see this in the rise of polling and the predicting of elections. Politicians spend more time crunching numbers, looking at graphs and figures, than they do either leading the public or engaging with them. Data isn’t to blame for this, it is used as an avoidance strategy. Data replaces winning a public with monitoring them.

The embrace of big data for certain aspects of our lives has authoritarian implications. Cukier highlights the use of use of big data in fighting crime. Police departments in the US are using records of past criminal activity to predict future incidents. This could be dangerous. For a start, the police should have an idea of where criminal activity occurs. They shouldn’t rely on the computer to tell them. They need also to understand why. This is crucial if anyone is to make a positive difference with the information. Finally, letting the data predict crime that hasn’t yet happened could be dangerous in the hands of the powerful. Are we far away from using such predictions to convict before anything wrong has taken place?

It is important to remember that the state and corporations don’t always have the same interests as you and me. There needs to be informed and active consent when it comes to gathering all this information about us. There are serious threats to our privacy as the state and companies go about tracking our every movement.

The rise of big data is accompanied by the assumption that everything can and should be measured. Even in my field – the arts – arts organisations and artists are increasingly asked to account for their work in numbers. If they don’t, they are not funded. But the value of the arts, the quality of a play or a painting, is not measurable. You could put all sorts of data into a machine: dates, colours, images, box office receipts, and none of it could explain what the artwork is, what it means, and why it is powerful. That requires man, not machine.

Whilst I am sure there is a place for big data, it doesn’t apply to many aspects of human life. It won’t improve the human condition. Love, art, culture and politics are all essential to our lives but big data will tell us little about them. To put it concisely: don’t believe all the data hype. It’s not the answer to the big questions.