Doing Data Better conference: Be wise when using data to predict the future

Schrager, an alumna of the University of Edinburgh and one of the DDI’s first data innovation ambassadors, wrote An Economist Walks Into A Brothel: And Other Unexpected Places To Understand Risk.

“When you are measuring risk, you are making a projection of the future using formulas and a lot of data – to make an estimate of what will happen,” she told the conference. “Risk is looking at all the things that could happen and how probable they are. What informs the probability distribution is data, so we use data to predict the future.

“This raises a lot of problems and risks, and gets us into trouble. Data is always from the past, and always inherently flawed if you try to use it to predict the future. New factors come along and data from the past is less useful.”

She used the example of Ryan Kavanaugh, a financier who seduced movie moguls with “a magic algorithm that can predict what movie is going to make money and which one isn’t”. Kavanaugh was given the “Holy Grail” of Hollywood – the data which studios often kept private, on marketing, sales and other income and expenditure.

“The studios gave him all this glorious data, and for a while it looked like he could predict winners,” said Schrager. “Everyone was making money, and he got a piece of it by running the numbers. But in true Hollywood form, he ended up flying too close to the Sun. After a while, his algorithm stopped working so well, he was picking total duds and eventually went bankrupt.”

Looking back, Schrager said, the problem wasn’t so much the algorithm, but the data, because the world changes. She said that Kavanaugh started in an era where DVD sales were a huge part of profitability, before streaming took hold – and before China became a huge market.

“When you’re using data from the past, it stops becoming very useful or interesting at a certain point,” she said. “You need judgement to know when to adjust your algorithm, or accept it’s a whole new world.”

So how do we make sense of this, she asked, because without data, we have nothing? “It’s all we have to predict the future. We can’t assess risk if we can’t measure it; if we can’t measure it, we can’t manage it. So it’s definitely better than nothing. But there’s also huge costs to relying too much on your data.

“No matter how big, no matter how good your data is, no matter how great your algorithm is, you’re still making projections from the past about the future.”

She concluded with a call to use data wisely: “We have so much great data, and it poses lots of risks if we don’t use it properly. You can’t think, ‘I have so much data, I can predict anything’. There’s always going to be a level of human judgement, you can never get away from. Be thoughtful about what data you’re using, and humble about the limits of its predictive power.”

A digital version of the full conference report can be found here.