Synthetic data: the way ahead - Bryn Coulthard

When it comes to data and innovation, the potential is exciting and unlimited. Synthetic data in particular is gaining more attention, as rapid progress in Machine Learning and AI combined with ever-increasing computing power means it’s possible to create higher quality artificial data than ever before.

The power and the rise of synthetic data is evidenced by the growth in market size – Gartner estimates that by 2024, 60 per cent of data for AI applications will be synthetic, and the total of publicly-known funding for synthetic data companies reached $328 million in October 2022 - $275 million more than in 2020.

The synthetic data market is at an interesting inflection point as it moves from ‘hype’ towards mainstream acceptance and greater market understanding of what synthetic data is and how it can ignite and accelerate innovation.

Hide Ad
Hide Ad

At Smart Data Foundry, creating synthetic data designed to solve the problems holding innovators back has been a core focus, aligned to our mission to inspire and accelerate financial innovation, but we know driving awareness of its capabilities – and successes – is key.

Bryn Coulthard, Chief Product and Technology Officer, Smart Data FoundryBryn Coulthard, Chief Product and Technology Officer, Smart Data Foundry
Bryn Coulthard, Chief Product and Technology Officer, Smart Data Foundry

So what is synthetic data and why is it useful? Whilst real-world data contains information about real people, entities, real events and real interactions, synthetic data provides the same accurate and reliable insight, except some or all of the people, entities, events, or interactions are artificial.

This means synthetic data can facilitate cases that would be problematic using real-world data, enable applications that would otherwise be impossible, and augment real-world data to enable analyses that would otherwise be more difficult.

This could include, for example, developing a rapid prototype, training an AI model, or running scenarios on the strategic impact of a new initiative.

We think of three factors when assessing how ‘good’ synthetic data is – fidelity, privacy and utility - how similar the synthetic dataset is to the ‘real-world’ data, the risk of identifying real people, and the ‘usefulness’ of the synthetic data.

We have invested time in comparing the approaches of synthetic data generation, considering the three factors above and our whitepaper on the topic focuses on two primary methods of generation; agent-based modelling and learning-based synthesis, often referred to as synthetic doubles.

Each method has strengths and weaknesses dependent on the problem that needs to be solved.

Agent-based is the more suitable approach if real data doesn’t exist, is biased or incomplete, or you are looking to innovate through collaboration and don’t want to share confidential data.

Hide Ad
Hide Ad

Learning-based approaches use machine learning to make a safe-to-use version of data that an organisation already has, creating a synthetic double of that data.

So what difference can synthetic data actually make? We firmly believe it can unlock the power of data to improve people’s lives, as demonstrated most recently with our work with the Financial Conduct Authority to innovate in the area of Authorised Push Payment Fraud.

And we are passionate about sharing insights gleaned through its application to help people understand its potential, and ultimately help change outcomes for people and organisations for the better.

Bryn Coulthard, Chief Product and Technology Officer, Smart Data Foundry

Smart Data Foundry will at the Innovate Finance Global Summit on 17 & 18th April. To learn more about synthetic data, visit



Want to join the conversation? Please or to comment on this article.