On 30 March 2022, the UK’s Financial Conduct Authority (FCA) issued a Call For Input regarding the use of ‘synthetic’ data in supporting and promoting financial services innovation. The deadline for responses is 22 June 2022.
What is synthetic data?
Synthetic data is artificially generated data, as opposed to data that has been generated by real-world events. It simulates real data without identifying specific individuals – thus materially reducing data privacy risks – is often more cost efficient than using real data, and can plug gaps where real data is limited or does not exist. Synthetic data is created algorithmically by observing patterns and the statistical properties of real data, and then replicating these patterns within the synthetic dataset. Among other industries, the FCA has observed an increase in experimentation with synthetic data across sectors such as automotive and robotics, healthcare and medicine, logistics, and government national statistics offices for policy making and research. In the financial services sector, the FCA itself has explored the use of synthetic datasets to test financial crime controls (as noted in its Business Plan 2022/23).
Why is the FCA interested in synthetic data?
The FCA is of the view that access to large volumes of reliable financial data plays an important role in driving forward innovation in financial services which, in turn, is important in driving competition. The FCA points in particular to advances in AI and machine learning, which it considers underpin new innovations in areas such as financial crime and fraud prevention, customer engagement, credit scoring, sales and trading, insurance pricing and claims management, and asset management and portfolio optimisation.
The FCA says that new market entrants face inherent challenges in obtaining access to such crucial but highly sensitive financial datasets, which (given the nature of the data) are subject to strict data privacy laws, whereas market incumbents with “large data harvesting capabilities” have much greater access to such data. The FCA is concerned that this is resulting in a “widening data gap” that is preventing new market entrants from developing technology and strategies to compete with incumbents. Identifying ways in which market entrants can access a wider pool of data is very much in line with the direction of travel from the FCA, following the perceived policy success of open banking. (For an overview of open banking in the UK, take a look at this post, and you can hear our thoughts here.)
What input is the FCA seeking?
The FCA is seeking industry views on the extent to which the use of ‘synthetic’ data could represent a solution to these issues, with queries broadly focusing on the following areas:
- The benefits of synthetic data, which the FCA sees as data privacy, cost efficiency and to plug the gap where real-world data is limited or does not exist.
- The approach to generating synthetic data, in particular, with a focus on privacy considerations arising out of any specific approaches to generating such data.
- The use cases of synthetic data in financial services innovation and the type and quality of data needed for those use cases. By way of example, the FCA observes that, during its Digital Sandbox Sustainability cohort, certain firms requested synthetic ESG-related data in order to train and develop algorithms designed to identify cases of ‘greenwashing’.
- The potential risks and limitations of synthetic data, including the possibility that it could be reverse engineered to reveal ‘real-world’ information, the risk of bias and risks arising from the processing of real-world data to generate synthetic datasets.
The role of the regulator
The FCA recognises that it, and other regulators and public bodies, could also benefit from synthetic data, for example, by creating quality, shareable datasets that will allow the co-creation of SupTech (Supervisory Technology) between regulators.
The FCA also considers that it could have a role in generating, hosting and sharing synthetic data and therefore seeks to understand:
- the appetite of firms to co-operate with the FCA and other organisations, for example by providing sample real data as an input into the generation process, as well as synthetic data generation expertise to ensure that the synthetic data produced is of optimal quality; and
- the extent to which organisations would be interested in using synthetic data, how often they would use it, and the volume of datasets they would require to realise significant benefits from synthetic data sharing.
Key takeaways
Synthetic data adoption is currently very low, with techniques for data generation still being developed and ongoing research into the level of privacy actually afforded by synthetic data (for example, owing to the possibility that such data could be reverse engineered to reveal real-world information). However, a key feature of the FCA’s recent communications is the pursuit of its competition objective. Having observed the utility of synthetic data through its Digital Sandbox initiative, the FCA has identified synthetic data as a possible route to improving innovation in the financial services sector and, as the FCA observes, “We see innovation as a vital component of effective competition”.
While regulatory support in generating, hosting and sharing synthetic data is therefore likely to be welcomed by market entrants, it may also benefit incumbents – for example, by allowing them to access a wider pool of data beyond that generated by their own customers, and also to use datasets with a lower personal data-related compliance burden and reduced fear of data breaches (and the inevitable fines and follow-on litigation).
What now remains to be seen is the extent to which synthetic data will ultimately be able to represent a high-quality substitute for real world data and, to the extent it falls short, what the potential impact of this will be on consumer outcomes and how the FCA might then intervene. The perceived benefits of generating greater competition between market incumbents and new entrants through the use of synthetic data and increased data privacy must be carefully balanced against the potential risks that arise from the creation and use of synthetic data, for example, should the data transpire to be of an inadequate quality or lead to confirmation bias.