Machine learning is continuing to play an important role in the insurance industry. This is most prevalent when looking at its role within insurance risk pricing.
Richard Wilson is the Chief Commercial Officer for Avantia with responsibility for all its pricing, data science and marketing operations. Avantia is a home insurer that trades across price comparison and direct channels through its consumer brand, ‘HomeProtect’, using its own state of the art machine learning platform. Prior to Avantia, Richard worked in a number of pricing and commercial strategy roles across telecoms, ecommerce and payments.
Here Richard discusses machine learning and how it can be effectively used in insurance risk pricing.

There is no doubt that insurance risk pricing is one of the most complex domains of financial process that there is. It presents a uniquely complex set of challenges where the permutations are already at the point that only those using the most sophisticated machine learning and AI techniques will prevail.
Even the sharpest risk modellers have to constantly be on the lookout for new techniques. This is exactly why machine learning is now indispensable in insurance risk pricing. To paraphrase a recently deceased screen icon, you really don’t want to bring a knife to this gunfight.
Complexity within insurance claims
Insurance is about claims and by and large insurers don’t know when they are going to occur or the cost when they do. Most personal line insurers will have a claims frequency of around 5% so risk pricing analysts start from the position that around 95% of the customers they price for will never use the core product.
In theory, if you were able to identify that 5% and price them in such a way as to discourage them from buying from you, then hoover up as much of the remaining 95% as you can, you’d be performing the insurance equivalent of magic.
While this might only be a theoretical possibility, machine learning gives us the opportunity to move furthest towards that goal.
But machine learning all starts with data. Most machine learning and AI techniques are ineffective without a depth of data and more is almost always better in this case.
Data-powered insurance pricing
Data is critical to all businesses but there are few that literally manufacture their prices out of data in the way insurance businesses do. Clean, accurate data fed by highly robust pipelines is an absolute non-negotiable for risk pricing, but even once the data is in situ, data scientists may use machine learning techniques to enforce important relationships in otherwise noisy data.
Inexperienced data scientists can be seduced by the idea that “we just throw all the variables into the model and we get the answer”. Nothing could be farther from the truth. Machine learning and AI in insurance is 70-80% about data and the hallmark of great data science is how clean and robust your data pipelines are.
If you’ve got clean data then a good place to start is the average. If 5% of customers make a claim and on average their claims cost £2000, in theory, you could charge everyone £100 and cover your cost. The problem here is what we in insurance call the ‘anti-selection problem’. Here, if you just charge everyone the average, then you will most likely be too cheap for those who tend to claim and too expensive for those who don’t.
The centrality of machine learning to the pricing process
Over the past 15 years, risk pricing analysts have increasingly tackled this problem using Generalized Linear Models – statistical techniques that enable them to describe the characteristics that explain the likelihood someone will claim or for how much. This helps them create more segmentation, shift prices away from the average, and avoid the risk of anti-selection.
In this endeavour, risk price analysts rigorously work through all the variables and the creative ones may begin to combine variables, creating so-called ‘interactions’. For example, you might suspect that two people living in a bungalow might be different to two people living in a flat. You might then add their age information to explore that hypothesis further and create a new hybrid variable that combines the information available. Each time, you are creating narrower segments and shifting the anti-selection problem onto your competitors.
The variables available to explore those relationships have exploded in the past few years and are now well beyond the point where analysts can manually explore all of them. Some insurers will have hundreds if not thousands of columns of data on a given risk, and processing all that data becomes computationally impossible without machine learning.
Insurance risk pricing is increasingly using machine learning algorithms to detect these relationships, create ever-smaller segments and by doing so trying to better select risk.
Bringing together the many variables
The best risk pricing minds are using hundreds, if not thousands of variables which are updated and controlled constantly by robust data pipelines. These individuals are mixing and matching models whether that be neural networks, gradient boosted machines, random forests or GLMs.
Machine learning models all have different blind spots and combining them can produce ever more powerful ‘ensembles’ of models. By combining models, analysts can create novel solutions to pricing risks where data is sparse. All of this will be working on a platform that enables them to deploy prices into live trading environments.
If those sharp minds have all these resources to hand, then they truly will be bringing a gun to the knife fight.