Maximising the value of text-based insurance data

Text-mining is developing into an increasingly useful tool as insurers and other businesses seek to squeeze maximum value from their non-numeric data.

Handling non-numerical data, such as free-form text in claims notifications, is an interesting and challenging area for maximising the value of the information available to insurers. The underlying algorithms use context-relevant lexicons, clustering techniques and various approaches to measuring semantic similarities and meanings.

Our Actuarial team has worked on various approaches to incorporate unstructured text information to generate insights and identify portfolio hot-spots. Clustering procedures group sets of data using an assessment of their semantic similarity. Designing the ‘similarity criteria’ can be judgemental and tends to be specific to industry groups – much like regional dialects. We wouldn’t call it AI – it’s more of a supervised machine learning algorithm.
Text-based analysis can, of course, be time-consuming and is an unfamiliar area to many. However, the field is growing and techniques are improving. The clustering algorithm, coupled with the associated ‘similarity criteria’, significantly expands the range of insurance data that lends itself to sophisticated statistical analysis. The primary focus has been on the narrative data relating to claims, with policyholder and proposal form data a distant second – it is important to include in any analysis the connection between the exposure (policy data) and response variable (claim).

We have found many applications for our algorithms in insurance, including:
  • identifying common themes and trends in the descriptions of claim circumstances and using this information to better evaluate most likely settlement durations;
  • targeting deep-dive reviews based on risk concentrations revealed by text-mining;
  • group claims from policies with similar risk characteristics. When applied properly, it helps overcome data deficiencies where exact risk-mappings are not available;
  • produce more accurate reserve estimates and improve risk capital estimates by better understanding the inherent variability;
  • improved pricing of new business by mining existing data for risks similar to the new business, which aids the underwriter at the quoting stage.
For more information on how can we assist you with data mining, please contact Santiago Restrepo, David Edison or Dewi James.

Leave a comment

 Security code