PRISM Data Scientist Yi Li Wins Data Modeling Award
PRISM’s Data Scientist, Yi Li, won the Actuarial Loss Prediction Competition from Kaggle, a well-known data computation competition platform. The competition’s goal was to use machine learning (AI) as a way to predict the ultimate cost of workers’ compensation claims, on an individual basis. Out of 3,633 submissions to the competition, Li’s data model had the best accuracy rate, 91.3%.
“First I was absolutely excited,” Li said. “Competing with all these talented data scientists across the world.” For each competition, data scientists can submit as many models as wanted. Approximately 3000 models were submitted to this competition.
The competition to look for the ultimate loss in workers’ compensation, ran from mid-December 2020 to April 11, 2021, being extended an extra month because of the high participant interest. As Li put it, ultimate loss and case reserves is a hot topic right now for data scientists. The ability to produce cost predictions can help us determine claim allocations, even those which aren’t in the payout stages.
An interesting feature of Li’s model submission was Negation Detection—an aspect of Natural Language Processing (NLP). By teaching computers to better understand human language, this program looked for words of negation (e.g., not, never, no longer, denied) in claims descriptions to predict the correct body parts involved in a claim. Li’s model was a type of “sliding window” algorithm, which she humbly said was not a complicated one.
Outside of the contest, Li’s work for PRISM includes many data science practices to expedite data preparation and improve data quality, leading to future automation.
One such example is her creation of an algorithm to scour workers’ compensation claims descriptions for job titles to find matching terminology. Dubbed Fuzzy Matching, a simple example would be looking for the job title Risk Manager and finding the word “Risk Mnger” written in shorthand. Li’s algorithm was designed to search for similar words and terms, ranked by degree of variation, misspellings, and abbreviations, enabling data extraction directly from these claims reports.
“I sent this actuarial contest to Yi thinking she might be interested in reviewing how the teams built their development models when the contest was finished,” said John Alltop, PRISM’s Chief Actuary. “Instead, she joined the contest, spent her personal time building almost 40 models over a 3 month period and won against 140 teams! An incredible accomplishment. When Yi first started at PRISM, we discussed producing individual claims development models in 3-5 years. That timeline has now shortened.”
“It’s not so much about winning the competitions, but feeling the passion about what you’re doing,” Li said. “What’s important was the entire journey.”
PRISM is proud of the work being accomplished by Li, and looks forward to implementing her designs into the upcoming Member Dashboard improvements being created by the Data and Analytics team.