Healthcare Provider Fraud Detection

Overview

I observe myriad recommendations from consulted literature to build an optimally precise classifier of healthcare providers' potential fraudulence. The datasets come from a Kaggle post, and its labels resemble one of the few publicly available sources of labelled provider fraud data, the List of Excluded Individuals/Entities (LEIE).

Please see my full analysis in a Google Colab notebook.

If the notebook of my full analysis is too large, please see these segments into which I've broken it up:

Highlights

The highlights of this work might relate to the following topics, and reside in the corresponding linked notebooks:

Topic of Highlight	Resident Notebook(s)
data integration and feature engineering	Integration I, Feature Engineering, Integration II
exploratory visualization	EDA
correlation-based feature selection	EDA
resampling to mitigate class imbalance	Sampling
hypertuning, testing, and logging numerous models through automated means	Modelling

Recommendations from the Literature

The recommendations I implement from the consulted literature, which mostly concerns approaches to fraud detection in healthcare claim data, include the following:

Feature engineering to aggregate claim and patient data to the provider level. [1]
Correlation-based feature selection to interpret variable importances and drastically reduce training times. [2]
Alleviating class imbalance to a 75-25 ratio. [3]
Classifying with ensemble learning methods, which were described as particularly effective on small samples, as well as those with rebalanced class ratios. [3]

Bibliography

Kumaraswamy, Nishamathi, et al. "Healthcare fraud data mining methods: A look back and look ahead." Perspectives in health information management 19.1 (2022).
Bolón-Canedo, Verónica, et al. "A review of microarray datasets and applied feature selection methods." Information sciences 282 (2014): 111-135.
Herland, Matthew, Richard A. Bauder, and Taghi M. Khoshgoftaar. "Approaches for identifying US medicare fraud in provider claims data." Health care management science 23 (2020): 2-19.

Findings

Assessing the Recommended Techniques

Ultimately, I found these recommendations to be beneficial, overall. Below I briefly comment with my findings for each:

Recommended Technique	My Assessment thereof in this Application
Feature engineering to aggregate to the provider level	So plainly necessary for my application that I cannot consider it as a recommendation taken.
Correlation-based feature selection	Beneficial in interpreting variable importances and drastically reducing training times. Only slightly costed the models' precision scores, usually.
Alleviating class imbalance	Beneficial, as models trained on samples so adjusted were nearly always more precise than their counterparts.
Classifying with ensemble learning methods	Mixed findings on this point. The best models did come from two ensemble learning methods. But the two non-ensemble methods, Naive Bayes and SVM, hit higher heights than Gradient Boosting (an ensemble method.) Also counter to the notion of ensemble advantageousness was a Naive Bayes model handily outperforming the others on the more class-imbalanced datasets.

Optimal Classifier of Potentially Fraudulent Providers

The optimally precise classifier came from using Ada Boosting on the rebalanced, CFS-reduced sample. It scored a precision of 0.864.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
analysis_in_segments		analysis_in_segments
visualizations		visualizations
HPFD_Full_Analysis.ipynb		HPFD_Full_Analysis.ipynb
README.md		README.md
all_visualizations.md		all_visualizations.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Healthcare Provider Fraud Detection

Overview

Highlights

Recommendations from the Literature

Bibliography

Findings

Assessing the Recommended Techniques

Optimal Classifier of Potentially Fraudulent Providers

About

Releases

Packages

Languages

maxruther/HCP_Fraud_Detection

Folders and files

Latest commit

History

Repository files navigation

Healthcare Provider Fraud Detection

Overview

Highlights

Recommendations from the Literature

Bibliography

Findings

Assessing the Recommended Techniques

Optimal Classifier of Potentially Fraudulent Providers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages