Primary Menu

Education, Events, Publication

Funding & Recognition

Validation of Clinical Data Mining Algorithms for the Extraction of Diabetes and Other Metabolic Disease Diagnoses from Electronic Medical Records

Semester: Summer 2023


Presentation description

For this study, we are validating a clinical data mining algorithm for the extraction of patient clinicodemographic data related to cancer and metabolic diseases. 100 patient records were randomly selected for data extraction using the random sample function in R. Using these records, we extracted select data elements related to diabetes, prediabetes, hypertension, hypertriglyceridemia, and low high-density lipoprotein (HDL) diagnoses, which were collected using REDCap, a self-service web-based for creating and managing databases that is subsidized for University of Utah research needs. The data elements included ICD codes, biomarker tests, medication usage, and medical history via clinician notes for each diagnosis. Two student researchers separately extracted the data elements for all 100 patients, and statistics including percent agreement (p0), expected agreement (pe), and Cohen's Kappa (k) were computed. There was a 98.16% agreement between both abstractors for all variables extracted, and a Cohen's Kappa value of 0.962 (p0=98.16%, pe=0.51 k=0.962), suggesting a near perfect agreement. Overall, 92% of patients had at least one element related to diabetes, with the most common being an elevated fasting blood glucose or HbA1c (88%). 47% of patients had at least one element related to prediabetes, with the most common being a history recorded in provider clinical notes (43%). 98% of patients had at least one element related to hypertension, with the most common being elevated blood pressure measurements (94%). Lastly, 88% of patients had at least one element related to dyslipidemia, with the most common being a history of high triglycerides (80%). So far, we have observed high agreement in manual data extraction as well as high prevalence of diabetes, hypertension, and dyslipidemia in our dataset. The next stage in this investigation is to calculate the agreement between manual data extraction and clinical data mining methods.

Presenter Name: Emon Parry

Presentation Type: Poster
Presentation Format: In Person
Presentation #95
College: Medicine
School / Department: Population Health Sciences
Research Mentor: Maci Winn
Date | Time: Thursday, Aug 3rd | 10:30 AM