A novel machine-learning model for identification of significant coronary artery disease
Gang Wang1 Ya Gao1 Feng Xu2 Jiali Wang2 Benteng Ma3 Genshan Ma4 Yong Xia3 5 Yuguo Chen2
1.Department of Emergency Medicine, the Second Affiliated Hospital, Xi’an Jiaotong University, Xi’an, China
2.Department of Emergency Medicine, Qilu Hospital, Shandong University, Jinan, China
3.Shaanxi Key Lab of Speech & Image Information Processing (SAIIP), School of Computer Science, Northwestern Polytechnical University, Xi’an, China
4.Department of Cardiology, Zhongda Hospital, Southeast University, Nanjing, China
5.Centre for Multidisciplinary Convergence Computing (CMCC), School of Computer Science, Northwestern Polytechnical University, Xi’an, China
Objectives: Significant coronary artery disease (sCAD) was defined as the presence of ≥50% luminal narrowing in any major epicardial coronary artery. Previous studies showed that patients with sCAD had higher all-cause mortality than patients without sCAD. In addition, coronary angiography (CAG) revealed up to half of patients with suspected CAD to have no significant stenosis in coronary arteries. Therefore, it is important to identify the patients with suspected CAD who are like to have sCAD prior to cardiac catheterizations are carried out. The goal of our study was to establish a bedside prediction model for the early distinction of sCAD using common laboratory test results only in machine learning models.
Methods: Patients with suspected CAD were recruited from the Second Affiliated Hospital, Xi’an Jiaotong University from 2010 to 2014 and undergoing CAG in the cardiac catheterization lab based on international guidelines. Those with disqualified CAG reports were excluded. Then laboratory results and CAG reports were collected for database setup, feature selection, model training and statistical assessment. The size and predictive power of the different lab marker combinations were tested in four classical classifiers including k-Nearest Neighbor, Decision Tree, Random Forest and Support Vector Machine (SVM) using Genetic algorithms. The area under the receiver operating characteristic (AUROC) was used for measuring the performance of the models.
Results: Based on whether major epicardial coronary artery had ≥50% stenosis, 1957 patients were divided into sCAD group (n=1442) and non-sCAD group (n=515). A total of 87 laboratory markers were input into the prediction model. Six types of optimal combinations (T1, T2,…, T6) were obtained, in which the number of selected lab markers ranged from one to six. From T1 to T6, the highest accuracy in each subset was 77.47%, 85.21%, 85.63%, 85.21%, 85.21% and 84.65%, respectively. Eventually, two combinations of three clinical laboratory markers in SVM model reached the highest accuracy (85.63%). Then ROC test revealed that the combination of platelet large cell ratio, prealbumin and cholinesterase presented the better performance in the prediction of sCAD patients with a higher AUROC (0.76).
Conclusions: This is the first study to report that the combination of platelet large cell ratio, prealbumin and cholinesterase in SVM model could distinguish sCAD patients in clinics. It may serve as an early screening approach for evaluating the value of CAG in the future. The machine learning model and our results should be tested in a prospective observational study in the future.