|
|
||||||||
J Thorac Cardiovasc Surg 2006;132:12-19
© 2006 The American Association for Thoracic Surgery
Surgery for Acquired Cardiovascular Disease |
a Department of Cardiothoracic Surgery, Lund University, Lund, Sweden
b Department of Theoretical Physics, Lund University, Lund, Sweden
c Competence Centre for Clinical Research, Lund University Hospital, Lund, Sweden
d Papworth Hospital, Cambridge, UK
Received for publication September 1, 2005; revisions received December 19, 2005; accepted for publication December 29, 2005. * Address for reprints: Dr Johan Nilsson, Department of Cardiothoracic Surgery, Heart and Lung Center, Lund University Hospital, SE 221 85 LUND, Sweden. (Email: johan.nilsson{at}thorax.lu.se).
| Abstract |
|---|
|
|
|---|
METHODS: Prospectively collected data from 18,362 patients undergoing cardiac surgery at 128 European institutions in 1995 (the European System for Cardiac Operative Risk Evaluation database) were used. Models to predict the operative mortality were constructed using artificial neural networks. For calibration a sixfold cross-validation technique was used, and for testing a fourfold cross-testing was performed. Risk variables were ranked and minimized in number by calibrated artificial neural networks. Mortality prediction with 95% confidence limits for each patient was obtained by the bootstrap technique. The area under the receiver operating characteristics curve was used as a quantitative measure of the ability to distinguish between survivors and nonsurvivors. Subgroup analysis of surgical operation categories was performed. The results were compared with those from logistic European System for Cardiac Operative Risk Evaluation analysis.
RESULTS: The operative mortality was 4.9%. Artificial neural networks selected 34 of the total 72 risk variables as relevant for mortality prediction. The receiver operating characteristics area for artificial neural networks (0.81) was larger than the logistic European System for Cardiac Operative Risk Evaluation model (0.79; P = .0001). For different surgical operation categories, there were no differences in the discriminatory power for the artificial neural networks (P = .15) but significant differences were found for the logistic European System for Cardiac Operative Risk Evaluation (P = .0072).
CONCLUSIONS: Risk factors in a ranked order contributing to the mortality prediction were identified. A minimal set of risk variables achieving a superior mortality prediction was defined. The artificial neural network model is applicable independent of the cardiac surgical procedure.
| Introduction |
|---|
|
|
|---|
Most risk scoring systems have been created using a biostatistical method based on a generalized linear model with assumptions of linear relationship. Artificial neural networks (ANNs) work in a nonlinear fashion, which may better describe the interaction between health risk factors. ANNs have been used in classification and diagnostic prediction of cancer
6
and electrocardiogram interpretation,
7
among others. Some studies in clinical medicine have demonstrated superiority of the classification or prediction by ANNs compared with other statistical models.
8
In the field of cardiac surgery, only a few studies using ANNs have been published, and the results have been ambiguous.
9-14
To select risk variables for a model, significance testing (P values) is the most common methodology, but this does not assess the importance of the individual variable.
15
On the other hand, ANNs may be used for both variable selection and ranking of individual variables in order of importance.
15
For example, this methodology has been employed to select and minimize a large number of gene expression levels used in cancer classification, with excellent results.
16
This study aimed to systematically evaluate the accuracy and performance of ANNs to select and rank the most important risk factors for operative mortality in cardiac surgery by using high-performance computer clusters.
| Methods |
|---|
|
|
|---|
Patients and Study Design
From the 97 original EuroSCORE variables, a subset of 72 variables was selected (Tables 1 and 2).
This was done by excluding variables closely linked to other variables and data collected intraoperatively (ie, number of conduits and number of distal coronary anastomoses). Patients with a missing value in any mandatory variable (age, gender, or surgical procedure) or outcome (operative mortality) were excluded from analysis. Imputation was used to substitute missing values in the other variables with the statistical mode for categorical variables and the mean for continuous variables.
11
|
|
|
Statistical Analysis
Mean values (± standard deviation [SD]) were used to describe continuous variables, and frequencies were calculated for categorical variables. Logistic regression analysis was performed to obtain the coefficients for the risk variables included in the logistic model as described by Hosmer and Lemeshow.
20
To compare the number of correctly classified patients by ANNs versus the logistic EuroSCORE, a proportion test was used. Effective odds ratio for the risk variables were determined as described by Lippmann and Shahian.
11
The 95% confidence intervals (CIs) for both the odds ratio and the output from the ANNs were calculated using the bootstrap technique.
11,21
Receiver operating characteristics (ROC) curves were used to describe the performance and predictive accuracy for the models.
22
The area (with 95% CI) under the curve was used as a quantitative measure of the ability of the risk predictor models to distinguish between survivors and nonsurvivors. To compare the areas under the resulting ROC curves, the nonparametric approach described by DeLong and coworkers
23
was used.
Computer Cluster and Software
High-performance computing clusters were used to train and evaluate the ANNs. The ANN calibration and analyses were performed with MatLab 7 (2005), Neural Network Toolbox (MathWorks, Natick, Mass). Graphs and statistical analyses were performed using the Intercooled Stata version 9.0 (2005) statistical package (StataCorp LP, College Station, Tex).
| Results |
|---|
|
|
|---|
The average age was 62.6 ± 10.7 years (range 17-89). The majority of patients were men (72%). Isolated coronary artery bypass grafting (CABG) was performed in 11,628 patients (63%), 4907 (27%) patients had a valve procedure with or without CABG surgery, and 1827 (10%) had miscellaneous procedures. The patient details are described in Table 1. The actual operative mortality was 4.9% (n = 891).
Architecture of Artificial Neural Networks
Approximately 42,500 different ANN models were validated. The architecture for the final validation ANN included 1 hidden layer with 14 nodes, 1 output node, and 6 individual members of the ensemble. This ANN architecture was used in the selection of risk factors utilized for the mortality prediction.
Selection of Risk Factors Utilized for Mortality Prediction
The importance ranking order of the risk variables for the ANN model is presented in Tables 1 and 2 and Figure 2, A. To optimize the model, an increasing number of the ranked variables was included in the model, starting with the top-ranked variable. The largest validation ROC area, 0.82 (95% CI: 0.80-0.83), was achieved when the 34 top-ranked risk variables were included (Figure 2, B).
|
Performance and Predictive Accuracy for the Algorithms
The discriminatory power (ie, the area under the ROC curve) for operative mortality was significantly larger for the final ANNs, 0.81 (95% CI: 0.79-0.82) compared with the logistic EuroSCORE model, 0.79 (95% CI: 0.78-0.81;
2 = 15.7; P = .0001; Figure 3). The final ANN ROC area was also significantly larger than the ROC area for a logistic model with the same 34 top-ranked risk variables, 0.80 (95% CI: 0.78-0.81;
2 = 17.5, P < .0001).
|
|
Effective Odds Ratio and Classification Confidence Limits for ANNs
The effective odds ratio for the 34 top-ranked risk variables is presented in Table 1. Bootstrap sampling was used to generate CIs for both the effective odds ratio (Table 1) and the ANN classification (Figure 4). Thus, an individual patient with a calculated mortality risk of 0.64 belongs with 95% certainty to the group of patients with a greater likelihood of not surviving the operation than surviving the surgery. For a patient with a calculated risk of 0.31, the opposite holds true.
|
| Discussion |
|---|
|
|
|---|
The search for an effective method for mortality prediction in cardiac surgery started in the 1980s.
24
During the last decades, several risk score algorithms for cardiac surgery have been published,
2,4,5,19
but it still remains difficult to risk stratify individual patients.
25
Most risk scoring systems have been developed using a biostatistical method based on a generalized linear model. Different methods to improve the accuracy of risk algorithms have been suggested (eg, include more patients with higher risk, select and identify the most important risk factors, and the use of new algorithmic models such as machine learning techniques of which ANNs are an example).
26-28
Only a few studies have investigated ANNs in the prediction of survival after cardiac surgery.
9-14
Most of these are based on CABG-only patients
9,11-14
and only 1 included all cardiac surgical procedures.
10
None of these studies found any considerable improvement over the traditional biostatistical methods. Orr
10
and Tu and colleagues
9
showed that an ANN could be used to estimate cardiac surgical mortality, but the performance was equivalent to that of logistic regression. These studies were made on smaller cohorts than the present (1477 and 4782 patients) and used few risk variables (7 and 11). Lippmann and Shahian
11
obtained a similar result when ANNs were used on patients from the Society of Thoracic Surgeons database. Despite that 80,000 patients with 32 risk variables were included in the study, the ANNs showed a performance equivalent to the other prediction models. The authors concluded that no complex nonlinear relationship exists, at least not among the presented risk variables. Similar to the other studies, almost all variables were categorical, and the variable selection was performed in a classic way, by significance testing (P value). However, identifying a nonlinear relationship is more likely in continuous than categorical variables. Important risk variables for ANNs may also go unrecognized if traditional statistical variable selection is used.
One fundamental and controversial question is the number of variables optimally included in a risk model. In the present study a total of 72 variables (11 continuous) were used. No prior variable selection such as significance testing was used; instead, the ANNs ranked every variable in order of its importance for the mortality prediction. In a second step, the total number of variables was minimized to include only variables with a positive contribution to the outcome prediction. The largest ROC area was achieved when the 34 top-ranked variables were included in the model. If more variables were included, the discriminatory power decreased.
Five of the studies of ANNs in cardiac surgery used ROC analysis to describe the accuracy and the discrimination for the different models
9,11-14
and 1 compared the number of correctly classified patients.
10
Even if comparison of ROC curves in a statistically valid fashion to evaluate models remains controversial, the ROC curve is currently the best-developed statistical tool for describing performance.
22
Importantly, the ROC curve for the ANN model is consistently above the logistic EuroSCORE ROC curve, making direct comparison possible. When applying a statistical model to clinical practice, cutoff values for sensitivity and specificity are valuable. ANNs performed significantly better than the logistic EuroSCORE at sensitivity cutoffs of 25%, 50%, and 75%. At a sensitivity of 75%, the ANNs classified 720 more survivors correctly than the logistic EuroSCORE model did.
The predictive accuracy of different risk scoring systems may be influenced by numerous factors, such as differences in variable definitions, management of incomplete data fields, surgical procedure selection criteria, and geographical differences in patient risk factors. The prevalence of risk factors in patients referred for heart surgery may also change over time.
The advantages of ANNs are that they do not require any a priori assumptions or knowledge of underlying frequency distributions, have the capacity to model complex nonlinear relationships, and are robust and tolerant of missing data and input errors.
11
Earlier studies on risk analysis in cardiac surgery have mostly been developed and validated on isolated CABG-only patients
2,4
or all cardiac surgery.
17
Recently the Northern New England Cardiovascular Disease Study Group presented a risk model for aortic valve surgery and another for mitral valve surgery.
29
Analyses comparing risk score performance in different surgical procedures have been lacking. In the present study, the ANNs show a similar performance independent of the surgical procedure, unlike the logistic EuroSCORE model. This may be explained not only by a better risk factor selection but also by the capacity of ANNs to recognize complex nonlinear relationships.
Strengths of the present study are that the ANNs were developed on a large multi-institutional database from 8 European countries, that the patient data were quality-checked and validated by 2 independent operators before it was entered into the database,
17
that a large number (42,500) of combinations of parameters included in the ANNs architecture could be evaluated by using high-performance computer clusters, and that an independent blind test was performed on a second, external database. A limitation of the present study is that it was performed on data collected 10 years ago. However, a similar result was obtained in the blind test, where the surgical procedures were performed between 1996 and 2001. Hierarchical generalized linear modeling, which accounts for clustering of observations within providers, may improve the results of the logistic regression.
30
This method may be particularly useful to rate provider performance.
The additive EuroSCORE algorithm
5
can be used at the bedside without a computer, and the logistic EuroSCORE
19
is available on the Internet (http://www.euroscore.org). The ANN model cannot compete with the additive model in simplicity, but it is feasible to make it available on a website.
| Appendix E1 |
|---|
|
|
|---|
The final prediction models were tested on patients not previously exposed to the ANNs or the logistic EuroSCORE, by using a fourfold cross-testing technique. Thus, the patient material was randomly split into 4 groups. One of these groups was selected as a test set and excluded from further analysis. The remaining groups were used for calibration and validation. This procedure was performed 4 times with a new group selected each time as a test set (see Figure 1).
Selection of Risk Factors Utilized for Mortality Prediction
To select the most important risk variables and to minimize the number of variables included in the final model, a risk variable ranking was performed. A baseline receiver operating characteristics (ROC) area (see below) was created using all 72 variables. The ranking list was then obtained by measuring the change of the ROC area, as compared with the baseline, when a risk variable was excluded from the model. The highest-ranked variable corresponded to the largest decrease of the ROC area when it was excluded from the model. Each of the models lacking one of the risk variables was recalibrated prior to the ROC area assessment. To optimize the model an increasing number of the ranked variables was included in the model, starting with the top-ranked variable. In this procedure the ANNs were recalibrated after every second variable inclusion.
Effective Odds Ratio and Confidence Intervals for the ANN Output
The odds ratio for a specific risk variable in each patient was determined by changing the risk variable in the patient from "absent" to "present" and calculating the odds for the two conditions. By computing the geometric mean for the odds ratio from all patients, an effective odds ratio for the specific variable was obtained.
2
The 95% confidence limit for both the odds ratio and the output from the ANNs were calculated using the bootstrap technique.
2,3
From the original database, 1750 bootstrap training data sets were created by resampling with replacement. These bootstrap training sets were then used to calibrate new ANN models with the same architecture and parameters settings as for the final ANN risk prediction model. Each ANN model generated an odds ratio for each variable, resulting in 1750 odds ratios for each variable. Standard techniques
2,3
were then used to extract the confidence limits from these sets of odds ratios. The confidence limits for the mortality risk of individual patients were calculated in the same way.
Computer Cluster
Three clusters for high-performance computing were used to train and evaluate the ANNs. Two Linux clusters hosted by Lunarc at Lund University, one with 210 AMD Opteron nodes and one with 184 Intel P4 nodes, and one Mac OS X cluster with 7 nodes were employed. The latter was also used for the statistical analysis.
| See related editorial on page 8.
|
| Acknowledgments |
|---|
| Footnotes |
|---|
| References |
|---|
|
|
|---|
| References E1 |
|---|
|
|
|---|
Related Article
This article has been cited by other articles:
![]() |
A. A. Klein and S. A. M. Nashef Perception and Reporting of Cardiac Surgical Performance Seminars in Cardiothoracic and Vascular Anesthesia, September 1, 2008; 12(3): 184 - 190. [Abstract] [PDF] |
||||
![]() |
K. Karkouti, D. N. Wijeysundera, W. S. Beattie, and for the Reducing Bleeding in Cardiac Surgery (RBC) Risk Associated With Preoperative Anemia in Cardiac Surgery: A Multicenter Cohort Study Circulation, January 29, 2008; 117(4): 478 - 484. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. B. Ferguson Jr The "scientific investment" by cardiac surgery J. Thorac. Cardiovasc. Surg., July 1, 2006; 132(1): 8 - 9. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |