OUP user menu

Comparison of 19 pre-operative risk stratification models in open-heart surgery

Johan Nilsson, Lars Algotsson, Peter Höglund, Carsten Lührs, Johan Brandt
DOI: http://dx.doi.org/10.1093/eurheartj/ehi720 867-874 First published online: 18 January 2006

Abstract

Aims To compare 19 risk score algorithms with regard to their validity to predict 30-day and 1-year mortality after cardiac surgery.

Methods and results Risk factors for patients undergoing heart surgery between 1996 and 2001 at a single centre were prospectively collected. Receiver operating characteristics (ROC) curves were used to describe the performance and accuracy. Survival at 1 year and cause of death were obtained in all cases. The study included 6222 cardiac surgical procedures. Actual mortality was 2.9% at 30 days and 6.1% at 1 year. Discriminatory power for 30-day and 1-year mortality in cardiac surgery was highest for logistic (0.84 and 0.77) and additive (0.84 and 0.77) European System for Cardiac Operative Risk Evaluation (EuroSCORE) algorithms, followed by Cleveland Clinic (0.82 and 0.76) and Magovern (0.82 and 0.76) scoring systems. None of the other 15 risk algorithms had a significantly better discriminatory power than these four. In coronary artery bypass grafting (CABG)-only surgery, EuroSCORE followed by New York State (NYS) and Cleveland Clinic risk score showed the highest discriminatory power for 30-day and 1-year mortality.

Conclusion EuroSCORE, Cleveland Clinic, and Magovern risk algorithms showed superior performance and accuracy in open-heart surgery, and EuroSCORE, NYS, and Cleveland Clinic in CABG-only surgery. Although the models were originally designed to predict early mortality, the 1-year mortality prediction was also reasonably accurate.

  • Mortality
  • Risk factors
  • Statistics
  • Surgery
  • Survival

See page 768 for the editorial comment on this article (doi:10.1093/eurheartj/ehi792)

Introduction

Despite technological advancements, open-heart operations still carry a risk of mortality and morbidity. To aid in the selection of patients for cardiac surgery, several risk-scoring systems have been developed during the last decades. These aim to estimate the risk of peri-operative death, based on the occurrence of different risk factors. Operative mortality is also increasingly used as an indicator of the quality of cardiac surgery.1

To make an accurate comparison between different institutions or surgeons, mortality data must be adjusted to the risk profiles of the patients.2,3 Differences between the available risk algorithms regarding score design and the patient population on which the score development was based could influence their accuracy and performance. Ideally, a risk model should be useful for outcome prediction at different surgical centres, both at the institutional level and for individual patients.4 Operative mortality is the outcome variable most commonly used as a quality indicator, but long-term mortality may be more relevant from a patient perspective.

A few comparative studies of different risk algorithms exist.48 However, the relative performance of the risk-scoring systems currently used remains unclear. The purpose of this study was to compare 19 open-source risk score algorithms with regard to their validity to predict 30-day and 1-year mortality after cardiac surgery in a large single-institution patient population.

Methods

Study design and patients

The study was approved by the Ethics Committee of the Medical Faculty at Lund University. Risk factors for all adult patients undergoing heart surgery at the University Hospital of Lund between January 1996 and February 2001 were prospectively collected when the patients were admitted to the Department of Cardiothoracic Surgery. The patient record form contained a total of 248 variables (80 pre-, 106 intra-, and 62 post-operative) based on the Society of Thoracic Surgeons (STS)9 patient record form. The data was stored in a local adult cardiac surgery database.

Data collection and risk-score calculation

From the total of 248 variables, those corresponding to the risk factors in the different risk models were selected. Thus, a subset of 104 of the pre- and intra-operative variables were imported into the statistical software package, together with 30-day and 1-year mortality for the population. Missing values were replaced using the probability imputation technique10 before the risk score was calculated. The probability imputation technique substitutes conditional probabilities for missing covariate values when the covariate is qualitative. The risk score for each algorithm was calculated for every patient according to the published definitions (Table 1).

Follow-up

The vital status at 1 year after the operation was obtained for all patients from the Population and Welfare Statistics Sweden, Statistiska Centralbyrån, Stockholm, Sweden, as was the date and cause of mortality.

Statistical analysis

Means (±SD) were used to describe the continuous variables, and frequencies were calculated for categorical variables. Score-predicted operative mortality (death within 30 days of operation) was calculated using the mean score from the different risk models, except for the Northern New England algorithm where the published score-mortality table11 was used. Receiver operating characteristics (ROC) curves were used to describe the performance and predictive accuracy for the different algorithms.12 The discriminatory power, i.e. the c-index, was evaluated by calculating the areas under ROC curves.13 The areas under curves are presented with 95% confidence limits. An area of 1.0 under the ROC curve indicates perfect discrimination, whereas an area of 0.50 indicates complete absence of discrimination. Any intermediate value is a quantitative measure of the ability of the risk predictor model to distinguish between survivors and non-survivors.

To compare the areas under the resulting ROC curves (used as an index for the predicted value), the non-parametric approach described by DeLong et al.14 was used. The ROC area for each risk algorithm was systematically compared with the ROC area of the other 18 algorithms. The numbers of algorithms with a significantly larger or smaller ROC area was then computed. The probability significance level was adjusted for the effect of multiple comparisons using Sidak's method.

Graphs and statistical analyses were performed using the Intercooled Stata version 9.0 (2005) statistical package (StataCorp LP, College Station, TX, USA) and GraphPad Prism 4b, 2004 for Mac OS X, GraphPad Software, Inc., USA.

Results

Patient population

Between January 1996 and February 2001, 6499 consecutive heart operations were performed on 6414 patients. During the period January–March 1998, database service and upgrade resulted in missing values in 30% of the data points. All operations (n=277) from this period were excluded from the study. Thus, 6153 patients, undergoing 6222 operations, were included in the analysis. In 2% of the total data points, missing values were replaced using the probability imputation technique.10 There was accurate documentation of data including mortality and cause of death in all cases, and no patient was lost to follow-up.

The average age was 66.3±10.6 years (range 18–95). The majority of patients were men (72%). A coronary artery bypass grafting (CABG)-only operation was performed in 4351 cases (70%), 1340 (22%) cases had a valve procedure with or without CABG surgery, and 531 (8%) were miscellaneous procedures, e.g. post-infarction septal rupture (37 cases), aortic aneurysm or dissection (209 cases), and cardiac transplantation (78 cases). Previous cardiac surgery had been performed in 457 cases (7.3%). Seventy-eight patients (1.3%) were in cardiogenic shock at the start of the operation and 628 (10%) were operated within 24 h after acceptance for surgery (emergency surgery). The actual 30-day mortality was 2.9% (n=180) and the 1-year mortality was 6.1% (n=377).

Performance and predictive accuracy for the algorithms

The discriminatory power (i.e. the area under the ROC curve) for 30-day mortality and 1-year mortality was highest for the logistic (0.84 and 0.77) and additive (0.84 and 0.77) European System for Cardiac Operative Risk Evaluation (EuroSCORE) algorithms, followed by the Cleveland Clinic (0.82 and 0.76) and the Magovern (0.82 and 0.76) scoring systems (Figures 1 and 2). None of the other risk algorithms had a significantly better discriminatory power (larger ROC area) than these four (Figure 3). In the subanalysis with CABG-only patients, the discriminatory power for the two EuroSCORE algorithms were highest, followed by the New York State (NYS) and Cleveland Clinic risk algorithm (Table 2).

The mortality predictions of the different scoring systems are shown in (Figure 4).

Follow-up

The most common cause of death within 30 days was cardiovascular disease (n=163, 91%), followed by cerebrovascular disease (n=3, 1.7%), malignant neoplasm (n=3, 1.7%), and chronic lower respiratory disease (n=2, 1.1%). Cardiovascular disease was also the most common cause of death within 1 year (n=280, 74%), followed by malignant neoplasm (n=22, 5.8%), cerebrovascular disease (n=16, 4.2%), chronic lower respiratory disease (n=10, 2.7%), and septicaemia (n=10, 2.7%). For each risk algorithm, the ROC areas for cardiovascular-related (n=163) and total 30-day mortality (n=180) were almost identical (difference 0.005 or less). The discriminatory power for cardiovascular-related 1-year mortality (n=280) increased by approximately 0.03 for all 19 algorithms compared with the discriminatory power for total 1-year mortality (n=377) (logistic EuroSCORE 0.80, additive EuroSCORE 0.80, Cleveland Clinic 0.79, and Magovern 0.78). However, it did not change their relative order of discriminatory power.

Discussion

The purpose of this study was to compare 19 commonly used cardiac surgical risk scores with regard to their validity in a large single-institute patient population. The results show that four of the algorithms had a superior performance and accuracy to predict 30-day and 1-year mortality, expressed as discriminatory power, compared with the other 15 algorithms. Despite the fact that all of the algorithms were designed to predict early mortality, they also predict 1-year mortality well, especially when the cause of death was cardiovascular disease.

Most algorithms overestimated the 30-day mortality in this patient population. The same finding has been reported in other studies.4,6 Rather than reflecting weaknesses in the risk score algorithm, these findings are probably explained by differences in patient mix and temporal periods compared to the original databases used for development of the algorithms.6 Prediction of mortality rate in the CABG-only subgroup was almost perfect using the Northern New England and NYS algorithms, which are both for use in CABG surgery and newly developed.

The potential of ROC curves in medical diagnostic testing was recognized as early as 1960.15 Even if comparison of ROC curves in a statistically valid fashion to evaluate models remains controversial, the ROC curve is currently the best developed statistical tool for describing performance.12 The EuroSCORE model, which had the highest discriminatory power, has been shown to work well to predict 30-day mortality in many European countries16 and in the United States.17 It compared favourably with the STS risk stratification algorithm7 (which is not open source and was therefore not included in the present analysis). Recently, it was demonstrated that EuroSCORE could predict intensive care unit stay and costs of open-heart surgery.18 The Cleveland Clinic model has also shown high discrimination to predict early mortality.8 An important finding in the present study is that these algorithms could be used also to predict long-term mortality (1 year), especially for cardiovascular deaths.

Earlier studies have compared the performance of different risk algorithms to predict 30-day mortality,4,6,8 but have not shown significant differences in performance and accuracy. This may be explained by smaller patient materials.6,8

The predictive accuracy of different risk scoring systems may be influenced by numerous factors, such as differences in variable definitions, management of incomplete data fields, surgical procedure selection criteria, and geographical differences in patient risk factors. The prevalence of risk factors in patients referred for heart surgery may also change over time. Difficulties thus arise when comparison of the accuracy and predictive power of large databases are attempted. However, ROC analysis is a robust technique for such comparisons. Importantly, the shapes of the ROC curves were similar among the compared risk models (Figure 2), making direct comparison possible.12 Murphy-Filkins et al.19 showed that an increase up to five times of a low-frequency variable (for example, due to difference in a variable definition) did not appreciably change the model fit.

All surgical procedures were included in the study, irrespective of the number of operations the patients underwent. Thus, a patient could participate two or more times in the analysis. This could be debated, as a dependence of the data that arises from multiple procedures performed within a patient may occur. An alternative would be to include only the first procedure for each patient. A subanalysis using this approach (n=6153) showed only very small differences in the ROC area for the different risk algorithms (in average 0.001). A drawback of excluding patients having a second procedure during the study period is that some high-risk cases will be eliminated from the analysis. Regardless of which method used, the differences caused by this dependence was negligible, most likely due to the small number of patients (1%) who had more than one procedure.

The probability imputation technique, used in this study, has been shown to work well in prognostic factor studies.20 Another strategy to handle incomplete data is to exclude the patients with missing values from analysis, but because missing values are more likely in emergent high-risk patients, this could result in bias.

Geographical differences in the occurrence of patient risk factors may have influenced the design of different risk-scoring systems, but do not seem to influence the present results. The best-performing risk scores in this study were developed in two different geographical areas: Europe and the USA.

Eight of the included risk algorithms (Cabdeal, NYS, Northern New England, Magovern, Toronto, Toronto (modified), UK national score, and Veterans Affairs) were originally designed to predict early mortality in CABG-only patients, which also could affect the predictive accuracy. A subanalysis of CABG-only patients in this material identified the same two risk-scoring systems with the largest ROC areas (EuroSCORE additive and logistic), followed by the NYS and the Cleveland Clinic risk-scoring systems.

The smaller ROC area for the 1-year than for the 30-day mortality prediction was expected. Risk models originally designed to predict 30-day mortality will mainly predict cardiovascular death, which was the most common cause of early post-operative mortality (91%). At 1 year, the causes of death will be more diverse and the proportion of cardiovascular-related death will decrease (74%).

The strength of the present study is that the algorithms could be compared using a relatively large patient material, where the patient data were collected on a regular basis in the daily clinical work. The data was pre-operatively entered into the database, generally by residents, and not by the surgeon performing the operation.

During the last decades, several different risk score algorithms for cardiac surgery have been published, but it still remains difficult to risk stratify individual patients.4,8 One method to improve risk algorithm development could be to include more patients with higher risk scores as suggested by Wyse and Taylor.21 However, we found that the Cleveland Clinic score, which was developed on 5051 patients, performed almost as well as the EuroSCORE, developed on 13 302 patients.

Most risk algorithms are based on logistic regression analysis with a priori assumptions of linear relationships. Another method to improve risk prediction could be to use a more complex risk model, such as the artificial neural network, which has the advantage of the capacity to model complex, non-linear relationships and is relatively robust and tolerant of missing data.22 There are only a few studies done in this area, which merits further investigation.

Even if a perfect risk prediction algorithm in cardiac surgery is never achieved, identification of the best-performing risk algorithms is important. Pre-operative risk stratification may aid in the selection between cardiac surgery and other therapeutic modalities currently available, facilitate the planning of hospital resource utilization, and enable accurate comparison between different institutions or surgeons.

Conflict of interest: none declared.

Appendix

View this table:
Table A1

Pre-operative general risk factors in 6222 open-heart operations

Pre-operative risk factorMean (±SD) or n (%)AmphiascoreCabdealCleveland ClinicEuroSCOREaFrench scoreMagovernNYSNorthern New EnglandOntarioParsonnetParsonnet (modified)PonsTorontoToronto (modified)TremblayTumanUK national scoreVeterans Affairs
Ageb (years)66.3 (10.6)
Female gender1765 (28.4)
Heightb (centimetres)171.4 (8.0)
Weightb (kilograms)78.7 (13.8)
Hbb (g/L)134.1 (16.3)
Serum creatinineb (µmol/L)95.2 (40.5)
Hypertension (sys >140 mmHg)2458 (40.0)
Diabetes1106 (17.9)
Hypercholesterolemia (treated)2274 (37.0)
Chronic pulmonary disease477 (7.7)
Active smoker539 (8.8)
Cerebrovascular disease448 (7.2)
Peripheral vascular disease636 (10.3)
Kidney disease by history248 (4.0)
Dialysis28 (0.5)
Adult congenital heart disease11 (0.2)
ASA medication4346 (69.9)
Diuretic medication2203 (35.4)
Immunosuppressive medication71 (1.2)

ASA indicates acetylsalicylic acid; Hb, hemoglobin; sys, systolic arterial blood pressure.

aAdditive and logistic.

bContinuous variables are presented as mean (+SD). The analysis is based on operations where the risk factor data were available.

View this table:
Table A2

Pre-operative cardiac risk factors in 6222 open-heart operations

Pre-operative risk factorMean (±SD) or n (%)AmphiascoreCabdealCleveland ClinicEuroSCOREaFrench scoreMagovernNYSNorthern New EnglandOntarioParsonnetParsonnet (modified)PonsTorontoToronto (modified)TremblayTumanUK national scoreVeterans Affairs
Previous cardiac surgery457 (7.3)
Active endocarditis55 (0.9)
Heart failure1156 (18.6)
Cardiomegaly327 (5.3)
Unstable angina744 (12.0)
CCSb2.6 (1.0)
NYHAb2.4 (1.0)
Recent MI (within 24 h)144 (2.3)
Recent MI (within 48 h)207 (3.3)
Recent MI (within 21 days)793 (12.9)
Ventricular arrhythmia (acute)64 (1.0)
Atrial fibrillation508 (8.3)
Pacemaker33 (1.0)
Left main stenosis964 (17.9)
Triple vessel disease2690 (50.7)
LVEFb49.7 (11.6)
Aortic gradient >120 mmHg278 (4.5)
Pulmonary hypertension191 (3.1)

CCS, Canadian Cardiovascular Society; LVEF, left ventricular ejection fraction; NYHA, New York Heart Association; MI, myocardial infarction.

aAdditive and logistic.

bContinuous variables are presented as mean (+SD). The analysis is based on operations where the risk factor data were available.

View this table:
Table A3

Critical pre-operative situations in 6222 open-heart operations

Pre-operative risk factorn (%)AmphiascoreCabdealCleveland ClinicEuroSCOREaFrench scoreMagovernNYSNorthern New EnglandOntarioParsonnetParsonnet (modified)PonsTorontoToronto (modified)TremblayTumanUK national scoreVeterans Affairs
Urgent surgery1376 (22.2)
Emergency surgery628 (10.1)
PTCA failure/complication138 (2.2)
Intubated71 (1.1)
IABP134 (2.2)
Uncontrolled systemic disturbanceb1135 (18.2)
Cardiogenic shock78 (1.3)
Hemodynamically unstable286 (4.6)
Critical statec308 (5.0)
Catastrophic statesd206 (3.3)

IABP, intra-aortic ballon pump; PTCA, percutaneous transluminal coronary angioplasty.

aAdditive and logistic.

bAny one or more of the following: systolic pulmonary arterial pressure>50 mmHg; uncontrolled systemic arterial hypertension; renal insufficiency; chronic lung disease; poor hepatic function; cerebrovascular insufficiency; severe arrhythmias; active endocarditis; cachexia.

cAny one or more of the following: ventricular tachycardia or fibrillation or aborted sudden death; pre-operative cardiac massage; pre-operative ventilation before arrival in the anaesthetic room; pre-operative inotropic support; intraaortic balloon counterpulsation; or pre-operative acute renal failure (anuria or oliguria<10 mL/h)

dAny one or more of the following: acute structural defect (acute ventricular septal defect or acute mitral valve regurgitation); cardiogenic shock; acute renal failure.

View this table:
Table A4

Surgical information in 6222 open-heart operations

Operationn (%)AmphiascoreCabdealCleveland ClinicEuroSCOREaFrench scoreMagovernNYSNorthern New EnglandOntarioParsonnetParsonnet (modified)PonsTorontoToronto (modified)TremblayTumanUK national scoreVeterans Affairs
Venous graft alone572 (9.2)
Single valve surgery only657 (10.6)
Valve surgery only721 (11.6)
Aortic valve surgeryb1106 (17.9)
Mitral valve surgeryc449 (7.3)
Tricuspid valve surgeryb40 (0.6)
Valve surgery and CABG619 (9.9)
Otherd than isolated CABG1871 (30.1)
Heart transplantation78 (1.3)
Post-infarction septal rupture37 (0.6)
Left ventricular aneurysm16 (0.3)
Surgery on thoracic aorta209 (3.4)
Aortic dissection (acute)79 (1.3)

aAdditive and logistic.

bWith or without CABG surgery.

cWith or without CABG surgery, except for Amphiascore where the definition is mitral valve surgery with CABG surgery.

dTotal number of valve or miscellaneous procedures with or without CABG surgery.

Figure 1 The ROC area (diamonds) with 95% confidence intervals (horizontal bars) for 30-day mortality and 1-year mortality. (A) 30-day mortality and (B) 1-year mortality. Open heart surgery (n=6222). See Table 1 for abbreviations.

Figure 2 The ROC curves. The sensitivity of prediction of 30-day mortality vs. 1-specificity for the 19 risk algorithms is plotted. The solid line represents the absence of discrimination. Open-heart surgery (n=6222).

Figure 3 Comparison of the ROC area for different risk algorithms. For each risk scoring system (left y-axis), the number of risk algorithms with a significantly (P<0.05) larger (black bar) or smaller (grey bar) ROC area are shown. (A) 30-day mortality and (B) 1-year mortality. Open-heart surgery (n=6222). See Table 1 for abbreviations.

Figure 4 Observed 30-day mortality with 95% confidence intervals (vertical lines) in comparison to score-predicted 30-day mortality (diamonds) with 95% confidence intervals (horizontal bars). (A) All open-heart surgery and (B) CABG-only surgery. Asterisk denotes the predicted mortality calculated from ACC/AHA score mortality table11 specified for CABG-only surgery. See Table 1 for abbreviations.

View this table:
Table 1

Synopsis of original data of 19 risk score algorithms

RegionYear of data collectionYear of publicationNumber of patients (centers)Risk variablesROC area
Amphiascore23Netherlands1997–200120037282 (1)80.84
Cabdeala,24Finland1990–19911996386 (1)70.71
Cleveland clinic25USA1986–198819925051 (1)13N/A
EuroSCORE (add.)26Europe1995199913 302 (128)170.79
EuroSCORE (log.)27Europe1995200313 302 (128)170.79
French score28France199319957181 (42)130.75
Magoverna,29USA1991–199219961567 (1)180.86
NYSa,3,30USA1998200118 814 (33)140.79
NNEa,11USA1996–199819997290 (N/A)8N/A
Ontario31Canada1991–199319956213 (9)60.75
Parsonnet32USA1982–198719893500 (1)16N/A
Parsonnet (mod.)33France1992–199319976649 (42)410.70
Pons34Spain199419971309 (7)11N/A
Torontoa,35Canada1993–199619997491 (2)90.78
Toronto (mod.)a,36Canada1996–199720001904 (1)9N/A
Tremblay37Canada1989–199019932029 (1)8N/A
Tuman38USAN/A19923156 (1)10N/A
UK national scorea,5UK1995–199619981774 (2)190.75
Veterans Affairsa,39USA1987–1990199312 712 (43)10N/A

Add, additive; log, logistic; mod, modified; NNE, Northern New England; N/A, not available. Cleveland Clinic risk score algorithm is also known as Higgins score, NNE as American College of Cardiology/American Heart Association (ACA/AHA) score, and Ontario as Provincial Adult Cardiac Care Network (PACCN) score.

aAlgorithms developed for CABG-only surgery.

View this table:
Table 2

ROC area for the five risk algorithms with best performance and accuracy in CABG-only surgery (n=4351)

30-day mortality ROC area (95% CI)1-year mortality ROC area (95% CI)
EuroSCORE (logistic)0.86 (0.82–0.90)0.75 (0.72–0.79)
EuroSCORE (additive)0.85 (0.81–0.89)0.75 (0.71–0.78)
NYS0.84 (0.80–0.88)0.75 (0.72–0.79)
Cleveland Clinic0.84 (0.80–0.88)0.75 (0.71–0.78)
Parsonnet (modified)0.84 (0.80–0.88)0.73 (0.69–0.77)

Cleveland Clinic risk score algorithm is also known as Higgins score.

References

View Abstract