|
|
||||||||
J Thorac Cardiovasc Surg 2007;133:88-96
© 2007 The American Association for Thoracic Surgery
General Thoracic Surgery |
a Unit of Thoracic Surgery, Umberto I° Regional Hospital, Ancona, Italy
b Department of Anaesthesiology, Sheffield Teaching Hospital, Sheffield, United Kingdom
c Division of Thoracic Surgery, National Cancer Institute, Pascale Foundation, Naples, Italy.
Read at the Eighty-sixth Annual Meeting of The American Association for Thoracic Surgery, Philadelphia, Pa, April 29-May 3, 2006.
Received for publication April 22, 2006; revisions received August 13, 2006; accepted for publication August 31, 2006. * Address for reprints: Alessandro Brunelli, MD, Via S Margherita 23, Ancona 60129, Italy. (Email: alexit_2000{at}yahoo.com).
| Abstract |
|---|
|
|
|---|
METHODS: Seven hundred forty-three patients (551 male and 192 female patients) who underwent lobectomy (n = 611) or pneumonectomy (n = 132) from January 2000 through August 2004 at 2 European thoracic units (519 patients in unit A and 224 patients in unit B) were analyzed. Risk-adjusted models of 30-day or in-hospital cardiopulmonary morbidity and mortality were developed by using stepwise logistic regression analyses and validated by means of bootstrap analysis. Preoperative and operative variables were initially screened by using univariate analysis. Those with a P value of less than .10 were used as independent variables in the regression analyses. The regression equations were then used to estimate the risk of outcome, and the observed and predicted outcome rates of the 2 units were compared by using the z test for comparison of proportions.
RESULTS: The following regression models were developed. Predicted morbidity:
(Hosmer-Lemeshow statistic = 6.1 [P = .6], c index = 0.65). Predicted mortality:
(Hosmer-Lemeshow statistic = 2.99 [P = .9], c index = 0.77). The models proved to be stable at bootstrap analyses. No differences were noted between observed and predicted outcome rates within each unit, despite an apparent unadjusted better performance of unit B.
CONCLUSIONS: The use of risk-adjusted outcome models avoided misleading information derived from the unadjusted analysis of performance. Risk modeling is essential for the evaluation of the quality of care.
| Introduction |
|---|
|
|
|---|
The objective of the present study was to develop risk-adjusted models of early morbidity and mortality after major lung resection and to compare the performance of 2 different European thoracic surgery units with the intent to provide a methodological and practical example of model building and application for multicentric comparative audit purposes.
| Patients and Methods |
|---|
|
|
|---|
This is an observational study performed on prospective, periodically audited electronic databases of two dedicated thoracic surgery units located in two different European countries. Data were entered prospectively in each database by a trained staff physician and were periodically audited by a designated audit lead, who was responsible for the accuracy and completeness of the database. Both databases are used as continuous quality-improvement instruments at the two participating units. The study was approved by the local institutional review board of each center, and informed consent to be entered in the databases was obtained from all patients.
The patients and datasets used for this study were the same used for a recently published analysis.6
Although the subject was similar (risk modeling), the focuses of the two analyses were different in that the first study aimed at demonstrating the superiority of bootstrap analysis over the traditional training and test method for developing risk models by constructing multiple models from the population from unit A and then assessing their validity on the bootstrapped population of unit B.6
No comparison between the two units was performed because this was not the objective of that report. On the other hand, in this analysis the main purpose was to compare the performance of the two units by developing risk models from the entire dataset of patients (unit A plus unit B) after validation by means of bootstrap analysis. For this reason, we consider the two studies unique and independent.
In both centers surgical intervention was contraindicated in those patients with a predicted postoperative forced expiratory volume in 1 second (ppoFEV1) and predicted postoperative carbon monoxide lung diffusion capacity (ppoDLCO) of less than 30% of predicted value in association with a poor exercise capacity (height at maximal stair-climbing test <12 m or maximum oxygen consumption [VO 2max] at cycle ergospirometry <10 mL · kg1 min1) or in the presence of hemodynamic instability, despite optimization of treatment. As a rule, lung resections were performed through a muscle-sparing thoracotomy by certified thoracic surgeons for benign (56 lobectomies and 2 pneumonectomies) or malignant (555 lobectomies and 130 pneumonectomies; 667 primary and 18 metastatic diseases) diseases. The postoperative management policies were the same in both centers. All patients were admitted to a dedicated general thoracic surgery ward immediately after the operation, resorting to the intensive care unit only in case of complications requiring invasive assisted ventilation or invasive continuous monitoring. Postoperative treatment was standardized in both units and focused on early mobilization, chest physiotherapy and physical rehabilitation, thoracotomy pain control, and antibiotic and antithrombotic prophylaxis. Postoperative chest pain was controlled by means of epidural or continuous intravenous analgesia, which was titrated to keep the pain visual analogue score at less than 4 (in a scale ranging from 0-10) for the first postoperative 48 to 72 hours (pain score was assessed twice daily during the morning and afternoon rounds).
Postoperative morbidity and mortality were considered as those occurring within 30 days postoperatively or for a longer period if the patient was still in the hospital.
A number of preoperative and operative variables were tested for possible association with outcome variables (see Appendix 1
4,5,7-9
for explanation of variables).
Data were initially scrutinized for assessing the quality of variables and their consistency in definition and recording between the 2 units. To this purpose, the 2 databases were reciprocally and independently audited by the principle investigator of the other unit (AB and GR). Only those variables and end points that were deemed of high quality and consistent across the 2 units were included in this analysis. The databases were made anonymous for both patients and surgeons and were merged for analysis. All patients were initially used to develop the predictive logistic models. For each measure of outcome (morbidity and mortality rates), variables were initially screened by using univariate analyses. The univariate comparisons of outcomes were performed by means of the unpaired Student t test for numeric variables with normal distribution and by means of the MannWhitney U test for numeric variables without normal distribution. The ShapiroWilk normality test was used to assess normal distribution. Categoric variables were compared by means of the
2 test or the Fisher exact test, as appropriate.
Variables with a P value of less than .10 at univariate analysis were then used as independent variables in the stepwise logistic regression analyses. The presence or absence of 1 or more complications or of mortality was used as a dependent variable in each respective model. All data were complete, with the exception of data on carbon monoxide lung diffusion capacity (DLCO), which were 95% complete. Missing data were imputed by averaging the nonmissing values. Potential explanatory variables more than 5% incomplete were excluded from this analysis (ergometric parameters [VO 2max], blood gas analysis measures, and albumin concentration). To avoid multicollinearity, only 1 variable in a set of variables with a correlation coefficient of greater than 0.5 was selected (by using the bootstrap procedure) and used in the regression model.
A P value of less than .05 was selected for retention of variables in the final model. The area under the receiver operating characteristic curve or c index was used to study the discrimination ability of each model. HosmerLemeshow goodness-of-fit statistics were used to assess the calibration of the models. Furthermore, the multivariate procedures were validated by means of bootstrap bagging with 1000 samples. In the bootstrap procedure repeated samples of the same number of observations as the original database (n = 743) were selected with replacement from the original set of observations. For each sample, stepwise logistic regression was performed, entering the variables with a P value of less than .1 at univariate analysis. The stability of the final model can be assessed by identifying the variables that enter most frequently in the repeated bootstrap models and comparing those variables with the variables in the final model. If the final stepwise model variables occur in a majority (>50%) of the bootstrap models, the original final stepwise regression model can be judged to be stable.6,10,11
We have previously shown that bootstrap analysis might be particularly advantageous in moderate-sized samples inasmuch as it allows the use of the entire dataset for model developing without the need to split and further reduce the sample size (and the number of outcome cases).6
The logistic models were then used to predict morbidity and mortality in the patients undergoing operations in the 2 different units. Predicted and observed outcome rates in each unit were then compared, and P values were calculated from the z test statistic for the difference between two proportions, which uses the sampling distribution of the statistic to guess population parameters. In the test statistic the numerator is the difference between the proportion in the two samples, and the denominator is the standard error of the difference in the two proportions.
All the statistical tests were two-tailed. The analysis was performed with STATA 8.2 (Stata Corp, College Station, Tex) statistical software.
| Results |
|---|
|
|
|---|
|
|
Stepwise logistic regression analysis showed that significant and reliable predictors of morbidity were age (P = .005), ppoFEV1 (P = .003), and cardiac comorbidity (P = .002, Table 3).
|
|
|
The following equation predicting mortality was developed:
|
|
|
We observed differences in baseline and operative characteristics between patients in unit A and those in unit B. In particular, compared with patients in unit B, those operated on in unit A were older (P < .0001), had lower DLCO values (P = .005), and had a greater frequency of cardiac comorbidities but had higher ppoFEV1 values (P = .001, Table 1).
Figure 2 depicts the distributions of the patients according to increasing quartiles of expected risk of morbidity and mortality in the 2 units, respectively. Unit A had a greater frequency of patients at higher risk of morbidity (P < .0001) and mortality (P = .001) compared with unit B.
|
The comparison of predicted and observed outcomes is shown in Table 4. Despite a higher observed morbidity rate in unit A (P = .07), no differences were noted between observed and predicted outcomes rates in each unit.
|
| Discussion |
|---|
|
|
|---|
Two recent, important multi-institutional articles from the American College of Surgeons Oncology Group and the American College of Surgeons Commission on Cancer were aimed at setting modern outcome benchmarks and patterns of care in our specialty. Yet by reporting very different crude outcome figures after major pulmonary resections, likely reflecting different eligibility criteria, they emphasize the need for risk adjustment.12,13
The quality end points must be necessarily risk adjusted to account for differences in patients baseline and operative characteristics. In fact, crude outcome rates might lead to inappropriate clinical and administrative decisions and cause unethical risk-averse behaviors.
As a consequence, risk modeling should become an integral part of any quality-monitoring and quality-improvement program. In our specialty risk stratification for audit purposes is still in its embryonic phase, and only few experiences have been published on this issue.3-5
In particular, examples of risk-adjusted multicentric comparative analysis of performance are lacking.
Recently, the European Association for Cardiothoracic Surgery/European Society of Thoracic Surgeons European Thoracic Database project produced a model of in-hospital mortality from a dataset of more than 3400 lung resections collected voluntarily from 27 units in 14 countries over a period of 3 years.7
This work represented the first multi-institutional multinational effort to develop an objective instrument to analyze the performance of different units across Europe. Future developments of the project are expected to refine the preliminary model and set performance benchmarks in Europe.
The present study must be interpreted as an example of methods and application of risk models for provider-initiated multicentric comparative audit purposes. In fact, it was conceived to develop risk-adjusted morbidity and mortality models to compare the performance of two different thoracic surgery units. Because no model is better than the one derived from the data at hand and because it has been shown that ready-made models applied to external populations perform less well than internally derived models,2,14-16
we elected not to use existing external models. Furthermore, it is known that regression models perform better when applied retrospectively to evaluate the past performance.17
Under a total quality-management perspective, they are not meant to foretell the future but to analyze past data to avoid repeating problems encountered in the past.17
In this regard the retrospective application of a model as a diagnostic quality instrument to the data from which it was developed seems justified, provided a cross-sample validation (bootstrap) had been performed to measure its reliability.6,10,11
In surgical practice morbidity and mortality are the most commonly used clinical indicators of quality. When used as an outcome variable, however, complications have inherent problems: their definition might be complex and subjective, and their recording might vary among different institutions and even within the same unit during successive periods of time. In this work complications were prospectively and independently recorded at two different centers after strict criteria were preliminarily defined. It was our priority to assess the consistency of these definitions between the two centers, and only those complications that were judged to be reliably consistent in definition and recording were used for the analysis (see Appendix 1 for definition of variables). Furthermore, the two main investigators (AB and GR) are member of the European Thoracic Database Committee7
and share the same methods and purposes in variable definition and database quality control. They were designated as the clinical audit leads responsible for periodically verifying the quality of the databases.
The same issue applied for the selection of the variables tested for a possible association with outcomes. Only those high-quality variables that were at least 95% complete in each database and were deemed to be consistent between the two units were selected and used for the analysis. Variables that were more than 5% incomplete, such as ergometric parameters (VO
2max), blood gas analysis measures, and albumin concentration, were not included in the analysis. We are aware that our models might be imperfect and subject to improvements in terms of individual discrimination by the addition of other important factors associated with postoperative outcome (ie, ergometric parameters), but they are presumably the most reliable ones that we could derive in the context of this analysis. They showed a good face value and content validity and had a good predictive validity, as assessed by using the bootstrap bagging simulation. Our models are parsimonious enough to obviate overfitting problems when applied to medium-sized populations (particularly when the events are rare, such as mortality). It has recently been shown that increasing the number of predictors does not necessarily improve the discrimination of the models.18
Conversely, keeping the models as parsimonious as possible might be attractive because it can prevent many problems: cost of data collection, errors and imprecision in data recording, missing values, and instability of the model. The ideal model should be based on clinical, high-quality, prospectively compiled, periodically audited, specialty-specific, and procedure-specific (lung resection) databases. We think our models met these criteria.
Bootstrapping was used to validate the models once they were developed from the entire cohort of patients. We and others have shown that this method is superior to the traditional training and test splitting of the dataset, inasmuch as more reliable and reproducible predictive equations are generated.6,10
Each model was tested on 1000 bootstrap samples of the same number of patients as the original dataset, and only reliable predictors were selected and factored into the final regression equations.
As ever, caution is required in interpreting the prediction of a risk model in an individual patient. The individual discrimination of the models (c statistics) was moderate but in line with the ones reported in other studies.5,8
This common finding in surgical models might be partly due to yet unknown predictors, to the difficulty to represent complex clinical conditions or pathways of care with one or more variables, and to catastrophic random events that are rare in the population but important for the single patients.19
However, the models could be reliably applied to the whole population of lung resection candidates as audit instruments.
Another important issue central to every audit analysis is the definition of quality of care, which most likely is the reflection of the entire process of care rather than of a single outcome end point. However, in the absence of more precise instruments to evaluate the quality of care in its wholeness, multiple end points should be analyzed as a surrogate20
because each end point might be associated with a different aspect of the quality of care. In this regard our selected indicators (morbidity and mortality) are only a few of the multiple end points that could be risk adjusted and used for audit purposes (eg, postoperative stay, intensive care unit admission, technical complications, readmission rate, long-term survival, quality of life, and residual functional state).
After development and validation, our models were applied for predicting the outcomes in the two units, and they were able to prevent misleading information derived from the unadjusted analysis of the performance. In fact, despite unit A having a higher observed morbidity rate compared with that of unit B, the observed outcome rates were in line with the predicted ones in each unit. The increased observed morbidity in unit A could be explained by a worse physiologic state of the patients at the time of operation rather than by a poorer performance, as shown also by the higher frequency of patients with higher predicted morbidity and mortality risks in unit A compared with those in unit B. Without the use of risk adjustment, unit A would have been erroneously regarded as underperforming unit B.
It must be noted that our models were designed for audit purposes only and were not meant to be used for patient selection, a process that should be based more on individual clinical evaluation rather than on a population-based risk model.21
This work confirmed that risk modeling is essential for provider profiling and can be easily used for a fair comparison of the performance between different centers, with the ultimate goal of improving the quality of surgical care. The costs for implementing and managing international multicentric databases (eg, the European Thoracic Surgery Database or the Society of Thoracic Surgeons thoracic database) seem therefore justified by the benefits that could derive from the quality-improvement processes that will be based on them. Even though start-up costs might be daunting, ultimately, improved quality will be cost-efficient, and part of any cost savings realized by improved quality can be even factored into the total costs of gathering and maintaining risk-adjusted data. We think that important international cooperative processes for monitoring and standardization of the pathways of surgical care and for the accreditation of structures cannot leave out of consideration the use of reliable risk-adjustment models. As physicians, we should assume complete responsibility in the evaluation of our performance. We should not let managers and administrators judge our practice through imprecise and improper instruments. At a minimum, it would be in our best interest to provide them with the right evaluation tools, which must necessarily take into account our proficiency in clinical risk adjustment.
| Appendix 1 |
|---|
|
|
|---|
Pulmonary function tests were performed according to the American Thoracic Society criteria. Results of spirometry were collected after bronchodilator administration. DLCO measurement was performed by using the single-breath method.
FEV1, ppoFEV1, DLCO, and ppoDLCO values were expressed as percentages of predicted value for age, sex, and height. ppoFEV1 and ppoDLCO values were calculated by estimating the amount of functioning parenchyma removed during operation by means of bronchoscopy, computed tomography, and quantitative lung perfusion.
For the purpose of the present study and in accordance with previous investigations,4,5
a concomitant cardiac disease was defined as follows: previous cardiac surgery, previous myocardial infarction, history of coronary artery disease, and current treatment for arrhythmia, cardiac failure, or hypertension. We chose to use this definition of cardiac comorbidity for the sake of comparison with previous studies and for numeric reasons. In fact, breaking down the variable in the single cardiac diseases would have resulted in too many cofactors with limited representation. Although not weighed, all cardiac conditions included in the variable are widely recognized cardiac risk factors for noncardiac surgery.
Outcome Variables
For the purpose of this study, according to previous studies5,8,9
and to the European Association for Cardiothoracic Surgery/European Society of Thoracic Surgeons thoracic surgery database,7
the following complications were included: respiratory failure requiring mechanical ventilation for more than 48 hours, pneumonia (chest radiographic infiltrates, increased white blood cell count, and fever), atelectasis requiring bronchoscopy, adult respiratory distress syndrome, pulmonary edema, pulmonary embolism, myocardial infarction (suggestive electrocardiographic findings and increased myocardial enzymes), hemodynamically unstable arrhythmia requiring medical treatment, cardiac failure (suggestive chest radiographs, physical examination, and symptoms), acute renal failure (change in serum creatinine level >2 mg/dL compared with preoperative values), and stroke. For numeric reason, we did not separate cardiac and pulmonary complications. We also did not weigh complications in keeping with most of the work done on morbidity; however, we included only those complications that increased the complexity of postoperative management, requiring new treatments or a change of treatment, therefore adding up to hospital costs and stay.
| References |
|---|
|
|
|---|
Related Article
This article has been cited by other articles:
![]() |
A. Brunelli, M. Refai, M. Salati, C. Pompili, and A. Sabbatini Standardized Combined Outcome Index as an Instrument for Monitoring Performance After Pulmonary Resection Ann. Thorac. Surg., July 1, 2011; 92(1): 272 - 277. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Brunelli, R. G. Berrisford, G. Rocco, G. Varela, and on behalf of the European Society of Thoracic Surg The European Thoracic Database project: composite performance score to measure quality of care after major lung resection Eur J Cardiothorac Surg, May 1, 2009; 35(5): 769 - 774. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Brunelli, G. Varela, P. Van Schil, M. Salati, N. Novoa, J. M. Hendriks, M. F. Jimenez, P. Lauwers, and on behalf of the ESTS Audit and Clinical Excellenc Multicentric analysis of performance after major lung resections by using the European Society Objective Score (ESOS) Eur J Cardiothorac Surg, February 1, 2008; 33(2): 284 - 288. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Brunelli, M. Salati, M. Refai, F. Xiume, G. Rocco, and A. Sabbatini Risk-adjusted econometric model to estimate postoperative costs: An additional instrument for monitoring performance after major lung resection J. Thorac. Cardiovasc. Surg., September 1, 2007; 134(3): 624 - 629. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |