|
|
||||||||
J Thorac Cardiovasc Surg 2004;128:823-825
© 2004 The American Association for Thoracic Surgery
Statistics for the Rest of Us |
a Department of Thoracic Surgery, Guy's Hospital, London, UK
b Clinical Operational Research Unit, Department of Mathematics, University College, London, United Kingdom
Received for publication February 23, 2004; accepted for publication March 4, 2004.
* Address for reprints: Tom Treasure, MS, MD, FRCS, Consultant in Thoracic Surgery, Guy's Hospital, St Thomas's St, London SE1 9RT, United Kingdom
Tom.Treasure{at}ukgateway.net
| See related article on pages 807, 811, 820, and 907.
|
Marc de Leval1 presented a cumulative sum (CUSUM) chart to The American Association for Thoracic Surgery in Chicago 10 years ago. His graph (Figure 1) showed an outstanding series of 52 cases performed with just 1 death early in the adoption of the arterial switch operation for transposition of the great arteries. The CUSUM graph concluded with a similarly excellent, nearly flat line with 1 death in the most recent 39 cases. Sandwiched between them was a cluster of deaths. de Leval's CUSUM chart was simple, explicit, and intuitive; each operation moved the graph 1 unit along the horizontal axis, and each death moved it up by 1 unit on the vertical axis. It enabled those of us fortunate to be present at what proved to be a landmark presentation to follow the story with absolute clarity. de Leval charted the results for a single procedure. Prompted by his presentation, and convinced that this method would help us display and understand our outcomes better, we worked toward a method of displaying data sequentially that would also allow for variable risk in series of different case mix. We dubbed the method variable life-adjusted display (VLAD),2 but other terms have also been used.3
|
|
|
Our second example deals with results of a single surgeon during 1990 to 1994, well before the issues surrounding performance monitoring heated up (Figure 4). The surgeon was believed by colleagues to be underperforming, but believed himself to have acceptable results given the case mix undertaken, and indeed he was not a man to shy away from risk. He defended himself vigorously against what he regarded as an attempt at unfair dismissal and took his employing hospital to court. His colleagues had the sorry task of explaining Figure 4 to lawyers. Earlier explicit display of performance in an intuitively obvious form might have saved some lives, as well as the surgeon's reputation.
|
VLAD charts are gaining in popularity because they are intuitive and can reveal trends meriting inspection and discussion. We have given some examples where display of the results is itself enough to lead to an explanation and a solution. We are left, however, with the problem of how to decide what is acceptable, what is questionably acceptable, and how to determine at what point results have become unacceptably bad. The debate about whether we can or should construct control limits on the charts is an important one,3,4,6 but it is also important to remember that displaying sequential risk-adjusted data does not create a problem, although it may reveal one. We have always had to decide how to declare what is "significantly" worse than an acceptable standard. It is inherent in the problem that an alert (to use de Leval's term1) must signal before the conventional level of scientific proof required to test a scientific hypothesis; if we allow events to run their course until a conventional level of significance is reached (such as P = .05), many lives will have been lost.
Whenever results of treatment are collected, it would be sensible to ask, "For what purpose?" Current pressure in the United Kingdom to collect and make available mortality data is for the early recognition of episodes of underperformance, whether from the surgeon's skill, the institution's systems, or any other cause. The SCTS has decided that it will present nonrisk-adjusted mortality data with 99.99% confidence intervals. This means that only once in 10,000 instances would a run of deaths occurring by chance (that is to say, common-cause variation4) be unfairly attributed to a surgeon experiencing a run of "bad luck." Of course, the corollary is that true problems may not be detected in time to avert disasterand yet one might have thought that was the original purpose of monitoring. In defense of this decision, the SCTS expresses the view that those on the inside will already have discovered that performance is slipping, which is a way of saying that the open publication of results is window dressing for public consumption, and meanwhile, surgeons will look after their own affairs. The SCTS is of the view that real problems will be picked up long before their proposed test becomes positive, so the public will never see a surgeon's results cross the P = .0001 threshold. Let us hope so for the patients' sake (for that represents a lot of deaths), the surgeon's sake (for that will be a mighty fall), and for the reputation of the surgical profession, which in the United Kingdom has suffered enough suspicion and criticism in recent years. "Trust meI'm a doctor" is no longer sufficient reassurance.
For those who remain uncomfortable without P values, formal statistical inference derived from sequential mortality data requires great care.6 Certainly VLAD charts should not be used as a method for formal hypothesis testing. Whatever methods are used for such a purpose, they must take into account repeated testing and the fact that successive hypothesis tests are carried out on data sets that overlap. Case-mix correction considerably complicates hypothesis testing. For example, it may be mathematically convenient to test the hypothesis that there has been a uniform inflation of all risks by the same factorthe assumption in the control limit methods described by Rogers and colleagues.4 Whether this reflects the realities of what happens if a surgeon's performance declines is dubious, because one might expect a disproportionately high increase in mortality for technically challenging rather than routine procedures. Further, we have noted that surgeons with overall excellent results have had runs of as many as 50 to 100 operations in which their results fell below those expected according to a risk model, even for the now too-forgiving Parsonnet model.
We reiterate all the caveats of Rogers and colleagues,4 in particular that VLAD was intended as a means of data display and with appropriate control limits might serve as a statistical "ready reckoner," but it should not be used as a hypothesis testing method.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
E. A. Bacha, L. A. Larrazabal, F. A. Pigula, K. Gauvreau, K. J. Jenkins, S. D. Colan, F. Fynn-Thompson, J. E. Mayer Jr., and P. J. del Nido Measurement of technical performance in surgery for congenital heart disease: The stage I Norwood procedure J. Thorac. Cardiovasc. Surg., October 1, 2008; 136(4): 993 - 997. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. A. Larrazabal, P. J. del Nido, K. J. Jenkins, K. Gauvreau, R. Lacro, S. D. Colan, F. Pigula, O. J. Benavidez, F. Fynn-Thompson, J. E. Mayer Jr, et al. Measurement of Technical Performance in Congenital Heart Surgery: A Pilot Study Ann. Thorac. Surg., January 1, 2007; 83(1): 179 - 184. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Novick, S. A. Fox, L. W. Stitt, T. L. Forbes, and S. Steiner Direct comparison of risk-adjusted and non-risk-adjusted CUSUM analyses of coronary artery bypass surgery outcomes. J. Thorac. Cardiovasc. Surg., August 1, 2006; 132(2): 386 - 391. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Lacour-Gayet, J. P. Jacobs, D. R. Clarke, J.W. Gaynor, M. L. Jacobs, R. H. Anderson, M. J. Elliott, B. Maruszewski, P. Vouhe, and C. Mavroudis Performance of surgery for congenital heart disease: Shall we wait a generation or look for different statistics? J. Thorac. Cardiovasc. Surg., July 1, 2005; 130(1): 234 - 235. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |