|
|
||||||||
J Thorac Cardiovasc Surg 2004;128:341-344
© 2004 The American Association for Thoracic Surgery
Editorial |
a Department of Thoracic and Cardiovascular Surgery, The Cleveland Clinic Foundation, Cleveland, Ohio, USA
b Department of Biostatistics and Epidemiology, The Cleveland Clinic Foundation, Cleveland, Ohio, USA
c Department of Cardiovascular Medicine, The Cleveland Clinic Foundation, Cleveland, Ohio, USA
Received for publication March 22, 2004; accepted for publication March 26, 2004.
* Address for reprints: Eugene H. Blackstone, MD, The Cleveland Clinic Foundation, 9500 Euclid Ave, Desk F24, Cleveland, OH 44195, USA
blackse{at}ccf.org
| See related editorial on page 396.
|
Nomori and colleagues1 demonstrate the relationship between contrast ratio derived from F-18 fluorodeoxyglucose positron emission tomography (FDG-PET) in cT1 N0 M0 lung adenocarcinoma and pathologic TNM classification, carcinoembryonic antigen levels, lymphatic and vascular invasion, pleural involvement, and tumor differentiation. These observations constitute the scientific merit of the study. Quite properly, the authors go on to ask what the findings mean and, in particular, what clinical inferences they suggest. Based on what appears to be 100% sensitivity of the imaging test, they conclude that if the contrast ratio is less than 0.5, "limited lung resection could be indicated, lymph node dissection or mediastinoscopy could be reduced, or both."
At the heart of these seemingly logically derived clinical inferences lies treachery. We must be quick to state that these same or similar inferences would be drawn by more than 90% of the readership, not just in this context but also in the general context of interpreting the accuracy of any diagnostic test; the authors are well within the mainstream. It is the rare reader who knows that the lid has been blown off many diagnostic tests, particularly the ones cardiologists and cardiac surgeons have come to rely on in ischemic heart disease. Heretofore, our training and backgrounds have been deficient in interpreting the accuracy of diagnostic testing. We have been misled by our ignorance. The data have not been false, but the interpretation and inferences have been.
What went wrong
Nomori and colleagues1 provide important details that give us not only insight into the value of their study but also a clue about the trap they have innocently set for the unsuspecting. The 44 patients presented are a highly selected subset of patients who had (1) major lung resection with mediastinal lymph node dissection and pathologic classification of disease (gold standard or reference standard), (2) tumors of specified size (large enough to be resolved by FDG-PET scanning) and characteristics (<3 cm, noncalcified nodule), and (3) a specific clinical diagnosis of cancer stage based at least in part on the very test they evaluated. Figure 1 shows a patient flow diagram formatted as suggested by the recently published Standards for Reporting of Diagnostic Accuracy (STARD) Initiative.2 Note the many question marks accompanying various n values. What is apparent is that the 44 cases belong to a large group of noncalcified malignant tumors less than 3 cm in diameter on computed tomography, and that these were themselves a subset of 223 patients, probably most of whom did not have a gold standard (reference) diagnosis. A diagram like this shows the many ways bias can be introduced and lead to unjustified inferences.
|
The particular problem here, and the only one we dwell on in this editorial, is known as work-up bias.
Work-up bias
Ransohoff and Feinstein4 coined the term work-up bias for their 1978 New England Journal of Medicine exposé of bias in diagnostic testing. Work-up bias occurs whenever a test is performed and a gold standard (reference) validation is not performed for each patient, and accuracy of the test is reported for only patients with reference validation. This is particularly apt to occur when the gold standard involves an invasive procedure, such as obtaining pathologic tissue in lung cancer. It also occurs when patients with a positive result go on to further testing (sequential-ordering bias2). Work-up bias, or slight variants of it, has been called verification bias,5,6 validation bias,7 referral bias,8,9 sampling bias,10 and selection bias.10-13
The effect of work-up bias on purported accuracy of a diagnostic test is illustrated in Figure 2. 14 Patients with a positive test result are likely to undergo a procedure for tissue pathologic verification, resulting in a disproportionately large share of patients undergoing verification having a positive test. Sensitivity (positive test when disease is present) appears to be high. As a corollary, because few patients undergoing an invasive procedure will have had a negative test result, few of the patients found not to have pathologic disease will have had a negative test. Thus, specificity will appear poor (negative pathology in patients with negative test results).
|
Why are we misled?
Of all diagnostic testing biases, work-up bias is the most counterintuitive.19 Logically, a test's reference values, such as sensitivity and specificity, should be computed by using the subgroup of patients for whom a gold standard test has been made. However, we fail to appreciate that the results of the diagnostic test have themselves determined which patients will receive a gold standard test and which will not. Thus, we have observed lack of work-up bias only in settings in which a surgeon does not believe in the test or ignores it for purposes of decision making, always gets the test results "after the fact," or follows a protocol that requires gold standard testing no matter what is found in diagnostic testing. Otherwise, there is a strong correlation between the test results and performance of gold standard testing,20-24 hence bias.
What to do
Faith in diagnostic tests is being shattered just as "shopping mall diagnostics" are taking off! Although shoppers who submit to such testing are probably a somewhat biased group, they are more likely than known ill patients to represent the general population. Therefore, without work-up bias, one will find these tests rather insensitive in picking up existing disease, but considerably more specific (fewer false-negative results) than we are accustomed to thinking.
So alarming is the present state of diagnostic testing reporting that journals are adopting the STARD checklist.2,25 The STARD Initiative was an international effort stimulated by growing recognition of biases that have fooled us all. Group members developed a 25-item checklist with cryptic explanation. Work-up bias is included in item 16: "The number of participants satisfying the criteria for inclusion that did or did not undergo the index test and/or the reference standard; describe why participants failed to receive either test (a flow diagram is strongly recommended)." This deceptively simple statement hardly seems to address biases, but it is absolutely fundamental because it is the nature of the patients tested and the influence of the test on whether the diagnosis is verified that introduce bias.
With respect to the article by Nomori and colleagues,1 the STARD statement seems not to preclude publishing such articles.26 Rather, it encourages authors to state carefully all subsets of their population and to consider the many sources of bias. It is presumed that authors (and readers) will use that information in interpreting their data, being particularly careful not to extrapolate conclusions to patients with yet unknown extent of disease.
Is warning, awareness, or even a 25-point checklist sufficient? We would suggest that as a minimum, such articles acknowledge that accuracy of testing has not been corrected for bias. Perhaps in the face of the rampant misinterpretation of test accuracy, whenever it is possible to estimate magnitude of the bias, correction of referent values for bias should be required.12,13
All is not lost
If the reader's appropriate profound disillusion with diagnostic testing has now reached the level of despair, we suggest that just because a test performs poorly diagnostically (once work-up bias is accounted for) does not necessarily mean it is useless clinically. It may be that the test still has substantial prognostic value. This has been found to be the case, for example, with stress testing.27 Schröder and Kranse28 suggest that new recommendations for prostate cancer screening should arise from the European Screening for Prostate Cancer trial and the Prostate, Lung, Colorectal and Ovary trial, which focus on whether screening reduces mortality. That is, they seem to be suggesting that screening tests should focus on long-term results rather than accuracy of diagnosis. Screening tests may also be of value for identifying patients most likely to respond to therapy, particularly those therapies that carry important morbidity, such as chemoradiotherapy. Of course, study of prognostic importance requires long-term clinical studies and well-designed clinical trials, which are clearly more difficult and expensive to perform than studies of diagnostic accuracy.
Further reading
For cardiothoracic surgeons, we highly recommend the article by Kelly and associates,8 who review a large number of sources of bias in diagnostic imaging for esophageal cancer. The Mayo Clinic group provides an appendix that illustrates Begg and Greenes' method for correcting work-up bias.9
References
This article has been cited by other articles:
![]() |
E. Lim and M. Dusmet Remediastinoscopy: a statistical reinterpretation. J. Thorac. Cardiovasc. Surg., January 1, 2009; 137(1): 254 - 255. [Full Text] [PDF] |
||||
![]() |
M. S. Lauer, S. C. Murthy, E. H. Blackstone, I. C. Okereke, and T. W. Rice [18F]Fluorodeoxyglucose Uptake by Positron Emission Tomography for Diagnosis of Suspected Lung Cancer: Impact of Verification Bias Arch Intern Med, January 22, 2007; 167(2): 161 - 165. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Kligfield and M. S. Lauer Exercise Electrocardiogram Testing: Beyond the ST Segment Circulation, November 7, 2006; 114(19): 2070 - 2082. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |