|
|
||||||||
J Thorac Cardiovasc Surg 2006;132:621-627
© 2006 The American Association for Thoracic Surgery
Evolving Technology |
a Thoracic Surgery Oncology Laboratory and Division of Thoracic Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, Mass
b Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Mass
Received for publication December 15, 2005; revisions received March 16, 2006; accepted for publication March 23, 2006. * Address for reprints: Gavin J. Gordon, PhD, or Raphael Bueno, MD, Brigham and Women's Hospital, Division of Thoracic Surgery, 75 Francis St, Boston, MA 02115 (Email: ggordon{at}partners.org; rbueno{at}partners.org).
| Abstract |
|---|
|
|
|---|
METHODS: We used gene expression profiling data to train a ratio-based predictor model to discriminate among a set of samples (n = 145 total) composed of normal lung, small cell lung cancer, adenocarcinoma, squamous cell carcinoma, and pulmonary carcinoid (the training set). We then examined the optimal test in an independent set of samples (the test set, n = 122). Finally, we used one aspect of the test to determine whether the gene ratio technique was capable of detecting cancer in specimens from fine-needle aspirations performed ex vivo with normal lung (n = 14) and suspected tumor nodules (n = 15) acquired at our institution.
RESULTS: We found that a ratio-based test with 23 genes could be used to classify training set samples with 90% accuracy. This same test was similarly accurate (88%) when applied to the test set of samples. We also found that this test was 87% and 100% accurate at detecting cancer in normal and tumorous fine-needle aspiration specimens, respectively.
CONCLUSION: The gene expression ratio diagnostic technique is likely to aid in the differential diagnosis of solitary lung nodules in patients with suspected cancer and may also prove useful in developing lung cancer screening strategies that incorporate analysis of fine-needle aspiration specimens.
| Introduction |
|---|
|
|
|---|
Most patients with lung cancer are seen with advanced disease not amenable to surgical therapy. However, screening with spiral computed tomography (CT) for lung cancer is a technique rapidly gaining popularity in the United States, with the goal of identifying lung cancer at early stages, when it is far more likely to be curable with surgery.3
Initial studies of this new screening technology have demonstrated a high incidence of nonmalignant nodules in the lungs of former smokers. The preliminary recommendations are to measure radiographic volume change of all subcentimeter nodules at 3-month intervals and to obtain biopsy samples of any growing nodule. Biopsy to obtain definitive diagnosis of any noncalcified nodule greater than 1 cm is also advised.3-6
Biopsies can be accomplished surgically with video-assisted thoracoscopic surgery or by transthoracic fine-needle aspiration (FNA).
Percutaneous CT-guided transthoracic FNA of lung nodules is a safe and well-accepted cytopathologic diagnostic technique that has been applied to lesions as small as 5 mm. There are very few false-positive cytologic diagnoses, but the false-negative rate has been reported to approach 30%.7
The ability of a cytologist to make a correct diagnosis depends on the quality of cells obtained and the preservation of tissue architecture. Cytologic diagnosis by FNA is also hindered by the frequent inability of the cytologist to determine the type of cancer found in the pulmonary nodule and to differentiate metastatic cancer to the lung from primary lung cancer.8
As a consequence, the clinical diagnostic strategy in the management of many newly discovered pulmonary nodules is to surgically remove those nodules for which a definitive benign histologic typing has not been obtained or to monitor all subcentimeter nodules with interval CT scans and remove them surgically if they grow.9
Gene expression profiling with microarrays and complex bioinformatics tools has been used successfully to diagnose cancer and predict disease-related outcome for multiple neoplasms, including lung cancer.10-12
Unfortunately, these models are difficult to assess clinically because they rely on the measurement of expression levels of relatively large numbers of genes with costly data-acquisition platforms and sophisticated algorithms and software. We recently described a method for translating gene expression profiling data into clinically relevant tests with ratios of gene expression for multiple cancers.13-17
Here we report the discovery of differentially expressed genes among normal lung and multiple types of lung cancer. We then used these genes in the development of a gene ratio method for the differential diagnosis of lung cancer or pulmonary nodule. Finally, we provide evidence suggesting that this technique may complement ongoing lung cancer screening strategies through the analysis of FNA samples.
| Methods |
|---|
|
|
|---|
Gene Expression Profiling Data
Microarray data for normal and tumorous tissues were obtained from two sources. Gene expression data for the training set of samples (n = 145 total) were obtained with Affymetrix high-density oligonucleotide microarrays (U95A chip; Affymetrix, Santa Clara, Calif) with probe sets representing approximately 12,000 genes and consisting of normal lung (n = 13) and the following primary tumors: SCLC (n = 7), lung adenocarcinoma (n = 89), lung SCC (n = 24), and pulmonary carcinoid (n = 12).18
Gene expression data for all additional primary and metastatic tumor samples (the test set) were acquired from a single source with the same Affymetrix U95A microarray.19
Primary tumors of the test set consisted of lung SCC (n = 13) and the following adenocarcinomas: prostate (n = 24), colon (n = 20), breast (n = 25), gastroesophageal (n = 12), pancreatic (n = 6), and lung (n = 13). Metastatic tumors in the test set (n = 9) included those arising from breast, colon, prostate, lung, and kidney tumors.
Data and Statistical Analysis
To train an expression ratio-based predictor model, we used an approach similar to that in previous published studies.13-17
We performed five separate analyses to determine differences in gene expression patterns between two groups composed of multiple combinations of tissues chosen from the 145 training set samples. In each of the five training subsets, one group was composed of all available samples of a single tissue type, whereas the other group consisted of a random sampling of all remaining tissue types, with equal representation according to the remaining tissue type with the smallest number of samples. For example, the lung adenocarcinoma training subset (n = 117 total) examined differences in gene expression between two groups, lung adenocarcinoma (n = 89) and not lung adenocarcinoma (n = 28, consisting of 7 samples each of the other four tissue types according to the total number of SCLC tissues). This process was repeated sequentially for the remaining training subsets: SCLC (n = 55 total), normal lung (n = 41 total), lung SCC (n = 72 total), and pulmonary carcinoid (n = 64 total). This experimental design resulted in five training sets with unique sample numbers (and membership) and was used to discover optimal discriminating genes in an unbiased fashion while ensuring equal representation among multiple tissue types.
The selection of predictor genes for use in expression ratio-based diagnosis was performed essentially as described,15,16
with minor modifications. With a 2-sided Student (parametric) t test, we identified statistically significant (see Table 1
for exact P values) genes with inversely correlated average expression levels between both groups in each of the five training subsets. We then filtered the resulting gene lists to find those genes with at least a 2-fold difference in average expression levels between groups. To minimize the effects of background noise, the list of distinguishing genes was additionally refined by requiring that the mean expression level (Affymetrix average difference) be greater than 500 in at least one of the two groups, similar to previous studies.15,16
A large number of genes were found to fit the filtering criteria in each of the training subsets. To further reduce the number of genes, we randomly chose for additional study a total of 8 genes from the among the most statistically significant differentially expressed genes in each training subset. Four of these genes were expressed at relatively higher levels in a single tissue type, and 4 were expressed at relatively higher levels in the remaining tissue types combined. There was a single exception: in the normal lung training subset, only 3 genes were expressed at relatively higher levels in all abnormal tissues. In one training subset (lung SCC), there was a single case of duplication among the genes chosen for further analysis, considering that (1) we randomly chose additional genes for study, (2) we initially identified genes strictly on the basis of their unique Affymetrix probe set identifiers (and not gene name), and (3) the same gene can be represented by multiple Affymetrix probe sets.
|
Real-time Quantitative Reverse TranscriptasePolymerase Chain Reaction
Real-time quantitative reverse transcriptasepolymerase chain reaction (RT-PCR) was performed as described with 2 µg of total RNA.16
Primer sequences (synthesized by Invitrogen) used for RT-PCR were as follows (forward and reverse, respectively): MFAP4 (5'-ACTTCTCCATCTCCCCGAAC-3' and 5'-TGGTAGGACAGGGAGTCACC-3'), PRDX2 (5'-AGACAATGGAATGGCAGCTT-3' and 5'-TGCCCAGAAGTGGCATTAGT-3'), AGER (5'-TCCACTGGATGAAGGATGGT-3' and 5'-CAGCTGTAGGTTCCCTGGTC-3') and SSR4 (5'-GGAGCAGGATGCGTATAGGA-3' and 5'-TCTGACTGCACAGATTCTTGG-3').
| Results |
|---|
|
|
|---|
|
Verification of Expression Level Ratios As a Diagnostic Tool
Next we tested the ability of these five highly accurate expression ratio combinations to diagnose cancer in a separate cohort of 113 primary tumors and 9 metastatic tumors (the test set) for which expression profiling data were available.19
A total of 26 samples (n = 13 each of primary lung adenocarcinoma and lung SCC) were directly relevant to the validation of the model developed here because they were obtained from primary lung lesions. The remaining tumors were adenocarcinomas originating from tissues other than lung or represented metastatic disease, and we used these samples to test multiple hypotheses. We first hypothesized that adenocarcinomas of diverse origin are more similar to one another than to any of the other four tissue types examined in this study with respect to global gene expression patterns, specifically the 23 genes used in the expression ratio diagnostic model. We also hypothesized that the diagnostic model developed here would be equally applicable in analyzing metastatic tumors. To perform this analysis, we used the expression values for all 23 diagnostic genes to calculate the five most accurate 3-ratio combinations and predicted the identity of all 122 samples with exactly the same criteria as before. In this analysis, the classification accuracy for all adenocarcinomas was evaluated without respect to tissue type of origin. The results for the classification of primary tumors (n = 113) are presented in Table 3. Overall, our model was 88% (107/122, 95% CI 81%-93%, P < 106) accurate in identifying the tumor type of test set samples and was 88% (100/113, 95% CI 81%-94%, P < 106) and 78% (7/9, 95% CI 40%-97%, P = .090) accurate within the subsets of primary and metastatic tumors, respectively. Specifically, we found that we could accurately (26/28 or 93%, 95% CI 76%-99%) and significantly (P = 2 x 106) predict the identity of primary lung tumors and successfully diagnose both metastatic lung tumors.
|
| Discussion |
|---|
|
|
|---|
Strategies to reduce mortality from lung cancer include the development and implementation of an effective screening system such as spiral CT for at-risk populations. Even as this technique is being studied, it is also being rapidly implemented by physicians, in many cases on demand by patients willing to bear the cost. Spiral CT of the chest can be excessively sensitive, and it is generally estimated that only 10% of nodules detected in the lungs of smokers are actually cancerous.3,4,25
FNA of newly discovered pulmonary nodules is an attractive technique, but unfortunately is currently limited by the size of the nodule and the accuracy of cytopathologic examination, specifically in distinguishing between false- and true-negative results, which may account for up to a third of all biopsy specimens.7
The major problem is that a negative cytologic result is simply a negative result in the majority of cases. This is often due to inadequate sampling or lack of sufficient cytologic features to call the sample a tumor.8
The gene ratio method can potentially address several of these clinical insufficiencies. For example, it could add a genomic component to the diagnosis that requires only the extraction of very small quantities of tumor RNA (tissue), thereby facilitating the acquisition of samples that would otherwise not demonstrate cytologically diagnostic tumor cells. Also, in the concept of gene ratio-based analysis, a diagnosis of nonmalignant is actually a positive diagnosis of benign tissue and not necessarily just a negative result. In our studies, both of the misclassified FNA samples proved to be normal lung, a disappointing finding considering that the virtue of FNA cytopathologic examination is its low false-positive rate. Since samples of normal lung tissue were harvested from the same patient in an area proximal to the suspected tumor, the margin could have been contaminated with histologically undetectable tumor cells, or transformed epithelial cells that have not yet formed a tumor. Alternatively, the misclassification could have resulted from inherent biologic variability reflected in gene expression. Unfortunately, as in similar pilot studies,12
sufficient material was not available to conduct cytologic analyses, which might have addressed some of these possibilities. To examine all these hypotheses systematically, we are conducting additional studies to refine the list of discriminating genes and prospectively obtaining consent and collecting FNA material linked to cytologic findings for use in follow-up studies.
Our experiments used an ideal scenario (an ex vivo FNA) to test the ability of multiple distinguishing genes to classify normal and malignant tissues accurately in the context of a gene expression ratio-based model. Even though the syringe, needle gauge, and biopsy technique were all similar to those typically used by cytopathologists at our institution, before implementation this technique will require rigorous testing to take into account additional clinical parameters, such as patient movement. Considering that the ex vivo FNA specimen was acquired through the surrounding pulmonary parenchyma and was still accurate at detecting tumor, we believe that the genes as reported will be suitable for use in actual FNA specimens. Recent work by other investigators has demonstrated the general feasibility of using transthoracic CTguided FNA biopsy to obtain material with RNA suitable for even stringent applications, such as gene expression profiling with microarrays.12
The genes used in this study have also been partially validated by another group of investigators who used a single gene pair ratio (RAGE/cyclin-B2) to detect lung cancer.26
We independently found that RAGE (also known as AGER) is overexpressed in normal lung relative to tumor, and thus it is part of our normal lung test (Table 2). We also found that cyclin-B2 was statistically significantly (P = .013) downregulated in normal lung relative to tumors. However, cyclin-B2 was not among our final list of discriminating genes, probably because of fundamental differences in experimental designs, because we examined a broader number of tumor types and used multiple genes and ratios.
In conclusion, we have produced evidence strongly suggesting that FNA specimens are suitable for gene ratio-based detection and diagnosis of lung cancer, and we are now conducting prospective studies to validate these initial proof-of-principle experiments. We ultimately view this technique as an adjunct and extension to current cytopathologic techniques in the evaluation of suspect lung nodules. Whereas cytopathologists require the preservation of tissue architecture and intact cells for definitive diagnosis, our proposed analysis only requires intact tumor RNA. Furthermore, other gene ratio-based tests, such as for the prognosis of lung cancer,14
may also be applicable to the analysis of FNA specimens to aid in tailoring the best therapy for the patient in whom cancer is detected and diagnosed. This approach may ultimately allow clinicians to tailor the therapy of individual cancer patients.20
| Footnotes |
|---|
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. De Rienzo, L. Dong, B. Y. Yeap, R. V. Jensen, W. G. Richards, G. J. Gordon, D. J. Sugarbaker, and R. Bueno Fine-Needle Aspiration Biopsies for Gene Expression Ratio-Based Diagnostic and Prognostic Tests in Malignant Pleural Mesothelioma Clin. Cancer Res., January 15, 2011; 17(2): 310 - 316. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Herbst, R. Jenders, R. McKenna, and A. Marchevsky Evidence-Based Criteria to Help Distinguish Metastatic Breast Cancer From Primary Lung Adenocarcinoma on Thoracic Frozen Section Am J Clin Pathol, January 1, 2009; 131(1): 122 - 128. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Dubey and C. A. Powell Update in Lung Cancer 2006 Am. J. Respir. Crit. Care Med., May 1, 2007; 175(9): 868 - 874. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |