|
|
||||||||
J Thorac Cardiovasc Surg 2008;135:627-634
© 2008 The American Association for Thoracic Surgery
General Thoracic Surgery |
a Department of Surgery, Medical University of South Carolina, Charleston, SC
b Department of Pathology and Laboratory Medicine, Medical University of South Carolina, Charleston, SC
c Department of Biostatics, Bioinformatics & Epidemiology, Medical University of South Carolina, Charleston, SC
d Department of Laboratory Medicine & Pathology, Mayo Clinic, Jacksonville, Fla
e Division of Gastroenterology & Hepatology, Mayo Clinic, Jacksonville, Fla
Received for publication April 23, 2007; revisions received September 3, 2007; accepted for publication October 26, 2007. * Address for reprints: Carolyn E. Reed, MD, Medical University of South Carolina, 96 Jonathan Lucas St, 418 CSB, Charleston, SC 29425. (Email: reedce{at}musc.edu).
| Abstract |
|---|
|
|
|---|
Methods: Twenty-two prognostic genes for the metastatic phenotype were identified through complementary DNA microarray analysis of 4 cancer cell lines and bioinformatics analysis. Expression levels of a subset of these genes (n = 13) were measured by real-time time reverse-transcriptase polymerase chain reaction in formalin-fixed paraffin-embedded primary adenocarcinoma from patients whose disease recurred within 2 years (n = 9) and in patients who did not have a recurrence (n = 11). Receiver operating characteristic curves were analyzed to establish prognostic values of single genes. The most informative gene was combined with the remaining genes to determine whether there was a particular pair(s) that yielded high diagnostic accuracy. A small validation study was performed.
Results: Receiver operating characteristic curve analysis of the single genes revealed that high expression of CK19 was associated with nonrecurrence (area under the curve = 0.859, confidence interval = 0.651–0.970). The CK19/EpCAM2 gene ratio had the most reproducible prognostic accuracy, followed by the CK19/P-cadherin ratio. A Kaplan–Meier survival analysis generated from the CK19/EpCAM2 ratio resulted in highly significant curves as a function of marker positivity (P = .0007; hazard ratio = 10.7). Significance declined but was maintained in the validation study.
Conclusions: This preliminary study provides evidence that the CK19/EpCAM2 and/or CK19/P-cadherin ratio(s) may be a simple and accurate prognostic indicator of clinical outcome in early-stage adenocarcinoma of the lung. If further validation studies from large patient cohorts confirm the results, adjuvant therapy could be targeted to this high-risk group.
| Introduction |
|---|
|
|
|---|
The development of metastatic disease is the most common cause of death among patients with NSCLC and results from dissemination of malignant cells. It is now recognized that the ability of cells to gain metastatic potential is an intrinsic property of the primary tumor, which is substantiated by the high correlations between clinical outcome and gene expression profiles of a variety of primary tumors.6,7
The ability to predict clinical outcome on the basis of analysis of primary tumors would allow patients with cancer to be treated more effectively. However, the problem with many of these expression studies is that they require measurements of large sets of predictive genes using a platform (complementary DNA [cDNA] microarray analysis) that is not well suited to clinical application.
In this pilot study, we hypothesized that clinical outcome of patients with resected early-stage adenocarcinoma of the lung could be predicted by the expression of relatively few, but critically important, genes measured by quantitative real-time reverse-transcription polymerase chain reaction (RT-PCR) in formalin-fixed paraffin-embedded primary tumors. Specifically, we hypothesized that there exists a "good gene" and a "bad gene" such that the ratio of the two is a strong prognostic indicator of clinical outcome.
| Materials and Methods |
|---|
|
|
|---|
|
The correlation map obtained by this bioinformatics data mining approach contained a total of 22 genes (
Figure 1). Seven of the 22 genes (AGR2, Map7, S100P, CK19, EpCAM1, EpCAM2, and P-cadherin) were derived from the list of 15 most highly expressed genes and are referred to as the primary prognostic genes (underlined in Figure 1). The remaining 15 genes identified from this bioinformatics approach are referred to as the secondary prognostic genes (italicized in Figure 1).
|
A small validation study was performed using paraffin sections from patients with early-stage adenocarcinoma who had an early recurrence (n = 10) and survived greater than 2 years (n = 12) undergoing resection at the Mayo Clinic, Jacksonville, Florida.
Real-time RT-PCR of formalin-fixed paraffin-embedded samples was performed according to the method of Sprecht and associates.9
A 50-µm section was cut from tissue blocks of primary tumor for messenger RNA extraction. For isolation of RNA, paraffin-embedded tissue sections were deparaffinized twice with 1 mL of xylene at 37°C or room temperature for 10 minutes. The pellet was subsequently washed with 1 mL of 100%, 90%, and 70% ethanol and air-dried at room temperature for 2 hours. The pellet was resuspended in 200 µL of RNA lysis buffer (2% lauryl sulfate, 10 mmol/L Tris-HCl [pH 8.0], and 0.1 mmol/L ethylenediaminetetraacetic acid) and 100 µg of proteinase K and incubated at 60°C for 16 hours. RNA was extracted by 1 mL of phenol/chloroform (5:1) solution (Sigma Chemical Company, St Louis, Mo). The aqueous layer containing RNA was transferred to a new 1.5-mL tube. Phenol/chloroform extraction was done a total of 3 times. RNA was precipitated with an equal volume of isopropanol, 0.1 volume of 3 mol/L sodium acetate, and 100 µg of glycogen at –20°C for 16 hours. After centrifugation at 12,000 rpm for 15 minutes (4°C), the RNA pellet was washed with 70% ethanol and air-dried at room temperature for 2 hours. Finally, the pellet was dissolved in 12 µL of diethyl pyrocarbonate water. cDNA synthesis was performed with a panel of truncated gene-specific primers. Real-time RT-PCR was performed on a PE Biosystems Gene Amp 7300 or 7500 Sequence Detection System (PE Biosystems, Foster City, Calif). With the exception of the SYBR Green I master mix (purchased from Qiagen, Valencia, Calif), all reaction components were purchased from PE Biosystems. Standard reaction volume was 10 µL and contained 1X SYBR RT-PCR buffer, 3 mmol/L MgCl2, 0.2 mmol/L each of deoxyadenosine triphosphate, deoxycytosine triphosphate, deoxyguanosine triphosphate, 0.4 mmol/L deoxyuridine triphosphate, 0.1 U UngErase enzyme, 0.25 U AmpliTaq Gold, 0.35 µL cDNA template, and 50 nmol/L of oligonucleotide primer. Initial steps of RT-PCR were 2 minutes at 50°C for UngErase activation, followed by a 10-minute hold at 95°C. Cycles (n = 40) consisted of a 15-second melt at 95°C followed by a 1-minute annealing/extension at 60°C. The final step was a 60°C intubation for 1 minute. All reactions were performed in triplicate. Threshold for cycle of threshold (Ct) analysis of all samples was set at 0.5 relative fluorescence units.
Gene expression values were quantified as
Ct values, which were obtained by subtracting the Ct value of an internal reference control gene (β2-microglobulin, B2M) from the gene of interest. Ct values are inversely proportional to gene expression levels and are based on log2 scale.
The results were internally validated by repeating the real-time RT-PCR process using a new section cut from tissue blocks of the primary tumor. Variability of tumor quantity on the sections was minimized by hematoxylin and eosin comparison performed by a pathologist. A cross-validation procedure was used to determine whether the results were sensitive to the samples included. A leave-one-out procedure was used whereby each sample was systemically removed and the data reanalyzed.
Statistical Analysis
To assess for prognostic accuracy, we analyzed receiver operating characteristic curves on the individual genes normalized to B2M (Med Calc Software, Mariakerke, Belgium). Prognostic gene combinations were tested by subtracting
Ct values generated by RT-PCR analysis. Subtraction of
Ct values (
Ct) is equivalent to the log of the ratio of values. In the text, the
Ctgene A –
Ctgene B calculation is abbreviated as a gene expression ratio. The value of the 2-gene prognostic assay was further assessed by Kaplan–Meier survival analysis.
| Results |
|---|
|
|
|---|
The correlation map illustrated in Figure 1 resulted from a unique bioinformatics analysis that led to a set of genes that had specific structured connections based on a query of 15 genes overexpressed in 4 lung cancer cell lines. Of the 22 identified genes, 7 were in the original query set and were labeled primary prognostic genes. These genes combined with 6 of the most frequently expressed remaining 16 secondary genes constituted the study's test gene set in patients with adenocarcinoma of the lung. This unique approach is somewhat similar to the description of expression profiles in different tumors in terms of behavior modules, sets of genes that are in concert to carry out a specific function.10
In fact, many of the genes in this study test set were contained in one of the modules (module 180) described by Segal and colleagues.10
Area under the curve (AUC) values for the primary and secondary genes are shown in
Table 2. Receiver operating characteristic curve analysis of the individual genes revealed that high expression of CK19 was associated with nonrecurrence (
4 years) (AUC = 0.859; 95% confidence interval [CI] = 0.651–0.970); whereas high expression of EpCAM2 was associated with disease recurrence within 2 years (AUC = 0.606; 95% CI = 0.366–0.813).
|
Ct values of individual genes as determined by real-time RT-PCR analysis from
CtCK19. For all potential CK19/gene X combinations, the ratio of CK19/EpCAM2 yielded the highest prognostic accuracy as determined by AUC measurements (
|
To further assess the value of CK19 unpaired and paired with EpCAM2, we performed a Kaplan–Meier survival analysis using data generated from single marker and CK19/gene X analyses. For the single CK19 marker, a
Ct cutoff of 11.4 was used, which separated the 20 patients into high (
Ct < 11.4; n = 13) and low (
Ct > 11.4; n = 7) expressing tumors. A log–ranked test indicated that the two curves generated as a function of marker positivity were different at a P value of .0021 with a hazard ratio of 6.2 (
Figure 2, A). For the CK19/EpCAM2 ratio, a 
Ct cutoff of 7.2 was used, which separated the 20 patients into high (
Ct
7.2; n = 13) and low (
Ct > 7.2; n = 7) groups that correlated with survival. A log–ranked test indicated that the two curves generated as a function of marker positivity were different at a P value of .0001 with an associated hazard ratio of 10.7 (Figure 2, B). Kaplan–Meier survival analysis of other CK19/gene X pairs is shown in Table 3. The gene pair that yielded the second most highly significant curves was CK19/P-cadherin, with an associated hazard ratio of 8.1.
|
| Discussion |
|---|
|
|
|---|
In the present study, we measured the expression of 14 different test genes and one internal reference control gene in primary tumors resected from patients with early-stage NSCLC. Using the B2M gene as an internal reference, we observed that high expression of CK19 was correlated with good clinical outcome (no disease recurrence), whereas high expression of EpCAM2 was correlated with poor clinical outcome (disease recurrence within 2 years). Of all possible 2-gene combinations (n = 105), we further observed that the ratio of CK19/EpCAM2 had the highest accuracy for predicting disease recurrence. The concept of using a 2-gene ratio was previously applied to NSCLC by Gordon and coworkers,19,22
who identified S100P as 1 of 7 prognostic markers. It should be noted that in the Mayo data set, the marker combination of CK19/S100P yielded results similar to CK19/P-cadherin (data not shown). However, the current study is the first to analyze the expression of genes in paraffin samples. In colon cancer, high expression of EpCAM2 (also known as TROP2) has been shown to be associated with a higher frequency of liver metastasis (P = .005) and more cancer-related deaths (P = .046),23
a finding that further supports the concept that for early-stage NSCLC, EpCAM2 is a "bad gene."
The gene pair with the second highest prognostic accuracy for disease recurrence was CK19/P-cadherin. Previous studies have shown that expression levels of P-cadherin in primary tumors correlate with tumor grade in ovarian cancer24
and metastases to the lung in thyroid cancer.25
Further, overexpression of P-cadherin in vitro results in increased cell motility in pancreatic cancer,26
a necessary requirement for establishment of distant metastases. Taken together, these results provide evidence that P-cadherin may also serve as a candidate "bad gene" in NSCLC. Regarding CK19, antibodies to the protein encoded by this gene (and/or a combination of other cytokeratin genes) have been used for the detection of circulating tumor cells in breast, lung, colon, and other cancers.27,28
In the current study, we suspect that CK19 expression levels serve as a reliable indicator of the epithelial content of the primary tumor.
Although there was a recent report of the use of real-time RT-PCR for prognosis of patients with early-stage NSCLC, the current study differs significantly from the approach taken by Chen and colleagues.20
In this report, patient prognosis was based on a simple calculation of a 2-gene ratio, an approach that contains only one "decision node." In the study of Chen and associates, a 5-gene marker panel was used that required a relatively high number of decision nodes (n = 19). An algorithm that uses such a high number of decision nodes for a low number of genes is less likely to be clinically applicable because of its cumbersome nature. In contrast, the microarray study of Potti and coworkers6
required only 5 decision nodes, even though 289 genes were involved.
There are several advantages to the technique used in this preliminary study. It is a simple 2-gene model and uses a technology that is relatively inexpensive and is quickly performed once RNA is extracted. Paraffin-embedded tumor tissue can be screened and an appropriate slide(s) could be sent to a reference laboratory. The technique is amenable to small tissue samples, which may be important if preoperative biopsy directs neoadjuvant therapy.
Several limitations of this pilot analysis need to be acknowledged. First, given the small numbers used for the preliminary study, external verification must be performed on a larger data set before definitive statements are made concerning its application as a prognostic tool. Second, given the number of putative genes that could display either a direct or inverse relationship between expression and prognosis, it is possible that another gene ratio or a combination of two ratio sets will be more informative as patients are added. Correlative experiments looking at protein levels in tumor issues should be a future goal.
In summary, a simple 2-gene molecular model has been developed to predict recurrence in patients having resection of early-stage adenocarcinoma of the lung. The model will require further validation and refinement. It is hoped that in the future a relatively easy, cost-effective, clinically relevant molecular model will be used to individualize therapy in early-stage NSCLC.
| Footnotes |
|---|
| References |
|---|
|
|
|---|
Related Article
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |