If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Corresponding author. Address for correspondence: Shinji Kohsaka, MD, PhD, Division of Cellular Signaling, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan.
Division of Cellular Signaling, National Cancer Center Research Institute, Tsukiji, Chuo-ku, Tokyo, JapanDepartment of Thoracic Surgery, Graduate School of Medicine, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan
Studies are yet to characterize the differences in molecular profiles of lung adenocarcinoma (LUAD) among divergent ethnic groups. Herein, we conducted comprehensive molecular profiling of LUAD in never or light smokers from Asia to discover novel targetable mutations and prognostic biomarkers of this distinct disease entity.
Methods
We analyzed 996 cases of Japanese LUAD and performed whole-exome sequencing and RNA-seq in 125 cases of Japanese LUAD negative for the driver oncogenes defined by conventional laboratory testing. We also investigated the clinical and pathologic characteristics among the 996 cases.
Results
Driver oncogenes were identified in 88 cases (70.4%) with specific hotspot mutations differing from those in The Cancer Genome Atlas study. Two actionable novel fusions of FGFR2 and NRG2α were also identified. Clustering on the basis of mRNA expression profiles, but not genetic mutational ones, could predict patient prognosis. The risk score generated by the expression of a three-gene set was a strong prognostic marker for overall survival and progression-free survival in our cohort, and was further validated using The Cancer Genome Atlas cohort. Among the 996 cases, each driver alteration is distributed across all histologic subtypes. Adenocarcinoma in situ was identified to harbor driver mutations, suggesting that these alterations are early events in the pathogenesis of LUAD. ERBB2 mutations were over-represented in young adults.
Conclusions
This study indicates the value of applying gene expression profiling for predicting the prognosis after a surgical operation, and that the identification of actionable mutations is important for optimizing targeted drugs in Japanese LUAD.
Globally, it has been reported that lung cancer is the most common cause of cancer-related mortality, being associated with over a million deaths annually; the most common histologic type of lung cancer is lung adenocarcinoma (LUAD).
Erlotinib versus standard chemotherapy as first-line treatment for European patients with advanced EGFR mutation-positive non-small-cell lung cancer (EURTAC): a multicentre, open-label, randomised phase 3 trial.
have been achieved by molecularly targeted therapies directed against receptor tyrosine kinases (RTKs). Studies are also currently being undertaken on other targeted therapies directed against activating alterations in the KRAS, ERBB2, BRAF, MET, RET, NTRK1, and NTRK2.
Safety and antitumor activity of the multitargeted pan-TRK, ROS1, and ALK inhibitor Entrectinib: combined results from two Phase I trials (ALKA-372-001 and STARTRK-1).
Targeting HER2 aberrations as actionable drivers in lung cancers: phase II trial of the pan-HER tyrosine kinase inhibitor dacomitinib in patients with HER2-mutant or amplified tumors.
Dabrafenib plus trametinib in patients with previously untreated BRAF(V600E)-mutant metastatic non-small-cell lung cancer: an open-label, phase 2 trial.
In recent studies, researchers have focused on achieving comprehensive characterization of the changes found in the genome, epigenome, transcriptome, and proteome of cancer specimens to discover new driver genes against which clinically appropriate measures could be implemented.
However, to the best of our knowledge, no studies have yet comprehensively determined the difference in mutational distribution or transcriptional profile among different ethnic groups. So far, studies have only analyzed the impact of ethnicity on genomic alterations in a few driver oncogenes and tumor suppressors.
As lung cancer in smokers is relatively common in Caucasians compared with Asians,
in The Cancer Genome Atlas (TCGA) study, only limited data about smoking-unrelated lung cancer were obtained. Smoking is known to be the major cause of LUAD but, as smoking rates decrease, proportionally more cases are arising in never or light smokers.
highlighting the need to investigate and discover novel genetic factors influencing survival in this population. Studies reported to date have shown that never smokers have lower rates of mutation in the KRAS and TP53 genes than smokers
A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international validation studies.
Director's Challenge Consortium for the Molecular Classification of Lung Adenocarcinoma Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study.
has led to the commercialization of two biomarkers in LUAD (myPlan Lung Cancer and Pervenio Lung Risk Score), but their accuracy for survival estimation remains limited. Although there are certain biases and limitations associated with microarray data, these can be ameliorated through RNA-seq, particularly in the detection of transcripts present at low levels.
further large-scale prospective validation is needed.
In this study, we conducted comprehensive molecular profiling of LUAD in never or light smokers from Asia to discover novel targetable mutations and prognostic biomarkers of this distinct disease entity.
Materials and Methods
Study Design and Patient Specimens
The study cohort consisted of 996 primary LUADs from 920 patients that underwent surgical resection at Juntendo University between 2010 and 2014. Two board-certified pathologists (TH and TS) reviewed the histologic features of the LUADs according to the criteria of the current WHO classification of lung carcinomas.
Travis WD, Brambilla E, Nicholson AG, et al. The 2015 World Health Organization classification of lung tumors: impact of genetic, clinical and radiologic advances since the 2004 classification. J Thorac Oncol. 10:1243–1260.
Fresh frozen tumor samples and formalin-fixed and paraffin-embedded tissue blocks were obtained from all patients. Approval for this study was obtained from the Ethics Committee of The University of Tokyo (No. G3546) and Juntendo University School of Medicine (No. 2014176). Written informed consent was obtained from all patients involved in the present study.
Analyses of Mitogenic Driver Alterations
First, analyses of driver oncogene alterations in a total of 996 primary LUADs were performed in accordance with previously reported methods. Briefly, EGFR mutations were analyzed using the peptide nucleic acid-locked nucleic acid polymerase chain reaction (PCR) clamp method; KRAS mutations using the peptide nucleic acid-mediated PCR clamping method; ALK fusions using break-apart fluorescence in situ hybridization (FISH) and the intercalated antibody-enhanced polymer method; and ROS1 and RET fusions using break-apart FISH.
Samples without mitogenic driver alterations were subsequently analyzed by whole-exome sequencing (WES) and whole-transcriptome sequencing
WES Including Mutation Call, Copy Number Analysis, and Signature Analysis
Genomic DNA was isolated from fresh frozen samples using QIAamp DNA Mini Kit (Qiagen, Hilden, Germany), and 500 ng of each sample was subjected to target fragment enrichment using an Agilent Exome Kit (v6) (Agilent Technologies, Santa Clara, CA). Massively parallel sequencing of isolated fragments was performed with a HiSeq2500 (Illumina) using the paired-end option. Paired-end WES reads were independently aligned to the human reference genome (hg38) using BWA,
Bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml), and NovoAlign (http://www.novocraft.com/products/novoalign/). Somatic mutations were called using MuTect (http://www.broadinstitute.org/cancer/cga/mutect), SomaticIndelDetector (http://www.broadinstitute.org/cancer/cga/node/87), and VarScan (http://varscan.sourceforge.net). Mutations were discarded if any of the following occurred: (1) the read depth was less than 20 or the variant allele frequency (VAF) was less than 0.1, (2) they were supported by only one strand of the genome, or (3) they were present in normal human genomes in either the 1000 Genomes Project dataset (http://www.internationalgenome.org/) or our in-house database. Gene mutations were annotated by SnpEff (http://snpeff.sourceforge.net). Copy number status was analyzed using our in-house pipeline, which determines the logR ratio (LRR) as follows: (1) we selected single nucleotide polymorphism positions in the 1000 Genomes Project database that were in a homozygous state (VAF = ≤ 0.05 or ≥ 0.95) or a heterozygous state (VAF = 0.4–0.6) in the genomes of respective normal samples, (2) normal and tumor read depths at the selected position were adjusted on the basis of GþC percentage of a 100-base pair window flanking the position,
(3) we calculated the LRR as equal to log2 (ti/ni), where ni and ti are normal and tumor-adjusted depths at position i, and (4) each representative LRR was determined by the median of a moving window (1 megabase) centered at position i. The values of LRR of the copy number of both alleles, that of the major allele, and that of the minor allele were determined for every region of the genome. The p values for gain or loss of respective genomic regions were determined from the LRRs with a permutation test (100,000 iterations) following the algorithm used in Genomic Identification of Significant Targets in Cancer
Transcriptome Sequencing, Expression Analysis, and Detection of Fusion Genes and Exon Skipping
Total RNA was extracted from fresh frozen samples using RNA-Bee (Tel-Test Inc., Gainesville, FL), followed by treatment with DNase I (Thermo Fisher Scientific, Waltham, MA) and then poly(A)-RNA selection before cDNA synthesis. The library used for RNA-seq was prepared with a NEBNext Ultra Directional RNA Library Prep Kit (NEB, Ipswich, MA), in accordance with the manufacturer’s protocol. Sequencing was conducted from both ends of each cluster using a HiSeq 2500 or NextSeq platform (Illumina, San Diego, CA). RNA-seq was aligned to hg19 using TopHat (v2.0.9; https://ccb.jhu.edu/software/tophat/index.shtml). The expression level of each gene was calculated using Cufflinks (v2.1.1; http://cole-trapnell-lab.github.io/cufflinks), and gene fusions were detected using the deFuse pipeline (https://bitbucket.org/dranew/defuse). Exon skipping was analyzed using an in-house pipeline as follows: (1) RNA-seq reads were aligned to hg38 and the National Center for Biotechnology Information reference sequence (RefSeq) using Burrows–Wheeler Aligner and Bowtie2, (2) skipped exons were detected from the mapped RefSeq data, (3) virtual transcriptome sequences were created dynamically, (4) RNA-seq reads were aligned to the candidate transcriptome sequences, and (5) exon skipping candidates were identified on the basis of reads with a breakpoint.
Signature Generation and Statistical Analysis
Survival data for the cohort were collected at Juntendo University. TCGA clinical data were downloaded from the TCGA data portal, and manually curated. The duration of overall survival (OS) was defined as the time between the date of surgical intervention and the date of either death or the previous follow-up. The duration of recurrence-free survival (RFS) was defined as the time between the date of surgical intervention and the date of either recurrence, death from any cause, or the previous follow-up. Univariate Cox regression analysis was used to evaluate the correlation between the expression level of each gene and OS in our cohorts. Only genes with q values less than 0.003 and SD greater than 1 were considered as candidate genes for the correlation analysis, and those genes were used to construct the predictive model. The candidate genes were then fitted in stepwise multivariate Cox regression analysis to assess the relative contribution of each gene to survival prediction in our cohort. The genes that correlated with survival were included in the prognostic signature. According to the estimated regression coefficients in multivariate Cox regression analysis, a prognostic risk score for predicting OS was then calculated as follows:
where n is the number of prognostic genes, expi is the expression level of prognostic gene i, and βi is the regression coefficient of gene i.
In this study, all statistical analyses were performed with R (version 3.5.1; https://www.r-project.org/) and its contained packages. Survival analysis and Cox regression analyses were performed using the “survival” (v2.44.1.1) package. OS and RFS were analyzed using the Kaplan-Meier method, and curve differences were evaluated using the log-rank test according to either the risk score or the driver mutation subtypes. The gene set enrichment analysis (GSEA) was performed using Java GSEA software (http://software.broadinstitute.org/gsea/index.jsp) (v2.2.4).
Cell Lines
Human embryonic kidney (HEK) 293T cells and mouse 3T3 fibroblasts were obtained from the American Type Culture Collection and maintained in Dulbecco’s modified Eagle’s medium-F12 (DMEM-F12) supplemented with 10% fetal bovine serum (both from Thermo Fisher Scientific). Ba/F3 cells were cultured in RPMI 1640 medium (Thermo Fisher Scientific) supplemented with 10% fetal bovine serum and mouse IL-3 (20 U/mL; Sigma-Aldrich).
Preparation of Retrovirus and Transduction of Cell Lines
The recombinant plasmids were introduced together with packaging plasmids (Takara Bio) into human embryonic kidney 293T cells to obtain recombinant retroviral particles. For the focus formation assay, 3T3 cells were infected with ecotropic recombinant retroviruses using 4 μg/mL polybrene (Sigma-Aldrich) for 24 hours. They were then subjected to further culture for up to 2 weeks in Dulbecco's Modified Eagle Medium-F12 supplemented with 5% calf serum. Cell transformation was assessed through either phase-contrast microscopy or staining with Giemsa solution.
Alamar Blue Cell Viability Assay
After cell incubation in 96-well plates (with 100 μL of culture medium per well), the addition of 10 μL of Alamar Blue (Thermo Fisher Scientific) was performed, after which the fluorescence was measured using a microplate reader (2030 ARVO X3; PerkinElmer, Waltham, MA) (excitation 530 nm, emission 590 nm) at the indicated times. Wells without cells were assayed as negative controls. Adjustment for fluorescence gain for every well was performed against the well with the maximum fluorescence intensity.
Xenograft Tumor Assays
All animal studies were conducted in accordance with the protocols approved by the Animal Ethics Committee of the National Cancer Research Center, Tokyo, Japan. Before injection, 3T3 cells (1.0 × 106) were mixed in PBS with Matrigel (BD Biosciences, Franklin Lakes, NJ) at a 1:1 ratio. Subcutaneous injection of the cell suspension was performed (at 200 μL/mouse) into 6-week-old female BALB/c nude mice (CLEA Japan, Tokyo, Japan). The mice were treated twice a week with an intraperitoneal injection of either trametinib (10 mg/kg body weight) or vehicle control, which was initiated once the tumors had reached a size of approximately 100 to 150 mm3. The average tumor volume in each group is expressed in cubic millimeters and was calculated using the following formula: π/6 × (largest diameter) × (smallest diameter)
. Tumor injections and volume measurements were performed in a manner blinded to the constructs expressed by the cells used for injection. The mice were killed after 6 weeks of treatment and resection of their solid tumors was performed.
Sanger Sequencing
For capillary sequencing with a 3130xl Genetic Analyzer (Thermo Fisher Scientific), PCR products prepared from 10 ng of template cDNA were used to amplify CD74-NRG2α and FGFR2-MBIP by GoTaq G2 Hot Start Master Mix Green (Promega, Madison, WI), in accordance with the manufacturer’s instructions with the following primers: 5′-CACCTTAAGAACACCATGGAGACC-3′ and 5′-ATTTGATGCGAATGTCTCGGCTGC-3′ for CD74-NRG2α and 5′-ACATGATGATGAGGGACTGTTGGC-3,′ and 5′-GCTTTTCTTCCTCTTGTAGGTCGC-3′ for FGFR2-MBIP.
TaqMan Real-Time PCR Assay
Quantitative real-time PCR was performed using TaqMan assays (20× Primer Probe mix; Thermo Fisher Scientific) corresponding to MAP2K1 (Assay ID AHI16ER for p.E102_I103del, AHKA4KZ for P105_A106del, AHLJ2Q7 for K57N) and GAPDH (Assay ID Hs02758991_g1). All PCR reactions were performed with TaqMan Genotyping Master Mix (Thermo Fisher Scientific) on an Applied Biosystems 7900HT Fast Real-Time PCR System, in accordance with the standard protocols. Cycle threshold values were calculated using the built-in data collection software, and samples with Cycle threshold less than or equal to 37 were considered to be positive. All assays were performed in triplicate.
Immunohistochemical Analysis
Formalin-fixed paraffin-embedded tissues were sectioned and stained with hematoxylin and eosin. Immunohistochemistry was performed on the sections using anti–phospho-EGFR (Tyr1068) antibody (Cell Signaling Technology, Danvers, MA), anti–phospho-HER2 (Tyr1221/1222) antibody (Cell Signaling Technology), anti–phospho-HER3 (Tyr1289) antibody (Cell Signaling Technology), and anti–phospho-HER4 (Tyr1162) antibody (Abcam, Cambridge, United Kingdom) following the manufacturer’s recommendations.
Data Availability
We have deposited the raw sequencing data in the Japanese Genotype-Phenotype Archive (http://trace.ddbj.nig.ac.jp/jga), which is hosted by the DNA Data Bank of Japan, under accession number JGAS00000000215.
Results
Whole-Transcriptome Sequencing and WES on LUAD of Never or Light Smokers With Unknown Driver Oncogenes
The study cohort was composed of 996 patients with primary LUAD from 920 patients who underwent surgical resection at Juntendo University between 2010 and 2014. Of the cohort, 373 patients were heavy smokers (401 ≤ smoking index = cigarettes smoked per day × y of cigarette use), 104 patients were light smokers (101 ≤ smoking index ≤ 400), 510 patients were never smokers (smoking index ≤ 100), and the other nine patients’ smoking history was unknown. The LUAD subtypes included 280 lepidic adenocarcinomas, 268 acinar adenocarcinomas, 110 papillary adenocarcinoma, 103 adenocarcinomas in situ (AIS), 94 minimally invasive adenocarcinomas, 91 solid adenocarcinomas, 42 invasive mucinous adenocarcinomas, five micropapillary adenocarcinomas, two enteric adenocarcinomas, and one fetal adenocarcinoma.
Among 987 LUADs with known smoking history, EGFR mutations, KRAS mutations, ALK fusions, RET fusions, or ROS1 fusions were identified in 435, 121, 22, 10, and 10 patients, respectively, by conventional methods. KRAS G12C, a variant for which several covalent inhibitors have been recently developed, was found in 47 cases (4.8% of the total), and it was more common in heavy smokers (9.7%) compared with never or light smokers (1.8%) (Supplementary Fig. 1). ALK, RET, and ROS1 fusions were determined using either FISH or immunohistochemistry, and fusion partners were confirmed by either Sanger sequencing or next-generation sequencing (NGS) (Supplementary Table 1). A total of 389 patients were negative for all of these driver mutations, among whom 201 had never or only lightly smoked (Fig. 1A).
Figure 1Summary of mutations in lung adenocarcinoma (LUAD) of Japanese who have never or only lightly smoked. (A) Summary of driver oncogenic mutations in LUAD. This shows the number of cases identified as being positive for driver mutations before this study; (B) here, the 13 frequently mutated genes with color coding of their alteration status for each tumor are indicated. The sex and smoking status are shown at the top; (C) schematic diagram depicting the NRG1/2 fusions. The CD74 gene (NM_001025159 at 5q32) is disrupted downstream of exon 6 and is subsequently ligated to a position upstream of either exon 2 of NRG2α (NM_004883 at 5q31) or exon 6 of NRG1 (NM_013956 at 8q12). NRG1/2 fusions identified by RNA-seq are shown with their functional domains. The EGF-like domain is maintained in all fusions identified; (D) transcript variants in NRG1 fusions. The exon junction reads of NRG1 variants supporting specific exons for NRG1α (TMc_α) or NRG1β (TMc_β and En_β) were counted. The ratio of junction reads which corresponds to the ratio of NRG1β/NRG1α was calculated by the formula shown. Exon α and β represent the specific exon of NRG1α and NRG1β, respectively. The exon En represents the downstream exons of NRG1β. The exon TMc represents the downstream exons of exon α and exon En; (E) representative photographs of NRG1/2-positive LUAD. The histology of LUAD with CD74-NRG2α (left), CD74-NRG1 (middle), and SDC4-NRG1 (right) is shown with hematoxylin-eosin staining (top panels). Immunohistochemical stainings for p-EGFR and p-HER2/3/4 are shown in the lower panels. ND, not determined. TM, transmembrane domain; EGF, epidermal growth factor-like domain.
Figure 1Summary of mutations in lung adenocarcinoma (LUAD) of Japanese who have never or only lightly smoked. (A) Summary of driver oncogenic mutations in LUAD. This shows the number of cases identified as being positive for driver mutations before this study; (B) here, the 13 frequently mutated genes with color coding of their alteration status for each tumor are indicated. The sex and smoking status are shown at the top; (C) schematic diagram depicting the NRG1/2 fusions. The CD74 gene (NM_001025159 at 5q32) is disrupted downstream of exon 6 and is subsequently ligated to a position upstream of either exon 2 of NRG2α (NM_004883 at 5q31) or exon 6 of NRG1 (NM_013956 at 8q12). NRG1/2 fusions identified by RNA-seq are shown with their functional domains. The EGF-like domain is maintained in all fusions identified; (D) transcript variants in NRG1 fusions. The exon junction reads of NRG1 variants supporting specific exons for NRG1α (TMc_α) or NRG1β (TMc_β and En_β) were counted. The ratio of junction reads which corresponds to the ratio of NRG1β/NRG1α was calculated by the formula shown. Exon α and β represent the specific exon of NRG1α and NRG1β, respectively. The exon En represents the downstream exons of NRG1β. The exon TMc represents the downstream exons of exon α and exon En; (E) representative photographs of NRG1/2-positive LUAD. The histology of LUAD with CD74-NRG2α (left), CD74-NRG1 (middle), and SDC4-NRG1 (right) is shown with hematoxylin-eosin staining (top panels). Immunohistochemical stainings for p-EGFR and p-HER2/3/4 are shown in the lower panels. ND, not determined. TM, transmembrane domain; EGF, epidermal growth factor-like domain.
We performed whole-transcriptome sequencing on driver oncogene unidentified 126 NSCLC samples (124 LUADs of never or light-smoker and one heavy-smoker LUAD and one heavy-smoker squamous cell carcinoma as control), with 83 of them also undergoing WES. We identified 26 cases with EGFR uncommon mutations, 16 cases with ERBB2 mutations, 17 cases with BRAF mutations, eight cases with MAP2K1 mutations, and three cases with NRG1 fusions (Fig. 1B). A total of 13 cases of MET exon 14 skipping were identified by RNA-seq, with split reads being found that supported the ligation of MET exon 13 to exon 15. Overexpression of any of EGFR, ERBB2, MET, and MAP2K1 was identified in each single case. One case of CD74-NRG2 fusion and one case of FGFR2-MBIP fusion were identified, neither of which had been identified before (Fig. 1C and Supplementary Fig. 2). The RNA-seq data suggested that the transcript variant of NRG2 constituting CD74-NRG2 was NRG2α (NM_004883) whereas NRG1 fusions were composed of different transcript variants including NRG1α and NRG1β (Fig. 1D and Supplementary Fig. 3). The EGF motif was encoded by exon 4 and exon 5 of NRG2α and was included in the CD74-NRG2α fusion. In total, driver oncogenes were identified in 88 cases (70.4%) among 125 LUADs.
Clinicopathologic Characteristic of NRG1/2-Fusion-Positive LUAD
All four cases of NRG1/2-fusion-positive LUAD were women in their 60s or 70s, and they all underwent surgical resection (Table 1). The patient with CD74-NRG2α was treated with erlotinib after tumor recurrence. However, as the patient had a severe skin rash, the physician discontinued erlotinib after 1 month. The two patients with CD74-NRG1 are still alive, with no recurrence, whereas the patient with SDC4-NRG1 died of empyema after operation. The histologic diagnosis of CD74-NRG2α-positive LUAD was acinar adenocarcinoma consisting of round- to oval-shaped atypical glands, whereas CD74-NRG1-positive cases of LUAD were invasive mucinous adenocarcinomas consisting of columnar cells with abundant intracytoplasmic mucin and basally oriented nuclei. SDC4-NRG1–positive tumor was solid adenocarcinomas composed mainly of polygonal tumor cells forming sheets (Fig. 1E).
Table 1Clinicopathologic Characteristics of Patients With NRG1 or NRG2 Rearrangements
Sample ID
Fusion
Age
Sex
Smoking index
Pathologic stage
Stage
Histology
Treatment modality
Outcome (weeks)
LUAD_085
CD74-NRG1
70
Female
100
pT1aN0M0
IA
Invasive mucinous adenocarcinoma
Surgical resection
Alive with no recurrence (130)
LUAD_086
CD74-NRG1
63
Female
0
pT1aN0M0
IA
Invasive mucinous adenocarcinoma
Surgical resection
Alive with no recurrence (110)
LUAD_087
SDC4-NRG1
71
Female
0
pT3N0M0
IIB
Solid adenocarcinoma
Surgical resection
Dead of empyema (22)
LUAD_088
CD74-NRG2
70
Female
0
pT2aN2M0
IIIA
Acinar adenocarcinoma
Surgical resection and elrotonib for recurrent tumor
As phosphorylation of ERBB2/3/4 can be a surrogate marker for pathway activation, immunohistochemical analysis of p-EGFR and p-HER2/3/4 was performed. Tumor cells in CD74-NRG2α-positive case were moderately positive for p-HER4 but negative for p-EGFR, p-HER2, and p-HER3, whereas tumor cells of NRG1 fusion-positive cases were positive for all HER family member phosphorylation (Fig. 1E).
Mutational Signatures in LUAD of Never or Light Smokers
Various carcinogenic and cancer-related processes contribute to mutational patterns observed in tumor cells
Using the Wellcome Trust Sanger Institute Mutational Signature Framework, we identified four mutational signatures in this cohort, many of which are strongly correlated with previously defined signatures in the Catalogue of Somatic Mutations in Cancer database (COSMIC, https://cancer.sanger.ac.uk/cosmic) (Supplementary Fig. 4A and B). These include an apolipoprotein B mRNA-editing enzyme-catalytic polypeptide-like-related signature of a C to G or C to T change at a TCT or TCA site (COSMIC signature 13, abbreviated SI13), a mismatch-repair signature of a C to T change at a GCG site (SI6), a smoking-related signature of a C to A transversion (SI4), and a signature with a moderate correlation to COSMIC signature 5 (SI5) with unknown cause (Supplementary Fig. 4C).
Copy Number Analysis on LUAD of Never or Light Smokers
Chromosomal copy number amplification was observed in chr1q, chr5p (encompassing the TERT locus), chr7p (EGFR), chr8q (MYC), chr12 (MDM2), chr16p, chr17q (ERBB2), and chr20q. Losses of copy number were observed in chr9 and chr17, including in CDKN2A, CDKN2B, and TP53 (Supplementary Fig. 4D). The copy number profile in our cohort is similar to that found in TCGA study.
Somatic Mutations of Driver Oncogenes Detected by NGS Analyses
The mutational profile of our cohort was compared with that of the TCGA cohort with only a limited number of Asian patients. The mean numbers of somatic mutations identified in each tumor of both never and light smokers of our cohort were markedly smaller than those of the TCGA study (average mutation # in never smokers, 21 versus 157; average mutation # in light smokers, 96 versus 299) (Supplementary Fig. 5A). There were also differences between our study and the TCGA study in the frequency of common mutations, such as TP53 (51.3% versus 13.6%), KEAP1 (24.3% versus 3.7%), KMT2C (21.6% versus 3.7%), FAT3 (20.3% versus 1.2%), and STK11 (17.6% versus 1.2%) (Supplementary Fig. 5B). In contrast, there were similarities between the two studies in the frequencies of growth-promoting driver mutations such as in EGFR, BRAF, ERBB2, and MAP2K1.
The somatic mutation profiles of EGFR, BRAF, ERBB2, and MAP2K1 in our study are shown in Supplementary Figure 5C. Here, the BRAF mutation hotspot was K601E, and the MAP2K1 mutation hotspot was E102_I103del, whereas those of the TCGA study or the Genomics Evidence Neoplasia Information Exchange project were G469A/L/R/S/V or V600E in BRAF and K57N/T in MAP2K1 (Supplementary Fig. 6). Even in a cohort from the Memorial Sloan Kettering Cancer Center of 302 never smokers with LUAD, BRAF, and MAP2K1 mutation hotspots differed from those in our cohort.
We identified seven cases with MAP2K1 mutations, six of which were MAP2K1 p.Glu102_Ile103del and the other was p.P105_A106del (Supplementary Fig. 7A). Another 67 never or light-smoker samples and 181 heavy-smoker samples of this LUAD cohort without known driver oncogenes were tested for MAP2K1 mutations by TaqMan single nucleotide polymorphism Genotyping Assays. The findings revealed p.P105_A106del in two never or light smoker samples (Supplementary Fig. 7B). Notably, no MAP2K1 mutation was identified among 181 heavy smokers, suggesting that these MAP2K1 exon 3 deletions are specific for tumors in those who have never or only lightly smoked.
To investigate the transforming potential of these MAP2K1 mutations, focus formation assays were performed; for this purpose, each mutant was transduced into cells of the mouse fibroblast cell line NIH/3T3 (3T3). Transformed foci were observed in the cells expressing E102_I103del, P105_A106del, or the K57N mutant, but not in mock-transfected cells, or those expressing wild-type MAP2K1 or MAP2K1(K97M) (kinase-dead) (Supplementary Fig. 8A). To determine the effects of MAP2K1 mutations on MAPK signaling, we investigated the ability of the mutants to induce ERK phosphorylation in 293T cells. Western blot analyses reported increased kinase activity in the E102_I103del, P105_A106del, and K57N mutants (Supplementary Fig. 8A).
MAP2K1 mutants were transduced into Ba/F3, a murine interleukin-3 (IL-3)–dependent pro–B-cell line, to assess the ability of Ba/F3 to grow independently of IL-3. Cells expressing the E102_I103del, P105_A106del, or K57N mutant could grow even without IL-3; however, this was not the case for the cells expressing the wild-type. Treatment with an MEK1/2 inhibitor inhibited Ba/F3 growth with the MAP2K1 mutants; however, this was not observed for the parental Ba/F3 supplemented with IL-3 (Supplementary Fig. 8B). Furthermore, the 3T3 cells expressing the E102_I103del mutant formed subcutaneous tumors in nude mice, and the tumor growth was significantly (p < 0.01) inhibited in vivo by the treatment with trametinib (Supplementary Fig. 8C).
Whole-Transcriptome Analysis of LUAD
To identify the gene expression profiles associated with clinicopathologic features or gene mutational profiles, we conducted k-means clustering analysis using RNA-seq data. We used the top 100 genes with the most variation among the samples to divide the cohort into two groups by this clustering approach (Fig. 2A).
Figure 2Gene expression profile of lung adenocarcinoma in Japanese who have never or only lightly smoked. (A) The k-means clustering analysis was conducted using RNA-seq data. The clinical information (sex, age, smoking index, pathologic stage) and driver mutation profile are shown in the upper part; (B) Fisher’s exact test was performed to identify the factors associated with either group stratified by k-means clustering. Indicated factors were compared between the left major cluster (cluster 1) and right major cluster (cluster 2) in the heat map of (A); (C) Kaplan-Meier curves of OS and progression-free survival in our cohort stratified by k-means clustering as clusters 1 and 2. Univariate Cox analysis was used to calculate hazard ratio (HR). HR, 95% confidence interval, and p value are shown.
We merged the clinical information and mutational profile with the gene expression data and performed Fisher’s exact test to determine the factors related to either group: the left cluster (cluster 1) and the right cluster (cluster 2). The results reported that the proportion of higher-stage (stage ≥IIA) patients in cluster 2 was significantly greater than that in cluster 1 (p = 0.003, Fig. 2B).
OS and RFS of cluster 2 were significantly worse than those of cluster 1 (OS hazard ratio [HR] = 6.82, 95% confidence interval [CI]: 2.50–18.7, p < 1 × 10−5; RFS HR = 8.97, 95% CI: 3.36–23.9, p < 1 × 10−7). Considering that higher-stage cancer is enriched in cluster 2, it is suggested that the gene expression profile of cluster 2 is associated with advanced cancer (Fig. 2C).
GSEA revealed that the gene set of “E2F_TARGETS” was enriched in cluster 2 and in stage II or higher cancers (Supplementary Fig. 9 and Supplementary Table 2). There was no significant difference in OS and RFS between the driver mutation-positive and -negative cancer groups (OS HR = 1.31, 95% CI = 0.54–3.17, p = 0.5; RFS HR = 1.27, 95% CI = 0.56–2.83, p = 0.6) (Supplementary Fig. 10).
There is a substantial risk for recurrence and death in patients with early-stage LUAD, even after complete surgical resection. The use of adjuvant therapy in LUAD at early stages, particularly stage I, remains controversial because no consistent survival benefit was reported in previous randomized trials. Reliable prognostic biomarkers are critically needed to select patients who are at high-risk for recurrence and who might benefit from additional systemic therapies.
We analyzed approximately 13,000 genes with a SD of fragments per kilobase of exon per million reads mapped greater than 1.0 to ensure adequate variance. Univariate Cox proportional hazards regression analysis reported that 192 genes were statistically significantly correlated with OS (p ≤ 1×10−4), although genes with lower statistical significance may also be important.
The 14 genes with a false discovery rate of less than or equal to 0.003 were used for prognostic signature building using the forward conditional stepwise regression with multivariable Cox analysis in our cohort. This procedure selected a prognostic model with three genes: CCL8, MIS18A, and C1orf131.
We constructed a risk score with the regression coefficients from this model and performed manual selection of a suitable threshold at the 75th percentile (Fig. 3A). High-risk patients, as defined by the three-gene signature-based risk score, had significantly worse OS for all stages (HR = 14.3, 95% CI: 5.03–40.7, p = 1 × 10−10) and for stage I patients (HR = 9.83, 95% CI: 2.45–39.4, p = 7 × 10−5) in our cohort independent of age, sex, smoking index, stage, and gene mutations (Fig. 3B).
Figure 3Three-gene prognostic signature in lung adenocarcinoma in Japanese who have never or only lightly smoked. (A) Three-gene expression and risk score distribution in our cohort by z-score. Here, red indicates higher expression, whereas light blue indicates lower expression. The risk scores for all patients are plotted in ascending order and marked as low risk (blue) or high-risk (red), as divided by the threshold (vertical black line). The risk score threshold is 4.86; (B) Kaplan-Meier curves of OS and progression-free survival for all stages (upper) or for stage I (lower) in our cohort stratified by three-gene prognostic signature into those at high and low risk. Univariate Cox analysis was used to calculate the hazard ratio (HR). HR, 95% confidence interval, p value, and median survival are shown; (C) heatmap of the top 200 genes differentially expressed between those at high and low risk, with red indicating higher expression and blue indicating lower expression; (D) statistically significant gene sets identified by GSEA to be differentially overexpressed in high-risk tumors. Supplementary Table 2 presents the full GSEA results.
To understand the biology underpinning high-risk tumors, we identified the top 100 genes overexpressed and the top 100 genes underexpressed in high-risk tumors (Fig. 3C). Our findings revealed significant (p < 0.05) enrichment of fusion-positive cases in the high-risk group, whereas in the cases with other driver genes, no difference was identified between the low- and high-risk groups (Supplementary Table 3). We found significant (p < 0.01) enrichment for the high-risk tumors for gene sets related to cancer biology, including E2F targets, MYC targets, and G2M checkpoint (Fig. 3D and Supplementary Table 2).
EGFR status could be a strong confounding factor to the three-gene prognostic signature when the patients were treated with EGFR tyrosine kinase inhibitors (EGFR TKIs). To exclude this possibility, we analyzed the performance of the three-gene prognostic signature in patient subsets with the wild-type or mutant status of EGFR. The three-gene prognostic signature risk group provided significant OS stratification in the EGFR wild-type patients (103/125, 82.4%) (HR = 14.1, 95% CI: 4.38–45.7, p = 2 × 10−8) and the EGFR-mutant patients (22/125, 17.6%) (HR = 10.5, 95% CI: 1.08–101, p = 0.01) (Supplementary Fig. 11).
Finally, in a multivariable Cox analysis that includes EGFR and ALK alteration status, we found that the risk score was still statistically significant (HR = 12.61, p = 1.81 × 10−5) (Table 2). None of the mutation statuses was statistically significant in the multivariable analysis.
Table 2Cox Proportional Hazard Models in Japanese Lung Adenocarcinoma Cohort
Two-sided likelihood ratio test. Age, <50 versus ≥50; Smoking index, never smoker versus light smoker + heavy smoker; stage, stage I versus stage ≥II.
Risk score
14.31
2.05E–08
12.61
1.81E–05
Sex
0.82
0.72
1.15
0.82
Age
1.01
0.99
1.45
0.64
Smoking index
2.38
0.1
1
1
Stage
6.6
5.38E–05
4.14
0.01
Fusion
4.98
0.08
0.59
0.54
MET
0.41
0.32
0.17
0.1
BRAF
0.32
0.18
0.59
0.63
EGFR
1.04
0.95
0.76
0.65
ERBB2
1.21
0.76
1.44
0.6
MAP2K1
3.76E–08
0.11
9.88E–08
0.99
HR, hazard ratio.
∗ Two-sided likelihood ratio test. Age, <50 versus ≥50; Smoking index, never smoker versus light smoker + heavy smoker; stage, stage I versus stage ≥II.
Using the same risk score threshold as selected in our cohort, we found that the three-gene prognostic signature risk group significantly stratified the TCGA cohort for OS (HR = 1.58, 95% CI: 1.18–2.10, p = 2 × 10−3), which was independent of age, sex, and stage (Supplementary Fig. 12 and Supplementary Tables 4 and 5).
Clinicopathologic and Genomic Characterization of LUAD
Figure 4A presents the clinical and pathologic characteristics of the patients with oncogenic drivers. A total of 679 patients were shown to have driver oncogenes (68.2%). Overall, each activating alteration was found to be distributed throughout the histologic subtypes such as preinvasive, minimally invasive, and invasive adenocarcinomas.
Figure 4Clinicopathologic characterization and genomic status of lung adenocarcinoma. (A) The types of driver alterations are indicated for each pathologic classification of lung adenocarcinoma; and below (B) the types of driver alterations are indicated for each age group. The right pie chart indicates the types of driver mutations found in patients aged 40 years and below (young adults).
We also identified mutant EGFR, KRAS, ERBB2, BRAF, and MAP2K1, or rearranged ALK and ROS1 alterations in AIS; this supported the idea that these alterations are early pathogenic events in LUAD. Interestingly, BRAF was the second most usually mutated gene (9%) in AIS, followed by MAP2K1 (4%) and KRAS (4%) mutations.
We next compared the frequency of oncogenic alterations in patients aged less than or equal to 40 years (young adults) with that observed in patients aged greater than or equal to 41 years. Among the young adults, there was a high proportion of patients with activating EGFR alterations (46.7% versus 25.0%; p = 0.0538), whereas ERBB2 alterations reported significant differences (20.0% versus 1.12%; p < 0.0001). In addition, in nine young adults (45%), no apparent oncogenic drivers were identified (Fig. 4B and Supplementary Table 6). With respect to rearranged ALK, the patients ranged in age from 41 to 85 years (median 65.5 y).
Discussion
In this study, we performed mutational profiling on LUAD in Japanese patients who had never smoked or only smoked lightly. We identified a distinct mutational profile compared with that in the TCGA study, the data of which were mainly from a non-Asian population. Compared with never smokers or light smokers in the TCGA cohort, there were fewer mutations per tumor in total in our cohort. This suggests that the mechanism underlying the generation of driver mutations might differ between the two cohorts. This difference could be explained by inherited genetic variations or environmental stress such as second-hand tobacco smoke, viral infections, or hazardous chemicals.
In this study, it was revealed that more than half of the patients previously diagnosed as driver oncogene-negative (i.e., negative for EGFR major mutations, ALK fusions, RET fusions, and ROS1 fusions) still had actionable mutations within EGFR, BRAF, MAP2K1, ERBB2, or MET or had fusion oncogenes of NRG1/2 or FGFR2. This reveals the importance of NGS-based clinical sequencing for identifying driver mutations in individual cancers. In particular, RNA sequencing identified CD74-NRG2α, FGFR2-MBIP, and CD74-NRG1, suggesting the clinical utility of RNA-seq that may offer patients the opportunity to enroll in clinical trials that are specific for genomic alterations.
A key finding in this study is the identification of the novel CD74-NRG2α fusion. There are now several global trials targeting NRG1 fusions in solid malignancies (e.g., Merus’ bispecific HER2/HER3 antibody MCLA-128). Thus, it is possible that the treatment could be extended to NRG2α fusions and theoretically to NRG3 and NRG4 fusions albeit that a HER4 antibody will have to employed in certain cases.
As it has recently been shown that NRG1 fusion is involved in various solid tumors, including breast, head, and neck, renal, lung, ovarian, pancreatic, prostate, and uterine cancers,
it is also interesting to investigate the prevalence of NRG2α fusion among solid tumors. The fact that CD74-NRG2α fusion could not be identified in TCGA RNA-seq data of various cancer types
suggests that it may be very rare or more specific to LUAD in Asians. In contrast, NTRK1/2 fusion was not identified in the 996 cases by either immunohistochemistry for pan-TRK, FISH, or RNA-seq suggesting that NTRK fusions in LUAD may not be so common in Asians as in Caucasians.
Immunohistochemically, tumor cells of CD74-NRG2α-positive case were negative for p-HER3, whereas tumor cells of NRG1-fusion-positive cases were focally positive for p-HER3. These data may indicate that NRG2α fusion may possess oncogenic potential through activation of other HER family members.
Our cohort revealed a difference in the distribution of driver oncogenes compared with those in TCGA, the Genomics Evidence Neoplasia Information Exchange project, or the MSK-IMPACT study. For example, in our study, the mutation hotspots for BRAF and MAP2K1 are p.K601E and p.E102_I103del, respectively, but those of the TCGA study were p.G469A/L/R/S/V or p.V600E for BRAF and p.K57N/T for MAP2K1. This might be explained by the difference in smoking status between the two cohorts. The MAP2K1 exon 3 deletions found in this study were only identified in nonsmokers, whereas K57N/T has been reported to be associated with smoking in a previous study.
In contrast to the case for oncogenic mutations, interestingly, the k-means clustering of our expression data enabled patients to be separated into groups with favorable and poor outcomes. Even in stage I cancers, the prognosis of which is generally favorable, the k-means clustering could still identify patients with a poor prognosis. This motivates us to identify the optimal gene set associated with prognosis, especially in stage I.
We defined a three-gene set for predicting the aggressive type of LUAD. This gene set might be used as a biomarker for high-risk patients for whom careful follow-up should be performed after the surgical resection of tumors. The three-gene set was further confirmed to be useful in predicting the prognosis of LUAD in TCGA cohort. However, its clinical utility was not so clear in the TCGA cohort, in which HR for OS/RFS was 1.6, compared with that in the Japanese cohort. This discrepancy in the robustness of the three-gene set in both cohorts might be because TCGA study include patients with known driver mutations who had probably been treated with corresponding TKIs. In that case, genomic status could be a confounding factor regarding the outcome of the patients. Therefore, a large-scale cohort study should be performed to evaluate if the utility of this gene set as a prognostic biomarker for stage I LUAD is limited to Japanese or applicable to other races. Although current standard clinical practice does not support the routine use of genomic/transcriptomic testing in early-stage, reliable prognostic biomarkers may be helpful to select patients who are at high-risk of recurrence and who might benefit from additional systemic therapies. We have indeed included the three genes in the clinical RNA sequencing panel that we designed
and plan to investigate their utility to guide decisions on adjuvant systemic therapy in a prospective LUAD cohort.
One limitation of this study is that patients who may have had co-mutations or co-alteration could have been missed because we did not perform WES if RNA-seq identified any driver oncogenes such as ERBB2 mutation or MET exon 14 skipping variants. Another limitation is that the three-gene set was evaluated only in the tumors at the initial surgical resection. The prognostic value of the marker for patients in recurrent cancer should be investigated in future studies.
The discoveries made here can be easily applied in a clinical setting. Our data indicate the value of applying gene expression profiling for predicting the prognosis after a surgical operation, and that the identification of actionable mutations is important for optimizing targeted drugs. CD74-NRG2α and FGFR2-MBIP are novel actionable fusions and could be targeted by several TKIs currently under development. We believe that our genomic and transcriptomic analyses highlight the importance of precise tumor profiling to provide the best possible care to patients.
Acknowledgments
This study was financially supported in part through grants from the Program for Integrated Database of Clinical and Genomic Information under grant number JP18kk0205003, the Leading Advanced Projects for Medical Innovation under grant number JP18am0001001, the Practical Research for Innovative Cancer Control under grant number JP18ck0106252, and the Project for Cancer Research And Therapeutic Evolution under grant number JP18cm0106502 from the Japan Agency for Medical Research and Development. This work was also supported in part by a grant from Eisai Co., Ltd. The authors would like to thank A. Maruyama and H. Tomita for technical assistance.
Erlotinib versus standard chemotherapy as first-line treatment for European patients with advanced EGFR mutation-positive non-small-cell lung cancer (EURTAC): a multicentre, open-label, randomised phase 3 trial.
Safety and antitumor activity of the multitargeted pan-TRK, ROS1, and ALK inhibitor Entrectinib: combined results from two Phase I trials (ALKA-372-001 and STARTRK-1).
Targeting HER2 aberrations as actionable drivers in lung cancers: phase II trial of the pan-HER tyrosine kinase inhibitor dacomitinib in patients with HER2-mutant or amplified tumors.
Dabrafenib plus trametinib in patients with previously untreated BRAF(V600E)-mutant metastatic non-small-cell lung cancer: an open-label, phase 2 trial.
A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international validation studies.
Travis WD, Brambilla E, Nicholson AG, et al. The 2015 World Health Organization classification of lung tumors: impact of genetic, clinical and radiologic advances since the 2004 classification. J Thorac Oncol. 10:1243–1260.
Neuregulins (NRGs) are cellular signaling proteins that contain epidermal growth factor (EGF)–like domains and play important roles in the development of the nervous and cardiovascular systems. NRG1 and NRG2 are members of six distinct NRG genes (NRG1, NRG2, NRG3, NRG4, NRG5, [tomoregulin-1, transmembrane protein with EGF like and two follistatin like domains 1 {TMEFF1}] and NRG6 [Neuroglycan-C, chondroitin sulfate proteoglycan 5 {CSPG5}, chicken acidic leucine-rich EGF-like domain-containing brain protein {CALEB}) that share an EGF-like domain.