If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Division of Intramural Research, National Institute on Minority Health and Health Disparities, Bethesda, MarylandBloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland
CCR Collaborative Bioinformatics Resource CCBR, Center for Cancer Research, National Cancer Institute, Bethesda, MarylandAdvanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, Maryland
Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MarylandCancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland
CCR Collaborative Bioinformatics Resource CCBR, Center for Cancer Research, National Cancer Institute, Bethesda, MarylandAdvanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, Maryland
Departments of Pharmacology and Medicine, Rutgers Cancer Institute of New Jersey, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, New Brunswick, New Jersey
National Institute of Minority Health and Health Disparities, Bethesda, MarylandDepartment of Pathology and Cell Biology, Columbia University Medical Center, Columbia University, New York, New York
Corresponding author. Address for correspondence: Bríd M Ryan, PhD, MPH, Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Building 37, Room 3060C, Bethesda, MD 20892.
Lung cancer incidence is higher among African Americans (AAs) compared with European Americans (EAs) in the United States, especially among men. Although significant progress has been made profiling the genomic makeup of lung cancer in EAs, AAs continue to be underrepresented. Our objective was to chart the genome-wide landscape of somatic mutations in lung cancer tumors from AAs.
Methods
In this study, we used the whole-exome sequencing of 82 tumor and noninvolved tissue pairs from AAs. Patients were selected from an ongoing case-control study conducted by the National Cancer Institute and the University of Maryland.
Results
Among all samples, we identified 178 significantly mutated genes (p < 0.05), five of which passed the threshold for false discovery rate (p < 0.1). In lung adenocarcinoma (LUAD) tumors, mutation rates in STK11 (p = 0.05) and RB1 (p = 0.008) were significantly higher in AA LUAD tumors (25% and 13%, respectively) compared with The Cancer Genome Atlas EA samples (14% and 4%, respectively). In squamous cell carcinomas, mutation rates in STK11 (p = 0.002) were significantly higher among AA (8%) than EA tumors from The Cancer Genome Atlas (1%). Integrated somatic mutation data with CIBERSORT (Cell-type Identification By Estimating Relative Subsets Of RNA Transcripts) data analysis revealed LUAD tumors from AAs carrying STK11 mutations have decreased interferon signaling.
Conclusions
Although a considerable degree of the somatic mutation landscape is shared between EAs and AAs, discrete differences in mutation frequency in potentially important oncogenes and tumor suppressors exist. A better understanding of the molecular basis of lung cancer in AA patients and leveraging this information to guide clinical interventions may help reduce disparities.
Of all racial and ethnic groups in the United States, African American (AA) men have the highest age-adjusted NSCLC incidence rates and the highest age-adjusted mortality rates.
These trends persist despite a lower tobacco exposure among AAs in terms of the number of cigarettes smoked per day compared with European Americans (EAs).
Potential factors associated with these disparities include smoking behaviors—for example, disparities lessen when nicotine intake per cigarette is taken into account
These studies reveal that although most tumor biology is shared between EAs and AAs, specific and potentially actionable differences exist.
Recent improvements in cancer survival have been largely owing to advances in our understanding of cancer genomics its translation into targeted therapies. However, these studies have been dominated by research focused on populations of European or Asian descent.
Cancer Genome Atlas Research Network Comprehensive genomic characterization of squamous cell lung cancers [published correction appears in Nature. 2012;491:288. Rogers, Kristen [corrected to Rodgers, Kristen]].
Cancer Genome Atlas Research Network Comprehensive molecular profiling of lung adenocarcinoma [published correction appears in Nature. 2014;514:262. Rogers, K [corrected to Rodgers, K]] [published correction appears in Nature. 2018;559:E12].
Indeed, genomic studies of LUAD in European and Asian populations highlights population heterogeneity. For example, EGFR mutations are found more frequently among Asian populations, and several driver genes with low mutation frequency and specific mutational signatures have been identified.
In recent years, studies using targeted exome sequencing approaches to analyze AA populations have been published. These studies reveal that much of the somatic mutation landscape of NSCLC is shared between EAs and AAs,
Despite these observations, such ancestry differences in NSCLC genomics have yet to be systematically evaluated with whole-exome sequencing (WES) in an AA cohort. A recent study by Lusk et al.,
conducted WES on a subset of lung tumors without known driver mutations, the results of which suggested that an unbiased WES assessment in an unselected sample set was warranted. To fill this knowledge gap, we used WES on matched tumor and noninvolved tissue pairs.
Materials and Methods
Patient Samples and DNA Extraction
Patients were selected from an ongoing case-control study conducted by the National Cancer Institute (NCI) and the University of Maryland. Patients for this study were recruited between 1984 and 2013. At the time of surgical procedure, a portion of the tumor specimen and noninvolved adjacent lung tissue was flash frozen and stored at −80°C until needed. Clinical and pathologic information was obtained from medical records, tumor boards, and pathology reports (Table 1). Never-smokers were defined as having smoked less than 100 cigarettes in their lifetime, former smokers were defined as individuals that quit smoking more than 1 year at the time of the interview, whereas current smokers included individuals that continued to smoke and individuals that had quit smoking within 1 year of interview. A participant’s sample was included if the patient was a candidate for surgery, gave informed consent, and, if after pathologic assessment, there was enough fresh frozen tissue for research analyses. Furthermore, the participant needed to have a matched genomic DNA available for comparison, and the DNA extracted needed to meet sufficient quality control criteria to be included in the WES analysis.
Education categories: low (elementary school, middle school, and 10th or 11th grade); medium (high school or GED, some college, technical school); and high (college, professional school).
Pack-years of smoking: low (<19 pack-years smoked); medium (19.0–36.7 pack-years smoked); and high (>36.7 pack-years smoked)
n (%)
Low
26 (31.7)
Medium
23 (28.0)
High
25 (30.5)
Unknown
8 (9.8)
Menthol use, n (%)
Yes
26 (31.7)
No
12 (14.6)
Unknown
44 (53.7)
Histologic subtype, n (%)
LUAD
36 (43.9)
LUSC
39 (47.6)
BAC
4 (4.9)
Other
3 (3.7)
Stage, n (%)
I
49 (59.8)
II
23 (28.0)
III
5 (6.1)
Unknown
5 (6.1)
AA, African American; BAC, bronchioalveolar carcinoma; BMI, body mass index; GED, general educational development; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma.
a BMI categories: underweight (under 18.5 kg/m2); normal (18.5–24.9 kg/m2); overweight (25.0–29.9 kg/m2); obese (>30 kg/m2).
b Education categories: low (elementary school, middle school, and 10th or 11th grade); medium (high school or GED, some college, technical school); and high (college, professional school).
c Income levels (in US dollars): low (<$15,000); medium ($15,000–60,000); and high (>$60,000),
d Pack-years of smoking: low (<19 pack-years smoked); medium (19.0–36.7 pack-years smoked); and high (>36.7 pack-years smoked)
DNA was extracted from fresh, frozen macro-dissected primary lung tumor tissues using the Qiagen DNeasy Blood and Tissue kit (Qiagen) spin-column procedure, according to the manufacturer’s protocol, as previously described.
Isolated primary lung tumor DNA was initially quantified using a DS-11 spectrophotometer (DeNovix). Subsequent Qubit fluorometer analyses were performed to assess DNA integrity and ensure the presence of intact double-stranded DNA in all samples (Invitrogen). DNA with an A260-to-A280 ratio between 1.8 and 2.0, a minimum concentration of 12 ng/μL, and a total concentration of 100 ng was used for further analysis.
WES and Data Processing
WES was performed at the Cancer Genomics Research Laboratory, the NCI Division of Cancer Epidemiology and Genetics (Gaithersburg, MD). Extracted DNA samples were used for library preparation using the NimbleGen SeqCap EZ Exome capture system (Roche NimbleGen) with 64 megabases (Mb) of exonic sequence targeted, and the resulting postcapture enriched multiplexed sequencing libraries were used in cluster formation on an Illumina cBOT (Illumina, San Diego, CA), and paired-end sequencing was performed with sequence across AHM5YYBBXX, BHMNLHBBXX, AHMCMTBBXX, BHM7M3BBXX, AHMCJHBBXX, and AHMCFYBBXX flowcells on Illumina HiSeq (Illumina) following Illumina-provided protocols for 2 times 125 base pairs (HiSeq 2500) or 2 times 150 base pairs (HiSeq 4000) paired-end sequencing.
Sequence reads were trimmed for adapters and low-quality bases using the Trimmomatic software (version 0.33; Usadel Lab) and then aligned to the human hg19 reference genome using BWA mapping software (version 0.7.15; Github).
Duplicate reads were marked using Picard Tools, followed by realignment and base quality score recalibration using the Genome Analysis Toolkit version 3.8.0.
in joint genotyping mode. Variants were then filtered for quality with the following criteria: (1) single nucleotide polymorphisms, quality by depth less than 2.0, Fisher’s exact test to detect strand bias greater than 60.0, mapping quality less than 40.0, mapping quality RankSum less than −12.5, and ReadPosRankSum less than −8.0; and (2) for Indels, quality by depth less than 2.0, Fisher’s exact test to detect strand bias greater than 200.0, and ReadPosRankSum less than −20.0. For admixture analysis, Indels and any single nucleotide polymorphisms that were not biallelic were removed, and the 1000 genomes phase 3
superpopulations were used as a reference. We also excluded rare variants (≤0.05 frequency across all phase 3 1000 genomes). To maximize the genetic comparisons by race, we examined each patient for African genetic ancestry (Methods) (Supplementary Table 1). We then used the tool Admixture version 1.3.0
to estimate ancestry proportions for each of the 1000 genomes superpopulations (Supplementary Table 1). One patient that self-reported as AA had greater than 60% European ancestry and was therefore excluded from downstream analyses. Four more patients were also excluded from downstream analyses as two had tumor/nontumor tissue mismatch pairs, and the remaining two only had tumor tissue available. Thus, in total, the final study cohort consisted of 82 tumor-nontumor tissue matched pairs.
Somatic Variant Analysis
Somatic variant calling was performed using muTect (version 1.1.7; Broad Institute),
in tumor-normal mode. Mutations called with at least two of these programs were considered in our study. Annotation of variants was performed using Ensembl’s variant effect predictor version 92
A final set of somatic variants were generated using the following stringent filtering criteria. Usually occurring variants annotated with a frequency of greater than 0.001 in the Exome Aggregation Consortium database, Genome Aggregation Database, or 1000 Genomes databases were excluded. Additional filtering steps to keep high-quality variants included the following: (1) mutant allele frequency greater 0.05 in the tumor sample, (2) a count depth of the mutant allele in nontumor sample less than two; (3) a count/depth of the mutant allele in tumor sample greater than four; and (4) a total tumor sequencing depth greater than ×100. Finally, before all downstream analyses, variants in frequently mutated genes in exome data that are likely false positives were also removed.
Tumor mutation burden was defined as the number of somatic mutations in the coding region (Supplementary Table 2) and calculated as the total number of mutation counts divided by the size of the coding sequence region (64 Mb) of the NimbleGen SeqCap EZ Exome capture system. The mutation significance was performed using the MutSigCV algorithm (Broad Institute).
The current version improves the background mutation rate estimation by pooling data from neighbor genes in the covariate space and substantially reduces the number of false-positive findings. Tables with mutation data, per sample coverage, gene covariables, and mutation type were imported to the software. Genes with a Bonferroni-corrected p value of less than 0.1 were considered statistically significant (Supplementary Table 3).
The Cancer Genome Atlas Data
Somatic mutation data for The Cancer Genome Atlas (TCGA)–LUAD and lung squamous cell carcinoma (LUSC) data set were retrieved using the TCGA mutations R package (R Foundation),
CIBERSORT (Cell-type Identification By Estimating Relative Subsets Of RNA Transcripts) data for LUAD and LUSC were extracted from the article by Thorsson et al.
Mutational signatures in the targeted sequencing data were analyzed using R/Bioconductor package MutationalPatterns (Bioconductor). The package covers a wide range of tools, including mutational signatures, transcriptional and replicative strand bias, genomic distribution, and association with genomic features. The reference mutation signatures were obtained from the Catalogue of Somatic Mutations in Cancer (COSMIC) website (https://cancer.sanger.ac.uk/cosmic/signatures) for 65 signatures. These signatures were compared with the pattern of all possible single-base substitutions (SBS) in each sample independently. Etiologies were parsed from the hypertext markup language pages programmatically, the text was clustered, and then manually assigned an source or origin based on the clustered descriptions. Cosine similarity is used as the comparison metric in Figure 3. In addition, the contribution of each known signature was computed as the optimal linear combination of mutational signatures that most closely reconstructs the mutation matrix for each sample (Supplementary Fig. 4).
The datasets generated during this study have been uploaded to The Database of Genotypes and Phenotypes repository in compliance with the National Institutes of Health Genomic Data Sharing Policy. The data can be accessed at The Database of Genotypes and Phenotypes study identification document phs001895.
Results
Study Cohort and Estimation of Genetic Ancestry
We conducted WES on genomic DNA from 82 AA patients using tumor and nontumor matched pairs. Most of these samples were LUSCs (47.6%) and LUAD (43.9%). The median level of African ancestry was 85% (Supplementary Table 1). There were 64 men and 18 women with a mean age of 63.5 years. Over 58% and 3.7% of the patients were current and never-smokers, respectively (Table 1).
Somatic Mutation Landscape in NSCLC From AAs
In total, we detected 141,209 single nucleotide variants and insertion and deletion events (Supplementary Table 1). The most frequent mutation type was a nonsynonymous base change, consistent with previous reports
Cancer Genome Atlas Research Network Comprehensive genomic characterization of squamous cell lung cancers [published correction appears in Nature. 2012;491:288. Rogers, Kristen [corrected to Rodgers, Kristen]].
Cancer Genome Atlas Research Network Comprehensive molecular profiling of lung adenocarcinoma [published correction appears in Nature. 2014;514:262. Rogers, K [corrected to Rodgers, K]] [published correction appears in Nature. 2018;559:E12].
there were 11 samples classified as hypermutated. One patient, a current smoker, had over 40 mutations per Mb. Known DNA repair genes, such as MSH3 and ERCC4, were among the mutated genes in this sample (Supplementary Table 2). The Ti to-Tv ratios were as expected, with the most frequent base substitution bias toward cytosine (C) > adenine (A) transversions followed by cytosine (C) > thymine (T) transitions (Supplementary Fig. 1), both of which are associated with exposure to cigarette smoking.
Identification and Characterization of Somatic Mutations in LUAD and LUSC
We initially used the MutSigCV algorithm to identify significantly mutated genes, given the higher background mutation rate in NSCLC. We found 178 significantly mutated genes (p < 0.05), five of which (TP53, STK11, RB1, CDKN2A, PIK3CG) had passed the threshold for false discovery rate (p < 0.1) (Figs. 1A and B, Table 2, and Supplementary Table 3). TP53 was the most mutated gene with similar frequency in AAs and EAs (using TCGA data as reference) (Table 2). Given the evidence for ethnicity-related EGFR mutations and that patients with exon 19 deletion and L858R mutation have a lower response rate to immunotherapy and greater response to receptor tyrosine kinase inhibitors,
we looked for the presence of EGFR mutations in the AA population and found one patient, a former smoker, with an E746-A750 deletion at exon 19 deletion. All KRAS mutations targeted codons 12 and 13 (Supplementary Table 4). Consistent with our previous findings,
the frequency of mutations in PTPRT and JAK2 genes was higher in AA LUAD tumors (17% and 14%, respectively) compared with TCGA EA patients (8% and 2%, respectively) (Table 2 and Supplementary Table 6).
Figure 1Somatic mutation landscape of lung cancer-derived from WES of 82 lung cancers from AAs. (A) Illustrated are genes with nonsynonymous and indel mutations of greater than 10% frequencies. The mutant frequencies in the cohort are illustrated on the right. Demographic and lifestyle exposures are superimposed at the bottom of the oncoplot. (B) Enhanced mutation frequency of STK11 and RB1 in AAs compared with EAs. (C) Dysregulated interferon signaling among STK11 mutant tumors and infiltration of Th1 and Th2 cells. p Values determined using two-sided Student’s t tests. AA, African American; EA, European American; Interferon gamma, interferon gamma; NCI-MD, National Cancer Institute–Maryland; TCGA, The Cancer Genome Atlas; Th, helper T-cell; WES, whole-exome sequencing.
Table 2Significant (Top Panel) and Most Frequently (Bottom Panel) Mutated Genes in Lung Cancer From AA
Gene
NCI-MD (n = 82) (AA)
AA LUAD
EA LUAD
AA LUSC
EA LUSC
NCI-MD WES FF (n = 36)
NCI-MD targeted FF (n = 54)
TCGA WES FF (n = 52)
TCGA WES FF (n = 381)
NCI-MD WES FF (n = 39)
NCI-MD targeted FF (n = 65)
TCGA WES FF (n = 27)
TCGA WES FF (n = 331)
TP53
0.52
0.42
0.46
0.65
0.49
0.56
0.68
0.93
0.83
STK11
0.15
0.25
0.19
0.21
0.13
0.08
0.03
0.07
0.01
PIK3CG
0.13
0.11
0.06
0.10
0.06
0.13
0.14
0.07
0.09
RB1
0.11
0.14
0.09
0.13
0.04
0.10
0.09
0.07
0.09
CDKN2A
0.10
0.06
NA
0.08
0.03
0.15
0.12
0.07
0.17
LRP1B
0.44
0.44
0.31
0.40
0.32
0.41
0.34
0.19
0.36
CSMD3
0.43
0.39
0.35
0.52
0.38
0.46
0.42
0.56
0.43
SPTA1
0.29
0.39
NA
0.35
0.23
0.18
NA
0.15
0.22
NAV3
0.28
0.36
NA
0.31
0.19
0.13
NA
0.11
0.22
PCLO
0.28
0.22
NA
0.27
0.16
0.31
NA
0.11
0.18
ZFHX4
0.27
0.22
NA
0.48
0.30
0.28
NA
0.22
0.28
COL11A1
0.24
0.33
NA
0.31
0.19
0.15
NA
0.11
0.19
SI
0.24
0.28
NA
0.23
0.15
0.15
NA
0.11
0.18
ZNF536
0.24
0.19
NA
0.29
0.20
0.23
NA
0.26
0.12
FSIP2
0.23
0.22
NA
NA
NA
0.18
NA
NA
NA
RELN
0.23
0.19
NA
0.19
0.15
0.23
NA
0.15
0.18
ANK2
0.22
0.22
NA
0.25
0.19
0.18
NA
0.15
0.14
PTPRB
0.22
0.25
NA
0.12
0.07
0.10
NA
0.07
0.13
ASXL3
0.21
0.25
NA
0.19
0.13
0.13
NA
0.04
0.08
NLRP3
0.21
0.31
NA
0.15
0.10
0.10
NA
0.11
0.07
PAPPA2
0.21
0.22
NA
0.27
0.16
0.10
NA
0.19
0.18
PTPRT
0.12
0.17
0.20
0.21
0.08
0.13
0.06
0.07
0.07
JAK2
0.11
0.14
0.09
0.06
0.02
0.10
0.09
0.04
0.03
AA, African American; EA, European Americans; FF, fresh frozen; LUAD lung adenocarcinoma; LUSC, lung squamous cell carcinoma; NCI-MD, National Cancer Institute Maryland; TCGA, The Cancer Genome Atlas; WES, whole-exome sequencing.
In LUAD, STK11 was mutated in 25% of LUAD tumors from AA patients in the NCI-Maryland (NCI-MD) samples compared with just 13% in TCGA EA samples (two-sample test of proportions p = 0.05) (Fig. 1B, Table 2, and Supplementary Table 6). This increased mutation frequency was replicated in another data set
and in TCGA (Fig. 1B). Similarly, LUSC samples from AAs had a higher STK11 mutation frequency in the NCI-MD tumors (8%) compared with EA tumors from TCGA (1%) (two-sample test of proportions [p = 0.002]) (Fig. 1B, Table 2, and Supplementary Table 6). Again, this observation was validated in our previous data set based on targeted sequencing
and in TCGA (Table 2 and Fig. 1B). Consistent with TCGA data for EAs, the spread of mutations was similar, with no evidence of clear hot spots.
STK11 alterations have been identified as the most prevalent genomic driver of primary resistance to programmed cell death protein 1 axis inhibitors in the context of KRAS-mutant LUAD through the generation of an immune cold tumor microenvironment.
Molecular determinants of response to anti-programmed cell death (PD)-1 and anti-programmed death-ligand 1 (PD-L1) blockade in patients with non-small-cell lung cancer profiled with targeted next-generation sequencing [published correction appears in J Clin Oncol. 2018;36:1645].
Although the co-occurrence of KRAS and STK11 alterations (mutation or amplification) is similar in both EAs and AAs (approximately 25%), we found that most AA tumors with an STK11 alteration also carry KRAS mutations or amplifications (six of seven). The co-occurrence of STK11 and KRAS variants in AAs was detected in both LUAD and LUSC (Supplementary Table 4). We integrated TCGA LUAD somatic mutation data with CIBERSORT data from the supplementary data of this article.
As illustrated in Figure 1C, lung tumors from AAs carrying STK11 mutations have decreased interferon gamma response signatures, a feature also indicative of a cold tumor microenvironment.
In accordance with decreased interferon gamma signaling, we observed a decreased helper T-cell (Th) 1 cell infiltration and increased Th2 cell infiltration, with the strongest effects seen in STK11/KRAS-altered cells (Supplementary Fig. 3). These data suggest that, in AAs, the somatic alterations in STK11 and KRAS are associated with an immune cold tumor microenvironment through decreased interferon gamma–dependent signaling, as opposed to reduced cytotoxic T-cell infiltration. This may be indicative of divergent STK11/KRAS-dependent inflammatory signaling in AAs. The concurrent reduction of interferon gamma–dependent signaling and a Th2–skewed Th1/Th2 balance suggests not only a reduction in the antitumor interferon gamma-Th1–dependent signaling, but also an increase in protumor signaling by Th2 cells.
RB1 and CDKN2A are also mutated at a higher frequency in LUAD among AAs compared with EAs (Fig. 1B, Table 2, and Supplementary Table 6). In AA patients with LUAD, RB1 was mutated at a frequency of 14% compared with 4% in EAs (p = 0.008) (Table 2 and Fig. 1B). This higher frequency was again confirmed in our other data sets.
CDKN2A mutations were also higher among AAs in this previous data set, though these differences did not reach statistical significance. As observed in EAs, CDKN2A and RB1 mutations are primarily mutually exclusive in AAs, with 11 of 36 (31%) LUAD tumors carrying a mutation in either RB1 or CDKN2A. Among AAs with LUAD, PIK3CG was also significantly mutated in our data set and in TCGA at a frequency of 11% and 10%, respectively, which is higher than the 6% observed among EAs (using TCGA as reference) (Table 2).
Additional recurrently mutated genes that did not reach a statistical significance by MutSigCV (but may functionally impact carcinogenesis) were also identified (Table 2). For example, mutation frequencies for genes in AAs with LUAD, supported in our other data sets, included the following: (1) SPTA1 (NCI-MD AA 39% and TCGA AA 35% versus TCGA EA 23% [p = 0.03]); (2) NAV3 (NCI-MD AA 36% and TCGA AA 31% versus TCGA EA 19% [p = 0.02]); (3) COL11A1 (NCI-MD AA 33% and TCGA AA 31% versus TCGA EA 19% [p = 0.04]); (4) SI (NCI-MD AA 28% and TCGA AA 23% versus TCGA EA 15% [p = 0.04]); and (5) ASXL3 (NCI-MD AA 25% and TCGA AA 19% versus TCGA EA 13% [p = 0.05]). The mutation status of these genes was not associated with survival (Table 3). In AA LUSC tumors in the NCI-MD and TCGA data, the mutation frequency of ZNF536 (NCI-MD AA 23% and TCGA AA 26% versus TCGA EA 12% [p = 0.05]) was significantly higher compared with EA patients in TCGA.
Table 3Univariable Relationship Between Somatic Mutations with 5-year Lung Cancer-Specific Survival
We, therefore, looked for genes with a mutation frequency of at least 5% in AAs in both our data set and TCGA and with 0% mutation prevalence in EAs, again using TCGA (Supplementary Table 6) and identified KRT9 as mutated in approximately 5% of LUAD tumors among AAs and 0% of LUADs among EAs. The gene product, keratin-9, is a type I cytokeratin expressed in terminally differentiated epidermal cells. Apart from melanoma, KRT9 is not frequently mutated in cancer.
Integration of Copy Number Variation and Somatic Mutations
Oncogene and tumor suppressor gene function can be perturbed by both somatic mutation and somatic copy number change. We previously profiled somatic copy number alterations (SNCAs) in 62 of the samples for which we had WES data
and integrated both data sets to get a complete view of driver gene alterations across NSCLC in AAs. As illustrated in Supplementary Figure 4, TP53 is the most frequently somatically altered gene in NSCLC among AAs in both LUAD and LUSC. In LUAD, LRP1B, CDKN2A, and STK11 are altered at frequencies of 46%, 41%, and 38%, respectively. In LUSC, TP53 is again the most altered gene in AAs (50%), followed by LRP1B and STK11 (44% for both). Interestingly, this combined analysis reveals that overall, oncogenic activation of KRAS is somewhat similar in LUAD in both EA and AA, at approximately 33%. However, the mechanism of alteration, that is, somatic mutation versus SNCA, differs by population, with KRAS amplification more frequent in AA than in EA.
Integration of Mutation Status, Etiologic Exposures, and Mutational Signatures
In an effort to further understand why some genes are mutated at a higher frequency among AAs, mutation frequencies were correlated with demographic and etiologic features such as sex, body mass index, education, income, and menthol cigarette use. No significant correlation was observed between the mutation status of STK11, RB1, CDKN2A, PIK3CG, PTPRT, JAK2 with these factors (Fig. 2A and Supplementary Table 7), but this may be partly owing to the limited sample size when stratified comparisons were made.
Figure 2Contribution of mutational signatures to the somatic landscape of lung cancer in AAs. The clustering of patients is presented on the basis of the proportion of mutational signatures in each tumor. Mutational signatures are grouped as per their etiologic origin (left). Demographic and lifestyle exposures are superimposed at the bottom of the oncoplot. AA, African American; BMI, body mass index; LUAD, lung adenocarcinoma.
and questioned whether these signatures were enriched in tumors with specific somatic mutations. As expected for an NSCLC cohort of ever-smokers, SBS 4—which is tobacco-associated—was the predominant signature among these tumors (Fig, 2B and Supplementary Table 8). SBS2 and SBS13, signatures attributed to increased activity of apolipoprotein B mRNA editing catalytic polypeptide-like (APOBEC) family members, were also found. None of the SBS signatures had a clear or significant association with body mass index, education, income, or menthol cigarette use (Supplementary Fig. 5A and B).
The likelihood of acquiring a cancer-causing mutation is dependent on the underlying mutational processes.
We, therefore, integrated the SBS mutational signatures with the mutation status of genes enriched among AAs. We hypothesized that the origin of these SBS mutational signatures could indicate why certain mutational events occurred at a higher frequency in AAs. The APOBEC gene signatures, SBS2 and SBS13, were significantly higher in JAK2, RB1, and PTPRT mutant tumors compared with wild-type tumors, with a stronger trend in LUAD (Supplementary Fig. 6A and C). A previous article highlighted links between APOBEC-induced mutagenesis and specific driver PIK3CA mutations across cancer types
and we found significant and consistent evidence of this association in our population also (Supplementary Fig. 6B and D). Although our data could be a supporting evidence for a causative relationship between the APOBEC mutational activity and the acquisition of these driver mutations in AAs, there is no statistical difference in the expression of APOBEC genes in tumor tissues from EA and AA (data not published). There is no evidence of increased APOBEC signatures in AA versus EA overall,
and several other genes, in which the mutation frequency is not enhanced in AA compared with EA, also had evidence for increased APOBEC activity (Supplementary Fig. 6B and D). SBS4 was enriched in STK11 mutant tumors, with some evidence for SBS29—associated with smokeless tobacco—also enriched among these tumors (Supplementary Fig. 6A and C).
Discussion
The overall goal of this research is to explore possible genetic differences in NSCLC mutations by race, given that AA men have the highest incidence rate that is not fully explained by smoking behavior.
Here, we conducted an in-depth analysis of somatic mutations in AAs using WES. We replicated our previous data using targeted exome sequencing that exhibits increased JAK2 and PTPRT mutations among AAs with LUAD. Furthermore, we also present evidence that mutations in the tumor suppressor genes STK11, RB1, and CDKN2A are higher among AAs, especially in LUAD, and replicated these observations in two independent data sets. Interestingly, we found that STK11 mutations, including in the context of somatically altered KRAS, were associated with a decreased interferon gamma signaling, consistent with previous data in EAs.
Molecular determinants of response to anti-programmed cell death (PD)-1 and anti-programmed death-ligand 1 (PD-L1) blockade in patients with non-small-cell lung cancer profiled with targeted next-generation sequencing [published correction appears in J Clin Oncol. 2018;36:1645].
Furthermore, we found a high co-occurrence of STK11 with KRAS alterations. This may be relevant in the context of drugs targeting the immune system—in particular, immune checkpoint inhibitors. However, given the fact that STK11 loss seems to increase response to platinum compounds, how mutations of this gene modulate response to combined immune checkpoint inhibitor/chemotherapy treatment, which is increasingly offered to patients with NSCLC,
with possible consequences for chemotherapy and immunotherapy response, we did not detect co-occurring mutations in KEAP1 and STK11 in our cohort of AAs.
RB1 and CDKN2A mutations were generally mutually exclusive, suggesting that perturbation of this tumor-suppressive pathway is important in LUAD among AAs. Previous somatic mutation studies of NSCLC primarily used targeted gene or mutation-specific panels; as such, it did not cover the genes identified above.
Mutational landscapes of smoking-related cancers in Caucasians and African Americans: precision oncology perspectives at Wake Forest baptist Comprehensive Cancer Center.
Mutational landscapes of smoking-related cancers in Caucasians and African Americans: precision oncology perspectives at Wake Forest baptist Comprehensive Cancer Center.
which could be owing to different exposures related to geography, rates of admixture, or methodologies.
The mechanism by which a gene is perturbed in cancer can be important. As noted earlier, an integrated analysis of somatic copy number and mutation analyses indicated that although, overall, oncogenic activation of KRAS is somewhat similar in EAs and AAs with LUAD (at approximately 33%), the mechanism of alteration (i.e., somatic mutation versus SNCA) differs by population, with KRAS amplification more frequent in AAs than in EAs. Similarly, we found that mutations in CDKN2A and RB1 were more frequent among AAs, but that copy number deletions were fewer.
underly the population divergence between somatic mutations and copy number changes. Although inactivation of a pathway, independent of the underlining mechanism, manifests in a similar manner on tumor biology, one implication of these differences relates to genes that are either codeleted with a tumor suppressor or not. For example, the codeletion of MTAP with CDKN2A creates a synthetic lethal vulnerability to the MAT2A/PRMT5/RIOK1 axis and a potential novel therapeutic vulnerability.
Thus, patients with CDKN2A mutations, as opposed to CDKN2A deletion, might not respond to such a therapeutic approach. Furthermore, within the guidance to develop immune checkpoint inhibitor treatment in the context of mutually altered KRAS and STK11 tumors, it will be important to consider the therapeutic indications of both somatic copy number and mutation changes.
Our analysis also identified PIK3CG as a gene that is significantly mutated in NSCLC in AAs. The data from TCGA support the increased mutation frequency of this gene in LUAD among AAs, but a recent analysis of adenocarcinomas using targeted sequencing did not.
Interestingly, population differences in the mutation frequency of this gene in Asian populations have been reported previously, in which PIK3CG is mutated in approximately 30% of LUSCs.
PIK3CG encodes a protein that belongs to the phosphoinositide 3/phosphoinositide 4 family of kinases. It modulates extracellular signals, including those elicited by E-cadherin-mediated cell-to-cell adhesion and has been implicated in Notch signaling, stemness, and migration in claudin-low breast cancer cells.
We also identified several genes that have a higher mutation prevalence in NSCLC among AA patients compared with EA patients. For example, we found COL11A1 mutated at a higher frequency in AAs with LUAD. COL11A1 is a collagen type XI α1 protein that encodes one of the two α chains of type XI collagen, a minor fibrillar collagen.
As a major component of the extracellular matrix, collagens are involved in the regulation of multiple biological processes, including cell proliferation, differentiation, and migration
Proteomic profiling of lung adenocarcinoma indicates heightened DNA repair, antioxidant mechanisms and identifies LASP1 as a potential negative predictor of survival.
; it is a plasma membrane to the actin cytoskeleton and functions in the determination of cell shape, the arrangement of transmembrane proteins, and organization of organelles. We found LRP1B mutated in approximately 40% of LUAD samples. Mutations in this gene were previously reported in NSCLC, with some indications that it may be associated with response to immune checkpoint drugs because of a higher mutational burden observed among LRP1B mutant tumors.
Association of LRP1B mutation with tumor mutation burden and outcomes in melanoma and non-small cell lung cancer patients treated with immune checkpoint blockades [published correction appears in Front Immunol. 2019;10:1523.
Signatures of mutational processes in human cancer [published correction appears in Nature. 2013;502:258. Imielinsk, Marcin [corrected to Imielinski, Marcin]].
We, therefore, assessed the mutational patterns of these tumors from AAs and identified the same key mutational processes as seen in EAs—that is, SBS4 (tobacco-associated) and SBS2/13 (aberrant APOBEC activity). There were no significant associations between African ancestry and SBS signatures after correction for multiple testing, consistent with previous observations using targeted gene panels.
Although we had a limited sample size, we leveraged the available demographic and exposure data that we had on our participants but did not find significant associations between the metrics of socioeconomic status, such as income and education, with the genes we found enriched among AAs. This analysis was likely impacted by power. To uncover the potential etiologic processes driving the occurrence of mutations in JAK2, PTPRT, RB1, STK11, PIK3CG, and CDKN2A, we also integrated somatic mutational signatures with the mutation status of these genes. Interestingly, SBS2 and SBS13, those associated with aberrant APOBEC signaling, were enriched in tumors with mutated RB1, JAK2, and PTPRT. This suggests that aberrant APOBEC activity in tumors from AAs could drive the increased mutations of these genes. However, an analysis of APOBEC1 and 3 expressions in data we had previously collected
did not exhibit consistent significant mRNA expression differences. A germline variant in cytidine deaminase, codon 70 (208G > A), which consists of a threonine instead of an alanine, is found mostly in African and Asian populations,
so it is possible that structural differences at the protein level could make a difference. However, we found several genes in which mutation frequencies are not enriched in AAs, such as PIK3CA, RET, ARID1A, EGFR, NFE2L2, and SMARCA4, that also had increased SBS2 and SBS13 representation. Thus, additional studies should be conducted with greater sample size and demographic variables to further understand the potential etiologic forces that contribute to these mutation differences in AAs. Of note, although our data do not point toward APOBEC and a role in disparities per se, the finding that these signatures are associated with the mutation profile of these genes in NSCLC is novel.
In summary, using WES, we extended our previous targeted exome analysis of NSCLC in AAs, validated our previous observations, and identified increased mutation frequency of several tumor suppressor genes in NSCLC. As before, most somatic mutation differences that we observed occur in LUAD. In future, larger studies with demographic, socioeconomic, and etiologic data will be needed to assess the factors that contribute to these population differences in tumor biology and whether these differences in tumor biology relate to therapy outcomes, including immune checkpoint inhibitor, at a gene level. Our data highlight the continuing importance of not only including minority populations in genomics research but leveraging the inherent differences represented by race to contribute toward understanding observed differences in incidence, severity, and treatment response. This will become especially useful as more is learned about how somatic mutations influence the tumor microenvironment, the immune response to carcinogenesis, and treatment efficacy.
Acknowledgments
This work was supported by the Intramural Research Programs of the National Institute for Minority Health and Health Disparities and the Center for Cancer Research, National Cancer Institute. Dr. Pine was supported by 1R01CA239093 and a Rutgers Cancer Institute of New Jersey Health Equity Pilot Award.
Comprehensive genomic characterization of squamous cell lung cancers [published correction appears in Nature. 2012;491:288. Rogers, Kristen [corrected to Rodgers, Kristen]].
Comprehensive molecular profiling of lung adenocarcinoma [published correction appears in Nature. 2014;514:262. Rogers, K [corrected to Rodgers, K]] [published correction appears in Nature. 2018;559:E12].
Molecular determinants of response to anti-programmed cell death (PD)-1 and anti-programmed death-ligand 1 (PD-L1) blockade in patients with non-small-cell lung cancer profiled with targeted next-generation sequencing [published correction appears in J Clin Oncol. 2018;36:1645].
Mutational landscapes of smoking-related cancers in Caucasians and African Americans: precision oncology perspectives at Wake Forest baptist Comprehensive Cancer Center.
Proteomic profiling of lung adenocarcinoma indicates heightened DNA repair, antioxidant mechanisms and identifies LASP1 as a potential negative predictor of survival.
Association of LRP1B mutation with tumor mutation burden and outcomes in melanoma and non-small cell lung cancer patients treated with immune checkpoint blockades [published correction appears in Front Immunol. 2019;10:1523.
Signatures of mutational processes in human cancer [published correction appears in Nature. 2013;502:258. Imielinsk, Marcin [corrected to Imielinski, Marcin]].
We read with great interest the article by Arauz et al.1 focusing on mutation status in the African American population with NSCLC. The authors conducted a whole-exome sequencing on a minority population and identified increased mutation frequency of several tumor suppressor genes in NSCLC. Because of the lack of genomic studies on African Americans, their work contributed to a better understanding of the molecular basis of lung cancer and provided clinicians worldwide with potential optimal interventions for patients with NSCLC.
Lung cancer continues to be the second leading cancer diagnosis in both men and women in the United States even though it is one of the few cancers whose main cause, cigarette smoking, is known. Racial differences in incidence exist despite African Americans reporting lower levels of cigarettes smoked per day than whites.1 Furthermore, it is the leading cause of cancer-related deaths in the United States, with 5-year survival rates hovering at approximately 20%.2 Although remaining low, overall survival in advanced NSCLC has improved in the small subset of cases with actionable molecular profiles.