If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
The association between smoking-induced chronic obstructive pulmonary disease (COPD) and lung cancer (LC) is well documented. Recent genome-wide association studies (GWAS) have identified 28 susceptibility loci for LC, 10 for COPD, 32 for smoking behavior, and 63 for pulmonary function, totaling 107 nonoverlapping loci. Given that common variants have been found to be associated with LC in genome-wide association studies, exome sequencing of these high-priority regions has great potential to identify novel rare causal variants.
To search for disease-causing rare germline mutations, we used a variation of the extreme phenotype approach to select 48 patients with sporadic LC who reported histories of heavy smoking—37 of whom also exhibited carefully documented severe COPD (in whom smoking is considered the overwhelming determinant)—and 54 unique familial LC cases from families with at least three first-degree relatives with LC (who are likely enriched for genomic effects).
By focusing on exome profiles of the 107 target loci, we identified two key rare mutations. A heterozygous p.Arg696Cys variant in the coiled-coil domain containing 147 (CCDC147) gene at 10q25.1 was identified in one sporadic and two familial cases. The minor allele frequency (MAF) of this variant in the 1000 Genomes database is 0.0026. The p.Val26Met variant in the dopamine β-hydroxylase (DBH) gene at 9q34.2 was identified in two sporadic cases; the minor allele frequency of this mutation is 0.0034 according to the 1000 Genomes database. We also observed three suggestive rare mutations on 15q25.1: iron-responsive element binding protein neuronal 2 (IREB2); cholinergic receptor, nicotinic, alpha 5 (neuronal) (CHRNA5); and cholinergic receptor, nicotinic, beta 4 (CHRNB4).
Our results demonstrated highly disruptive risk-conferring CCDC147 and DBH mutations.
Chronic tobacco-induced airway inflammation provokes a milieu conducive to pulmonary carcinogenesis. We and others have previously shown that tobacco-induced chronic obstructive pulmonary disease (COPD), which is also characterized by a sustained inflammatory reaction in the airways and lung parenchyma, is a significant contributor to risk for the development of lung cancer (LC) in smokers.
Recent genome-wide association studies (GWAS) have identified 28 susceptibility loci for LC, 10 loci for COPD, 32 loci for smoking behavior (SM), and 63 loci for abnormal pulmonary function (PF) and related phenotypes, totaling 107 unique GWAS susceptibility loci (as of November 2014, Supplemental Table 1). Interestingly, there is considerable overlap among the susceptibility loci for these phenotypes. For example, 6p21.32 major histocompatibility class III region (advanced glycosylation end product–specific receptor [AGER]/mut S homolog 5 [MSH5i]), 15q24-25.1 cholinergic receptor, nicotinic, alpha 5 (neuronal) (CHRNA3)/cholinergic receptor, nicotinic, alpha 5 (neuronal) (CHRNA5)/iron-responsive element binding protein neuronal 2 (IREB2), and 19q13.2 cytochrome P450, family 2, subfamily A, polypeptide 6 (CYP2A6) are shared by all four phenotypes; 5p15.33 telomerase reverse transcriptase (TERT)/CLPTMI-like (CLPTM1L)/aryl hydrocarbon receptor repressor (AHRR), 10q25.1 glutathione S-transferase omega 2 (GSTO2)/vesicle transport through interaction with t-SNAREs 1A (VTI1A), and 10q23.31 actin, alpha 2, smooth muscle, aorta (ACTA2)/phospholipase C, epsilon 1 (PLCE1) are shared by three of these phenotypes; and more than 15 loci are shared by two of the four phenotypes (see Supplemental Table 1). Therefore, LC and COPD are not discrete diseases related only through smoking exposure; they may also share genetic predisposition mechanisms.
Given the common variants that have been found to be associated with LC in GWAS, exome sequencing with a focused analysis provides a cost-effective approach for further investigation of high-priority regions of the genome and has great potential to identify rare causal variants in GWAS loci, as targeted studies of inflammatory bowel disease
may play a crucial role in the etiology of complex traits and could account for missing heritability that is unexplained by common variants.
Our approach to unveiling these hidden rare variants was to sequence selective cases of LC by adopting a modified extreme phenotype approach. Only approximately 13% of cases of LC are reported as familial
Therefore, it could be assumed that patients with LC who are from high-risk families would tend to reflect the genetic component of the etiology of LC more clearly than those who are not from high-risk families. In the present study, to search for the disease-causing rare germline mutations within the target 107 GWAS loci, we selected (1) 48 patients with sporadic LC who reported histories of heavy smoking and 37 of whom exhibited carefully documented severe COPD (in which the environmental factor of smoking is considered overwhelming), and (2) 54 unrelated unique patients with familial LC who were from families with at least three first-degree relatives with LC (and who are likely enriched for genomic signal).
Study subjects with familial LC
Phenotype data and biological specimens for 54 patients with LC who had three or more first-degree relatives affected with histologically confirmed LC were provided by the Genetic Epidemiology of Lung Cancer Consortium (GELCC) collection. Only one patient with LC per family was included in the current study. The selection criteria included availability of adequate amounts of good-quality genomic DNA stored at the GELCC biorepository for probands and for whom no DNA samples on other affected family members were available. Samples and data were collected by the familial LC recruitment sites of the GELCC, which included the University of Cincinnati, University of Colorado Health Science Center, Karmanos Cancer Institute at Wayne State University, Louisiana State University Health Sciences Center-New Orleans, Mayo Clinic, University of Toledo, Johns Hopkins University, and Saccomanno Research Institute. The GELCC study population and recruitment scheme have been described in detail previously.
Ever-smokers older than 40 years were enrolled from three clinics within the Texas Medical Center in Houston, Texas: Ben Taub General Hospital, Houston Methodist Hospital, and Michael E. DeBakey Veterans Affairs Medical Center. The COPD phenotype was carefully defined by irreversible airflow limitation (reduced forced expiratory volume in 1 second <50% predicted and forced expiratory volume in 1 second/forced vital capacity <0.7) assessed by postbronchodilator spirometry. For this analysis, we selected smokers enrolled in this study who had histologically confirmed LC. Information on family history of LC was not available for these patients with sporadic LC.
DNA was isolated from the peripheral blood of the patients with familial LC and those with sporadic LC. The study was approved by the institutional review board of all sites accruing participants and by the institutional review board at the Baylor College of Medicine (BCM) for exome sequencing conducted at the BCM Human Genome Sequencing Center (HGSC).
Library preparation and capture enrichment
DNA samples were constructed into Illumina paired-end precapture libraries according to the manufacturer’s protocol (Illumina Multiplexing_SamplePrep_Guide_1005361_D). The complete library and capture protocol, as well as the oligonucleotide sequences, have been described in detail previously.
Read qualities were recalibrated with Genome Analysis Toolkit; a minimum quality score of 30 was required, and the variant had to have been present in more than 15% of the reads covering the position.
Variant annotation and filtering
This analysis was restricted to rare mutations mapping to the exons within the 107 selected regions described earlier (see Supplemental Table 1 for genomic coordinates). Variants were annotated for effect on the protein and predicted function using the Single-Nucleotide Polymorphism (SNP) & Variation Suite (SVS) software (Golden Helix, Inc.). This suite integrates more than 378 databases for variant information including the following: (1) MAF in the European American population in the reference database (1000 Genomes [TG], Exome Sequencing Project [ESP] 6500) and the University of California, Santa Cruz Common SNPs 135/137/141 tracks, which include all variants with a MAF of at least 0.01 in the general population; (2) experimental evidence from disease variant databases (such as the Catalogue of Somatic Mutations in Cancer [COSMIC] and ClinVar); and (3) deleterious prediction of variant function determined either by mutation type (truncating, splicing, frame shift, stop gain/loss, or exonic Indels) or mutation effects predicted by dbNSFP Functional Predictions.
To generate a list of disease-causing candidate variants, we focused on identifying genes with rare and novel variants (never reported in a publicly available database or University of California, Santa Cruz All SNPs 135/137/141 tracks) (Fig. 1). We used scaled C-scores from the combined annotation-dependent depletion (CADD) method
After implementing the aforementioned filtering schema, we used GenomeBrowse (Golden Helix, Inc.) to visually confirm the potential candidate variants by rechecking the raw binary alignment/map file data. We then tabulated the number of candidate deleterious mutations per gene and within our two study subgroups (familial versus sporadic) and created a Venn diagram for the list of candidate variants that were significantly associated with the four different phenotypes (LC, COPD, PF, and SM) in previous GWAS.
The potential candidate variants were verified, and segregation was examined by using Sanger capillary bidirectional sequencing in the selected sample sites. Primers specific to the region containing the variant to be tested were designed, polymerase chain reactions (PCRs) were prepared according to the Qiagen Multiplex PCR Kit protocol (Qiagen), and touchdown PCR was performed (all PCR primers and conditions are available upon request). SNVs were identified using SNP Detector and visually displayed in Sequence Scanner v1.0 (Applied Biosystems).
Candidate variant protein annotation, structure modeling, and protein-protein interaction
for modeling the 3D structure of the candidate variant gene encoded-protein. These resources use sequence-, structure-, and systems biology–based features to predict whether the mutation in the protein is likely to have a functional or phenotypic effect.
Demographic information, including age, sex, smoking history, and histologic diagnosis, is summarized in Table 1. All 54 unrelated patients with familial LC and 48 with sporadic LC were adult non-Hispanic whites. The mean ages of onset of LC in the patients with familial and sporadic LC were 56.0 and 60.9 years, respectively. More than 85% of those with familial LC and all those with sporadic LC (because of the study design criteria) reported being ever-smokers, with mean pack-years of 52.3, and 60.3, respectively. Overall, non–small cell LC had been diagnosed in 86.0% of those with familial LC and 90.5% of those with sporadic LC. Adenocarcinoma was diagnosed in 40.5% of those in the sporadic group and 30.2% of those in the familial group for whom histologic data were available.
Table 1Demographic and histologic characteristics of patients with familial and sporadic lung cancer
Of 99,489 SNVs and 1206 Indels located in the exons of the target 107 loci, our stepwise filtering strategy identified 39 potential candidate variants (see Fig. 1). Of these 39 variants interrogated by Sanger sequencing, nine mutations failed, and 30 variants (80%) were verified in the original LC samples (Table 2). All the failed mutations were singletons. Of the 30 verified candidate variants, five variants were present in two or more patients, three variants were located in highly likely functional sites (CHRNA5 g.78880766 splice donor, myozenin 3 [MYOZ3] g.150051315 splice acceptor, and chromosome 10 open reading frame 11 [C10orf11] p.Ser8 frameshift), and three SNVs were novel (patatin-like phospholipase domain containing 8 [PNPLA8] p.Ile479Ser, pantothenate kinase 1 [PANK1] p.Phe163Ser, and insulin-degrading enzyme [IDE] p.Asp9Asn) (see Table 2).
Table 2List of 30 candidate deleterious germline mutations in familial and sporadic cases of lung cancer
Overall, the total number and proportion of patients with LC (N = 32) who carried these 30 candidate variants were only slightly higher in the group with familial cases (18 cases, 18/54 = 33.3%) than in the group with sporadic cases (14 cases, 14/48 = 29.2%, with 11 of these 14 patients also having severe COPD). The mean ages of the familial and sporadic candidate mutation carriers were not different from the overall means. In terms of smoking intensity, however, carriers of familial mutations reported fewer pack-years than their mean (43 versus 52), whereas there was no difference in smoking intensity among the carriers of sporadic mutations.
We identified two highly deleterious mutations occurring in more than three patients with LC (see Table 2 and Supplemental Fig. 1). The first was a heterozygous c.2086C>T in the coiled-coil domain-containing 147 gene (CCDC147, also called CFAP58), resulting in a p.Arg696Cys substitution. This variant was identified in two patients with familial LC (both women, mean age 54.5, mean pack-years 40, and squamous histologic features) and one patient with sporadic LC (a 57-year-old man, pack-years 88, with adenocarcinoma but without COPD). Notably, the MAF of this variant is 0.0026 from the TG, and 0.0072 from the ESP6500 databases. The mutation is predicted to be protein damaging by PolyPhen-2 (score: 1.0) and highly-functional by Mutation taster. This variant has a high scaled CADD C-score of 16.2, which indicates that the Arg696 is predicted to be in the top 10% possible deleterious substitutions in the human genome. The CCDC147 spans 101kb, contains 18 exons, and has 872 amino acids (AAs) (Fig. 2); the p.Arg696Cys is located in exon 14 and affects a strictly evolutionarily conserved AA residue in the crystal structure of tropomyosin (protein ID: Q5T655). This p.Arg696Cys mutation is predicted to perturb the tertiary structure (folding of the domain and stability of the three-dimensional shape) of the protein because the Cys696 forms a covalent bond disulfide bridge with 697Cys. It is also very close to the p.Arg698Gln, which is a confirmed somatic mutation in patients with cutaneous melanoma shown from COSMIC database and the acetylation modification site Lys692.
The second candidates were two missense SNVs, p.Val26Met and p.Met563Thr, in the dopamine beta-hydroxylase (DBH) gene (see Table 2 and Supplemental Fig. 1). The p.Val26Met was found in two patients with sporadic LC (both men, age 64 and 65, pack-years 40, histologic diagnosis of adenocarcinoma with severe COPD), and p.Met563Thr found in one patient with familial LC (a man, age 30, pack-years 52, histologic diagnosis not specified). The MAFs of these two variants were 0.0034/0.0045 and 0.0002/0.0002 from the TG/ESP6500 databases, respectively. The two mutations were predicted to be protein damaging by PolyPhen-2 (score 0.93 and 0.87), with CADD C-scores of 17.3 and 20.3, and they exhibited extremely high degrees of sequence conservation (0.96 and 0.99, respectively). The DBH gene contains 12 exons, spans 23 kb, and has 617 AAs (protein ID: P09172; see Fig. 2). The p.Val26Met is located within exon 1, lies in the hydrophobic transmembrane region, and possesses a helical structure. The p.Met563Thr located in exon 11, and it lies in a highly conserved region of α-helix and the dopamine β-monoxgenase (DBM) motif IX that may influence the stability of the enzyme. This somatic mutation is also reported in patients with acute myeloid leukemia from the COSMIC database. These observations suggest that the two DBH mutations are likely to have a detrimental effect on the protein.
The other interesting candidates were three SNVs located in the 15q25.1 loci: IREB2 p.Gly747Glu; CHRNA5 g.78880766 splice donor; and cholinergic receptor, nicotinic, beta 4 (CHRNB4) p.Ala435Val. The CHRNA5 splicing variant found in a patient with sporadic LC (a 63-year-old man, 48 pack-years of smoking, non–small cell LC, with severe COPD), who was also a carrier of another two candidate mutations (nidogen 2 (osteonidogen) [NID2] p.Thr567Met and lysyl-tRNA synthetase [KARS] gene p.Arg448Cys); the IREB2 and CHRNB4 SNVs were found in two patients with familial LC (both women, ages 45 and 64, pack-years 45 and 80, with small cell LC and unknown histologic diagnoses).
There were three additional candidate variants—NID2 p.Thr567Met, mitochondrial intermediate peptidase (MIPEP) p.Leu197Pro, and chromosome 1 open reading frame 100 (C1orf100) p.Asp71His—that were present in multiple LC cases. Other genes that harbored multiple different mutations in different patients included tensin 1 (TNS1), F-box protein 38 (FBXO38), PNPLA8, KARS, and bromodomain PHD transcription factor (BPTF). In addition, a patient with sporadic LC and a history of extremely heavy smoking (a 65-year-old man, 150 pack-years of smoking, adenocarcinoma, and severe COPD) was a carrier of two novel mutations with a CADD C-score higher than 30 (IDE p.Asp9Asn and neuron navigator 3 [NAV3] p.Ser278Ile) (see Table 2).
Of the 30 candidate variants belonging to 20 loci and 24 genes (Fig. 3A), seven genes (including CCDC147 and DBH) had candidate variants observed both in those with familial LC and in those with sporadic LC. Also (as shown in Fig. 3B), among the candidate genes examined in the current study, the CCDC147, IREB2/CHRNA5/CHRNB4, PANK1/IDE, and egl-9 family hypoxia-inducible factor 2 (EGLN2) genes were shared by three or more phenotypes (LC, COPD, PF, and SM) from the previously published GWAS (see Table 2 and Fig. 3B).
Despite previous family-based linkage studies, intensive population-based GWAS analyses, and candidate gene screening, a large proportion of the heritability of LC remains unexplained. Using an extreme phenotype design, this report describes the first exome sequencing approach comparing heavy smokers with familial and sporadic LC and evaluating the effects of rare coding variation in the GWAS loci associated with LC, COPD, SM, and PF. Our results showed that the familial mutation carriers reported fewer pack-years than their group’s mean (43 versus 52), whereas there was no difference in smoking intensity among the sporadic carriers. Furthermore, we identified two disease-causing rare mutations on 10q25.1 (CCDC147 p.Arg696Cys) and 9q34.2 (DBH p.Val26Met and p.Met563Thr), and three suggestive rare mutations on 15q25.1 (IREB2 p.Gly747Glu, CHRNA5 g.78880766 splice donor, and CHRNB4 p.Ala435Val), although the findings require replication. Patients with familial LC and patients with sporadic LC are indistinguishable at initial clinical examination, and our results demonstrated that the two forms of LC may have both shared determinants and distinct components.
Strong evidence for an LC-conferring deleterious mutation was observed at CCDC147 p.Arg696Cys in three patients with LC (two with familial LC and one with sporadic LC). Interestingly, the two familial carriers were lighter smokers and had an earlier age of onset than the overall mean for familial cases. The sporadic carrier was a heavier smoker with a history of 88 pack-years and no documented COPD. Although several genes in 10q25.1 loci have been implicated in susceptibility to LC,
in GWAS, very little is known about the function of the CCDC147 gene in humans or mice, although it is thought to produce a functional protein as described in the Proteomics database. CCDC147 protein, which is also known as cilia- and flagella-associated protein 58 (CFAP58), demonstrates high expression in T cells, nasal epithelium, lungs, and alveolar fluids (http://www.genecards.org/cgi-bin/carddisp.pl?gene=CFAP58). It is believed to interact with members of the shelterin complex, the human telomere repeat binding factor 1 (TRF1) and protection of telomeres 1 (POT1), as reported in the BioGRID database and STRING Interaction Network. Interestingly, recent studies have shown that rare mutations in the gene POT1 are associated with chronic lymphocytic leukemia,
in which it is thought to result in telomere deprotection and length extension associated with cancer. Furthermore, one of the most important functions of shelterin includes modulation of telomerase activity, which has been detected in approximately 85% of cancers and is linked to genomic instability and tumorigenesis. Although direct evidence regarding the biological function of CCDC147 is lacking, our finding of CCDC147 as a novel telomere-interacting protein underscores the need for future work that could elucidate the role of this gene in LC pathogenesis.
Another main finding was the highly disruptive and deleterious rare mutations on 9q34.2 DBH, p.Val26Met and p.Met563Thr, in three patients with LC (one with familial LC and two with sporadic LC). The familial carrier was very young (age 30). Both sporadic carriers had adenocarcinoma and severe COPD. Previous GWAS identified DBH rs3025343 as a locus associated with SM.
The DBH (OMIM 609312) gene contributes primarily to conversion of dopamine to noradrenaline. Dopamine is known to be released from neurons in response to nicotine and plays a well-documented role in determining an individual's predisposition to nicotine dependence through its role in mediating drug reward in the brain.
A community-based study of cigarette smoking behavior in relation to variation in three genes involved in dopamine metabolism: catechol-o-methyltransferase (COMT), dopamine beta-hydroxylase (DBH) and monoamine oxidase-A (MAO-A).
The contribution of cigarette smoking to both LC and COPD could invoke a variety of underlying biological processes, including inflammation, epithelial-mesenchymal transition, oxidative stress, DNA repair, and abnormal cellular proliferation.
NID2 (OMIM 605399) encodes a member of the highly conserved nidogen family of basement membrane proteins. This protein binds collagens I and IV and laminin, is involved in stabilizing and maintaining the structure of the basement membrane, and plays a key role in the cell-extracellular matrix. Unbalanced proteolysis in the extracellular matrix is a potential mechanism to explain inflammatory processes within the emphysematous lung. NID2 mutation in patients with LC may be conducive to invasion and metastasis of tumor cells by loosening cell interaction with basal membrane and weakening the strength of the basement membrane itself, and it could be a marker of progression as well.
A main strength of this study is its focus on patients with extreme phenotypes, who are the most likely to be informative. For quantitative traits, one can select individuals with extreme trait values after adjusting for known covariates. Alternatively, in disease-focused studies, selection of individuals with extreme phenotypes can be conducted on the basis of known risk factors. Smoking, family history of LC, and COPD are all well-documented risk factors for LC. Because the frequencies of alleles that contribute to the trait/disease are enriched in phenotype extremes (such as familial LC or patients with both LC and COPD), studying extremes has been shown to provide more than five times the power (only 20% of the subjects compared with in traditional designs).
In the present study, the recurrent rare mutations described herein suggest that it may be possible to identify susceptibility genes in a relatively small sample size, although we cannot rule out the possibility that the results have been observed by chance. The small sample size and lack of validation of the identified mutations in a separate large-scale cohort limit the relevance of our findings. Another limitation of this analysis is phenotype misclassification between familial and sporadic LC. For the patients with familial LC, we lacked COPD phenotype data, and for those with sporadic LC, family history of LC was not available. Also, we acknowledge the existence of a sex imbalance between the familial and sporadic cases that could cause bias and limit applicability of the findings to the general population.
In summary, our results demonstrated highly disruptive germline mutations in the genes CCDC147 and DBH in patients with LC that are interesting candidates for LC risk alleles. The overlap in risk loci between familial and sporadic LC, and that between COPD and LC, may be due to genes and mutations involving telomere maintenance, to inflammation, or to the lack of family history in the sporadic cases being the result of no smoking exposure in other carriers of the mutation in their families. Therefore, going forward, comprehensive genomic analyses of whole genomes (from point mutations to large structural variants) and a large number of LC samples from diverse race/ethnic groups for validation, as well as further functional works for the top two candidate genes, will be needed to better understand the underlying molecular genetics and guide screening for mutations in this unique subset of patients to assess their potential risk for LC.
This work was supported by grants from the National Institutes of Health (R01 CA127219, R01 HL082487, R01 HL110883, K07CA181480, R01 CA060691, R01 CA87895, R01 CA80127, R01 CA84354, R01 CA134682, R01 CA134433, R03 CA77118, P20GM103534, P30CA125123, P30CA023108, P30-ES006096, P30CA022453, N01-HG-65404, U01CA076293, U19CA148127, and HHSN268201 200007C). Dr. Bailey-Wilson was supported by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. Additional support was provided by the National Library of Medicine T15LM007093 (Davis) and the Population Sciences Biorepository at BCM. We would like to thank the patients and their families for participating in this research. We thank Dr. Richard Gibbs, Donna Muzny, Xiaoyun Liao, Van Le, Sandra Lee, and Margi Sheth from the HGSC-BCM for performing the exome sequencing for all the samples in this study.
A community-based study of cigarette smoking behavior in relation to variation in three genes involved in dopamine metabolism: catechol-o-methyltransferase (COMT), dopamine beta-hydroxylase (DBH) and monoamine oxidase-A (MAO-A).
Disclosure: This work was supported by grants from the National Institutes of Health (R01 CA127219, R01 HL082487, R01 HL110883, K07CA181480, R01 CA060691, R01 CA87895, R01 CA80127, R01 CA84354, R01 CA134682, R01 CA134433, R03 CA77118, P20GM103534, P30CA125123, P30CA023108, P30-ES006096, P30CA022453, N01-HG-65404, U01CA076293, U19CA148127, and HHSN268201 200007C). JEB-W was supported by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. Additional support was provided by the National Library of Medicine T15LM007093 (Davis) and the Population Sciences Biorepository at Baylor College of Medicine (BCM). The authors declare no conflict of interest.