Advertisement

Biomarker Discovery and Validation: Statistical Considerations

Open AccessPublished:February 01, 2021DOI:https://doi.org/10.1016/j.jtho.2021.01.1616

      Abstract

      Biomarkers have various applications including disease detection, diagnosis, prognosis, prediction of response to intervention, and disease monitoring. In this era of precision medicine, having validated biomarkers to inform clinical decision making is more important than ever. In this article, we discuss best the practices and potential issues in biomarker discovery and validation. We encourage team science partnerships to bring cutting-edge discovery from bench to bedside, leading to improved patient care and outcomes.

      Keywords

      Introduction

      A biological marker (biomarker) is “a defined characteristic that is measured as an indicator of normal biological processes, pathogenic processes, or biological responses to an exposure or intervention, including therapeutic interventions.”
      FDA-NIH Biomarker Working Group
      BEST (Biomarkers, EndpointS, and Other Tools). Resource.
      Biomarkers have various applications, such as risk estimation, disease screening and detection, diagnosis, estimation of prognosis, prediction of benefit from therapy, and disease monitoring (Fig. 1). In oncology, biomarker candidates often consist of biological molecules found in cancer cells. The most common biomarkers are cancer-associated proteins, gene mutations, deletions, rearrangements, and extra copy numbers of genes. These molecules are sometimes secreted into the circulation and so may be detected by blood-based assay, whereas others are present in cancer cells and so require a biopsy to obtain tissue for testing. An ideal biomarker satisfies the following properties: it should be either binary (i.e., present or absent) or quantifiable without subjective assessments; the result should be generated by an assay that is adaptable to routine clinical practice and has a timely turnaround (i.e., in a matter of days rather than weeks); the biomarker assay should be sensitive and specific; and most importantly, the biomarker should be detectable using easily accessible specimens.
      Figure thumbnail gr1
      Figure 1Use of biomarkers in relation to the course of disease.
      Molecular biomarkers are used together with clinical information to achieve precision medicine to customize prevention, screening, and treatment strategies to a group of patients with similar characteristics (Fig. 1). Risk stratification biomarkers may identify patients at higher than usual risk of disease who should be monitored more closely than the general population, for example, smoking increases the risk of lung cancer.
      National Comprehensive Cancer Network
      Lung cancer screening. 2020 version.
      Disease screening and detection biomarkers are used to detect diseases before symptoms manifest, when therapy has a greater likelihood of success, for example, low-dose computed tomography screening is recommended for patients at high risk of lung cancer.
      National Comprehensive Cancer Network
      Lung cancer screening. 2020 version.
      Diagnostic biomarkers detect the presence of diseases, for example, biopsies can be used in the diagnosis of lung cancer.
      National Comprehensive Cancer Network
      Lung cancer screening. 2020 version.
      Prognostic biomarkers provide information on overall expected clinical outcomes of a patient, regardless of therapy or treatment selection, for example, sarcomatoid mesothelioma has a poor outcome regardless of therapy.
      • Nicholson A.G.
      • Sauter J.L.
      • Nowak A.K.
      • et al.
      EURACAN/IASLC proposals for updating the histologic classification of pleural mesothelioma: towards a more multidisciplinary approach.
      Predictive biomarkers inform the overall expected clinical outcome on the basis of treatment decisions in biomarker-defined patients only. The most important predictive biomarkers found for NSCLC, for example, are mutations in the EGFR gene, BRAF, or MET gene and rearrangements involving the ALK, ROS1, RET, and NTRK family genes
      • Remon J.
      • Ahn M.J.
      • Girard N.
      • et al.
      Advanced-stage non-small cell lung cancer: advances in thoracic oncology 2018.
      ; various targeted therapies are available for patients identified by most of these biomarkers.
      A biomarker’s journey from discovery to clinical use is long and arduous, but it can be broken into phases or steps.
      • Pepe M.S.
      • Etzioni R.
      • Feng Z.
      • et al.
      Phases of biomarker development for early detection of cancer.
      Institute of Medicine (US)
      Committee on Qualification of Biomarkers and Surrogate Endpoints in Chronic Disease.
      Food and Drug Administration
      Biomarker qualification: evidentiary framework guidance for industry and FDA staff.
      • Rudin M.
      Imaging readouts as biomarkers or surrogate parameters for the assessment of therapeutic interventions.
      Biomarker discovery efforts have increased with the emergence of technologies for gathering relevant data; for example, single-cell next-generation sequencing, liquid biopsy (blood sample) for circulating tumor DNA, microbiomics, radiomics, and other types of high-throughput technologies have exploded in popularity in recent years, owing to their ability to produce an enormous volume of data quickly and at relatively low cost. Across the continuum of biomarker data capture and utilization, however, many more challenges lie ahead—from analysis of high-throughput biomarker data to maximum exploitation of the electronic health record, and to the ultimate goal of biomarker-driven clinical practice. Biomarker discovery and validation are essential steps in establishing biomarkers in all applications across the disease course. In this article, we discuss the best practices for biomarker discovery and validation from a statistical perspective (Fig. 2).
      Figure thumbnail gr2
      Figure 2Simplified schematic of biomarker development. PRoBE: prospective-specimen-collection, retrospective-blinded-evaluation.

      Biomarker Discovery

      The intended use of a biomarker (e.g., risk stratification and screening)
      FDA-NIH Biomarker Working Group
      BEST (Biomarkers, EndpointS, and Other Tools). Resource.
      and the target population to be tested need to be defined early in the development process. The use of a biomarker in relation to the course of a disease and specific clinical contexts should also be prespecified (Fig. 1). The patients and specimens should both directly reflect the target population and intended use.

      Key Considerations for Biomarker Discovery

      Key considerations for conducting discovery studies using archived specimens are the patient population represented by the specimen archive, power of the study (through the number of samples and number of events), prevalence of the disease, the analytical validity of the biomarker test, and the preplanned analysis plan.
      • Simon R.M.
      • Paik S.
      • Hayes D.F.
      Use of archived specimens in evaluation of prognostic and predictive biomarkers.
      The most reliable setting in which to perform such (retrospective) studies is by means of specimens and data collected during prospective trials, and the results of one study need to be reproduced in another. Definitions for levels of evidence have been developed to evaluate the clinical use of biomarkers in oncology and medicine.
      • Simon R.M.
      • Paik S.
      • Hayes D.F.
      Use of archived specimens in evaluation of prognostic and predictive biomarkers.
      ,
      • Teutsch S.M.
      • Bradley L.A.
      • Palomaki G.E.
      • et al.
      The Evaluation of Genomic Applications in Practice and Prevention (EGAPP) initiative: methods of the EGAPP working group.
      Bias, a systematic shift from truth, is one of the greatest causes of failure in biomarker validation studies.
      • Ransohoff D.F.
      • Gourlay M.L.
      Sources of bias in specimens for research about molecular markers for cancer.
      Bias can enter a study during patient selection, specimen collection, specimen analysis, and patient evaluation. Randomization and blinding are two of the most important tools for avoiding bias. Randomization in biomarker discovery should be carried out to control for nonbiological experimental effects owing to changes in reagents, technicians, machine drift, etc., that can result in batch effects.
      • Leek J.T.
      • Scharpf R.B.
      • Bravo H.C.
      • et al.
      Tackling the widespread and critical impact of batch effects in high-throughput data.
      Specimens from controls and cases should be assigned to arrays, testing plates or batches by random assignment, ensuring the distributions of cases, controls, and age of specimen are equally distributed.
      • Qin L.X.
      • Zhou Q.
      • Bogomolniy F.
      • et al.
      Blocking and randomization to improve molecular biomarker discovery.
      Blinding can be carried out by keeping the individuals who generate the biomarker data from knowing the clinical outcomes; it prevents the bias induced by unequal assessment of biomarker result.
      • Ransohoff D.F.
      Bias as a threat to the validity of cancer molecular-marker research.
      Randomization and blinding should be used in the process of biomarker data generation and should be incorporated at every stage of a study when possible.

      Prognostic and Predictive Biomarker Identification

      A prognostic biomarker can be identified in properly conducted retrospective studies that do not rely solely on convenience samples but use biospecimens prospectively collected from a cohort that represents the target screening population, case-control studies, and single-arm trials. A prognostic biomarker is identified through a main effect test of association between the biomarker and the outcome in a statistical model. An example of a prognostic biomarker is the STK11 mutation that is associated with poorer outcome in nonsquamous NSCLC.
      • Pécuchet N.
      • Laurent-Puig P.
      • Mansuet-Lupo A.
      • et al.
      Different prognostic impact of STK11 mutations in non-squamous non-small-cell lung cancer.
      Tissue samples were collected from a consecutive series of patients with nonsquamous NSCLC who underwent curative-intent surgical resection in 2001 to 2006 at two hospitals. An a priori power calculation was performed to ensure a sufficient number of overall survival events to provide adequate statistical power to assess five candidate biomarkers. Even though convenience samples were used, the prognostic effect was validated in two external datasets which strengthened the validity of the discovery.
      A predictive biomarker needs to be identified in secondary analyses using data from a randomized clinical trial, through an interaction test between the treatment and the biomarker in a statistical model. Secondary analyses refer to subsequent correlative studies that may or may not be predefined as a protocol objective. An example of predictive biomarker identification is the IPASS study.
      • Mok T.S.
      • Wu Y.L.
      • Thongprasert S.
      • et al.
      Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma.
      The IPASS study enrolled patients with advanced pulmonary adenocarcinoma who were nonsmokers or former light smokers and randomly assigned patients to receive gefitinib or carboplatin plus paclitaxel (CP). Patients’ EGFR mutation status was not known at the time of enrollment and was determined retrospectively. The interaction between treatment and EGFR mutation was statistically significant (p < 0.001) and indicated that among patients who have EGFR-mutated tumors, progression-free survival (PFS) was significantly longer (hazard ratio = 0.48; 95% confidence interval [CI]: 0.36–0.64) for those receiving gefitinib compared with those receiving CP. In contrast, among patients who have EGFR wildtype tumors, PFS was significantly shorter (hazard ratio = 2.85; 95% CI: 2.05–3.98) for those receiving gefitinib compared with those receiving CP.
      • Mok T.S.
      • Wu Y.L.
      • Thongprasert S.
      • et al.
      Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma.

      Analytical Methods

      Analytical methods should be chosen to address study specific goals and hypotheses. Data-driven analyses and the resulting findings are less likely to be reproducible in an independent set of data. Thus, the analytical plan should be written and agreed on by all members of the research team before receiving data to avoid the data influencing an analysis. This includes defining the outcomes of interest, hypotheses that will be tested, and criteria for success. Control of multiple comparisons should be implemented when multiple biomarkers are evaluated; a measure of false discovery rate is especially useful when using large-scale genomic or other high dimensional data for biomarker discovery.
      • Storey J.D.
      • Tibshirani R.
      Statistical significance for genomewide studies.
      During biomarker discovery, evaluation of associations between a biomarker and disease status, demographic or clinical characteristics, such as age, sex, and body mass index, or in diseased patients, stage or other disease characteristics, can inform design of future validation studies. Metrics useful for evaluating biomarkers (Table 1) include differences between groups, sensitivity, specificity, positive and negative predictive values, discrimination (i.e., receiver operating characteristic area under the curve), calibration, and clinical validity and use.
      • Teutsch S.M.
      • Bradley L.A.
      • Palomaki G.E.
      • et al.
      The Evaluation of Genomic Applications in Practice and Prevention (EGAPP) initiative: methods of the EGAPP working group.
      ,
      • Harrell Jr., F.E.
      Regression Modeling Strategies With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis.
      • Steyerberg E.W.
      Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating.
      • Pepe M.
      The Statistical Evaluation of Medical Tests for Classification and Prediction.
      • Mahar A.L.
      • Compton C.
      • McShane L.M.
      • et al.
      Refining prognosis in lung cancer: a report on the quality and relevance of clinical prognostic tools.
      The appropriate metric depends on the study goals and should be determined by a study team, including clinicians, scientists, statisticians, and epidemiologists.
      Table 1Metrics Useful for Evaluating Biomarker Performance
      MetricsDescription
      SensitivityThe proportion of cases that test positive
      SpecificityThe proportion of controls that test negative
      Positive predictive valueProportion of test-positive patients who actually have the disease; is a function of disease prevalence
      Negative predictive valueProportion of test-negative patients who truly do not have the disease; is a function of disease prevalence
      ROC curvePlot of sensitivity (true positive rate) versus 1 specificity (false-positive rate), with a data point calculated for every value of the marker in the data set
      DiscriminationHow well the marker distinguishes cases from controls; often measured by the area under the ROC curve; ranges from 0 to 1, with 0.5 indicating performance equivalent to a coin flip and 1 corresponds to perfect ability to distinguish
      CalibrationHow well a marker estimates the risk of disease or of the event of interest
      ROC, receiver operating characteristic.
      It is often the case that information from a panel of multiple biomarkers will be required to achieve better performance than a single biomarker, despite the added potential measurement errors that come from multiple assays. Using each biomarker in its continuous state instead of a dichotomized version retains maximal information for model development, and in turn, greater improvement in panel performance; dichotomization for clinical decision making is best left for later studies. The optimal analytical strategy for combining multiple biomarkers and for choosing which biomarkers to combine depends on both sample size and clinical context. Incorporation of some form of variable selection, such as shrinkage, during model estimation generally minimizes overfitting and maximizes the likelihood of validation; hundreds to thousands of patients are generally required to incorporate nonlinear functions, such as interactions, smoothing splines, or machine learning and artificial intelligence algorithms, without overfitting. It is useful to generate pilot data for use in simulations to inform sample size calculations and plan the appropriate analytical strategy.
      • Harrell Jr., F.E.
      Regression Modeling Strategies With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis.
      • Steyerberg E.W.
      Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating.
      • Pepe M.
      The Statistical Evaluation of Medical Tests for Classification and Prediction.
      ,
      • Hastie T.
      • Tibshirani R.
      • Friedman J.
      The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
      ,
      • James G.
      • Witten D.
      • Hastie T.
      • et al.
      An Introduction to Statistical Learning: With Applications in R.
      Missing data can lead to biased results. Thus, the analysis plan should include an approach to handle missing data, including assessment of the mechanism responsible for the missingness and an approach to handle the missingness that minimizes potential biases from being introduced into an analysis.
      • Little R.J.
      • Rubin D.B.
      Statistical Analysis With Missing Data.
      The EQUATOR network assembles an important collection of guidelines for the design and reporting of diagnostic and prognostic modeling studies (https://www.equator-network.org/).

      Biomarker Validation

      Validation is “a process to establish that the performance of a test, tool, or instrument is acceptable for its intended purpose.”
      FDA-NIH Biomarker Working Group
      BEST (Biomarkers, EndpointS, and Other Tools). Resource.
      Internal validation establishes a biomarker’s performance in the data in which the biomarker was developed and should be assessed by means of resampling methods, such as bootstrapping or cross-validation, to provide realistic expectations.
      • Harrell Jr., F.E.
      Regression Modeling Strategies With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis.
      External validation establishes a biomarker’s performance in a completely independent data set not used during development; it must be established using data from different time frames, institutions, or geographic regions which we discuss in subsequent paragraphs. Analytical validation and clinical validation are two distinct aspects of biomarker validation. Use of specimens collected prospectively from the target population before knowing patient outcomes is a critical design feature of all validation studies which minimizes the influence of bias.

      Analytical Validation

      Analytical validation aims to establish the performance characteristics of a biomarker including sensitivity, specificity, accuracy, precision, interlaboratory reproducibility, and other relevant performance characteristics following a prespecified protocol. The statistical analysis methods used for analytical validation are similar to the methods mentioned in biomarker discovery (Table 1). The goal of analytical validation is to reveal a biomarker’s technical performance (i.e., the biomarker will provide consistent measurements to the unknown true values) and not its usefulness.

      Clinical Validation

      Clinical validation aims to establish an association between the biomarker and the end point of interest (i.e., clinical validity per Teutsch et al.
      • Teutsch S.M.
      • Bradley L.A.
      • Palomaki G.E.
      • et al.
      The Evaluation of Genomic Applications in Practice and Prevention (EGAPP) initiative: methods of the EGAPP working group.
      ) and to reveal the usefulness of the biomarker (i.e., clinical use per Teutsch et al.
      • Teutsch S.M.
      • Bradley L.A.
      • Palomaki G.E.
      • et al.
      The Evaluation of Genomic Applications in Practice and Prevention (EGAPP) initiative: methods of the EGAPP working group.
      ). Clinical validation relies on external validation and can be done by retrospective use of clinical trial data or by prospective clinical trials. Retrospective use of clinical trial data is a form of external clinical validation in which the biomarker evaluation is not part of the original study design.
      Establishing clinical utility or usefulness generally requires a prospective clinical trial, a form of external validation, to reveal that use of the biomarker to guide patient care translates into improved health outcomes. An example is the approval of pembrolizumab as the first tissue-agnostic approval granted by the United States Food and Drug Administration (FDA).
      • Boyiadzis M.M.
      • Kirkwood J.M.
      • Marshall J.L.
      • Pritchard C.C.
      • Azad N.S.
      • Gulley J.L.
      Significance and implications of FDA approval of pembrolizumab for biomarker-defined disease.
      Patients with microsatellite instability-high (MSI-H) tumors treated with pembrolizumab had higher overall response rates compared with those with microsatellite stable tumors regardless of the tumor origin in the KEYNOTE-016 study. The regulatory approval was based on data from five different trials (N = 149) in which patients with MSI-H were retrospectively identified from two prospective studies (N = 14) and prospectively identified from three studies (N = 135). The objective response rate was 39.6% (7% with complete response) among 149 patients with MSI-H tumor consisting of 15 different tumor types which was considered clinically meaningful (compared with an objective response rate of 0% among patients with colorectal cancer with microsatellite stable tumors in KEYNOTE-016
      • Le D.T.
      • Uram J.N.
      • Wang H.
      • et al.
      Programmed death-1 blockade in mismatch repair deficient colorectal cancer.
      ). At the time of the approval, no companion in vitro diagnostic device was available. Patients were enrolled predominantly on the basis of PCR-based tests for MSI-H and immunohistochemistry-based tests for deficient mismatch repair available in the community as laboratory-developed tests. The FDA determined that the risk to patients with “false positive” tumors is low in this setting and, given the efficacy observed, FDA approved for this use.
      • Marcus L.
      • Lemery S.J.
      • Keegan P.
      • Pazdur R.
      FDA approval summary: pembrolizumab for the treatment of microsatellite instability-high solid tumors.
      There was commitment from Merck to develop a companion diagnostic test for detection of MSI-H and deficient mismatch repair across all cancers postmarketing.

      Study Designs for Biomarker Validation

      Though costly, biomarker evaluation efforts are enhanced by biobanks of specimens collected prospectively from an observational cohort that represents the target population intended for the biomarker.
      • Andre F.
      • McShane L.M.
      • Michiels S.
      • et al.
      Biomarker studies: a call for a comprehensive biomarker study registry.
      A prospective-specimen-collection, retrospective-blinded-evaluation design
      • Pepe M.S.
      • Feng Z.
      • Janes H.
      • Bossuyt P.M.
      • Potter J.D.
      Pivotal evaluation of the accuracy of a biomarker used for classification or prediction: standards for study design.
      can be performed in such a setting to validate screening, diagnostic, and prognostic biomarkers. Specimens and clinical data are collected without knowing the patient outcome. Case patients and control patients would be randomly selected on the basis of their outcome status. The biomarker data are then generated for the patients selected, blinded to clinical and outcome information. An example of such design is the MILD study.
      • Sozzi G.
      • Boeri M.
      • Rossi M.
      • et al.
      Clinical utility of a plasma-based miRNA signature classifier within computed tomography lung cancer screening: a correlative MILD trial study [published correction appears in J Clin Oncol. 2014;32:1520].
      The MILD trial, a randomized prospective clinical trial, enrolled 4099 current or former smokers without history of cancer and randomized them to low-dose computed tomography versus observation. Whole blood was collected at enrollment and subsequent follow-up. Retrospectively, 1000 consecutive plasma samples collected from June 2009 to July 2010 among lung cancer-free individuals enrolled onto the trial were used for validation of a microRNA signature classifier. The classifier was prespecified with predefined cut points, and risk scores were generated blinded to clinical outcome for individual participants and submitted to an independent research center. Data analysis was completed according to a prespecified statistical analysis plan by the independent research center. This validation study intentionally used the full cohort rather than a random subset of patients to maximize the study power.
      There are several prospective clinical trial designs aimed to validate the clinical use of a predictive biomarker in a clinical setting. Enrichment designs screen all patients for the biomarker but only enroll and randomize those with the desired molecular features. A treatment will be evaluated within the biomarker-defined subgroup only. Enrichment designs are advantageous when the biomarker prevalence is low (<15%–20%). An example of such a design is the EURTAC trial
      • Rosell R.
      • Carcereny E.
      • Gervais R.
      • et al.
      Erlotinib versus standard chemotherapy as first-line treatment for European patients with advanced EGFR mutation-positive non-small-cell lung cancer (EURTAC): a multicentre, open-label, randomised phase 3 trial.
      which led to the FDA’s approval of erlotinib for the first-line treatment of patients with metastatic NSCLC harboring EGFR mutations. The EURTAC trial screened 1227 patients and then randomized 174 patients with EGFR mutations to receive erlotinib or standard chemotherapy (Fig. 3A).
      Figure thumbnail gr3
      Figure 3Trial design schema. (A) Enrichment design. (B) All-comer (stratified by biomarker status) design. (C) Subgroup design.
      All-comer (stratified by biomarker status) designs screen all patients for the biomarker and then enroll and randomize patients with a valid biomarker result. The randomization can be stratified by the biomarker status (if the turnaround time of biomarker testing is short), and the test of treatment by biomarker interaction is included in the prespecified analysis plans. All-comer designs are appropriate when the treatment benefit needs to be better understood in both patients who test positive and in those who test negative. An example of such a design is the MARVEL trial (N0723, NCT00738881). The MARVEL trial planned to enroll 1196 patients with advanced NSCLC after first-line therapy and patients’ EGFR expression by means of fluorescence in situ hybridization (FISH) was evaluated by central pathology review. After the FISH result was available, patients were randomized to receive pemetrexed versus erlotinib, stratified by the FISH status and other factors. The goal was to identify 287 FISH-positive patients and 670 FISH-negative patients (70%) to evaluate whether there are differences in PFS owing to treatment with erlotinib compared with pemetrexed for subsets defined by FISH positivity versus negativity (Fig. 3B).
      Subgroup designs validate a predictive biomarker in a specific subgroup of patients and in the overall population using a multiple-hypothesis design.
      • Hoering A.
      • Leblanc M.
      • Crowley J.J.
      Randomized phase III clinical trial designs for targeted agents.
      In this design, all patients with a particular disease are randomized to experimental therapy versus standard of care, but coprimary objectives are defined to test the superiority of the experimental therapy in the subgroup of patients selected by the biomarker, and for all enrolled patients. This design is advantageous when there is evidence that the experimental therapy will be most effective in patients with the biomarker of interest, but could also have a broad effect in the general disease population. An example is Southwest Oncology Group S0819,
      • Herbst R.S.
      • Redman M.W.
      • Kim E.S.
      • et al.
      Cetuximab plus carboplatin and paclitaxel with or without bevacizumab versus carboplatin and paclitaxel with or without bevacizumab in advanced NSCLC (SWOG S0819): a randomised, phase 3 study.
      which was designed to test the hypothesis that EGFR amplification can identify patients most likely to benefit from EGFR antibodies in combination with chemotherapy in patients with advanced NSCLC. S0819 randomized 1313 eligible patients to chemotherapy with cetuximab versus chemotherapy alone. EGFR-FISH status was not required to be known at trial enrollment and was evaluated at each interim analysis. Coprimary end points were PFS in patients with EGFR-FISH–positive cancer and overall survival in the entire population (Fig. 3C).
      Platform-type trial designs, such as umbrella trials (histology specific) and basket trials (biomarker specific and agnostic to histology), can be advantageous in biomarker validation as well.
      • Ou F.S.
      • An M.W.
      • Ruppert A.S.
      • Mandrekar S.J.
      Discussion of trial designs for biomarker identification and validation through the use of case studies.
      There are common features for establishing analytical validity, clinical validity, and clinical use, that is, there should be a prespecified protocol dealing with the specifics of the validation process, such as specimen collection, specimen handling and storage procedures, biomarker and clinical outcomes of interest, the purposes of the biomarker, and the potential benefits and risks associated with the use of the biomarker.

      Conclusions

      In this article, we discussed the statistical perspectives on the best practices for biomarker discovery and validation. One aspect that we omitted was the biomarker qualification process with the regulatory agencies.
      Food and Drug Administration
      Biomarker qualification: evidentiary framework guidance for industry and FDA staff.
      Readers should note that the FDA requires biomarker candidates to undergo clinical validation and be assessed as a companion diagnostic before receiving regulatory approval. The biomarkers used to direct therapies need to be generated by an assay that is performed in a Clinical Laboratory Improvement Amendments–certified laboratory, which will be the first step toward clinical validation. We encourage investigators to reach out to health authorities early to discuss potential biomarkers of interest.
      We would also like to take this opportunity to urge oncologists to resist the temptation of adopting unvalidated biomarker findings into practice. Attempts to discover biomarkers have accelerated through advanced technology in generating relevant data. The potential biomarkers discovered should be considered as hypothesis generating, and the biomarkers need to be validated (both analytically and clinically) before adoption. An example would be the STK11 and KEAP1 mutations that seemed to be predictive with emerging data revealing patients with STK11 and KEAP1 mutations do not respond to immunotherapy. However, an exploratory analysis using clinical trial data revealed that pembrolizumab monotherapy was associated with improved overall response rates compared with chemotherapy regardless of STK11 and KEAP1 mutational status, that is, these mutations were prognostic.

      Collingridge D. The Lancet Oncology. Paper presented at: ASCO Virtual Annual Meeting. June 5, 2020.

      In addition, an analysis using real-world evidence also revealed that STK11 and KEAP1 mutations are prognostic biomarkers and unlikely to be predictive biomarkers for anti–programmed cell death protein-1 and anti–programmed death-ligand 1 therapy.
      • Papillon-Cavanagh S.
      • Doshi P.
      • Dobrin R.
      • Szustakowski J.
      • Walsh A.M.
      STK11 and KEAP1 mutations as prognostic biomarkers in an observational real-world lung adenocarcinoma cohort.
      STK11 and KEAP1 remain unvalidated predictive biomarkers, and clinicians’ treatment decisions should not be swayed by the mutation status of these two genes.
      The discovery and validation of biomarkers require thorough planning and the collaboration of clinicians, scientists, statisticians, and epidemiologists. The success of these endeavors requires collaborative and cross-disciplinary approaches. A cohesive and an effective team of collaborative scientists is crucial for biomarker development, and we promote such partnerships to ultimately accelerate the translation of cutting-edge scientific discoveries from bench to bedside thus leading to improved patient care and outcomes.

      Acknowledgments

      This work was partially supported by the National Institutes of Health (NIH) grant P30CA15083 (Mayo Comprehensive Cancer Center grant; Drs. Ou, Oberg, and Adjei), National Cancer Institute (NCI) grant P50CA136393 ( Mayo Clinic Specialized Programs of Research Excellence in Ovarian Cancer grant; Dr. Oberg), NCI grant P50CA102701 (Mayo Clinic Specialized Programs of Research Excellence in Pancreatic Cancer; Dr. Oberg), NCI grant U10CA180882 (Alliance Statistics and Data Management Center; Drs. Ou and Oberg), NIH grant U24CA213274 (Dr. Shyr), NIH grant U54TR002243 (Dr. Shyr), and NIH grant P30CA068485 (Dr. Shyr)

      References

        • FDA-NIH Biomarker Working Group
        BEST (Biomarkers, EndpointS, and Other Tools). Resource.
        Food and Drug Administration, Silver Spring, MD2016
        • National Comprehensive Cancer Network
        Lung cancer screening. 2020 version.
        • Nicholson A.G.
        • Sauter J.L.
        • Nowak A.K.
        • et al.
        EURACAN/IASLC proposals for updating the histologic classification of pleural mesothelioma: towards a more multidisciplinary approach.
        J Thorac Oncol. 2020; 15: 29-49
        • Remon J.
        • Ahn M.J.
        • Girard N.
        • et al.
        Advanced-stage non-small cell lung cancer: advances in thoracic oncology 2018.
        J Thorac Oncol. 2019; 14: 1134-1155
        • Pepe M.S.
        • Etzioni R.
        • Feng Z.
        • et al.
        Phases of biomarker development for early detection of cancer.
        J Natl Cancer Inst. 2001; 93: 1054-1061
        • Institute of Medicine (US)
        Committee on Qualification of Biomarkers and Surrogate Endpoints in Chronic Disease.
        in: Micheel C.M. Ball J.R. Evaluation of Biomarkers and Surrogate Endpoints in Chronic Disease. National Academies Press (US), Washington, DC2010
        • Food and Drug Administration
        Biomarker qualification: evidentiary framework guidance for industry and FDA staff.
        (Accessed January 4, 2021)
        • Rudin M.
        Imaging readouts as biomarkers or surrogate parameters for the assessment of therapeutic interventions.
        Eur Radiol. 2007; 17: 2441-2457
        • Simon R.M.
        • Paik S.
        • Hayes D.F.
        Use of archived specimens in evaluation of prognostic and predictive biomarkers.
        J Natl Cancer Inst. 2009; 101: 1446-1452
        • Teutsch S.M.
        • Bradley L.A.
        • Palomaki G.E.
        • et al.
        The Evaluation of Genomic Applications in Practice and Prevention (EGAPP) initiative: methods of the EGAPP working group.
        Genet Med. 2009; 11: 3-14
        • Ransohoff D.F.
        • Gourlay M.L.
        Sources of bias in specimens for research about molecular markers for cancer.
        J Clin Oncol. 2010; 28: 698-704
        • Leek J.T.
        • Scharpf R.B.
        • Bravo H.C.
        • et al.
        Tackling the widespread and critical impact of batch effects in high-throughput data.
        Nat Rev Genet. 2010; 11: 733-739
        • Qin L.X.
        • Zhou Q.
        • Bogomolniy F.
        • et al.
        Blocking and randomization to improve molecular biomarker discovery.
        Clin Cancer Res. 2014; 20: 3371-3378
        • Ransohoff D.F.
        Bias as a threat to the validity of cancer molecular-marker research.
        Nat Rev Cancer. 2005; 5: 142-149
        • Pécuchet N.
        • Laurent-Puig P.
        • Mansuet-Lupo A.
        • et al.
        Different prognostic impact of STK11 mutations in non-squamous non-small-cell lung cancer.
        Oncotarget. 2017; 8: 23831-23840
        • Mok T.S.
        • Wu Y.L.
        • Thongprasert S.
        • et al.
        Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma.
        N Engl J Med. 2009; 361: 947-957
        • Storey J.D.
        • Tibshirani R.
        Statistical significance for genomewide studies.
        Proc Natl Acad Sci U S A. 2003; 100: 9440-9445
        • Harrell Jr., F.E.
        Regression Modeling Strategies With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis.
        2nd ed. Springer, Berlin, Switzerland2015
        • Steyerberg E.W.
        Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating.
        2nd ed. Springer International Publishing, New York, NY2019
        • Pepe M.
        The Statistical Evaluation of Medical Tests for Classification and Prediction.
        Oxford University Press, New York, NY2003
        • Mahar A.L.
        • Compton C.
        • McShane L.M.
        • et al.
        Refining prognosis in lung cancer: a report on the quality and relevance of clinical prognostic tools.
        J Thorac Oncol. 2015; 10: 1576-1589
        • Hastie T.
        • Tibshirani R.
        • Friedman J.
        The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
        2nd ed. Springer Science & Business Media, New York, NY2009
        • James G.
        • Witten D.
        • Hastie T.
        • et al.
        An Introduction to Statistical Learning: With Applications in R.
        Springer, New York, NY2013
        • Little R.J.
        • Rubin D.B.
        Statistical Analysis With Missing Data.
        John Wiley & Sons, Chichester, United Kingdom2019
        • Boyiadzis M.M.
        • Kirkwood J.M.
        • Marshall J.L.
        • Pritchard C.C.
        • Azad N.S.
        • Gulley J.L.
        Significance and implications of FDA approval of pembrolizumab for biomarker-defined disease.
        J Immunother Cancer. 2018; 6: 35
        • Le D.T.
        • Uram J.N.
        • Wang H.
        • et al.
        Programmed death-1 blockade in mismatch repair deficient colorectal cancer.
        J Clin Oncol. 2016; 34 (103–103)
        • Marcus L.
        • Lemery S.J.
        • Keegan P.
        • Pazdur R.
        FDA approval summary: pembrolizumab for the treatment of microsatellite instability-high solid tumors.
        Clin Cancer Res. 2019; 25: 3753-3758
        • Andre F.
        • McShane L.M.
        • Michiels S.
        • et al.
        Biomarker studies: a call for a comprehensive biomarker study registry.
        Nat Rev Clin Oncol. 2011; 8: 171-176
        • Pepe M.S.
        • Feng Z.
        • Janes H.
        • Bossuyt P.M.
        • Potter J.D.
        Pivotal evaluation of the accuracy of a biomarker used for classification or prediction: standards for study design.
        J Natl Cancer Inst. 2008; 100: 1432-1438
        • Sozzi G.
        • Boeri M.
        • Rossi M.
        • et al.
        Clinical utility of a plasma-based miRNA signature classifier within computed tomography lung cancer screening: a correlative MILD trial study [published correction appears in J Clin Oncol. 2014;32:1520].
        J Clin Oncol. 2014; 32: 768-773
        • Rosell R.
        • Carcereny E.
        • Gervais R.
        • et al.
        Erlotinib versus standard chemotherapy as first-line treatment for European patients with advanced EGFR mutation-positive non-small-cell lung cancer (EURTAC): a multicentre, open-label, randomised phase 3 trial.
        Lancet Oncol. 2012; 13: 239-246
        • Hoering A.
        • Leblanc M.
        • Crowley J.J.
        Randomized phase III clinical trial designs for targeted agents.
        Clin Cancer Res. 2008; 14: 4358-4367
        • Herbst R.S.
        • Redman M.W.
        • Kim E.S.
        • et al.
        Cetuximab plus carboplatin and paclitaxel with or without bevacizumab versus carboplatin and paclitaxel with or without bevacizumab in advanced NSCLC (SWOG S0819): a randomised, phase 3 study.
        Lancet Oncol. 2018; 19: 101-114
        • Ou F.S.
        • An M.W.
        • Ruppert A.S.
        • Mandrekar S.J.
        Discussion of trial designs for biomarker identification and validation through the use of case studies.
        JCO Precis Oncol. 2019; 3 (PO.19.00051)
      1. Collingridge D. The Lancet Oncology. Paper presented at: ASCO Virtual Annual Meeting. June 5, 2020.

        • Papillon-Cavanagh S.
        • Doshi P.
        • Dobrin R.
        • Szustakowski J.
        • Walsh A.M.
        STK11 and KEAP1 mutations as prognostic biomarkers in an observational real-world lung adenocarcinoma cohort.
        ESMO Open. 2020; 5e000706