P2.11-01 Novel Flexible Longitudinal Machine Learning Coupled with Patient Demographics Improves Lung Cancer Risk Prediction Using Whole Screening CTs


      The inherent variability of screening scans in clinical practice limits existing machine and deep learning techniques, who have difficulty accessing all available information. As a result, cohorts are less generalizable or require integrating human intervention to initiate data definition. We created a deep learning prediction model for lung cancer screening that could define temporally static and time varying imaging features and combine them with common epidemiological factors collected at the time of shared decision making. We propose a new longitudinal lung cancer detection method, called longitudinal imaging and clinical data co-learning (LICDC), which integrates the temporally flexible deep imaging features and clinical features from the PLCO model.


      722 individuals with cancer and 1072 random participants without cancer with more than one scan were selected from the NLST. We initiated the deep learning model using Kaggle contest algorithm discovered features for cancer using each CT scan (n=4781) in its entirety as an independent observation (temporally varying features are not created). We then applied our flexible long and short-term temporal memory methodology to extract additional longitudinal features and generate a neural network model. This training for the LICDC model used NLST participants (n=829) and scans (n=3588) and 826 participants (367 had multiple scans) with 1193 scans from our local screening program. Ten-fold cross-validation was performed on a participant basis. Cancer risk probabilities from imaging-only deep learning model (DLSTM) were combined with PLCO model predictions and regressed together (LICDC-Logistic Regression). Additionally, individual variables from the PLCO model and the cancer probability from DLSTM were fitted with support vector machine (SVM) method using linear kernels (LICDC-SVM) and their area under the receiver operating curve was estimated (AUC). All AUC curves were calculated from the same 1655 patients (829 from NLST and 826 from our local screening program).


      PLCO’s predicted risk and basic imaging features (Kaggle winner) had similar accuracy with AUC of 0.815 and 0.781, respectively. Combining risk estimates from the full longitudinal deep learning and PLCO model in a simple logistic regression increased AUC to 0.861. Re-estimation of PLCO and imaging features LICDC-SVM further increased AUC to 0.918.
      Figure thumbnail fx1


      Risk prediction for lung cancer in patients who are eligible for screening is improved by combining longitudinal machine learning CT imaging data with demographic information. These methods allow for risk prediction based on a complete CT imaging data set with varying time between scans and across participants without relying on a segmented nodule.


      machine learning, Lung Screening, Risk prediction