Predicting depression onset in young people based on clinical, cognitive, environmental and neurobiological data

Abstract

Background: Adolescent onset of depression is associated with long-lasting negative consequences. Identifying adolescents at risk for developing depression would enable the monitoring of risk-factors and the development of early intervention strategies. Using machine learning to combine several risk factors from multiple modalities might allow prediction of depression onset at the individual level.

Methods: A subsample of a multi-site longitudinal study in adolescents, the IMAGEN study, was used to predict future (subthreshold) major depressive disorder (MDD) onset in healthy adolescents. Based on 2-year and 5-year follow-up data, participants were grouped into: 1) developing an MDD diagnosis or subthreshold MDD and 2) healthy controls. Baseline measurements of 145 variables from different modalities (clinical, cognitive, environmental and structural magnetic resonance imaging [MRI]) at age 14 were used as input to penalized logistic regression (with different levels of penalization) to predict depression onset in a training dataset (N=407). The features contributing highest to the prediction were validated in an independent hold-out sample (3 independent IMAGEN sites; N=137).

Results: The area under the receiver operating characteristics curve (AUROC) for predicting depression onset ranged between 0.70-0.72 in the training dataset. Baseline severity of depressive symptoms, female sex, neuroticism, stressful life events and surface area of the supramarginal gyrus contributed most to the predictive model and predicted onset of depression with an AUROC between 0.68-0.72 in the independent validation sample.

Conclusions: This study showed that depression onset in adolescents can be predicted based on a combination multimodal data of clinical, life events, personality traits, brain structure variables.

Keywords: Depression; adolescents; machine learning; major depressive disorder; penalized logistic regression; prediction.