Automated Speech Markers of Alzheimer Dementia: Test of Cross-Linguistic Generalizability

Journal of medical Internet research

J Med Internet Res. 2025 Oct 15;27:e74200. doi: 10.2196/74200.

ABSTRACT

BACKGROUND: Automated speech and language analysis (ASLA) is gaining momentum as a noninvasive, affordable, and scalable approach for the early detection of Alzheimer disease (AD). Nevertheless, the literature presents 2 notable limitations. First, many studies use computationally derived features that lack clinical interpretability. Second, a significant proportion of ASLA studies have been conducted exclusively in English speakers. These shortcomings reduce the utility and generalizability of existing findings.

OBJECTIVE: To address these gaps, we investigated whether interpretable linguistic features can reliably identify AD both within and across language boundaries, focusing on English- and Spanish-speaking patients and healthy controls (HCs).

METHODS: We analyzed speech recordings from 211 participants, encompassing 117 English speakers (58 patients with AD and 59 HCs) and 94 Spanish speakers (47 patients with AD and 47 HCs). Participants completed a validated picture description task from the Boston Diagnostic Aphasia Examination, eliciting natural speech under controlled conditions. Recordings were preprocessed and transcribed before extracting (1) speech timing features (eg, pause duration, speech segment ratios, and voice rate) and (2) lexico-semantic features (lexical category ratios, semantic granularity, and semantic variability). Machine learning classifiers were trained with data from English-speaking patients and HCs, and then tested (1) in a within-language setting (with English-speaking patients and HCs) and (2) in a between-language setting (with Spanish-speaking patients and HCs). Additionally, the features were used to predict cognitive functioning as measured by the Mini-Mental State Examination (MMSE).

RESULTS: In the within-language condition, combined speech timing and lexico-semantic features yielded maximal classification (area under the receiver operating characteristic curve [AUC]=0.88), outperforming single-feature models (AUC=0.79 for timing features; AUC=0.80 for lexico-semantic features). Timing features showed the strongest MMSE prediction (R=0.43, P<.001). In the between-language condition, speech timing features generalized well to Spanish speakers (AUC=0.75) and predicted Spanish-speaking patients' MMSE scores (R=0.39, P<.001). Lexico-semantic features showed lower performance (AUC=0.64) and no significant MMSE prediction (R=-0.31, P=.05). The combined model did not improve results (AUC=0.65; R=0.04, P=.79).

CONCLUSIONS: These results suggest that while both timing and lexico-semantic features are informative within the same language, only speech timing features demonstrate consistent performance across languages. By focusing on clinically interpretable features, this approach supports the development of clinically usable ASLA tools.

PMID:41091545 | DOI:10.2196/74200