论文详情

作者：柴静

单位：北京大学第三医院

摘要

Multiple large language models (LLMs) have shown promise in disease differential diagnosis. However, real-world evidence on the impact of localized DeepSeek deployment on diagnostic performance and related costs remains limited.To assess the effect of hospital-localized DeepSeek R1 deployment on differential diagnostic efficacy and diagnosis-related expenditures using differential diagnosis records of first-admitted inpatients.

This single-center comparative study was conducted in Department of Rheumatology and Immunology, Peking University Third Hospital. Eligible participants were first-admitted inpatients with complete differential diagnosis records. The study included two cohorts: patients admitted between April and July 2024 (without DeepSeek access) and those admitted between April and July 2025 (with real-time DeepSeek support). Of 210 screened admission records (105 per cohort), 14 were excluded (1 with incomplete data and 13 with unconfirmed diagnoses), resulting in a final sample of 196 records (98 per cohort).Primary outcome was diagnostic comprehensiveness, evaluated using a 5-point Likert scale. Exploratory efficacy outcomes included diagnostic accuracy, clarity, and effectiveness. Secondary outcomes were diagnostic ranking and diagnosis-related expenditures. Correction for multiple comparisons was applied for exploratory analyses.Statistical analyses included Fisher’s exact test for comparing disease composition across cohorts, and Mann-Whitney U test for analyzing differences in diagnostic efficacy scores and diagnosis-related expenditures.

The DeepSeek cohort had significantly improved diagnostic comprehensiveness (median, 4 [IQR, 4.0–5.0] vs. 5 [IQR, 4.0–5.0]; P=0.005) compared with the non-DeepSeek cohort. No significant differences were observed in accuracy(median, 5 [IQR, 4.0–5.0] vs. 5 [IQR, 4.0–5.0]; P=0.995), clarity(median, 5 [IQR, 4.0–5.0] vs. 5 [IQR, 3.2–5.0]; P=0.09), effectiveness(median, 5 [IQR, 4.0–5.0] vs. 5 [IQR, 4.0–5.0]; P=0.06), top-1 diagnosis correctness（P=0.15）, or diagnosis-related expenditures (median [IQR], 1065 [727.00–1611.00] vs. 1003 [826.00–1457.00], P=0.98).

Localized deployment of DeepSeek R1 enhances the comprehensiveness of differential diagnoses for first-admitted inpatients with rheumatic diseases without increasing costs. These findings support the integration of localized medical LLMs into clinical workflows as a replicable paradigm for improving diagnostic scope while maintaining cost neutrality.

关键词： Large language models; DeepSeek; Local deployment; Rheumatology; Diagnostic efficacy; Clinical implementation;

来源：中华医学会第二十八次风湿病学学术会议