摘要
Cardiovascular diseases (CVD) remain the leading cause of mortality in patients with rheumatoid arthritis (RA). However, accurate risk stratification in this population remains challenging. Our previous cohort study revealed that both standard models for general population and specific models for RA patients (e.g. ERS-RA score) demonstrated suboptimal accuracy in predicting 10- year CVD risk among Chinese RA patients, with area under the curves (AUC) 0.756~ 0.806. In real world clinical practice of rheumatology, chest Computed Tomography (CT) is routinely performed for lung assessment of RA patients, offering a window for opportunity to screen coronary and aortic calcification. Coronary artery calcium score has been proved to improve CVD risk prediction in the general population. However, it remains unclear whether incorporating these images directly into standard clinical assessment provides significant incremental value for CVD risk prediction in RA.
To develop and validate a multimodal Deep Learning (DL) model integrating routine non-contrast chest CT images and clinical data for the prediction of CVD risk in RA patients.
Study Populatiion: RA patients were enrolled from our parent real-world RA cohort with longitudinal follow-up since 2010 to 2025. Patients were excluded if they had other autoimmune diseases, malignancy, follow-up length < 3 months, or existing CVD at baseline.
Clinical Data Collection: A comprehensive dataset comprising 58 multidimensional clinical indicators was collected at baseline, including demographics, lifestyle factors, previous medication, comorbidities, clinical assessments, laboratory parameters, functional status was assessed with the Stanford health assessment questionnaire disability index (HAQ-DI), and radiographic indicators including modified total Sharp score (mTSS).
Imaging Data Collectiion: For the imaging component, non-contrast chest CT scans were utilized for model training. The specific Regions of Interest (ROIs) covering the major coronary arteries were manually annotated by experienced radiologist, including the left main, left anterior descending (LAD), left circumflex (LCX), and right coronary arteries (RCA).
Outcome Definition: The primary outcome was the incidence of composite CVD events during follow-up, defined as coronary heart disease, stroke, peripheral arterial disease, heart failure, and CVD-related death.
Model Development: We developed a deep learning framework comparing two settings for CVD prediction: single-modality modeling and multi-modality fusion (Figure 1A). In the single-modality setting, we independently trained a Tabular Transformer (for 58-dimensional clinical variables) and a Vision Transformer (for 3D CT slices) to predict CVD separately. In the multi-modality setting, these transformers served as feature extractors, followed by a cross-attention module designed to align and integrate CT-derived visual features with clinical representations. This fused representation captures complementary information across modalities to generate the final CVD risk stratification.
Model Validation: The dataset was randomly partitioned into a training set and an independent testing set (7:3). To ensure model robustness and optimize hyperparameters, we employed 5-fold cross-validation within the training set. The final model performance was evaluated on the held-out test set using the AUC, Accuracy (ACC) and F1 score. To evaluate the incremental predictive value of our Multimodal DL model, we benchmarked the multimodal model against four traditional models: the Framingham Risk Score (FRS), SCORE2, China-PAR, and ERS-RA models.
(1) A total of 822 RA patients were enrolled with mean age 54.5±12 years and 79.8% female. The median disease duration was 6.3 [inter-quartile range (IQR): 2.0~10.9] years, and 23.4% of them were treatment-naïve. Baseline clinical profiling revealed a median CRP of 5.6 mg/L (IQR: 3.2~24.3), CDAI of 10.0 (IQR: 3.0~23.0), HAQ-DI of 0.25 (IQR: 0.0~1.0), and mTSS of 13.7 (IQR: 4.0~38.0). Seropositivity rates were 70.1% for rheumatoid factor and 67.2% for anti-cyclic citrullinated peptide antibody.
(2) There were 145 (17.6%) smokers among all RA patients. The burden of comorbidities was characterized by hyperlipidemia (30.9%), hypertension (30.7%), type 2 diabetes mellitus (15.1%), chronic kidney disease (4.5%), and obesity (3.9%). During a total follow-up of 2934.5 person-years (mean 3.9 years), 81 (9.8%) incident CVD events were recorded among all RA patients, including 38 (4.6%) coronary heart disease, 11 (1.3%) heart failure, 28 (3.4%) stroke, 1 (0.1%) peripheral arterial disease and 3 (0.4%) CVD-related death.
(3) The multimodal DL model demonstrated superior predictive performance on independent test set, achieving AUC of 0.932, ACC of 0.908 and F1 score of 0.426 (Table 1). The overall performance significantly outperformed the traditional models (AUC: 0.756~0.803, Figure 1B), as well as the unimodal models based on clinical data or chest CT images alone (AUC: 0.882~0.885). Notably, the F1 score of the multimodal DL model approach showed a substantial improvement compared to single-modality models (0.244~0.323) and the traditional models (0.148~0.295), indicating superior sensitivity in identifying high-risk patients.
This study presents a novel vision transformer-based multimodal system for CVD prediction in RA utilizing routine, non-gated chest CT images with clinical data, which provides a practical and effective way to improve CVD risk screening in daily practice.
