AI Outperforms Radiologists in TB Screening for HIV Patients, Study Finds

AI Outperforms Radiologists in TB Screening for HIV Patients, Study Finds

In a groundbreaking study conducted in one of China’s most tuberculosis- and HIV-affected regions, artificial intelligence (AI) has demonstrated significantly higher sensitivity than experienced human radiologists in detecting active pulmonary tuberculosis among people living with HIV/AIDS. The findings, published in the Chinese Journal of Antituberculosis, underscore AI’s potential to transform early TB detection in high-risk, resource-limited settings—particularly where diagnostic expertise is scarce and disease presentation is atypical.

The research, led by Qian Wang and Jin-ge He from the National Center for Tuberculosis Control and Prevention at the Chinese Center for Disease Control and Prevention (China CDC) in Beijing, along with collaborators from the Sichuan Provincial Center for Disease Control and Prevention, evaluated the performance of three commercially available AI-powered chest X-ray interpretation systems against the judgments of three senior radiologists. The study focused on 633 individuals living with HIV/AIDS who underwent active TB screening in Butuo County, Liangshan Yi Autonomous Prefecture, Sichuan Province, in 2019—a region historically burdened by co-epidemics of HIV and tuberculosis.

Tuberculosis remains the leading cause of death among people with HIV globally. In the absence of effective interventions, TB accounts for up to one-third of all HIV-related fatalities. The immunosuppressed state of HIV-positive individuals not only increases their susceptibility to Mycobacterium tuberculosis infection—by as much as 19-fold compared to the general population—but also complicates clinical diagnosis. Traditional bacteriological tests, such as sputum smear microscopy or culture, often yield false negatives in this population due to low bacillary loads or difficulties in obtaining adequate specimens. Meanwhile, chest radiographs, though widely used, present unique challenges: the radiological manifestations of TB in advanced HIV are frequently atypical, lacking the classic upper-lobe cavitation seen in immunocompetent patients, and instead showing diffuse infiltrates, mediastinal lymphadenopathy, or even normal-appearing lungs.

This diagnostic ambiguity creates a critical gap in care—especially in rural China, where access to specialized radiologists is limited. It is precisely in this context that AI-driven solutions may offer a scalable, consistent, and highly sensitive alternative.

The study employed a retrospective design, analyzing digital radiography (DR) images from all 633 participants. Among them, 47 (7.4%) were confirmed to have bacteriologically positive pulmonary TB through at least one of three gold-standard methods: sputum smear, mycobacterial culture, or molecular detection (e.g., Xpert MTB/RIF). These 47 cases served as the reference standard for evaluating diagnostic accuracy.

Three senior physicians—two from radiology departments of tertiary tuberculosis specialty hospitals in Beijing and one from a clinical TB department—each independently reviewed all chest X-rays. Collectively, they flagged 198 individuals (31.3%) as having suspected active TB. However, their individual assessments varied widely: one expert identified 139 cases, another 100, and the third only 90. More alarmingly, when it came to the 47 confirmed TB cases, the human readers missed a total of 14 patients (29.8%)—meaning none of the three experts detected TB in these individuals. The sensitivity of manual reading, calculated as the proportion of true TB cases correctly identified by at least one expert, was 70.2%. Specificity—the ability to correctly rule out TB in non-cases—stood at 71.8%.

In stark contrast, the AI systems exhibited markedly higher sensitivity. Three distinct algorithms, developed by Jiangxi Zhongke Jiufeng Smart Healthcare Technology Co., Ltd., Beijing Infervision Technology Co., Ltd., and Beijing Zhangyin Medical Technology Co., Ltd., were applied independently to the same dataset. Combined, these AI tools flagged 434 individuals (68.6%) as suspicious for active TB—a much higher rate than human readers, reflecting their lower threshold for abnormality detection. Crucially, across the three AI platforms, only 5 of the 47 confirmed TB cases were missed (10.6% miss rate). The aggregate sensitivity of AI reached 89.4%, substantially outperforming human interpretation.

However, this gain in sensitivity came at the cost of specificity. The AI systems’ specificity was only 33.1%, meaning they generated a large number of false positives—flagging many individuals without TB as suspicious. This resulted in a lower overall agreement rate (37.3% vs. 71.7% for humans). Yet, in the context of active case-finding among a high-risk population, public health experts often prioritize sensitivity over specificity. A highly sensitive screening tool ensures that few true cases are missed, even if it means more individuals require follow-up testing. In settings like Butuo County, where every undetected TB case risks further transmission and poor outcomes, minimizing false negatives is paramount.

The study’s authors emphasize that AI should not replace clinicians but serve as a powerful triage tool. “In resource-constrained areas with a shortage of trained radiologists, AI can act as a first-line screener,” explains Lin Zhou, the study’s corresponding author and a senior official at China CDC’s Tuberculosis Prevention and Control Center. “It can rapidly process hundreds of images, highlight potential abnormalities, and prioritize cases for human review—effectively extending the reach of limited medical expertise.”

This approach aligns with World Health Organization (WHO) recommendations, which, since 2020, have conditionally endorsed the use of computer-aided detection (CAD) software for TB screening in high-burden settings. WHO acknowledges that while AI systems may have lower specificity, their high sensitivity makes them suitable for initial screening, particularly when integrated into digital health platforms.

Notably, the three AI systems evaluated in the study showed considerable variation in their outputs. While each detected between 247 and 299 suspicious cases, the overlap among them was partial—leading to a combined total of 434 flagged individuals. This heterogeneity reflects the current diversity in AI training data, algorithmic architectures, and decision thresholds across vendors. It also highlights a key challenge in real-world deployment: choosing the right AI tool or potentially combining multiple systems for optimal performance.

Despite this variability, all three AI platforms consistently outperformed human readers in sensitivity. One system detected 33 of 47 TB cases (70.2%), another 32 (68.1%), and the third 30 (63.8%)—each surpassing the best-performing human expert, who identified 29 cases (61.7%). The consistency of AI—unaffected by fatigue, workload, or subjective interpretation—emerges as a major advantage in large-scale screening campaigns.

The implications extend beyond TB-HIV co-infection. As China pushes forward its “Internet + Healthcare” initiative and national strategy to end TB by 2035, AI-assisted diagnostics are gaining traction in provincial programs. In Ningxia, for example, AI chest X-ray screening has already been integrated into routine TB control activities, demonstrating feasibility and acceptance among frontline health workers.

Still, the study acknowledges limitations. The relatively small number of bacteriologically confirmed TB cases (n=47) restricts the statistical power of subgroup analyses. Additionally, the researchers did not conduct longitudinal follow-up of individuals flagged by AI but not confirmed by bacteriology—leaving open the possibility that some “false positives” might represent early or paucibacillary TB cases missed by current lab methods. Future studies with larger cohorts and clinical outcome tracking are needed to refine AI thresholds and validate long-term impact.

Moreover, ethical and operational considerations remain. Deploying AI requires robust digital infrastructure, data privacy safeguards, and continuous model validation to prevent algorithmic drift or bias. Training local staff to interpret AI outputs and integrate them into clinical workflows is equally critical.

Nevertheless, the findings represent a significant step toward precision public health. By harnessing machine learning to address a decades-old diagnostic dilemma in a vulnerable population, this research exemplifies how innovation can bridge gaps in global health equity.

As Qian Wang, co-first author and epidemiologist at China CDC, notes, “For people living with HIV in remote villages, timely TB diagnosis can mean the difference between life and death. AI doesn’t get tired, doesn’t skip shifts, and can be deployed at scale. In the fight against co-epidemics, that’s a game-changer.”

The study, titled “A study on the effect of artificial intelligence automatic film reading technology in active tuberculosis screening of HIV/AIDS population,” appears in the June 2021 issue of the Chinese Journal of Antituberculosis (DOI: 10.3969/j.issn.1000-6621.2021.06.007). The research team includes Qian Wang and Yu-hong Li from the National Center for Tuberculosis Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing; Jin-ge He from the Tuberculosis Prevention and Control Institute, Sichuan Provincial Center for Disease Control and Prevention; and Ming-ting Chen and Lin Zhou from the National Center for Tuberculosis Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing. Lin Zhou is the corresponding author.