Artificial Intelligence Accuracy in Lung Nodule Detection Varies by Location

Artificial Intelligence Accuracy in Lung Nodule Detection Varies by Location, Study Finds

A new study reveals that the accuracy of artificial intelligence (AI) in detecting pulmonary nodules on chest CT scans is significantly influenced by the anatomical location of the nodules within the lungs, while nodule size and density have less impact than previously assumed. The findings, published in the peer-reviewed journal Sichuan Medical Journal, challenge the notion that AI performs uniformly across all types of lung nodules and emphasize the importance of radiologist oversight, particularly in complex anatomical regions.

The research, led by Luo Yi from the Department of Radiology at The First People’s Hospital of Longquanyi District, Chengdu, and conducted in collaboration with senior radiologists Yu Jianqun, Peng Liqing, and Zhang Wenzhao from the Department of Radiology at West China Hospital, Sichuan University, analyzed 220 consecutive chest CT scans. The team systematically evaluated over 1,500 confirmed lung nodules to assess the performance of an AI-assisted detection system, specifically the Infervision InferRead® CT Lung software, against the gold standard of expert human interpretation.

The primary objective was to determine whether the lung’s internal geography—defined by specific zones such as the hilar region, central zone, subpleural area, and areas of pleural adhesion—affected the AI’s ability to correctly identify nodules. The results were clear: location matters profoundly. The AI demonstrated a true positive nodule detection rate (TPNDR) of 97.1% in the subpleural region, the area immediately beneath the lung’s outer lining. This high accuracy is likely due to the relative simplicity of this region, which is less cluttered with blood vessels, bronchi, and other complex structures that can mimic nodules on imaging.

Conversely, the AI struggled in the hilar region, the central core of the lung where the main bronchi and pulmonary arteries enter. Here, the false negative nodule missed rate (FNNMR) was a significant 19.8%, the highest of any zone. This means that nearly one in five true nodules located near the lung’s center were missed by the AI. The dense network of vessels and airways in this area creates a complex background where subtle nodules can be easily obscured or misclassified by an algorithm looking for distinct, round shapes.

The study also identified the region of pleural adhesion—areas where the lung’s outer lining has scarred and stuck to the chest wall—as a major source of false positives. The false positive nodule detection rate (FPNDR) in this region was 31.6%, far exceeding the overall average of 17.3%. The AI frequently mistook the irregular, thickened tissue of a pleural adhesion or a nearby focal infection for a true pulmonary nodule. This highlights a critical limitation: AI systems, trained on vast datasets, can develop a bias toward flagging any abnormal density or irregularity, especially in areas with pre-existing scarring or inflammation.

The methodology was rigorous. Two senior radiologists independently reviewed the thin-slice CT images, using advanced post-processing techniques like maximum intensity projection (MIP), multiplanar reconstruction (MPR), and volume rendering (VR) to reach a consensus on the presence, location, size, and density of every true nodule. This human-verified dataset served as the benchmark against which the AI’s automated findings were compared. The lungs were divided into four distinct zones to facilitate a granular analysis of location-based performance.

The researchers then categorized the nodules not only by location but also by size (less than 5 mm, 5-10 mm, and 10-30 mm) and density (pure ground-glass, part-solid, and solid). A statistical analysis using SPSS software was performed to determine if differences in AI performance across these categories were significant.

The most striking finding was the profound impact of location. When comparing the AI’s performance across the four lung zones, the differences in TPNDR, FPNDR, and FNNMR were all statistically significant (P < 0.05). This confirms that the lung's internal architecture is a dominant factor in AI detection accuracy. The subpleural zone emerged as the AI's "sweet spot," while the hilar and pleural adhesion zones were its primary blind spots and trouble spots, respectively.

In contrast, when the researchers analyzed the data by nodule size alone, they found no statistically significant difference in the AI’s overall TPNDR, FPNDR, or FNNMR across the three size groups. This suggests that, on a broad scale, the AI is equally capable of detecting small, medium, and larger nodules. However, a deeper dive into the data revealed a crucial nuance: when size and location were analyzed together, the location effect persisted. For instance, the AI’s TPNDR for nodules of all sizes was consistently highest in the subpleural zone and lowest in the hilar zone. This means that a 3 mm nodule in the subpleural area is more likely to be detected than a 20 mm nodule in the hilar region, a counterintuitive result that underscores the overwhelming influence of anatomical context.

The analysis of nodules smaller than 5 mm revealed another important insight. While the AI’s overall FPNDR was 17.3%, a substantial portion of these false positives were very small, sub-5 mm findings. When these tiny false alarms were excluded from the calculation, the FPNDR dropped dramatically to 8.0%. This finding is critical for clinical workflow. It suggests that a significant number of AI-generated alerts for tiny nodules are likely benign artifacts, such as small vessels or minor tissue irregularities. Radiologists can use this knowledge to triage AI results, potentially spending less time investigating these numerous but low-risk findings, thereby improving efficiency without sacrificing safety.

The story with nodule density was similar to that of size. When analyzed in isolation, the AI’s performance on pure ground-glass, part-solid, and solid nodules showed no statistically significant differences. This indicates that the algorithm’s core detection mechanism is not fundamentally biased toward one density type over another. However, once again, the interaction with location revealed significant variations.

For pure ground-glass nodules (GGNs), which are often subtle and challenging to detect, the AI performed best in the hilar region, with a TPNDR of 96.3%. This was a surprising result, as the hilar region is typically difficult for both humans and machines. The reason for this high performance is not fully understood but may relate to the specific algorithm’s training data or its ability to detect certain textural patterns in that zone. However, the trade-off was a high FPNDR and FNNMR in the pleural adhesion zone, meaning GGNs were both missed and falsely identified there.

For part-solid and solid nodules, the AI’s TPNDR was highest in the subpleural zone (94.0% and 99.7%, respectively), reinforcing the idea that this is the most favorable environment for detection. The highest FNNMR for these nodules was again in the hilar region, confirming it as a high-risk zone for missed diagnoses. The false positive rates were highest in different areas: for part-solid nodules, the central zone had the highest FPNDR (28.8%), likely due to confusion with small vessels or areas of uneven perfusion; for solid nodules, the pleural adhesion zone had the highest FPNDR (27.4%), consistent with the general trend of mistaking scar tissue for nodules.

These detailed, location-specific findings have profound implications for the integration of AI into clinical radiology practice. The study’s conclusion is not that AI is unreliable, but that its reliability is context-dependent. It is an exceptionally powerful tool, capable of processing vast amounts of data and flagging potential abnormalities with high sensitivity. However, it is not a replacement for the radiologist’s expertise.

Instead, the research paints a picture of AI as a sophisticated first-line screener, a digital assistant that can handle the bulk of the work in favorable conditions. Its greatest value lies in its ability to ensure that no nodule in the easily accessible subpleural regions is overlooked. In these areas, its near-perfect detection rate can provide a safety net, especially for less experienced readers.

The real value of the human radiologist, according to this study, is in managing the AI’s known weaknesses. Radiologists must be acutely aware that the AI is most likely to miss a nodule near the lung’s center and most likely to generate a false alarm in areas of scarring. This knowledge should directly inform their reading strategy. When an AI report comes in, the radiologist should not treat all flagged nodules with equal suspicion. They should pay extra attention to the hilar region, actively searching for nodules the AI might have missed, even if the AI report is silent on that area. Conversely, they should be prepared to confidently dismiss many of the AI’s alerts in the pleural adhesion zones, knowing that a high percentage are likely to be false.

This represents a shift from a model of AI as a passive detector to a model of AI as an active partner in a diagnostic dialogue. The radiologist is no longer just reviewing an image; they are interpreting the AI’s interpretation, understanding its biases, and compensating for its limitations. This requires a new kind of expertise—a “meta-cognition” of the AI system itself.

The study also highlights the importance of continued algorithm development. The high false positive rate in pleural adhesion zones suggests that AI models need to be better trained to recognize and differentiate between benign scar tissue and malignant nodules. This could involve feeding the algorithms more diverse datasets that include a wide range of pleural pathologies, teaching them to recognize the characteristic linear, band-like appearance of adhesions versus the more rounded, spherical form of a nodule.

The research, while robust, has acknowledged limitations. The study population excluded patients with chronic lung diseases like COPD or pulmonary fibrosis. These conditions create a much noisier, more complex CT background, which could further challenge AI performance. Future studies need to evaluate AI in these more difficult, real-world scenarios to understand its true clinical utility.

Furthermore, the study did not deeply analyze the specific characteristics of the false positive findings beyond their general categories (e.g., vessel, infection, pleural thickening). A more granular analysis could pinpoint the exact visual features that are most confusing to the AI, providing direct feedback for engineers to refine the underlying algorithms.

In conclusion, this research provides a crucial, evidence-based map of AI’s performance landscape in lung nodule detection. It moves the conversation beyond simplistic metrics of overall accuracy and into the nuanced reality of clinical practice. It demonstrates that the most important factor for an AI’s success is not the nodule itself, but where that nodule lives inside the lung. By understanding these geographical biases, radiologists can harness the power of AI more effectively, using it to enhance, rather than replace, their own diagnostic acumen. The future of radiology is not human versus machine, but a sophisticated collaboration where each partner’s strengths are leveraged, and their weaknesses are compensated for, to deliver the best possible patient care.

Luo Yi, Yu Jianqun, Peng Liqing, Zhang Wenzhao. Impact of the Size and Density of Pulmonary Nodules in Different Areas on Diagnostic Accuracy of Artificial Intelligence. Sichuan Medical Journal. doi:10.16252/j.cnki.issn1004-0501-2021.09.016