AI in Cancer Research: Global Trends and China’s Path Forward
Over the past decade, artificial intelligence (AI) has transitioned from a speculative concept to a transformative force across numerous scientific domains. In oncology, where early detection, accurate diagnosis, and personalized treatment remain critical challenges, AI has emerged as a pivotal tool. A comprehensive bibliometric analysis published in Cancer Research and Prevention and Treatment in 2021 sheds light on the global landscape of AI applications in cancer research between 2010 and 2019. The study, conducted by Yang Wenjing, Lü Zhangyan, Feng Xiaoshuang, Wang Wei, Ren Jiansong, Chi Hui, and Du Ranran from institutions including the National Cancer Center and the Chinese Academy of Medical Sciences, offers a detailed mapping of research output, international collaboration, thematic evolution, and emerging frontiers.
The analysis, based on 6,131 peer-reviewed articles retrieved from the Web of Science Core Collection, reveals a consistent upward trajectory in scholarly output. The volume of publications nearly doubled in the latter half of the decade, with 3,625 articles—representing 59% of the total—published between 2017 and 2019. This surge underscores a growing recognition of AI’s potential to address persistent challenges in oncology, from deciphering complex molecular mechanisms to improving clinical decision-making.
At the forefront of this global research effort stands the United States. American institutions led in both quantity and quality of output, contributing 2,151 articles—the highest among all nations. More significantly, the citation frequency of U.S.-affiliated publications consistently exceeded the global average, indicating not only high productivity but also substantial academic influence. This leadership is further reinforced by the prominence of U.S. research institutions in international collaboration networks. Harvard University, the National Cancer Institute, and Yale University rank among the top ten most central institutions globally, serving as key nodes in a vast web of scientific exchange.
China, while trailing the U.S. in citation impact, has rapidly ascended as a major player in the field. With 1,341 publications, China ranks second in total output, reflecting a robust domestic research ecosystem and strong governmental support for AI development. However, the analysis highlights a critical gap: despite high volume, Chinese articles exhibit lower citation rates compared to the global benchmark. This disparity suggests that while China is producing a large body of work, the international scientific community may perceive a relative lack of groundbreaking or highly influential contributions. The study attributes this to several factors, including fragmented institutional collaboration, a focus on applied rather than foundational research, and underinvestment in core algorithmic innovation compared to the U.S.
The spatial distribution of international collaboration reveals a highly interconnected yet unevenly structured global network. The United States maintains the highest cooperation frequency, with over 2,000 recorded collaborative links, followed by China with 1,328. Yet, when measuring centrality—a metric that reflects a node’s strategic importance in connecting disparate parts of a network—U.S. institutions dominate. In contrast, Chinese research entities, despite their high volume of partnerships, occupy less central positions. The Chinese Academy of Sciences, for instance, ranks 20th in international cooperation centrality, indicating that while it engages in numerous collaborations, it does not serve as a primary bridge between different global research clusters. This structural difference suggests that U.S. institutions are more deeply embedded in the core of global knowledge production, whereas Chinese institutions, though active, often operate on the periphery of these influential networks.
A closer examination of research content reveals distinct thematic concentrations. Breast cancer and lung cancer emerge as the dominant foci of AI applications in oncology. These two malignancies, among the most prevalent and deadly worldwide, present rich datasets and clear clinical pathways where AI can deliver measurable impact. In breast cancer research, the evolution of AI use is particularly pronounced. Early studies from the 2010s primarily employed neural networks for tumor classification. By the mid-2010s, the focus shifted toward enhancing radiotherapy precision through texture analysis of tumors. From 2015 onward, machine learning and deep convolutional neural networks were increasingly applied to interpret clinical imaging modalities such as CT, MRI, and ultrasound, enabling more accurate detection and characterization of lesions.
Lung cancer research followed a parallel trajectory. Initial efforts centered on identifying biomarkers using artificial neural networks. By 2013–2014, diagnostic and detection methodologies became central, especially in the context of squamous cell carcinoma. By 2019, machine learning had become a dominant theme, particularly in the analysis of radiological images for early-stage detection. The convergence of AI with advanced imaging technologies has proven especially valuable in lung cancer, where subtle nodules on CT scans can be difficult for human radiologists to distinguish from benign structures. AI models, trained on vast datasets, have demonstrated the ability to detect these early signs with high sensitivity, potentially reducing false negatives and enabling earlier intervention.
Beyond these two prominent cancer types, AI’s utility extends across the entire spectrum of oncological inquiry. In basic research, AI has been instrumental in analyzing gene expression patterns, modeling cellular signaling pathways, and predicting tumor growth and metastasis. The integration of AI with high-throughput technologies such as microarrays has enabled researchers to sift through thousands of genetic variables to identify those most strongly associated with cancer progression. In clinical settings, AI supports risk stratification, treatment planning, and survival prediction. Predictive models built using machine learning algorithms can integrate diverse data—genomic profiles, clinical histories, imaging features—to forecast patient outcomes and optimize therapeutic strategies.
The methodological foundations of these applications are themselves a major area of innovation. The co-citation analysis in the study identifies seminal works that have shaped the technical backbone of AI in cancer research. Key among them are foundational papers on deep learning, such as Alex Krizhevsky’s 2012 work on ImageNet classification using deep convolutional neural networks, and the 2015 paper by LeCun, Bengio, and Hinton that consolidated deep learning as a dominant paradigm. Other highly cited references include tools like LIBSVM, a library for support vector machines, and Scikit-learn, a Python-based machine learning framework widely used for data analysis and model development. These resources have democratized access to AI techniques, allowing biomedical researchers without extensive programming expertise to apply sophisticated algorithms to their data.
The concept of “burst” keywords—terms that experience a sudden increase in usage over a short period—provides insight into the shifting frontiers of the field. Among the most prominent burst terms are artificial neural network, protein, model, discovery, classification, genetic algorithm, and microarray. The rise of artificial neural network reflects the growing adoption of deep learning architectures capable of handling complex, non-linear relationships in biological data. The prominence of protein signals a shift toward proteomics, where AI is used to analyze protein expression, interaction networks, and post-translational modifications—critical factors in cancer biology that are not always predictable from genomic data alone.
The emphasis on model and discovery underscores a broader trend: the use of AI not just as a diagnostic tool, but as a discovery engine. Researchers are increasingly leveraging AI to uncover novel biomarkers, identify new cancer subtypes, and propose previously unrecognized therapeutic targets. For example, unsupervised learning algorithms can cluster patient data without predefined labels, revealing hidden patterns that may correspond to distinct molecular subtypes of cancer. This capacity for hypothesis generation positions AI as a partner in scientific exploration, rather than merely a tool for validation.
Despite these advances, significant challenges remain. One of the most pressing is the issue of model generalizability and reproducibility. Many AI models perform exceptionally well on the datasets on which they are trained but fail when applied to data from different institutions or populations. This limitation stems from factors such as data heterogeneity, biases in training data, and insufficient validation protocols. The study notes that concerns about overfitting, model configuration, and evaluation rigor are central to ongoing methodological research. Techniques such as dropout regularization, which randomly deactivates neurons during training to prevent overfitting, and residual learning, which enables the training of deeper networks, are being actively refined to improve model robustness.
Another challenge lies in the interpretability of AI systems. Deep learning models, often described as “black boxes,” make predictions based on complex internal representations that are difficult for humans to understand. In clinical settings, where decisions can have life-or-death consequences, the lack of transparency raises ethical and practical concerns. Clinicians may be reluctant to trust a model whose reasoning they cannot follow, and regulatory agencies require clear explanations of how decisions are made. Efforts to develop explainable AI (XAI) methods—such as attention mechanisms that highlight which parts of an image influenced a diagnosis—are gaining traction, but widespread clinical adoption remains limited.
The role of data is paramount. High-performing AI models require large, diverse, and well-annotated datasets. In oncology, access to such data is often constrained by privacy regulations, data silos within healthcare institutions, and the high cost of annotation by expert pathologists or radiologists. International data-sharing initiatives and federated learning approaches—where models are trained across decentralized data sources without transferring raw data—offer potential solutions, but they require robust governance frameworks and technical infrastructure.
For China, the study offers a clear roadmap for strengthening its position in the global AI-oncology landscape. While the country has made impressive gains in research output, elevating the impact and originality of its contributions will require strategic investments in several areas. First, fostering deeper collaboration between academic institutions, hospitals, and private technology companies can create synergies where technical expertise meets clinical data and real-world application needs. Second, increasing funding for fundamental research in AI algorithms and computational biology can help close the gap with U.S. innovation in core technologies. Third, promoting interdisciplinary training programs that produce researchers fluent in both biomedical science and computer science can cultivate the next generation of leaders in the field.
Moreover, China has a unique opportunity to leverage its vast patient population and growing digital health infrastructure to generate high-quality datasets for AI training. By establishing national data repositories with standardized formats and ethical safeguards, Chinese researchers could contribute to global efforts while advancing domestic innovation. Participation in international consortia and adherence to open science principles would further enhance the visibility and credibility of Chinese research.
The convergence of AI and oncology represents one of the most promising frontiers in modern medicine. From improving the accuracy of cancer screening to enabling personalized treatment regimens, AI has the potential to transform patient outcomes on a global scale. However, realizing this potential requires more than technological advancement; it demands a coordinated effort to address issues of data access, model transparency, clinical integration, and equitable collaboration.
As the field continues to evolve, the insights from this bibliometric analysis serve as both a benchmark and a call to action. For researchers, it highlights the importance of methodological rigor and interdisciplinary engagement. For policymakers, it underscores the need for sustained investment in foundational research and digital health infrastructure. For clinicians, it offers a vision of a future where AI augments human expertise, leading to earlier diagnoses, more effective treatments, and ultimately, a reduction in the global burden of cancer.
The trajectory of AI in cancer research is undeniably upward, but the path forward must be navigated with care, collaboration, and a commitment to scientific excellence. As Yang Wenjing, Lü Zhangyan, Feng Xiaoshuang, Wang Wei, Ren Jiansong, Chi Hui, and Du Ranran conclude, the future of AI in oncology will be shaped not just by technological breakthroughs, but by the strength of international partnerships, the depth of institutional cooperation, and the willingness to embrace cross-disciplinary innovation. The next decade promises not only more powerful algorithms but also a more integrated, ethical, and impactful application of AI in the fight against cancer.
Artificial Intelligence in Cancer Research: A Decade in Review
Yang Wenjing, Lü Zhangyan, Feng Xiaoshuang, Wang Wei, Ren Jiansong, Chi Hui, Du Ranran, National Cancer Center/Chinese Academy of Medical Sciences, Cancer Research and Prevention and Treatment, DOI: 10.3971/j.issn.1000-8578.2020.20.0657