Airdoc Secures China’s First Class III AI Diagnostic Software Approval for Diabetic Retinopathy

Airdoc Secures China’s First Class III AI Diagnostic Software Approval for Diabetic Retinopathy

In a landmark move for artificial intelligence in healthcare, Shanghai EagleVision Medical Technology Co., Ltd. — widely known by its brand name Airdoc — has become the first company in China to receive a Class III medical device certification from the National Medical Products Administration (NMPA) for an AI-powered diagnostic software specifically designed to detect diabetic retinopathy from retinal fundus images. The approval, granted in early August 2020, marks a watershed moment not only for Airdoc but for the broader ecosystem of AI-enabled clinical decision support tools in China — and potentially sets a global precedent for how regulators can pragmatically evaluate and approve autonomous diagnostic algorithms.

What makes this certification especially significant is not just the technological sophistication of the product itself, but the sheer complexity of the regulatory journey it undertook: over two and a half years of iterative refinement, cross-institutional collaboration, clinical validation, and dialogue with regulators — all without a clear precedent to follow. At every step, Airdoc’s team had to invent new protocols, bridge institutional silos, and respond dynamically to evolving expectations from both clinical and regulatory stakeholders. Their experience offers a rare, behind-the-scenes window into how AI-driven medical devices can — and perhaps must — be brought to market in a way that balances innovation with safety, accuracy with accessibility, and ambition with humility.

From Algorithm to Application: The Birth of a Diagnostic Assistant

The story begins in early 2017. At that time, deep learning had already demonstrated revolutionary potential in image recognition tasks — from facial identification to autonomous driving. Within medical imaging, researchers were publishing increasingly convincing studies showing convolutional neural networks (CNNs) could match or even outperform human experts in detecting pathologies in radiological and ophthalmic images.

But demonstration is not deployment. For Airdoc, the challenge wasn’t just training a model that worked well in a lab setting — it was building a system that could function reliably, ethically, and legally in the chaotic, high-stakes environment of real-world clinical care.

The core product — a software that analyzes single-field, non-mydriatic retinal fundus photographs to detect referable diabetic retinopathy (DR) — was designed not to replace physicians, but to augment frontline screening. In China, where over 140 million people live with diabetes and only a limited number of retina specialists are available, especially in rural areas, early detection of DR remains a public health priority. Delays in diagnosis often result in irreversible vision loss. A scalable, automated screening tool, if built responsibly, could dramatically expand access.

The foundational technology leveraged state-of-the-art architectures like Inception-ResNet and ResNet, refined through years of internal R&D. But the real innovation lay not in the network depth or parameter count — it lay in the system surrounding the model: the data governance, the validation scaffolding, the user-interface safeguards, and — crucially — the interpretability and failure-mode transparency embedded into the clinical workflow.

Building the Foundation: Data, Annotation, and Trust

One of the earliest bottlenecks Airdoc faced was the absence of nationally recognized standards or reference datasets for AI validation in ophthalmology. Unlike radiology — where institutions like the U.S. NIH had already launched large-scale public imaging repositories — China lacked a regulatory-grade fundus image database suitable for benchmarking AI performance.

Rather than wait for such infrastructure to appear, Airdoc proactively partnered with the China National Institutes for Food and Drug Control (NIFDC), commonly referred to as the Zhongjianyuan or “Central Institute.” Together, they initiated the creation of a multi-center, geographically diverse fundus image bank — ultimately sourcing images from 11 hospitals across 10 provinces. To ensure annotation quality, dozens of ophthalmologists with at least five years’ clinical experience were recruited to label the images using standardized DR grading criteria derived from the Chinese Guidelines for Clinical Diagnosis and Treatment of Diabetic Retinopathy.

This joint effort did more than just produce a dataset — it established a collaborative precedent. For the first time, a private AI developer and a national regulatory science institution were co-developing the very benchmarks that would later be used to evaluate the developer’s own product. This unusual alignment of interests — innovation and oversight converging — became a cornerstone of the eventual regulatory success.

Once the reference database was in place, Airdoc commissioned formal type testing, working side-by-side with NIFDC engineers to define test methodologies for an entirely new class of device. Unlike traditional hardware, where failure modes are mechanical and measurable, AI software failures are often silent, statistical, and context-dependent. How do you “stress-test” an algorithm? What constitutes an acceptable false-negative rate when human blindness is at stake?

These questions had no textbook answers. So the teams iterated — defining edge cases, simulating poor image quality, injecting synthetic artifacts, and probing the system’s behavior under degraded inputs. The resulting test report didn’t just assert performance; it narrated how the system behaved across a spectrum of real-world variability — a narrative that would later prove invaluable during regulatory review.

Clinical Validation: Borrowing Wisdom, Adapting Locally

With technical validation complete, Airdoc turned to the clinical stage. Here, the team looked outward — specifically, to the United States, where IDx-DR (developed by IDx, now part of Digital Diagnostics) had become the first autonomous AI diagnostic system cleared by the FDA in 2018.

Airdoc’s clinical trial design drew heavily from IDx-DR’s pivotal study: multi-center, prospective, real-world deployment in primary care settings, with ophthalmologists serving as the reference standard. But replication wasn’t enough. To satisfy China’s regulatory and clinical expectations, the protocol was carefully localized.

For instance, the grading criteria for DR severity were adapted to align precisely with Chinese clinical guidelines — not just in terminology, but in the interpretation of lesions like microaneurysms, hemorrhages, and exudates, which can vary in presentation across populations. Moreover, the qualifications for the “gold standard” graders were elevated: only board-certified retinal specialists with documented experience in DR management were permitted to adjudicate ambiguous cases.

The trial was conducted across three top-tier Grade-A tertiary hospitals — including the renowned Beijing Tongren Hospital — enrolling hundreds of patients with varying stages of diabetes and retinal health. Importantly, images were captured using multiple makes and models of commercially available fundus cameras, testing the software’s generalizability across hardware — a frequent concern raised by regulators wary of overfitting to proprietary devices.

Statistical analysis focused not only on overall sensitivity and specificity — which exceeded pre-specified targets — but also on subgroup performance: How did the algorithm perform in patients with cataracts? In those with small pupils? In darker fundi? These granular analyses helped dispel the common misconception that “AI is a black box”: here was a system whose behavior could be interrogated, mapped, and bounded.

Navigating the Regulatory Labyrinth: Innovation Status and the Power of Persistence

No amount of technical or clinical rigor guarantees regulatory approval — especially in a nascent field. In China, Airdoc faced stiff competition: several other domestic startups were racing toward the same finish line. Knowing that the standard review pathway for Class III devices could take years, Airdoc opted for a bold strategy: apply for Innovative Medical Device designation, which offers priority review and direct expert consultation.

The first attempt in early 2019 was rejected — not due to technical flaws, but because key clinical evidence (including final trial results and peer-reviewed publications) hadn’t yet been completed. Rather than protest, Airdoc treated the rejection as feedback. Within months, the team finalized the clinical trial report, submitted a manuscript to Yan Ke Xue Bao (the Journal of Ophthalmology), and supplemented the application with additional real-world pilot data from community health centers.

The revised submission was approved within a month.

This pivot — from setback to structured response — exemplifies a pattern that ran throughout the entire registration process: anticipate, adapt, document. Every concern raised by NMPA reviewers — whether about image quality control, operator training, or the ethical boundary between “assistive” and “autonomous” — was met not with defensiveness, but with expanded documentation, updated user interface warnings, and even workflow modifications embedded directly into the software.

For example, the system includes an explicit image quality assessment module. If an uploaded image is too blurry, poorly centered, or obscured by cataract, the software doesn’t guess — it refuses to analyze and displays a plain-language message: “Image quality is low — possible causes include small pupil, lens opacity, or improper capture. Please retake the photo or consult a physician.” This “fail-safe” design principle — graceful refusal over confident error — became a major point of trust with regulators.

Similarly, the output screen never states a diagnosis outright. Instead, it presents a recommendation: “Referable diabetic retinopathy detected — patient should be referred to an ophthalmologist within 30 days.” Crucially, the interface emphasizes that the final clinical decision rests solely with the physician. Legal disclaimers, workflow diagrams, and training materials all reinforce this chain of responsibility — a non-negotiable requirement for any AI system positioned near the diagnostic endpoint.

Beyond the Certificate: A New Infrastructure for AI in Medicine

Perhaps the most enduring legacy of Airdoc’s registration journey is not the certificate itself, but the institutional pathways it helped forge.

During the review phase, the NMPA’s Center for Medical Device Evaluation (CMDE), in partnership with the China Academy of Information and Communications Technology (CAICT), launched the Artificial Intelligence Medical Device Innovation Cooperation Platform — a consortium of over ten hospitals, research institutes, and companies aimed at standardizing evaluation frameworks, sharing best practices, and accelerating responsible translation.

Airdoc’s case became one of the platform’s first major use cases — not as a finished product to be copied, but as a living case study in how to build, validate, and regulate AI in medicine.

Key regulatory guidance documents published during this period — including the Review Points for Deep Learning-Based Decision Support Medical Devices and the Technical Guidelines for Medical Device Software Registration — were shaped in part by the questions and challenges surfaced during Airdoc’s review. In effect, the company didn’t just comply with the rules — it helped write them.

This symbiotic relationship between innovator and regulator challenges the outdated narrative of tech companies “disrupting” healthcare from the outside. Instead, Airdoc’s path reveals a more sustainable model: co-evolution. Where developers engage early, listen deeply, document meticulously, and design with clinical humility — and where regulators provide clear, iterative feedback, invest in technical capacity, and remain open to revising frameworks as evidence accumulates.

Looking Ahead: From DR to Multi-Modal, Multi-Disease Screening

With regulatory clearance in hand, Airdoc has begun rolling out its software in community health stations, physical examination centers, and primary care clinics — often integrated into existing ophthalmic screening workflows. Early reports suggest significant increases in DR detection rates, especially among asymptomatic patients who might otherwise skip specialist visits.

But this is just the beginning. The underlying platform is being extended to detect other retinal conditions — glaucomatous optic neuropathy, age-related macular degeneration, even systemic markers like hypertension and anemia inferred from vascular patterns. Future versions may fuse retinal data with electronic health records, wearable sensor streams, and even speech or gait analysis — not to replace clinicians, but to surface subtle correlations invisible to the human eye.

Critically, Airdoc has committed to post-market surveillance as rigorously as pre-market validation. Real-world performance data — including false positives, user feedback, and diagnostic concordance over time — is being collected and analyzed continuously. This isn’t compliance theater; it’s recognition that AI models, like clinicians, must keep learning.

A Blueprint for the Future

Airdoc’s journey offers more than a success story — it offers a blueprint.

For developers: Start with clinical need, not algorithm novelty. Partner with regulators early — not as gatekeepers, but as collaborators. Invest as much in data governance and user experience as in model architecture. And design for failure — because in medicine, how a system breaks matters more than how it shines.

For regulators: Create sandboxed pathways for high-potential innovations, but couple them with stringent documentation expectations. Publish clear, evolving guidance — and revise it publicly as learning accumulates. Most importantly, build in-house AI literacy; without technical fluency, oversight becomes either overly permissive or stiflingly risk-averse.

And for clinicians and patients: Demand transparency. Ask how the AI was trained, how it was tested, and how uncertainty is communicated. Support tools that extend human judgment — not obscure or automate it.

China’s first Class III AI diagnostic approval is not an endpoint. It’s a foundation — one built not on hype, but on collaboration, humility, and an unwavering commitment to the patient at the center of the image.

CAO Xiaoli, CHEN Yuzhong
Shanghai EagleVision Medical Technology Co., Ltd. (Airdoc), Shanghai 200030, China
Yan Ke Xue Bao (Journal of Ophthalmology), 2021, 36(1): 111–114
doi:10.3978/j.issn.1000-4432.2021.01.17