China Develops Indigenous AI Acceleration for Aerospace and Defense

China Develops Indigenous AI Acceleration for Aerospace and Defense

In the rapidly evolving landscape of artificial intelligence and embedded computing, China is making significant strides in developing autonomous, high-performance computing solutions tailored for its aerospace and defense sectors. A recent breakthrough by researchers from Xi’an Microelectronics Technology Institute has introduced a new direction in embedded intelligent algorithm reasoning acceleration—specifically designed to meet the rigorous demands of space missions and advanced weapon systems.

As global powers race to integrate artificial intelligence into next-generation military and space platforms, the need for domestically developed, secure, and reliable computing hardware has become paramount. The research team—comprising Zhou Fan, Ma Zhong, Ma Yao, and Li Shen—has published a comprehensive study outlining two innovative, fully indigenous solutions aimed at overcoming the current reliance on foreign neural network processors. Their work, featured in Ship Electronic Engineering, presents a strategic blueprint for achieving technological self-reliance in one of the most critical domains of modern defense and space exploration.

The paper addresses a pressing challenge: despite rapid advancements in AI algorithms, Chinese aerospace and weapons platforms have historically lacked native, high-efficiency inference capabilities. Most existing commercial-grade neural network accelerators originate from Western companies such as NVIDIA and Intel, which, while powerful and energy-efficient, are neither autonomous nor controllable within China’s national security framework. This dependency poses significant risks, particularly for applications where data sovereignty, anti-tampering measures, and long-term sustainability are non-negotiable.

To bridge this gap, the authors propose a dual-path strategy centered around Field-Programmable Gate Arrays (FPGAs), leveraging China’s growing semiconductor independence. Unlike general-purpose GPUs or fixed-function ASICs, FPGAs offer a unique balance between flexibility and performance, allowing real-time reconfiguration to adapt to evolving neural network models—an essential feature for dynamic mission profiles in deep space or combat environments.

The first solution revolves around System-on-Chip (SoC) architectures using domestically produced FMQL45T900 chips developed by Shanghai Fudan Microelectronics. This SoC integrates four ARM processing cores with programmable logic fabric, enabling tight coupling between control software and hardware acceleration units. By utilizing High-Level Synthesis (HLS), the team demonstrated that complex neural networks like GoogleNet, VGG-16, ResNet-50, and YOLO can be efficiently mapped onto the FPGA fabric without requiring low-level hardware description languages. This approach significantly reduces development time and lowers the barrier for software engineers to deploy AI models directly into flight-ready systems.

What sets this SoC-based design apart is its unified programming model. Using C++ as the primary development language, both the runtime scheduler running on the ARM cores and the custom IP blocks executing convolution, pooling, and activation functions are built within the same toolchain. This “software-defined accelerator” paradigm enables rapid prototyping and iterative optimization—a crucial advantage during mission-critical development cycles. In benchmark tests, the platform achieved up to 143.85 FPS on GoogleNet and maintained stable performance across diverse models, including lightweight architectures like SqueezeNet and Superpoint, which are ideal for resource-constrained satellites and guided munitions.

Moreover, the SoC solution supports a wide range of neural operations beyond standard convolutions, including deconvolution, fully connected layers, and normalization techniques. Its ability to handle both classification and detection tasks makes it suitable for multifunctional payloads—from planetary landing navigation to onboard threat identification. With a compact form factor and moderate power consumption, the system aligns well with the size, weight, and power (SWaP) constraints typical of spacecraft and missile-borne electronics.

However, recognizing that some future missions will demand even higher computational throughput, the researchers also explored a second, more powerful architecture based on high-end domestic FPGAs from both Fudan Microelectronics and Shenzhen Guowei Electronics. These devices—the JFM7VX690T and SMQ7V690T—are functionally equivalent to Xilinx Virtex-7 690T FPGAs, offering massive parallelism through tens of thousands of logic cells and high-speed transceivers.

This alternative design adopts a heterogeneous computing model, pairing a domestic ARM processor with a standalone high-capacity FPGA connected via PCI Express. While more complex to develop than the integrated SoC, this modular approach enables superior scalability and peak performance. It allows designers to implement deeply pipelined, highly parallelized compute engines optimized for specific neural workloads, such as large-scale object detection or super-resolution imaging.

One of the key innovations in this architecture is its support for emerging neural primitives, including dilated convolutions, scale layers, and advanced nonlinear activation functions like PReLU and sigmoid. These features enable compatibility with state-of-the-art models that push the boundaries of accuracy and robustness under challenging conditions—such as detecting small targets in noisy infrared imagery or distinguishing decoys from real threats in missile defense scenarios.

Crucially, both proposed solutions prioritize autonomy and supply chain resilience. All components—from the silicon die to the development tools—are sourced from domestic suppliers. While early versions still rely partially on Xilinx-compatible toolchains like Vivado, Fudan Microelectronics has already introduced PROCISE, an independently developed FPGA synthesis environment, reducing dependence on foreign software ecosystems. This level of vertical integration ensures that China retains full control over the entire lifecycle of these accelerators, from design to deployment and maintenance.

The urgency behind this initiative stems from concrete operational requirements across multiple domains. For instance, lunar and Martian landers must perform real-time visual odometry and hazard avoidance without relying on ground-based command loops due to communication delays. Similarly, autonomous drones and cruise missiles require onboard AI to identify targets, evade defenses, and coordinate swarm tactics—all while operating under strict latency and power budgets.

Current satellite-based surveillance systems face additional challenges: they must process vast amounts of optical and radar data in orbit rather than transmitting raw feeds back to Earth. Onboard AI enables immediate filtering, compression, and event detection—only downlinking actionable intelligence. This capability is especially valuable for time-sensitive reconnaissance and early warning missions, where every second counts.

To quantify these needs, the authors compiled a detailed analysis of representative use cases, including missile obstacle avoidance, space-based early warning, ship detection, cloud masking, and fake target discrimination. Each application imposes distinct algorithmic and computational demands. For example, missile guidance may require MobileNet or YOLO variants running at 25 frames per second with sub-second latency, while high-resolution Earth observation might involve super-resolution reconstruction at gigapixel scales.

Despite their diversity, all scenarios share a common denominator: massive computational intensity measured in giga-multiply-accumulate operations per second (GMAC/s). Some tasks exceed 10 TFLOPS of effective compute, far surpassing what traditional onboard processors can deliver. Without dedicated AI accelerators, these missions would either fail to meet performance thresholds or consume prohibitive amounts of energy—both unacceptable trade-offs in aerospace engineering.

Existing commercial solutions, though impressive on paper, fall short in practice. NVIDIA’s Jetson TX2 offers 1.3 TFLOPS at 7.5–15W, and Intel’s Neural Compute Stick 2 delivers 13.9 GMAC/s at just 1W—ideal for terrestrial edge devices but unsuitable for radiation-hardened space environments or classified defense systems. Moreover, their firmware and drivers are closed-source, raising concerns about backdoors, update policies, and long-term availability.

Domestic alternatives like Cambricon’s MLU series and Huawei’s Kirin 970 NPU demonstrate strong peak performance—up to 16 TFLOPS in some configurations—but suffer from excessive power draw (80–110W) and limited software portability. Most are tied to consumer mobile or server ecosystems, lacking support for real-time operating systems (RTOS) commonly used in avionics and weapon control units. None are currently qualified for spaceflight, where reliability under extreme temperatures, vacuum, and ionizing radiation is mandatory.

By contrast, the solutions proposed by Zhou Fan et al. are explicitly designed for rugged, embedded deployment. They emphasize not only raw speed but also determinism, fault tolerance, and longevity—qualities that define mission-critical systems. The use of FPGAs further enhances reliability; unlike monolithic ASICs, they can be reprogrammed in-flight to patch bugs, upgrade algorithms, or repurpose functionality mid-mission.

Another often-overlooked aspect is algorithmic agility. As AI evolves at breakneck pace, today’s cutting-edge model may be obsolete in months. Fixed-function accelerators risk becoming stranded assets unless they can adapt. The FPGA-based designs described here inherently support this evolution, allowing new layers, attention mechanisms, or sparse computation methods to be implemented through configuration updates—without changing the underlying hardware.

From a strategic perspective, this research represents more than just a technical achievement—it signals a maturation of China’s indigenous innovation ecosystem. Just a decade ago, the country relied heavily on imported microelectronics for its most sensitive programs. Today, it possesses the capability to design, fabricate, and deploy sophisticated AI-accelerated systems entirely within its own industrial base.

This shift has profound implications for global technology competition. As major militaries worldwide adopt AI-driven autonomy, access to trusted, secure, and high-performance computing becomes a cornerstone of national defense. Countries unable to produce such technologies domestically risk strategic vulnerability, whether through supply chain disruptions or covert exploitation.

China’s progress in this domain suggests it is closing the gap with Western leaders—not necessarily in terms of raw transistor count or floating-point performance, but in system-level integration, application-specific optimization, and operational readiness. The focus is no longer on catching up, but on building differentiated capabilities tailored to its unique doctrinal and environmental requirements.

For example, the emphasis on small-sample learning and small-target detection reflects real-world constraints faced by Chinese surveillance satellites, which often operate against stealthy or maneuvering adversaries with limited training data. Similarly, the interest in swarm coordination and group decision-making mirrors evolving PLA concepts of distributed warfare, where large numbers of semi-autonomous platforms act in concert.

Looking ahead, the path forward involves several key milestones. First, transitioning these prototypes into radiation-hardened, flight-qualified products capable of surviving launch stresses and years in orbit. Second, expanding software support to include mainstream frameworks like TensorFlow Lite and ONNX Runtime, ensuring seamless model migration from research labs to deployed systems. Third, establishing certification standards and testing protocols to validate safety, security, and interoperability across different platforms.

Collaboration between academia, industry, and government will be essential. While the current work originates from a state-affiliated institute, broader adoption depends on creating open interfaces, reference designs, and developer communities. Encouraging third-party contributions could accelerate innovation while maintaining core security controls.

Ultimately, the success of these efforts will be measured not by transistor density or benchmark scores, but by tangible improvements in mission effectiveness. Can a lunar rover land more safely? Can a missile avoid countermeasures more reliably? Can a satellite detect anomalous behavior faster? These are the questions that matter.

The research conducted by Zhou Fan, Ma Zhong, Ma Yao, and Li Shen provides a solid foundation upon which such capabilities can be built. By combining architectural ingenuity with a clear-eyed assessment of operational needs, they have charted a course toward truly autonomous, intelligent, and sovereign computing for China’s aerospace and defense future.

Their vision—one where machines think fast enough to keep pace with the speed of light and the complexity of war—is no longer science fiction. It is becoming engineering reality.

Zhou Fan, Ma Zhong, Ma Yao, Li Shen, Xi’an Microelectronics Technology Institute, Ship Electronic Engineering, DOI: 10.3969/j.issn.1672-9730.2021.06.021