Lightweight AI Model Boosts Real-Time Object Detection on Automotive Chips
In a significant leap for edge AI in autonomous driving, researchers from Tsinghua University and industry partners have developed a novel object detection architecture that delivers high accuracy while operating efficiently on resource-constrained automotive-grade chips. The innovation—centered on a new “center-convolution” design and quantization-aware deployment—enables real-time 3D perception using only 2D camera inputs, marking a practical step toward scalable, cost-effective autonomous systems.
The breakthrough addresses a core bottleneck in automotive AI: the tension between computational efficiency and detection performance. While deep learning models like convolutional neural networks (CNNs) have revolutionized computer vision, their high computational demands often render them impractical for in-vehicle deployment, where power, latency, and hardware constraints are severe. Most existing solutions either sacrifice accuracy for speed or rely on expensive sensors and high-end GPUs—neither viable for mass-market vehicles.
The newly proposed model, dubbed CR-CenterNet, builds on the anchor-free CenterNet framework but replaces standard 3×3 convolutions in the ResNet18 backbone with a custom “center-convolution” module. This design introduces a parallel 1×1 convolution branch that specifically enhances the representation of central features—critical for precise keypoint localization in CenterNet’s center-point detection paradigm. During training, this dual-branch structure improves feature learning without inflating inference complexity. At deployment, the 1×1 branch is seamlessly fused into the main 3×3 kernel, preserving the original computational footprint while boosting accuracy.
In ImageNet classification benchmarks, the modified backbone—named C-ResNet18—achieved a top-1 accuracy of 71.59%, outperforming both the official PyTorch ResNet18 (69.76%) and a re-implemented baseline (70.75%). More importantly, when deployed as CR-CenterNet for object detection in real-world driving scenarios, the model demonstrated a 5.9 percentage point improvement in bird’s-eye-view detection accuracy over its ResNet18-based counterpart, reaching 42.5% mean average precision on a custom highway dataset.
What sets this work apart is its end-to-end validation on actual automotive hardware. The team deployed the quantized CR-CenterNet model on Texas Instruments’ TDA4VM system-on-chip—a widely adopted automotive-grade processor known for its balance of performance, power efficiency, and functional safety. Using quantization-aware training (QAT), they converted the model from 32-bit floating-point to 8-bit integer precision without significant accuracy loss: post-quantization performance dipped only 1.4%, from 42.5% to 41.1%.
The efficiency gains are striking. On the TDA4 chip’s C7x DSP core, the quantized model processes a single frame in just 64 milliseconds—over 12 times faster than the same model running on a high-end server CPU (776 ms) and significantly quicker than on consumer-grade laptop GPUs. This performance enables real-time inference at over 15 frames per second, meeting the latency requirements for dynamic driving environments.
The system’s real-world utility was further validated through a clever 2D-to-3D inference pipeline. Trained exclusively on parking-lot imagery with 2D bounding boxes and ground contact points, CR-CenterNet successfully generalized to highway scenes and accurately reconstructed 3D object positions via inverse projection. This eliminates the need for costly LiDAR or stereo cameras in certain perception tasks, offering a viable path for Level 2+ and Level 3 autonomy in production vehicles.
“By co-designing the model architecture and deployment strategy, we’ve shown that high-performance perception is achievable even on modest automotive silicon,” said GONG Dahan, lead author and Ph.D. candidate at Tsinghua University’s School of Software. “The center-convolution acts like a computational magnifying glass—focusing attention where it matters most—while quantization unlocks the chip’s full potential without degrading reliability.”
The research team emphasized safety and scalability throughout development. All experiments used real-world data collected under diverse lighting and traffic conditions, and the model was tested in both parking and highway scenarios to ensure robustness. Notably, the system requires no cloud connectivity, operating entirely on-device—a critical feature for fail-operational autonomous systems where network latency or outages could be catastrophic.
This work arrives as the automotive industry intensifies its push toward software-defined vehicles. With OEMs and Tier-1 suppliers racing to deploy advanced driver-assistance systems (ADAS) at scale, efficient on-chip AI has become a strategic priority. Traditional approaches relying on over-provisioned compute hardware are giving way to algorithm-hardware co-optimization, where every watt and millisecond counts.
The TDA4 chip, used in this study, is already featured in production vehicles from major global automakers. By demonstrating that a lightweight, quantized model can deliver competitive perception performance on such platforms, the Tsinghua-led team provides a blueprint for next-generation ADAS that is both affordable and deployable today.
Moreover, the center-convolution concept is not limited to automotive applications. Its modular design can be integrated into any CNN-based vision system operating under tight resource constraints—such as drones, robotics, or mobile devices—without requiring changes to inference engines or compilers. This universality enhances its potential impact beyond autonomous driving.
Industry experts note that the real innovation lies in the holistic approach: from architectural novelty to chip-level validation. “Many papers propose clever model tweaks, but few close the loop with real hardware,” said an independent AI hardware analyst. “This team didn’t just simulate—they built, quantized, deployed, and measured. That’s the gold standard for edge AI research.”
The project was supported by China’s National Natural Science Foundation and the China Postdoctoral Innovation Talent Support Program, reflecting national strategic priorities in AI and intelligent transportation. Collaborators included researchers from the Zhuoxi Institute of Brain and Intelligence and HoloMatic Technology (Beijing) Co., Ltd.—a leading Chinese autonomous driving startup—ensuring tight alignment between academic innovation and industrial application.
Looking ahead, the team plans to extend the framework to multi-task learning—simultaneously detecting objects, estimating depth, and predicting motion—while further reducing latency. They also aim to explore compiler-level optimizations that could squeeze additional performance from automotive DSPs and NPUs.
As regulatory bodies worldwide finalize safety standards for automated driving, the demand for transparent, efficient, and verifiable AI systems will only grow. This research demonstrates that cutting-edge performance need not come at the cost of complexity or power. In the race to bring autonomy to every car, sometimes the smallest architectural changes yield the biggest real-world impact.
GONG Dahan¹,², YU Longlong³, CHEN Hui²,⁴, YANG Fan¹,², LUO Pei⁵, DING Guiguang¹,²
¹School of Software, Tsinghua University, Beijing 100084, China
²BNRist, Tsinghua University, Beijing 100084, China
³Zhuoxi Institute of Brain and Intelligence, Hangzhou 311121, China
⁴Department of Automation, Tsinghua University, Beijing 100084, China
⁵HoloMatic Technology (Beijing) Co., Ltd, Beijing 100102, China
CAAI Transactions on Intelligent Systems, 2021, 16(5): 900–907
DOI: 10.11992/tis.202107057