High-Precision Indoor Positioning Achieved with Visible Light and IMU Fusion

High-Precision Indoor Positioning Achieved with Visible Light and IMU Fusion

In an era where autonomous systems are rapidly permeating indoor environments—from warehouses and hospitals to underground parking lots and smart retail spaces—the demand for reliable, high-accuracy indoor positioning has never been more urgent. Traditional solutions such as Wi-Fi and Bluetooth often fall short, offering only meter-level precision that is insufficient for tasks requiring centimeter-level fidelity. Ultra-wideband (UWB) systems, while more accurate, come with prohibitive costs and complex infrastructure requirements. Against this backdrop, a novel approach leveraging visible light communication (VLC) and monocular vision has emerged as a compelling alternative—delivering both precision and practicality.

A team of researchers from the Information Engineering University in Zhengzhou, China, has developed a groundbreaking indoor positioning method that fuses VLC-enabled rectangular LED light panels with single-camera visual measurement and inertial measurement unit (IMU) assistance. Their work, recently published in the Journal of Geomatics Science and Technology, demonstrates centimeter-level positioning accuracy in real-world indoor settings, with update rates exceeding 30 Hz—performance metrics that meet the stringent demands of modern mobile robotics.

At the heart of this innovation lies a clever synergy between illumination and information. Modern indoor spaces, especially commercial and industrial facilities, are increasingly equipped with flat-panel LED lighting fixtures. These are not merely sources of light; they can be transformed into intelligent positioning beacons through VLC. By modulating the light output at high frequencies imperceptible to the human eye, each LED panel can broadcast a unique identifier that encodes its precise spatial coordinates. A photodiode (PD) receiver on a mobile platform—such as a robot or a handheld device—can decode this signal in real time, instantly acquiring the absolute location of the overhead light source.

However, knowing the beacon’s coordinates is only half the solution. To determine the receiver’s own position relative to that beacon, the system must also understand its geometric relationship to the light source. This is where computer vision enters the picture. The rectangular shape of standard LED panels provides a natural visual marker with four distinct corners. Using a single CMOS camera, the system captures an image of the panel and extracts these corner points through edge detection and contour analysis. Given the known physical dimensions of the panel—0.58 meters by 0.26 meters in the experimental setup—and the camera’s intrinsic parameters (such as focal length and sensor resolution), the relative pose between the camera and the light panel can be computed via a classic photogrammetric technique known as single-image space resection, or more commonly in computer vision, the Perspective-n-Point (PnP) problem.

Yet, solving PnP with only four coplanar points—especially under real-world conditions involving motion blur, varying illumination, and limited image resolution—can be unstable. Small errors in corner detection, even at the sub-pixel level, can propagate into significant positional and angular inaccuracies. This is where the IMU plays a decisive role.

The research team integrated a high-precision IMU capable of measuring tilt angles (pitch and roll) with an accuracy of 0.05° to 0.1°. While the IMU’s heading (yaw) accuracy is lower—ranging from 1° to 4°—its strength in measuring inclination is far superior to what can be reliably extracted from a single image of a distant, nearly horizontal light panel. By feeding these precise tilt measurements into the pose estimation algorithm, the system effectively decouples the rotational degrees of freedom, constraining the solution space and dramatically improving robustness.

The team evaluated four distinct algorithms for solving the PnP problem: a direct homography-based method (DH), the well-known Efficient PnP (EPnP), a quaternion-based iterative method (QI), and a globally convergent orthogonal iteration method (GOI). Through extensive simulation and real-world testing, they found that while iterative methods generally offered higher accuracy, they came at the cost of increased computational load—GOI, for instance, required up to 6,000 microseconds per iteration on a standard CPU, making it impractical for real-time applications without optimization.

Crucially, when IMU-assisted tilt data was incorporated, even the faster direct methods saw substantial improvements. The DH method, for example, saw its positioning error reduced by more than half. But the standout performer was the IMU-assisted GOI (A-GOI), which leveraged the DH solution as an initial guess to accelerate convergence. This hybrid approach achieved sub-5-centimeter 3D positioning accuracy and heading correction better than 1°—all while maintaining a processing frequency above 30 Hz on an ARM Cortex-A53 processor running at 1.4 GHz.

Real-world validation was conducted in a controlled 2 m × 2 m × 2.5 m indoor environment. The positioning module—compact enough to be mounted on a small robot—was placed 2.45 meters below a single LED panel. In rotational tests, where the module was rotated in place on a motorized turntable, the A-GOI method produced near-circular trajectories with a maximum deviation of just 62 millimeters from the reference point, compared to 112 millimeters for the IMU-assisted DH (A-DH) method. In dynamic linear motion tests, A-GOI maintained deviations under 30 millimeters from the intended path, while A-DH exhibited fluctuations up to 98 millimeters. Most impressively, the Z-axis (height) estimation with A-GOI showed a standard deviation of only 93 millimeters, compared to 243 millimeters for A-DH—demonstrating exceptional stability in depth perception, a known weakness of monocular systems.

The implications of this work extend far beyond academic interest. As indoor automation scales, the ability to localize precisely without relying on GPS or expensive infrastructure becomes a strategic advantage. VLC-based systems are inherently secure—light doesn’t penetrate walls—and immune to radio-frequency interference, making them ideal for sensitive environments like hospitals or industrial plants. Moreover, since the positioning beacons double as lighting fixtures, deployment costs are minimal: no additional hardware is needed beyond standard LED panels with embedded communication circuitry.

The researchers emphasize that their method is not intended to replace SLAM (Simultaneous Localization and Mapping) but to complement it. Visual SLAM systems, while powerful, suffer from drift over time and can fail in textureless or repetitive environments. By periodically correcting the robot’s pose using absolute references from overhead LED beacons, the proposed system can serve as a “ground truth” anchor, resetting accumulated errors and ensuring long-term stability.

This approach also sidesteps the need for complex multi-camera setups or LiDAR sensors, which remain cost-prohibitive for many applications. A single camera, a small IMU, and a photodiode—components that are now commodity items in consumer electronics—are sufficient to achieve performance that rivals far more expensive systems.

Looking ahead, the team envisions a future where every ceiling light in a smart building acts as a node in a high-precision indoor positioning network. With LED lighting already ubiquitous in modern infrastructure, the marginal cost of adding VLC capability is negligible. Combined with the computational efficiency demonstrated in this study, such a system could be deployed at scale with minimal disruption.

The research also opens new avenues for sensor fusion. Future iterations could integrate wheel odometry or barometric pressure sensors to further refine vertical positioning. Machine learning techniques might be employed to improve corner detection under challenging lighting conditions or partial occlusions. And as camera resolutions increase and processing power grows, even higher accuracy could be achieved without changing the fundamental architecture.

Critically, the system’s design adheres to principles of practicality and scalability. The use of standard rectangular panels means no custom markers are required. The communication protocol is simple and robust, designed to work within the 60-degree reception cone typical of photodiodes. And the entire pipeline—from light reception to pose estimation—runs in real time on embedded hardware, proving its viability for field deployment.

In summary, this work represents a significant leap forward in indoor positioning technology. By intelligently combining three mature technologies—visible light communication, monocular vision, and inertial sensing—the researchers have created a system that is accurate, fast, low-cost, and easy to deploy. For industries racing to automate indoor logistics, this could be the missing link that enables truly autonomous mobile robots to navigate complex environments with confidence.

As the boundaries between the digital and physical worlds continue to blur, precise spatial awareness indoors will become as essential as GPS is outdoors. Thanks to innovations like this, that future may be closer than we think.

Authors: Sun Senzhen, Li Guangyun, Wang Li, Feng Qiqiang
Affiliation: Information Engineering University, Zhengzhou 450001, China
Published in: Journal of Geomatics Science and Technology, Vol. 38, No. 3, 2021
DOI: 10.3969/j.issn.1673-6338.2021.03.007