Enhanced YOLOv5 with Attention Mechanism Boosts Underwater Treasure Detection Accuracy
In the rapidly evolving domain of marine robotics and aquaculture automation, a new breakthrough promises to transform how underwater treasures—such as sea cucumbers, sea urchins, and scallops—are detected and harvested. Researchers from Shenyang Ligong University and Liaoning Technical University have developed an advanced object detection model that significantly improves the accuracy and efficiency of identifying these high-value marine species in complex underwater environments. Dubbed CG-YOLOv5, the model integrates cutting-edge techniques in attention mechanisms, multi-scale detection, and network lightweighting to achieve state-of-the-art performance.
This innovation arrives at a critical juncture. Traditional methods of harvesting underwater delicacies—ranging from trawling nets to manual diving—have long posed serious ecological and safety concerns. Trawling damages seabed ecosystems, while manual collection is labor-intensive, costly, and hazardous for divers. As global demand for sustainable seafood rises, the aquaculture industry is increasingly turning to autonomous underwater vehicles (AUVs) equipped with computer vision systems. However, existing detection algorithms often falter in real-world underwater conditions due to poor visibility, color distortion, dynamic lighting, and occlusion—challenges that CG-YOLOv5 is specifically engineered to overcome.
At the heart of this advancement lies a strategic fusion of three key improvements over the baseline YOLOv5 architecture. First, the team incorporated the Convolutional Block Attention Module (CBAM), a dual-path attention mechanism that simultaneously refines features along channel and spatial dimensions. By emphasizing informative regions and suppressing irrelevant background noise, CBAM enables the network to better distinguish small, camouflaged, or partially obscured targets—common scenarios in natural seabed settings.
Second, the researchers expanded YOLOv5’s original three detection scales to four. This multi-scale enhancement allows the model to capture objects across a broader range of sizes with greater fidelity. Using K-means clustering on bounding box annotations from a real-world dataset, they recalibrated anchor boxes to better match the morphological characteristics of sea cucumbers, sea urchins, and scallops. The addition of a new detection head at an earlier network layer significantly boosts sensitivity to smaller targets, which are notoriously difficult to detect in underwater imagery.
Third, and perhaps most impactful for real-world deployment, is the integration of Ghost modules to replace standard Bottleneck blocks in the backbone network. Ghost modules generate feature maps through a combination of conventional convolutions and efficient linear operations, drastically reducing computational overhead without sacrificing representational power. The resulting Ghost-Bottleneck structure slashes both parameter count and floating-point operations (FLOPs), making the model far more suitable for edge devices aboard underwater robots with limited processing capacity and power budgets.
The team named their enhanced backbone CGDarkNet-53, reflecting the combined influence of CBAM and Ghost modules. Unlike YOLOv5’s CSP (Cross Stage Partial) architecture, which splits feature pathways to improve gradient flow, CG-YOLOv5 employs a streamlined CSP variant that maintains full gradient integration while benefiting from attention-guided feature selection and lightweight computation.
To validate their approach, the researchers constructed a dataset comprising 781 high-resolution images captured by underwater robots during the Zhanjiang Underwater Robot Competition. These images depict real seabed environments with natural lighting, sediment interference, and varying object densities—conditions far more challenging than synthetic or laboratory-controlled datasets. The images were manually annotated into three classes: sea cucumber, sea urchin, and scallop, and split into training and test sets in a 9:1 ratio. Crucially, no image enhancement or preprocessing—such as dehazing or contrast adjustment—was applied, ensuring the model’s robustness under authentic operational conditions.
Training was conducted on a system powered by an NVIDIA RTX 2080 Ti GPU, using PyTorch 1.6.0, CUDA 10.2, and cuDNN 7.6.5. The input resolution was standardized at 640×600 pixels, and the Adam optimizer was employed with a learning rate of 0.01 and batch size of 16. These choices reflect a balance between convergence speed and hardware accessibility, underscoring the model’s practicality for real-world deployment.
The results were compelling. CG-YOLOv5 achieved a mean Average Precision (mAP) of 95.67%—a 5.49 percentage point improvement over the original YOLOv5, which scored 90.18%. Class-specific performance was equally impressive: sea cucumbers were detected with 96.93% AP, sea urchins at 95.73%, and scallops at 94.35%. These gains were not achieved at the cost of speed; the model maintained an average inference time of just 0.023 seconds per image—only marginally slower than YOLOv5’s 0.015 seconds but significantly faster than alternatives like PP-YOLOv2 (0.077 s) and YOLOX (0.169 s).
Ablation studies further confirmed the individual contributions of each modification. Adding CBAM alone increased mAP by 4.62 points, while introducing the fourth detection scale added 3.72 points. The Ghost module reduced model size by over 27% (from 7.3 million to 5.3 million parameters) and cut FLOPs by one-third, all while slightly improving accuracy—demonstrating that intelligent architectural design can defy the typical accuracy-efficiency trade-off.
When benchmarked against seven leading object detectors—including SSD, Faster R-CNN, YOLOv4, PP-YOLO, and YOLOX—CG-YOLOv5 consistently outperformed them in both precision and robustness. Notably, algorithms like YOLOv4 and PP-YOLO struggled severely with sea urchin detection, achieving AP scores below 45%, likely due to their inability to handle texture-poor, low-contrast objects against rocky or sandy seabeds. In contrast, CG-YOLOv5 reliably detected even partially buried or clustered specimens, thanks to its attention-guided feature refinement and enhanced multi-scale fusion.
The implications extend beyond academic achievement. For commercial aquaculture operations, higher detection accuracy translates directly into reduced harvest time, lower operational costs, and minimized ecological disruption. Autonomous robots equipped with CG-YOLOv5 could selectively harvest mature specimens while avoiding juveniles or non-target species—a critical step toward sustainable marine resource management. Moreover, the model’s lightweight design enables deployment on compact, battery-powered AUVs, expanding access to small-scale fisheries that lack the infrastructure for large robotic systems.
Looking ahead, the research team acknowledges room for further refinement. While the current model operates on raw underwater imagery, future iterations may integrate pre-processing stages—such as underwater image restoration or color correction—to further boost performance in extremely turbid conditions. Additionally, the framework could be extended to detect additional species or incorporate depth estimation for 3D localization, enabling more sophisticated robotic manipulation.
This work also highlights a broader trend in AI-driven marine technology: the shift from generic, off-the-shelf models to domain-specialized architectures. As underwater datasets grow and domain knowledge accumulates, we can expect increasingly tailored solutions that respect the unique physics and biology of marine environments. CG-YOLOv5 exemplifies this paradigm—engineered not just for “object detection,” but for the specific visual and operational challenges of benthic aquaculture.
From a scientific standpoint, the study demonstrates the power of modular innovation. Rather than proposing an entirely new architecture, the authors strategically enhanced existing components—attention, scaling, and efficiency—each validated through rigorous ablation. This approach not only accelerates development but also ensures compatibility with established YOLO ecosystems, facilitating adoption by practitioners.
In conclusion, CG-YOLOv5 represents a significant leap forward in underwater object detection. By harmonizing attention mechanisms, multi-scale analysis, and lightweight design, it delivers unprecedented accuracy without compromising speed or deployability. As marine robotics transitions from experimental prototypes to commercial tools, such innovations will be indispensable in building a sustainable, efficient, and intelligent blue economy.
Lin Sen (School of Automation and Electrical Engineering, Shenyang Ligong University, Shenyang 110159, China), Liu Meiyi and Tao Zhiyong (School of Electronic and Information Engineering, Liaoning Technical University, Huludao 125105, China). Published in Transactions of the Chinese Society of Agricultural Engineering, 2021, 37(18): 307–314. DOI: 10.11975/j.issn.1002-6819.2021.18.035.