An Improved YOLOv11-Based Method for Defect Detection in Lyophilized Vial
Abstract
Lyophilized Vial is the primary packaging form for injectable pharmaceuticals. However, conventional vision-based inspection methods have shown limited effectiveness in detecting Lyophilized Vial defects. Because the defect regions in Lyophilized Vials are typically small and exhibit weak feature responses, while YOLOv11 employs convolutional layers with a fixed structure, resulting in a limited receptive field and insufficient cross-scale feature interaction. Thisdiminishes the model’s ability to perceive fine-grained textures and large-scale structural features in Lyophilized Vial defect detection. To address this issue, we propose a defect detection network—SAF-YOLO (Spectrum and Attention Fusion YOLO)—built upon YOLOv11 and enhanced from the perspectives of spectrum perception and attention mechanisms. For spectrum perception, we introduce the Wavelet-C3K2 (WTC3K2) module into the backbone network. Leveraging wavelet-based spectral perception, this module enables the network to capture multi-spectral features, thereby expanding the receptive field without compromising the extraction of small-object features. For attention enhancement, we design two modules. First, the Global Context Feature Refine (GCFR) module is added between the backbone and neck networks, where spatial adaptive pooling and attention mechanisms improve the network’s capacity to model contextual information. Second, within the neck network, we deploy the Multi-Scale Attention Fusion Module (MSAFM), which integrates multi-branch convolutions with a dual-channel attention mechanism to further strengthen feature perception. Experimental results demonstrate that, across various typical Lyophilized Vial defect categories, the proposed algorithm achieves a 2.6% improvement in mAP@50 compared to the baseline YOLOv11, validating the effectiveness of the proposed approach.