Dual Attention Transformers: Adaptive Linear and Hybrid Cross Attention for Remote Sensing Scene Classification | doi.page

0 citations0 references

Dual Attention Transformers: Adaptive Linear and Hybrid Cross Attention for Remote Sensing Scene Classification

IET Image Processing2025Vol. 19(1)

Citations Over TimeTop 10% of 2025 papers

Yake Zhang, Y. B. Zhao, Jianlong Wang, Zhengwei Xu, Dong Liu

Abstract

ABSTRACT Vision Transformers (ViTs) have demonstrated strong capabilities in capturing global contextual information compared to convolutional neural networks, making them promising for remote sensing image analysis. However, ViTs often overlook critical local features, limiting their ability to accurately interpret intricate scenes. To address this issue, we propose an adaptive linear hybrid cross attention transformer (ALHCT). It integrates adaptive linear (AL) attention and hybrid cross (HC) attention to simultaneously learn local and global features. AL is introduced into ViT, as it helps reduce computational complexity from exponential to linear scale. Furthermore, ALHCT incorporates two adaptive linear swin transformers (ALST) to achieve multi‐scale feature representation, enabling the model to capture high‐level semantics and fine details. Finally, to enhance global perception and discriminative power, HC attention fuse local and global features which captured by the two ALST. Experiments on three remote sensing datasets demonstrate that ALHCT significantly improves classification accuracy, outperforming several state‐of‐the‐art methods, validating its effectiveness in classifying complex remote sensing scenes.

Citations Over TimeTop 10% of 2025 papers

Abstract

Related Papers