Dual Attention Transformers: Adaptive Linear and Hybrid Cross Attention for Remote Sensing Scene Classification
Citations Over TimeTop 10% of 2025 papers
Abstract
ABSTRACT Vision Transformers (ViTs) have demonstrated strong capabilities in capturing global contextual information compared to convolutional neural networks, making them promising for remote sensing image analysis. However, ViTs often overlook critical local features, limiting their ability to accurately interpret intricate scenes. To address this issue, we propose an adaptive linear hybrid cross attention transformer (ALHCT). It integrates adaptive linear (AL) attention and hybrid cross (HC) attention to simultaneously learn local and global features. AL is introduced into ViT, as it helps reduce computational complexity from exponential to linear scale. Furthermore, ALHCT incorporates two adaptive linear swin transformers (ALST) to achieve multi‐scale feature representation, enabling the model to capture high‐level semantics and fine details. Finally, to enhance global perception and discriminative power, HC attention fuse local and global features which captured by the two ALST. Experiments on three remote sensing datasets demonstrate that ALHCT significantly improves classification accuracy, outperforming several state‐of‐the‐art methods, validating its effectiveness in classifying complex remote sensing scenes.
Related Papers
- → ESKVS: efficient and secure approach for keyframes-based video summarization framework(2024)9 cited
- Using DataGrid Control to Realize DataBase of Querying in VB6.0(2000)
- Susquehanna Chorale Spring Concert "Roots and Wings"(2017)
- → DETERMINING QUALITY REQUIREMENTS AT THE UNIVERSITIES TO IMPROVE THE QUALITY OF EDUCATION(2018)
- → ИСПОЛЬЗОВAНИЕ ПОТЕНЦИAЛA СОЦИAЛЬНЫХ ПAРТНЕРОВ В ПОДГОТОВКЕ БУДУЩИХ ПЕДAГОГОВ(2024)