Hyperspectral imaging (HSI) captures detailed spectral information across hundreds of bands, enabling precise classification of land covers for applications in agriculture, environmental monitoring, and mineral exploration. However, traditional classification methods such as KNN, SVM, and CNN are often struggle with limited training data and fail to capture both global and fine-grained local features effectively.
To address these challenges, Huang et al. introduce a novel Locally Enhanced Transformer Network (LETNet) for HSI classification. LETNet is designed to be lightweight yet powerful, integrating two core modules: the Multibranch Spatial–Spectral Tokenization (MSST) and the Dual-Branch Transformer Encoder (DTE).
The MSST module extracts spatial and spectral features using depth-wise and point-wise convolutions, while preserving raw data through residual connections. This approach ensures minimal information loss during tokenization and generates more discriminative tokens. The DTE module consists of a global transformer branch, which captures long-range dependencies using standard self-attention, and a locally enhanced transformer branch, which incorporates graph convolution and superpixel segmentation to encode spatial priors and enhance local feature extraction.
An adaptive fusion strategy with learnable weights dynamically integrates global and local features, optimizing classification performance. Notably, LETNet uses only one layer of CNN and transformer encoder, significantly reducing the number of parameters and mitigating overfitting risks, especially beneficial when training data is scarce.
Extensive experiments on four benchmark datasets which are PaviaU, Houston2013, LongKou, and SDFC demonstrate LETNet’s superior performance in terms of accuracy, robustness, and computational efficiency. It consistently outperforms state-of-the-art CNN and transformer-based models, showing remarkable stability and adaptability across varying patch sizes and training sample conditions.
LETNet represents a significant advancement in hyperspectral image classification, offering a scalable and efficient solution for real-world remote sensing applications.
Figure 1: Overall framework of the proposed locally enhanced transformer network for HSI classification. (a) Model consists of an MSST module and a DTE. The tokenization module extracts spectral and spatial features through depth-wise and point-wise convolution layers. The DTE consists of a global transformer encoder and a locally enhanced transformer encoder, which focus on capturing the global and local structures of HSI, respectively. (b) Local graph is constructed based on superpixel segmentation, providing local prior information for the local branch. (c) We develop an IMSA with local graph convolution and a Convblock to improve the spatial–spectral feature extraction in the locally enhanced encoder.
Reference: S. Huang, W. Xiao, H. Chen, S. K. Bejo and H. Zhang, "Hyperspectral Image Classification Based on a Locally Enhanced Transformer Network," in IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1-17, 2025, Art no. 5513217.
Link: https://doi.org/
Date of Input: 30/09/2025 | Updated: 30/09/2025 | ainzubaidah
ADMINISTRATION OFFICE
UNIVERSITI PUTRA MALAYSIA
43400 UPM SERDANG