Improving Texture Recognition via Multi-Layer Feature Aggregation from Pre-Trained Vision Architectures

Neshov, N. N.; Tonchev K.; Bozhilov, I. B.; Petkova, R. R.; Manolova, A. H.

Autors: Neshov, N. N., Tonchev K., Bozhilov, I. B., Petkova, R. R., Manolova, A. H.
Title: Improving Texture Recognition via Multi-Layer Feature Aggregation from Pre-Trained Vision Architectures
Keywords: DTD, FMD, GTOS-Mobile, KTH-TIPS-2, Multi-Layer Perceptron, texture recognition, transformer architectures

Abstract: Texture recognition is a fundamental task in computer vision, with diverse applications in material sciences, medicine, and agriculture. The ability to analyze complex patterns in images has been greatly enhanced by advancements in Deep Neural Networks and Vision Transformers. To address the challenging nature of texture recognition, this paper investigates the performance of several pre-trained vision architectures for texture recognition, including both CNN- and transformer-based models. For each architecture, multi-level features are extracted from early, intermediate, and final layers, concatenated, and fed into a trainable Multi-Layer Perceptron (MLP) classifier. The architecture is thoroughly evaluated using five publicly available texture datasets, KTH-TIPS2-b, FMD, GTOS-Mobile, DTD, and Soil, with MLP hyperparameters determined through an exhaustive grid search on one of the datasets to ensure optimal performance. Extensive experiments highlight the comparative performance of each architecture and demonstrate that aggregating features from different hierarchical levels improves texture recognition in most cases, outperforming even architectures that require substantially higher computational resources. The study also shows the particular effectiveness of transformer-based models, such as BEiTv2, in achieving state-of-the-art results on four of the five examined datasets.

References

Agarwal M. Singhal A. Lall B. 3D local ternary co-occurrence patterns for natural, texture, face and bio medical image retrieval Neurocomputing 2018 313 333 345 10.1016/j.neucom.2018.06.027
Akiva P. Purri M. Leotta M. Self-supervised material and texture representation learning for remote sensing tasks Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition New Orleans, LA, USA 18–24 June 2022 8203 8215
Swetha R. Bende P. Singh K. Gorthi S. Biswas A. Li B. Weindorf D.C. Chakraborty S. Predicting soil texture from smartphone-captured digital images and an application Geoderma 2020 376 114562 10.1016/j.geoderma.2020.114562
Liu L. Chen J. Fieguth P. Zhao G. Chellappa R. Pietikäinen M. From BoW to CNN: Two decades of texture representation for texture classification Int. J. Comput. Vis. 2019 127 74 109 10.1007/s11263-018-1125-z
Zhai W. Cao Y. Zha Z.J. Xie H. Wu F. Deep structure-revealed network for texture recognition Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Seattle, WA, USA 13–19 June 2020 11010 11019
Zhai W. Cao Y. Zhang J. Zha Z.J. Deep multiple-attribute-perceived network for real-world texture recognition Proceedings of the IEEE/CVF International Conference on Computer Vision Seoul, Republic of Korea 27 October–2 November 2019 3613 3622
Chen Z. Li F. Quan Y. Xu Y. Ji H. Deep texture recognition via exploiting cross-layer statistical self-similarity Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Nashville, TN, USA 20–25 June 2021 5231 5240
Fujieda S. Takayama K. Hachisuka T. Wavelet convolutional neural networks for texture classification arXiv 2017 10.48550/arXiv.1707.07394 1707.07394
Liu Z. Mao H. Wu C.Y. Feichtenhofer C. Darrell T. Xie S. A convnet for the 2020s Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition New Orleans, LA, USA 18–24 June 2022 11976 11986
Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale arXiv 2020 2010.11929
Touvron H. Cord M. Jégou H. Deit iii: Revenge of the vit Proceedings of the European Conference on Computer Vision Tel Aviv, Israel 23–27 October 2022 Springer Berlin/Heidelberg, Germany 2022 516 533
Haralick R.M. Shanmugam K. Dinstein I.H. Textural features for image classification IEEE Trans. Syst. Man Cybern. 2007 SMC-3 610 621 10.1109/TSMC.1973.4309314
Lazebnik S. Schmid C. Ponce J. A sparse texture representation using local affine regions IEEE Trans. Pattern Anal. Mach. Intell. 2005 27 1265 1278 10.1109/TPAMI.2005.151
Jégou H. Douze M. Schmid C. Pérez P. Aggregating local descriptors into a compact image representation Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition San Francisco, CA, USA 13–18 June 2010 3304 3311
Lowe D.G. Distinctive image features from scale-invariant keypoints Int. J. Comput. Vis. 2004 60 91 110 10.1023/B:VISI.0000029664.99615.94
Gabor D. Theory of communication. Part 1: The analysis of information J. Inst. Electr. Eng. Part III Radio Commun. Eng. 1946 93 429 441 10.1049/ji-3-2.1946.0074
Ojala T. Pietikainen M. Maenpaa T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns IEEE Trans. Pattern Anal. Mach. Intell. 2002 24 971 987 10.1109/TPAMI.2002.1017623
Bu X. Wu Y. Gao Z. Jia Y. Deep convolutional network with locality and sparsity constraints for texture classification Pattern Recognit. 2019 91 34 46 10.1016/j.patcog.2019.02.003
Xue J. Zhang H. Dana K. Deep texture manifold for ground terrain recognition Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Salt Lake City, UT, USA 18–23 June 2018 558 567
Zhang H. Xue J. Dana K. Deep ten: Texture encoding network Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Honolulu, HI, USA 21–26 July 2017 708 717
Peeples J. Xu W. Zare A. Histogram layers for texture analysis IEEE Trans. Artif. Intell. 2021 3 541 552 10.1109/TAI.2021.3135804
Xu Y. Li F. Chen Z. Liang J. Quan Y. Encoding spatial distribution of convolutional features for texture representation Proceedings of the Advances in Neural Information Processing Systems 34 (NeurIPS 2021) Online 14 December 2021 Volume 34 22732 22744
Mao S. Rajan D. Chia L.T. Deep residual pooling network for texture recognition Pattern Recognit. 2021 112 107817 10.1016/j.patcog.2021.107817
Zhai W. Cao Y. Zhang J. Xie H. Tao D. Zha Z.J. On exploring multiplicity of primitives and attributes for texture recognition in the wild IEEE Trans. Pattern Anal. Mach. Intell. 2023 46 403 420 10.1109/TPAMI.2023.3325230
Chen Z. Quan Y. Xu R. Jin L. Xu Y. Enhancing texture representation with deep tracing pattern encoding Pattern Recognit. 2024 146 109959 10.1016/j.patcog.2023.109959
Scabini L. Zielinski K.M. Ribas L.C. Gonçalves W.N. De Baets B. Bruno O.M. RADAM: Texture recognition through randomized aggregated encoding of deep activation maps Pattern Recognit. 2023 143 109802 10.1016/j.patcog.2023.109802
Florindo J.B. Fractal pooling: A new strategy for texture recognition using convolutional neural networks Expert Syst. Appl. 2024 243 122978 10.1016/j.eswa.2023.122978
Maurício J. Domingues I. Bernardino J. Comparing vision transformers and convolutional neural networks for image classification: A literature review Appl. Sci. 2023 13 5521 10.3390/app13095521
Scabini L. Sacilotti A. Zielinski K.M. Ribas L.C. De Baets B. Bruno O.M. A Comparative Survey of Vision Transformers for Feature Extraction in Texture Analysis J. Imaging 2024 11 304 10.3390/jimaging11090304 41003354
Liu Z. Lin Y. Cao Y. Hu H. Wei Y. Zhang Z. Lin S. Guo B. Swin transformer: Hierarchical vision transformer using shifted windows Proceedings of the IEEE/CVF International Conference on Computer Vision Montreal, QC, Canada 10–17 October 2021 10012 10022
Yang H. Zhang S. Shen H. Zhang G. Deng X. Xiong J. Feng L. Wang J. Zhang H. Sheng S. A multi-layer feature fusion model based on convolution and attention mechanisms for text classification Appl. Sci. 2023 13 8550 10.3390/app13148550
Tang H. Li Z. Zhang D. He S. Tang J. Divide-and-conquer: Confluent triple-flow network for RGB-T salient object detection IEEE Trans. Pattern Anal. Mach. Intell. 2024 47 1958 1974 10.1109/TPAMI.2024.3511621 40030445
Liu Z. Hu H. Lin Y. Yao Z. Xie Z. Wei Y. Ning J. Cao Y. Zhang Z. Dong L. et al. Swin transformer v2: Scaling up capacity and resolution Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition New Orleans, LA, USA 18–24 June 2022 12009 12019
Tu Z. Talebi H. Zhang H. Yang F. Milanfar P. Bovik A. Li Y. Maxvit: Multi-axis vision transformer Proceedings of the European Conference on Computer Vision Tel Aviv, Israel 23–27 October 2022 Springer Berlin/Heidelberg, Germany 2022 459 479
Yu W. Wang X. Mambaout: Do we really need mamba for vision? Proceedings of the Computer Vision and Pattern Recognition Conference Nashville, TN, USA 11–15 June 2025 4484 4496
Gu A. Dao T. Mamba: Linear-time sequence modeling with selective state spaces arXiv 2023 10.48550/arXiv.2312.00752 2312.00752
Peng Z. Dong L. Bao H. Ye Q. Wei F. Beit v2: Masked image modeling with vector-quantized visual tokenizers arXiv 2022 2208.06366
Sheth F. Mathur P. Gupta A.K. Chaurasia S. An advanced artificial intelligence framework integrating ensembled convolutional neural networks and Vision Transformers for precise soil classification with adaptive fuzzy logic-based crop recommendations Eng. Appl. Artif. Intell. 2025 158 111425 10.1016/j.engappai.2025.111425
Caputo B. Hayman E. Mallikarjuna P. Class-specific material categorisation Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05), Volume 1 Beijing, China 17–20 October 2005 Volume 2 1597 1604
Sharan L. Rosenholtz R. Adelson E. Material perception: What can you see in a brief glance? J. Vis. 2009 9 784 10.1167/9.8.784
Song K. Yang H. Yin Z. Multi-scale boosting feature encoding network for texture recognition IEEE Trans. Circuits Syst. Video Technol. 2021 31 4269 4282 10.1109/TCSVT.2021.3051003
Cimpoi M. Maji S. Kokkinos I. Mohamed S. Vedaldi A. Describing Textures in the Wild Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Columbus, OH, USA 23–28 June 2014 3606 3613 10.1109/CVPR.2014.461
Neshov N. Tonchev K. Manolova A. LBCNIN: Local Binary Convolution Network with Intra-Class Normalization for Texture Recognition with Applications in Tactile Internet Electronics 2024 13 2942 10.3390/electronics13152942
pytorch.org, Instalation of Pytorch v1.12.1 Available online: https://pytorch.org/get-started/previous-versions/ (accessed on 28 November 2025)
Wightman R. Pytorch Image Models (timm) Available online: https://github.com/rwightman/pytorch-image-models (accessed on 28 November 2025)
Farhan Sheth (Phantom-fs); Contributors. Agro-Companion-Modules 2025 Available online: https://github.com/Phantom-fs/Agro-Companion-Modules (accessed on 28 November 2025)
Montgomery D.C. Design and Analysis of Experiments 9th ed. Wiley Hoboken, NJ, USA 2017
Hollander M. Wolfe D.A. Chicken E. Nonparametric Statistical Methods 3rd ed. Wiley Hoboken, NJ, USA 2013
Wightman R. Pytorch Image Models (timm)—Huggingface Available online: https://huggingface.co/timm (accessed on 28 November 2025)

Issue

Electronics (Switzerland), vol. 14, 2025, Switzerland, https://doi.org/10.3390/electronics14234779

Цитирания (Citation/s):
1. Gupta V., Mishra A., Shrivastava N., HyTexNet: Percentile-guided local encoding and deep feature fusion for enhanced texture classification, 2026, Knowledge Based Systems, issue 0, vol. 338, DOI 10.1016/j.knosys.2026.115482, issn 09507051 - 2026 - в издания, индексирани в Scopus и/или Web of Science

Вид: статия в списание, публикация в издание с импакт фактор, публикация в реферирано издание, индексирана в Scopus и Web of Science

Е-Публикации
Технически университет - София

Детайли за публикация от базата данни на ТУ - София (Publication Details)