Autors: Neshov, N. N., Tonchev K., Manolova, A. H., Petkova, R. R., Bozhilov, I. B.
Title: Visual-to-Tactile Cross-Modal Generation Using a Class-Conditional GAN with Multi-Scale Discriminator and Hybrid Loss
Keywords: augmented reality, conditional Generative Adversarial Network (cGAN), cross-modal generation, haptic feedback, hybrid loss, LMT-108 dataset, multi-scale discriminator, texture-to-tactile translation,

Abstract: Understanding surface textures through visual cues is crucial for applications in haptic rendering and virtual reality. However, accurately translating visual information into tactile feedback remains a challenging problem. To address this challenge, this paper presents a class-conditional Generative Adversarial Network (cGAN) for cross-modal translation from texture images to vibrotactile spectrograms, using samples from the LMT-108 dataset. The generator is adapted from pix2pix and enhanced with Conditional Batch Normalization (CBN) at the bottleneck to incorporate texture class semantics. A dedicated label predictor, based on a DenseNet-201 and trained separately prior to cGAN training, provides the conditioning label. The discriminator is derived from pix2pixHD and uses a multi-scale architecture with three discriminators, each comprising three downsampling layers. A grid search over multi-scale discriminator configurations shows that this setup yields optimal perceptual similarity measured by Learned Perceptual Image Patch Similarity (LPIPS). The generator is trained using a hybrid loss that combines adversarial, (Formula presented.), and feature matching losses derived from intermediate discriminator features, while the discriminators are trained using standard adversarial loss. Quantitative evaluation with LPIPS and Fréchet Inception Distance (FID) confirms superior similarity to real spectrograms. GradCAM visualizations highlight the benefit of class conditioning. The proposed model outperforms pix2pix, pix2pixHD, Residue-Fusion GAN, and several ablated versions. The generated spectrograms can be converted into vibrotactile signals using the Griffin–Lim algorithm, enabling applications in haptic feedback and virtual material simulation.

References

  1. Zhang D. Tron R. Khurshid R.P. Haptic feedback improves human-robot agreement and user satisfaction in shared-autonomy teleoperation Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA) Xi’an, China 30 May–5 June 2021 IEEE Piscataway, NJ, USA 2021 3306 3312
  2. Gani A. Pickering O. Ellis C. Sabri O. Pucher P. Impact of haptic feedback on surgical training outcomes: A randomised controlled trial of haptic versus non-haptic immersive virtual reality training Ann. Med. Surg. 2022 83 104734 10.1016/j.amsu.2022.104734
  3. Hiemstra E. Terveer E.M. Chmarra M.K. Dankelman J. Jansen F.W. Virtual reality in laparoscopic skills training: Is haptic feedback replaceable? Minim. Invasive Ther. Allied Technol. 2011 20 179 184 10.3109/13645706.2010.532502 21438717
  4. Gayathri R. Nam S. Enhancing User Experience in Virtual Museums: Impact of Finger Vibrotactile Feedback Appl. Sci. 2024 14 6593 10.3390/app14156593
  5. Li D. Xiong Q. Zhou X. Yeow R.C.H. A Novel Kinesthetic Haptic Feedback Device Driven by Soft Electrohydraulic Actuators arXiv 2024 10.48550/arXiv.2411.18387 2411.18387
  6. Li X. Liu H. Zhou J. Sun F. Learning cross-modal visual-tactile representation using ensembled generative adversarial networks Cogn. Comput. Syst. 2019 1 40 44 10.1049/ccs.2018.0014
  7. Li Y. Zhu J.Y. Tedrake R. Torralba A. Connecting touch and vision via cross-modal prediction Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Long Beach, CA, USA 15–20 June 2019 10609 10618
  8. Zhong S. Albini A. Jones O.P. Maiolino P. Posner I. Touching a nerf: Leveraging neural radiance fields for tactile sensory data generation Proceedings of the Conference on Robot Learning Auckland, New Zealand 14–18 December 2022 PMLR Cambridge MA, USA 2023 1618 1628
  9. Yang F. Ma C. Zhang J. Zhu J. Yuan W. Owens A. Touch and go: Learning from human-collected vision and touch arXiv 2022 10.48550/arXiv.2211.12498 2211.12498
  10. Luo S. Yuan W. Adelson E. Cohn A.G. Fuentes R. Vitac: Feature sharing between vision and tactile sensing for cloth texture recognition Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA) Brisbane, QLD, Australia 21–25 May 2018 IEEE Piscataway, NJ, USA 2018 2722 2727
  11. Lee J.T. Bollegala D. Luo S. “Touching to see” and “seeing to feel”: Robotic cross-modal sensory data generation for visual-tactile perception Proceedings of the 2019 International Conference on Robotics and Automation (ICRA) Montreal, QC, Canada 20–24 May 2019 IEEE Piscataway, NJ, USA 2019 4276 4282
  12. Chen J. Zhou S. Vision2Touch: Imaging Estimation of Surface Tactile Physical Properties Proceedings of the ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Rhodes Island, Greece 4–10 June 2023 IEEE Piscataway, NJ, USA 2023 1 5
  13. Liu H. Guo D. Zhang X. Zhu W. Fang B. Sun F. Toward image-to-tactile cross-modal perception for visually impaired people IEEE Trans. Autom. Sci. Eng. 2020 18 521 529 10.1109/TASE.2020.2971713
  14. Su Z. Huang B. Miao J. Wang W. Lin X. Configurable Performance-Communication Trade-Off for Quaternion-Based AUVs: A Partitioned Hybrid Event-Triggered Approach IEEE Trans. Veh. Technol. 2025 early access 10.1109/TVT.2025.3599895
  15. Huang B. Song Y. Qin H. Miao J. Zhu C. Safety-enhanced formation maneuver control for electric vehicle with edge-weighted topology and reinforcement learning strategy IEEE Trans. Aerosp. Electron. Syst. 2025 61 14716 14731 10.1109/TAES.2025.3588126
  16. Goodfellow I.J. Pouget-Abadie J. Mirza M. Xu B. Warde-Farley D. Ozair S. Courville A. Bengio Y. Generative adversarial nets Proceedings of the 28th International Conference on Neural Information Processing Systems Montreal, QC, Canada 8–13 December 2014 Volume 2 2672 2680
  17. Li Y. Zhao H. Liu H. Lu S. Hou Y. Research on visual-tactile cross-modality based on generative adversarial network Cogn. Comput. Syst. 2021 3 131 141
  18. Isola P. Zhu J.Y. Zhou T. Efros A.A. Image-to-image translation with conditional adversarial networks Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Honolulu, HI, USA 21–26 July 2017 1125 1134
  19. Wang T.C. Liu M.Y. Zhu J.Y. Tao A. Kautz J. Catanzaro B. High-resolution image synthesis and semantic manipulation with conditional gans Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Salt Lake City, UT, USA 18–23 June 2018 8798 8807
  20. Zhu J.Y. Park T. Isola P. Efros A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks Proceedings of the IEEE International Conference on Computer Vision Venice, Italy 22–29 October 2017 2223 2232
  21. Kim T. Cha M. Kim H. Lee J.K. Kim J. Learning to discover cross-domain relations with generative adversarial networks Proceedings of the International Conference on Machine Learning Sydney, Australia 6–11 August 2017 PMLR Cambridge MA, USA 2017 1857 1865
  22. Park T. Liu M.Y. Wang T.C. Zhu J.Y. Semantic image synthesis with spatially-adaptive normalization Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Long Beach, CA, USA 15–20 June 2019 2337 2346
  23. Griffin D. Lim J. Signal estimation from modified short-time Fourier transform IEEE Trans. Acoust. Speech Signal Process. 1984 32 236 243
  24. De Vries H. Strub F. Mary J. Larochelle H. Pietquin O. Courville A.C. Modulating early visual processing by language Proceedings of the 31st International Conference on Neural Information Processing Systems Long Beach, CA, USA 4–9 December 2017 6597 6607
  25. Strese M. Schuwerk C. Iepure A. Steinbach E. Multimodal feature-based surface material classification IEEE Trans. Haptics 2016 10 226 239 10.1109/toh.2016.2625787
  26. Selvaraju R.R. Cogswell M. Das A. Vedantam R. Parikh D. Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization Proceedings of the IEEE International Conference on Computer Vision Venice, Italy 22–29 October 2017 618 626
  27. Kingma D.P. Welling M. Auto-encoding variational bayes arXiv 2013 1312.6114
  28. Vaswani A. Shazeer N. Parmar N. Uszkoreit J. Jones L. Gomez A.N. Kaiser Ł. Polosukhin I. Attention is all you need Proceedings of the 31st International Conference on Neural Information Processing Systems Long Beach, CA, USA 4–9 December 2017 6000 6010
  29. Van Den Oord A. Dieleman S. Zen H. Simonyan K. Vinyals O. Graves A. Kalchbrenner N. Senior A. Kavukcuoglu K. Wavenet: A generative model for raw audio arXiv 2016 10.48550/arXiv.1609.03499 1609.03499
  30. Verma P. Chafe C. A generative model for raw audio using transformer architectures Proceedings of the 2021 24th International Conference on Digital Audio Effects (DAFx) Vienna, Austria 8–10 September 2021 IEEE Piscataway, NJ, USA 2021 230 237
  31. Zhu H. Luo M.D. Wang R. Zheng A.H. He R. Deep audio-visual learning: A survey Int. J. Autom. Comput. 2021 18 351 376 10.1007/s11633-021-1293-0
  32. Sung-Bin K. Senocak A. Ha H. Owens A. Oh T.H. Sound to visual scene generation by audio-to-visual latent alignment Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Vancouver, BC, Canada 17–24 June 2023 6430 6440
  33. Ujitoko Y. Ban Y. Vibrotactile signal generation from texture images or attributes using generative adversarial network Proceedings of the Haptics: Science, Technology, and Applications: 11th International Conference, EuroHaptics 2018 Pisa, Italy 13–16 June 2018 Proceedings, Part II 11 Springer Berlin/Heidelberg, Germany 2018 25 36
  34. Ban Y. Ujitoko Y. TactGAN: Vibrotactile designing driven by GAN-based automatic generation Proceedings of the SIGGRAPH Asia 2018 Emerging Technologies Tokyo, Japan 4–7 December 2018 1 2 10.1145/3275476.3275484
  35. Cai S. Ban Y. Narumi T. Zhu K. FrictGAN: Frictional Signal Generation from Fabric Texture Images using Generative Adversarial Network Proceedings of the ICAT-EGVE Virtual 2–4 December 2020 11 15
  36. Cai S. Zhao L. Ban Y. Narumi T. Liu Y. Zhu K. GAN-based image-to-friction generation for tactile simulation of fabric material Comput. Graph. 2022 102 460 473
  37. Cai S. Zhu K. Ban Y. Narumi T. Visual-tactile cross-modal data generation using residue-fusion gan with feature-matching and perceptual losses IEEE Robot. Autom. Lett. 2021 6 7525 7532
  38. Xi Q. Wang F. Tao L. Zhang H. Jiang X. Wu J. CM-AVAE: Cross-Modal Adversarial Variational Autoencoder for Visual-to-Tactile Data Generation IEEE Robot. Autom. Lett. 2024 9 5214 5221
  39. Simonyan K. Zisserman A. Very deep convolutional networks for large-scale image recognition arXiv 2014 1409.1556
  40. Agatsuma S. Kurogi J. Saga S. Vasilache S. Takahashi S. Simple Generative Adversarial Network to Generate Three-axis Time-series Data for Vibrotactile Displays Proceedings of the International Conference on Advances in Computer-Human Interactions, ACHI 2020 Valencia, Spain 21–25 November 2020 19 24
  41. Rombach R. Blattmann A. Lorenz D. Esser P. Ommer B. High-resolution image synthesis with latent diffusion models Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition New Orleans, LA, USA 18–24 June 2022 10684 10695
  42. Sohl-Dickstein J. Weiss E. Maheswaranathan N. Ganguli S. Deep unsupervised learning using nonequilibrium thermodynamics Proceedings of the International Conference on Machine Learning Lille, France 7–9 July 2015 PMLR Cambridge MA, USA 2015 2256 2265
  43. Corvi R. Cozzolino D. Poggi G. Nagano K. Verdoliva L. Intriguing properties of synthetic images: From generative adversarial networks to diffusion models Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Vancouver, BC, Canada 17–24 June 2023 973 982
  44. Chen M. Mei S. Fan J. Wang M. Opportunities and challenges of diffusion models for generative AI Natl. Sci. Rev. 2024 11 nwae348 10.1093/nsr/nwae348
  45. Chen C. Ding H. Sisman B. Xu Y. Xie O. Yao B.Z. Tran S.D. Zeng B. Diffusion models for multi-task generative modeling Proceedings of the The Twelfth International Conference on Learning Representations Vienna, Austria 7–11 May 2024
  46. Lin X. Xu W. Mao Y. Wang J. Lv M. Liu L. Luo X. Li X. Vision-based Tactile Image Generation via Contact Condition-guided Diffusion Model arXiv 2024 2412.01639
  47. Gu C. Gromov M. Unpaired Image-To-Image Translation Using Transformer-Based CycleGAN Proceedings of the International Conference on Software Testing, Machine Learning and Complex Process Analysis Tomsk, Russia 25–27 November 2021 Springer Berlin/Heidelberg, Germany 2021 75 82
  48. Dubey S.R. Singh S.K. Transformer-based generative adversarial networks in computer vision: A comprehensive survey IEEE Trans. Artif. Intell. 2024 5 4851 4867 10.1109/tai.2024.3404910
  49. Dou Y. Yang F. Liu Y. Loquercio A. Owens A. Tactile-augmented radiance fields Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Seattle, WA, USA 16–22 June 2024 26529 26539
  50. Yang F. Zhang J. Owens A. Generating visual scenes from touch Proceedings of the IEEE/CVF International Conference on Computer Vision Paris, France 1–6 October 2023 22070 22080
  51. Jiang S. Zhao S. Fan Y. Yin P. GelFusion: Enhancing Robotic Manipulation under Visual Constraints via Visuotactile Fusion arXiv 2025 10.48550/arXiv.2505.07455 2505.07455
  52. Huang G. Liu Z. Van Der Maaten L. Weinberger K.Q. Densely connected convolutional networks Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Honolulu, HI, USA 21–26 July 2017 4700 4708
  53. Zhang M. Terui S. Makino Y. Shinoda H. TexSenseGAN: A User-Guided System for Optimizing Texture-Related Vibrotactile Feedback Using Generative Adversarial Network IEEE Trans. Haptics 2025 18 325 339
  54. pytorch.org. Instalation of Pytorch v1.12.1 Available online: https://pytorch.org/get-started/previous-versions/ (accessed on 1 September 2025)
  55. Kingma D.P. Adam: A method for stochastic optimization arXiv 2014 1412.6980
  56. Zheng W. Liu H. Wang B. Sun F. Cross-modal learning for material perception using deep extreme learning machine Int. J. Mach. Learn. Cybern. 2020 11 813 823
  57. Zhang R. Isola P. Efros A.A. Shechtman E. Wang O. The unreasonable effectiveness of deep features as a perceptual metric Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Salt Lake City, UT, USA 18–23 June 2018 586 595
  58. Yu Y. Zhang W. Deng Y. Frechet inception distance (fid) for evaluating gans China Univ. Min. Technol. Beijing Grad. Sch. 2021 3 1 7
  59. Krizhevsky A. Sutskever I. Hinton G.E. Imagenet classification with deep convolutional neural networks Proceedings of the 25th International Conference on Neural Information Processing Systems Lake Tahoe, NV, USA 3–6 December 2012 1097 1105

Issue

Sensors, vol. 26, 2026, Switzerland, https://doi.org/10.3390/s26020426

Вид: статия в списание, публикация в издание с импакт фактор, публикация в реферирано издание, индексирана в Scopus и Web of Science