Autors: Neshov, N. N., Tonchev K., Manolova, A. H., Petkova, R. R., Bozhilov, I. B. Title: Visual-to-Tactile Cross-Modal Generation Using a Class-Conditional GAN with Multi-Scale Discriminator and Hybrid Loss Keywords: augmented reality, conditional Generative Adversarial Network (cGAN), cross-modal generation, haptic feedback, hybrid loss, LMT-108 dataset, multi-scale discriminator, texture-to-tactile translation, Abstract: Understanding surface textures through visual cues is crucial for applications in haptic rendering and virtual reality. However, accurately translating visual information into tactile feedback remains a challenging problem. To address this challenge, this paper presents a class-conditional Generative Adversarial Network (cGAN) for cross-modal translation from texture images to vibrotactile spectrograms, using samples from the LMT-108 dataset. The generator is adapted from pix2pix and enhanced with Conditional Batch Normalization (CBN) at the bottleneck to incorporate texture class semantics. A dedicated label predictor, based on a DenseNet-201 and trained separately prior to cGAN training, provides the conditioning label. The discriminator is derived from pix2pixHD and uses a multi-scale architecture with three discriminators, each comprising three downsampling layers. A grid search over multi-scale discriminator configurations shows that this setup yields optimal perceptual similarity measured by Learned Perceptual Image Patch Similarity (LPIPS). The generator is trained using a hybrid loss that combines adversarial, (Formula presented.), and feature matching losses derived from intermediate discriminator features, while the discriminators are trained using standard adversarial loss. Quantitative evaluation with LPIPS and Fréchet Inception Distance (FID) confirms superior similarity to real spectrograms. GradCAM visualizations highlight the benefit of class conditioning. The proposed model outperforms pix2pix, pix2pixHD, Residue-Fusion GAN, and several ablated versions. The generated spectrograms can be converted into vibrotactile signals using the Griffin–Lim algorithm, enabling applications in haptic feedback and virtual material simulation. References - Zhang D. Tron R. Khurshid R.P. Haptic feedback improves human-robot agreement and user satisfaction in shared-autonomy teleoperation Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA) Xi’an, China 30 May–5 June 2021 IEEE Piscataway, NJ, USA 2021 3306 3312
- Gani A. Pickering O. Ellis C. Sabri O. Pucher P. Impact of haptic feedback on surgical training outcomes: A randomised controlled trial of haptic versus non-haptic immersive virtual reality training Ann. Med. Surg. 2022 83 104734 10.1016/j.amsu.2022.104734
- Hiemstra E. Terveer E.M. Chmarra M.K. Dankelman J. Jansen F.W. Virtual reality in laparoscopic skills training: Is haptic feedback replaceable? Minim. Invasive Ther. Allied Technol. 2011 20 179 184 10.3109/13645706.2010.532502 21438717
- Gayathri R. Nam S. Enhancing User Experience in Virtual Museums: Impact of Finger Vibrotactile Feedback Appl. Sci. 2024 14 6593 10.3390/app14156593
- Li D. Xiong Q. Zhou X. Yeow R.C.H. A Novel Kinesthetic Haptic Feedback Device Driven by Soft Electrohydraulic Actuators arXiv 2024 10.48550/arXiv.2411.18387 2411.18387
- Li X. Liu H. Zhou J. Sun F. Learning cross-modal visual-tactile representation using ensembled generative adversarial networks Cogn. Comput. Syst. 2019 1 40 44 10.1049/ccs.2018.0014
- Li Y. Zhu J.Y. Tedrake R. Torralba A. Connecting touch and vision via cross-modal prediction Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Long Beach, CA, USA 15–20 June 2019 10609 10618
- Zhong S. Albini A. Jones O.P. Maiolino P. Posner I. Touching a nerf: Leveraging neural radiance fields for tactile sensory data generation Proceedings of the Conference on Robot Learning Auckland, New Zealand 14–18 December 2022 PMLR Cambridge MA, USA 2023 1618 1628
- Yang F. Ma C. Zhang J. Zhu J. Yuan W. Owens A. Touch and go: Learning from human-collected vision and touch arXiv 2022 10.48550/arXiv.2211.12498 2211.12498
- Luo S. Yuan W. Adelson E. Cohn A.G. Fuentes R. Vitac: Feature sharing between vision and tactile sensing for cloth texture recognition Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA) Brisbane, QLD, Australia 21–25 May 2018 IEEE Piscataway, NJ, USA 2018 2722 2727
- Lee J.T. Bollegala D. Luo S. “Touching to see” and “seeing to feel”: Robotic cross-modal sensory data generation for visual-tactile perception Proceedings of the 2019 International Conference on Robotics and Automation (ICRA) Montreal, QC, Canada 20–24 May 2019 IEEE Piscataway, NJ, USA 2019 4276 4282
- Chen J. Zhou S. Vision2Touch: Imaging Estimation of Surface Tactile Physical Properties Proceedings of the ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Rhodes Island, Greece 4–10 June 2023 IEEE Piscataway, NJ, USA 2023 1 5
- Liu H. Guo D. Zhang X. Zhu W. Fang B. Sun F. Toward image-to-tactile cross-modal perception for visually impaired people IEEE Trans. Autom. Sci. Eng. 2020 18 521 529 10.1109/TASE.2020.2971713
- Su Z. Huang B. Miao J. Wang W. Lin X. Configurable Performance-Communication Trade-Off for Quaternion-Based AUVs: A Partitioned Hybrid Event-Triggered Approach IEEE Trans. Veh. Technol. 2025 early access 10.1109/TVT.2025.3599895
- Huang B. Song Y. Qin H. Miao J. Zhu C. Safety-enhanced formation maneuver control for electric vehicle with edge-weighted topology and reinforcement learning strategy IEEE Trans. Aerosp. Electron. Syst. 2025 61 14716 14731 10.1109/TAES.2025.3588126
- Goodfellow I.J. Pouget-Abadie J. Mirza M. Xu B. Warde-Farley D. Ozair S. Courville A. Bengio Y. Generative adversarial nets Proceedings of the 28th International Conference on Neural Information Processing Systems Montreal, QC, Canada 8–13 December 2014 Volume 2 2672 2680
- Li Y. Zhao H. Liu H. Lu S. Hou Y. Research on visual-tactile cross-modality based on generative adversarial network Cogn. Comput. Syst. 2021 3 131 141
- Isola P. Zhu J.Y. Zhou T. Efros A.A. Image-to-image translation with conditional adversarial networks Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Honolulu, HI, USA 21–26 July 2017 1125 1134
- Wang T.C. Liu M.Y. Zhu J.Y. Tao A. Kautz J. Catanzaro B. High-resolution image synthesis and semantic manipulation with conditional gans Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Salt Lake City, UT, USA 18–23 June 2018 8798 8807
- Zhu J.Y. Park T. Isola P. Efros A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks Proceedings of the IEEE International Conference on Computer Vision Venice, Italy 22–29 October 2017 2223 2232
- Kim T. Cha M. Kim H. Lee J.K. Kim J. Learning to discover cross-domain relations with generative adversarial networks Proceedings of the International Conference on Machine Learning Sydney, Australia 6–11 August 2017 PMLR Cambridge MA, USA 2017 1857 1865
- Park T. Liu M.Y. Wang T.C. Zhu J.Y. Semantic image synthesis with spatially-adaptive normalization Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Long Beach, CA, USA 15–20 June 2019 2337 2346
- Griffin D. Lim J. Signal estimation from modified short-time Fourier transform IEEE Trans. Acoust. Speech Signal Process. 1984 32 236 243
- De Vries H. Strub F. Mary J. Larochelle H. Pietquin O. Courville A.C. Modulating early visual processing by language Proceedings of the 31st International Conference on Neural Information Processing Systems Long Beach, CA, USA 4–9 December 2017 6597 6607
- Strese M. Schuwerk C. Iepure A. Steinbach E. Multimodal feature-based surface material classification IEEE Trans. Haptics 2016 10 226 239 10.1109/toh.2016.2625787
- Selvaraju R.R. Cogswell M. Das A. Vedantam R. Parikh D. Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization Proceedings of the IEEE International Conference on Computer Vision Venice, Italy 22–29 October 2017 618 626
- Kingma D.P. Welling M. Auto-encoding variational bayes arXiv 2013 1312.6114
- Vaswani A. Shazeer N. Parmar N. Uszkoreit J. Jones L. Gomez A.N. Kaiser Ł. Polosukhin I. Attention is all you need Proceedings of the 31st International Conference on Neural Information Processing Systems Long Beach, CA, USA 4–9 December 2017 6000 6010
- Van Den Oord A. Dieleman S. Zen H. Simonyan K. Vinyals O. Graves A. Kalchbrenner N. Senior A. Kavukcuoglu K. Wavenet: A generative model for raw audio arXiv 2016 10.48550/arXiv.1609.03499 1609.03499
- Verma P. Chafe C. A generative model for raw audio using transformer architectures Proceedings of the 2021 24th International Conference on Digital Audio Effects (DAFx) Vienna, Austria 8–10 September 2021 IEEE Piscataway, NJ, USA 2021 230 237
- Zhu H. Luo M.D. Wang R. Zheng A.H. He R. Deep audio-visual learning: A survey Int. J. Autom. Comput. 2021 18 351 376 10.1007/s11633-021-1293-0
- Sung-Bin K. Senocak A. Ha H. Owens A. Oh T.H. Sound to visual scene generation by audio-to-visual latent alignment Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Vancouver, BC, Canada 17–24 June 2023 6430 6440
- Ujitoko Y. Ban Y. Vibrotactile signal generation from texture images or attributes using generative adversarial network Proceedings of the Haptics: Science, Technology, and Applications: 11th International Conference, EuroHaptics 2018 Pisa, Italy 13–16 June 2018 Proceedings, Part II 11 Springer Berlin/Heidelberg, Germany 2018 25 36
- Ban Y. Ujitoko Y. TactGAN: Vibrotactile designing driven by GAN-based automatic generation Proceedings of the SIGGRAPH Asia 2018 Emerging Technologies Tokyo, Japan 4–7 December 2018 1 2 10.1145/3275476.3275484
- Cai S. Ban Y. Narumi T. Zhu K. FrictGAN: Frictional Signal Generation from Fabric Texture Images using Generative Adversarial Network Proceedings of the ICAT-EGVE Virtual 2–4 December 2020 11 15
- Cai S. Zhao L. Ban Y. Narumi T. Liu Y. Zhu K. GAN-based image-to-friction generation for tactile simulation of fabric material Comput. Graph. 2022 102 460 473
- Cai S. Zhu K. Ban Y. Narumi T. Visual-tactile cross-modal data generation using residue-fusion gan with feature-matching and perceptual losses IEEE Robot. Autom. Lett. 2021 6 7525 7532
- Xi Q. Wang F. Tao L. Zhang H. Jiang X. Wu J. CM-AVAE: Cross-Modal Adversarial Variational Autoencoder for Visual-to-Tactile Data Generation IEEE Robot. Autom. Lett. 2024 9 5214 5221
- Simonyan K. Zisserman A. Very deep convolutional networks for large-scale image recognition arXiv 2014 1409.1556
- Agatsuma S. Kurogi J. Saga S. Vasilache S. Takahashi S. Simple Generative Adversarial Network to Generate Three-axis Time-series Data for Vibrotactile Displays Proceedings of the International Conference on Advances in Computer-Human Interactions, ACHI 2020 Valencia, Spain 21–25 November 2020 19 24
- Rombach R. Blattmann A. Lorenz D. Esser P. Ommer B. High-resolution image synthesis with latent diffusion models Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition New Orleans, LA, USA 18–24 June 2022 10684 10695
- Sohl-Dickstein J. Weiss E. Maheswaranathan N. Ganguli S. Deep unsupervised learning using nonequilibrium thermodynamics Proceedings of the International Conference on Machine Learning Lille, France 7–9 July 2015 PMLR Cambridge MA, USA 2015 2256 2265
- Corvi R. Cozzolino D. Poggi G. Nagano K. Verdoliva L. Intriguing properties of synthetic images: From generative adversarial networks to diffusion models Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Vancouver, BC, Canada 17–24 June 2023 973 982
- Chen M. Mei S. Fan J. Wang M. Opportunities and challenges of diffusion models for generative AI Natl. Sci. Rev. 2024 11 nwae348 10.1093/nsr/nwae348
- Chen C. Ding H. Sisman B. Xu Y. Xie O. Yao B.Z. Tran S.D. Zeng B. Diffusion models for multi-task generative modeling Proceedings of the The Twelfth International Conference on Learning Representations Vienna, Austria 7–11 May 2024
- Lin X. Xu W. Mao Y. Wang J. Lv M. Liu L. Luo X. Li X. Vision-based Tactile Image Generation via Contact Condition-guided Diffusion Model arXiv 2024 2412.01639
- Gu C. Gromov M. Unpaired Image-To-Image Translation Using Transformer-Based CycleGAN Proceedings of the International Conference on Software Testing, Machine Learning and Complex Process Analysis Tomsk, Russia 25–27 November 2021 Springer Berlin/Heidelberg, Germany 2021 75 82
- Dubey S.R. Singh S.K. Transformer-based generative adversarial networks in computer vision: A comprehensive survey IEEE Trans. Artif. Intell. 2024 5 4851 4867 10.1109/tai.2024.3404910
- Dou Y. Yang F. Liu Y. Loquercio A. Owens A. Tactile-augmented radiance fields Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Seattle, WA, USA 16–22 June 2024 26529 26539
- Yang F. Zhang J. Owens A. Generating visual scenes from touch Proceedings of the IEEE/CVF International Conference on Computer Vision Paris, France 1–6 October 2023 22070 22080
- Jiang S. Zhao S. Fan Y. Yin P. GelFusion: Enhancing Robotic Manipulation under Visual Constraints via Visuotactile Fusion arXiv 2025 10.48550/arXiv.2505.07455 2505.07455
- Huang G. Liu Z. Van Der Maaten L. Weinberger K.Q. Densely connected convolutional networks Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Honolulu, HI, USA 21–26 July 2017 4700 4708
- Zhang M. Terui S. Makino Y. Shinoda H. TexSenseGAN: A User-Guided System for Optimizing Texture-Related Vibrotactile Feedback Using Generative Adversarial Network IEEE Trans. Haptics 2025 18 325 339
- pytorch.org. Instalation of Pytorch v1.12.1 Available online: https://pytorch.org/get-started/previous-versions/ (accessed on 1 September 2025)
- Kingma D.P. Adam: A method for stochastic optimization arXiv 2014 1412.6980
- Zheng W. Liu H. Wang B. Sun F. Cross-modal learning for material perception using deep extreme learning machine Int. J. Mach. Learn. Cybern. 2020 11 813 823
- Zhang R. Isola P. Efros A.A. Shechtman E. Wang O. The unreasonable effectiveness of deep features as a perceptual metric Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Salt Lake City, UT, USA 18–23 June 2018 586 595
- Yu Y. Zhang W. Deng Y. Frechet inception distance (fid) for evaluating gans China Univ. Min. Technol. Beijing Grad. Sch. 2021 3 1 7
- Krizhevsky A. Sutskever I. Hinton G.E. Imagenet classification with deep convolutional neural networks Proceedings of the 25th International Conference on Neural Information Processing Systems Lake Tahoe, NV, USA 3–6 December 2012 1097 1105
Issue
| Sensors, vol. 26, 2026, Switzerland, https://doi.org/10.3390/s26020426 |
|