SlowR50-SA: A Self-Attention Enhanced Dynamic Facial Expression Recognition Model for Tactile Internet Applications

Neshov, N. N.; Christoff, N. V.; Sechkova T.; Tonchev K.; Manolova, A. H.

Autors: Neshov, N. N., Christoff, N. V., Sechkova T., Tonchev K., Manolova, A. H.
Title: SlowR50-SA: A Self-Attention Enhanced Dynamic Facial Expression Recognition Model for Tactile Internet Applications
Keywords: deep learning, DFEW, emotion recognition, facial expression, FERV39K, self-attention, SlowFast networks, SlowR50, Tactile Internet

Abstract: Emotion recognition from facial expressions is a challenging task due to the subtle and nuanced nature of facial expressions. Within the framework of Tactile Internet (TI), the integration of this technology has the capacity to completely transform real-time user interactions, by delivering customized emotional input. The influence of this technology is far-reaching, as it may be used in immersive virtual reality interactions and remote tele-care applications to identify emotional states in patients. In this paper, a novel emotion recognition algorithm is presented that integrates a Self-Attention (SA) module into the SlowR50 backbone (SlowR50-SA). The experiments on the DFEW and FERV39K datasets demonstrate that the proposed model achieves good performance in terms of both Unweighted Average Recall (UAR) and Weighted Average Recall (WAR) metrics, achieving a UAR (WAR) of 57.09% (69.87%) on the DFEW dataset, and UAR (WAR) of 39.48% (49.34%) on the FERV39K dataset. Notably, SlowR50-SA operates with only eight frames of input at low temporal resolution, highlighting its efficiency. Furthermore, the algorithm has the potential to be integrated into Tactile Internet applications, where it can be used to enhance the user experience by providing real-time emotion feedback. SlowR50-SA can also be used to enhance virtual reality experiences by providing personalized haptic feedback based on the user’s emotional state. It can also be used in remote tele-care applications to detect signs of stress, anxiety, or depression in patients.

References

Fanibhare V. Sarkar N.I. Al-Anbuky A. A survey of the tactile internet: Design issues and challenges, applications, and future directions Electronics 2021 10 2171 10.3390/electronics10172171
Holland O. Steinbach E. Prasad R.V. Liu Q. Dawy Z. Aijaz A. Pappas N. Chandra K. Rao V.S. Oteafy S. et al. The IEEE 1918.1 “tactile internet” standards working group and its standards Proc. IEEE 2019 107 256 279 10.1109/JPROC.2018.2885541
Oteafy S.M. Hassanein H.S. Leveraging tactile internet cognizance and operation via IoT and edge technologies Proc. IEEE 2018 107 364 375 10.1109/JPROC.2018.2873577
Ali-Yahiya T. Monnet W. The Tactile Internet John Wiley & Sons Hoboken, NJ, USA 2022
Xu M. Ng W.C. Lim W.Y.B. Kang J. Xiong Z. Niyato D. Yang Q. Shen X.S. Miao C. A full dive into realizing the edge-enabled metaverse: Visions, enabling technologies, and challenges IEEE Commun. Surv. Tutor. 2022 25 656 700 10.1109/COMST.2022.3221119
Rasouli F. A Framework for Prediction in a Fog-Based Tactile Internet Architecture for Remote Phobia Treatment Ph.D. Thesis Concordia University Montreal, QC, Canada 2020
Van Den Berg D. Glans R. De Koning D. Kuipers F.A. Lugtenburg J. Polachan K. Venkata P.T. Singh C. Turkovic B. Van Wijk B. Challenges in haptic communications over the tactile internet IEEE Access 2017 5 23502 23518 10.1109/ACCESS.2017.2764181
Tychola K.A. Voulgaridis K. Lagkas T. Tactile IoT and 5G & beyond schemes as key enabling technologies for the future metaverse Telecommun. Syst. 2023 84 363 385
Amer I.M. Oteafy S.M. Hassanein H.S. Affective Communication of Sensorimotor Emotion Synthesis over URLLC Proceedings of the 2023 IEEE 48th Conference on Local Computer Networks (LCN) Daytona Beach, FL, USA 2–5 October 2023 1 4
Dar S.A. International conference on digital libraries (ICDL)-2016 Report, Teri, New Delhi Libr. Tech News 2017 34 8 10.1108/LHTN-02-2017-0006
Akinyoade A.J. Eluwole O.T. The internet of things: Definition, tactile-oriented vision, challenges and future research directions Proceedings of the Third International Congress on Information and Communication Technology: ICICT 2018, London Springer Berlin/Heidelberg, Germany 2018 639 653
Gupta M. Jha R.K. Jain S. Tactile based intelligence touch technology in IoT configured WCN in B5G/6G-A survey IEEE Access 2022 11 30639 30689 10.1109/ACCESS.2022.3148473
Steinbach E. Strese M. Eid M. Liu X. Bhardwaj A. Liu Q. Al-Ja’afreh M. Mahmoodi T. Hassen R. El Saddik A. et al. Haptic codecs for the tactile internet Proc. IEEE 2018 107 447 470 10.1109/JPROC.2018.2867835
Alja’Afreh M. A QoE Model for Digital Twin Systems in the Era of the Tactile Internet Ph.D. Thesis Université d’Ottawa/University of Ottawa Ottawa, ON, Canada 2021
Shamim Hossain M. Muhammad G. Al-Qurishi M. Masud M. Almogren A. Abdul W. Alamri A. Cloud-oriented emotion feedback-based Exergames framework Multimed. Tools Appl. 2018 77 21861 21877 10.1007/s11042-017-4621-1
Liu Y. Wang W. Feng C. Zhang H. Chen Z. Zhan Y. Expression snippet transformer for robust video-based facial expression recognition Pattern Recognit. 2023 138 109368 10.1016/j.patcog.2023.109368
Zhao Z. Liu Q. Former-dfer: Dynamic facial expression recognition transformer Proceedings of the 29th ACM International Conference on Multimedia Virtual 20–24 October 2021 1553 1561
Lee B. Shin H. Ku B. Ko H. Frame Level Emotion Guided Dynamic Facial Expression Recognition With Emotion Grouping Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Vancouver, BC, Canada 17–24 June 2023 5680 5690
Li H. Sui M. Zhu Z. Nr-dfernet: Noise-robust network for dynamic facial expression recognition arXiv 2022 2206.04975
Wang H. Li B. Wu S. Shen S. Liu F. Ding S. Zhou A. Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Vancouver, BC, Canada 17–24 June 2023 17958 17968
Ma F. Sun B. Li S. Spatio-temporal transformer for dynamic facial expression recognition in the wild arXiv 2022 2205.04749
Li H. Niu H. Zhu Z. Zhao F. Intensity-aware loss for dynamic facial expression recognition in the wild Proc. Aaai Conf. Artif. Intell. 2023 37 67 75 10.1609/aaai.v37i1.25077
Feichtenhofer C. Fan H. Malik J. He K. Slowfast networks for video recognition Proceedings of the IEEE/CVF International Conference on Computer Vision Long Beach, CA, USA 15–20 June 2019 6202 6211
Pytorch.org, Instalation of Pytorch v1.12.1 Available online: https://pytorch.org/get-started/previous-versions/ (accessed on 25 March 2024)
Jiang X. Zong Y. Zheng W. Tang C. Xia W. Lu C. Liu J. Dfew: A large-scale database for recognizing dynamic facial expressions in the wild Proceedings of the 28th ACM International Conference on Multimedia Seattle, WA, USA 12–16 October 2020 2881 2889
Wang Y. Sun Y. Huang Y. Liu Z. Gao S. Zhang W. Ge W. Zhang W. Ferv39k: A large-scale multi-scene dataset for facial expression recognition in videos Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition New Orleans, LA, USA 18–24 June 2022 20922 20931
Awesome Dynamic Facial Expression Recognition Available online: https://github.com/zengqunzhao/Awesome-Dynamic-Facial-Expression-Recognition (accessed on 25 March 2024)
Tran D. Wang H. Torresani L. Ray J. LeCun Y. Paluri M. A closer look at spatiotemporal convolutions for action recognition Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Salt Lake City, UT, USA 18–23 June 2018 6450 6459
Tran D. Bourdev L. Fergus R. Torresani L. Paluri M. Learning spatiotemporal features with 3d convolutional networks Proceedings of the IEEE International Conference on Computer Vision Santiago, Chile 7–13 December 2015 4489 4497
Carreira J. Zisserman A. Quo vadis, action recognition? A new model and the kinetics dataset Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Honolulu, HI, USA 21–26 June 2017 6299 6308
Qiu Z. Yao T. Mei T. Learning spatio-temporal representation with pseudo-3D residual networks Proceedings of the IEEE International Conference on Computer Vision Venice, Italy 22–29 October 2017 5533 5541
Hara K. Kataoka H. Satoh Y. Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? Proceedings of the IEEE conference on Computer Vision and Pattern Recognition Salt Lake City, UT, USA 18–23 June 2018 6546 6555
He K. Zhang X. Ren S. Sun J. Deep residual learning for image recognition Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Las Vegas, NV, USA 27–30 June 2016 770 778
Hochreiter S. Schmidhuber J. Long Short-Term Memory Neural Comput. 1997 9 1735 1780 10.1162/neco.1997.9.8.1735
Dosovitskiy A. Beyer L. Kolesnikov A. Weissenborn D. Zhai X. Unterthiner T. Dehghani M. Minderer M. Heigold G. Gelly S. et al. An image is worth 16x16 words: Transformers for image recognition at scale arXiv 2020 2010.11929
Van der Maaten L. Hinton G. Visualizing data using t-SNE J. Mach. Learn. Res. 2008 9 2579 2605
Gildenblat J. Contributors. PyTorch Library for CAM Methods 2021 Available online: https://github.com/jacobgil/pytorch-grad-cam (accessed on 12 April 2024)

Issue

Electronics (Switzerland), vol. 13, 2024, Switzerland, https://doi.org/10.3390/electronics13091606

Цитирания (Citation/s):
1. Gueuret, T., Sellami, A., Djeraba, C. A survey on Graph Deep Representation Learning for Facial Expression Recognition (2024) Proceedings - International Workshop on Content-Based Multimedia Indexing - 2024 - в издания, индексирани в Scopus и/или Web of Science

Вид: статия в списание, публикация в издание с импакт фактор, публикация в реферирано издание, индексирана в Scopus и Web of Science

Е-Публикации
Технически университет - София

Детайли за публикация от базата данни на ТУ - София (Publication Details)