Autors: Christoff, N. V., Tonchev, K., Neshov, N. N., Manolova, A. H., Poulkov, V. K.
Title: Audio-Driven 3D Talking Face for Realistic Holographic Mixed-Reality Telepresence
Keywords: 3D; holographic telepresence; talking face

Abstract: Machines' ability to effectively understand human speech based on visual input is crucial for efficient communication. However, distinguishing between the semantics of speech and the facial appearance poses a challenge. This article presents a taxonomy of 3D talking human face methods, categorizing them into GAN-based, NeRF-based, and DLNN-based approaches. The evolution of mixed-reality telepresence now focuses on developing talking 3D faces that synthesize natural human faces in response to text or audio inputs. Audio-video datasets aid in training algorithms across different languages and enabling speech recognition. Addressing audio data noise is vital for robust performance, utilizing techniques like integrating DeepSpeech and adding noise. Latency optimization enhances the user experience, and careful technique selection reduces latency levels. Quantitative and qualitative evaluation methods measure synchronization, face quality, and performance comparison.

References

    Issue

    in Proceedings of IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom), Istanbul, Turkiye, 04-07 July 2023, pp. 220-225, 2023, Turkey, Institute of Electrical and Electronics Engineers Inc., DOI 10.1109/BlackSeaCom58138.2023.10299781

    Copyright IEEE

    Вид: публикация в международен форум, публикация в реферирано издание, индексирана в Scopus