Autoencoder Architectures for Low-Rate Sparse Point Cloud Geometry Coding

Bozhilov, I. B.; Petkova, R. R.; Tonchev K.; Manolova, A. H.; Poulkov, V. K.; Vincent Poor H.

Autors: Bozhilov, I. B., Petkova, R. R., Tonchev K., Manolova, A. H., Poulkov, V. K., Vincent Poor H.
Title: Autoencoder Architectures for Low-Rate Sparse Point Cloud Geometry Coding
Keywords: Autoencoder, coding, compression, machine learning, point cloud, source coding

Abstract: Efficient compression of sparse point cloud geometry remains a critical challenge in 3D content processing, particularly for low-rate scenarios where conventional codecs struggle to maintain efficiency. This work proposes a system-level framework for low-rate sparse geometry coding, leveraging three variants of autoencoder-based deep learning architectures: folding-based, graph convolution-based, and transformer-based. The proposed framework operates at the object level and utilizes semantic segmentation to structure scenes into interpretable intermediate representations. Two novel autoencoder architectures—a Graph Autoencoder and a Transformer Graph Autoencoder—are proposed and evaluated alongside a FoldingNet baseline. These models are benchmarked on both synthetic and real-world datasets and compared against standard codecs (G-PCC and Draco) as well as a state-of-the-art learning-based codec (CRCIR). Experimental results indicate that the proposed architectures consistently outperform both traditional and recent learning-based methods, particularly at low bitrates. High robustness to latent noise is also observed, enabling applicability in various transmission scenarios. The proposed Transformer Graph Autoencoder architecture achieves the best overall rate-distortion performance and exhibits strong generalization to unseen data.

References

E. Calvanese Strinati, S. Barbarossa, J. L. Gonzalez-Jimenez, D. Ktenas, N. Cassiau, L. Maret, and C. Dehos, ‘‘6G: The next frontier: From holographic messaging to artificial intelligence using subterahertz and visible light communication,’’ IEEE Veh. Technol. Mag., vol. 14, no. 3, pp. 42–50, Sep. 2019.
I. F. Akyildiz and H. Guo, ‘‘Holographic-type communication: A new challenge for the next decade,’’ ITU J. Future Evolving Technol., vol. 3, no. 2, pp. 421–442, 2022.
I. Bozhilov, R. Petkova, K. Tonchev, A. Manolova, and V. Poulkov, ‘‘HOLOTWIN: A modular and interoperable approach to holographic telepresence system development,’’ Sensors, vol. 23, no. 21, p. 8692, Oct. 2023.
A. M. Aslam, R. Chaudhary, A. Bhardwaj, I. Budhiraja, N. Kumar, and S. Zeadally, ‘‘Metaverse for 6G and beyond: The next revolution and deployment challenges,’’ IEEE Internet Things Mag., vol. 6, no. 1, pp. 32–39, Mar. 2023.
I. Bozhilov, R. Petkova, K. Tonchev, and A. Manolova, ‘‘A systematic survey into compression algorithms for three-dimensional content,’’ IEEE Access, vol. 12, pp. 141604–141624, 2024.
Y. Yang, C. Feng, Y. Shen, and D. Tian, ‘‘FoldingNet: Point cloud autoencoder via deep grid deformation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 206–215.
S. Chen, C. Duan, Y. Yang, D. Li, C. Feng, and D. Tian, ‘‘Deep unsupervised learning of 3D point clouds via graph topology inference and filtering,’’ IEEE Trans. Image Process., vol. 29, pp. 3183–3198, 2020.
X. Ma, Q. Yin, X. Zhang, and L. Tang, ‘‘Foldingnet-based geometry compression of point cloud with multi descriptions,’’ in Proc. IEEE Int. Conf. Multimedia Expo Workshops (ICMEW), Jul. 2022, pp. 1–6.
A. Ranjan, T. Bolkart, S. Sanyal, and M. J. Black, ‘‘Generating 3D faces using convolutional mesh autoencoders,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 704–720.
I. Bozhilov, K. Tonchev, A. Manolova, and R. Petkova, ‘‘3D human body models compression and decompression algorithm based on graph convolutional networks for holographic communication,’’ in Proc. 25th Int. Symp. Wireless Pers. Multimedia Commun. (WPMC), Oct. 2022, pp. 532–537.
Y. Zhang, J. Lin, R. Li, K. Jia, and L. Zhang, ‘‘Point-DAE: Denoising autoencoders for self-supervised point cloud learning,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 9, pp. 1–15, Sep. 2025.
W. Shen, B. Zhang, H. Xu, X. Li, and J. Wu, ‘‘Multi-space point geometry compression with progressive relation-aware transformer,’’ IEEE Trans. Multimedia, vol. 26, pp. 8969–8980, 2024.
G-PCC Codec Description V9, Standard ISO/IEC JTC1/SC29/WG7, ISO/IEC JTC1/SC29/WG7 N0011, 2020.
Google. (2024). Draco: 3D Data Compression. [Online]. Available: https://google.github.io/draco/
H. Xu, X. Zhang, and X. Wu, ‘‘Fast point cloud geometry compression with context-based residual coding and INR-based refinement,’’ in Proc. Eur. Conf. Comput. Vis., 2024, pp. 270–288.
V-PCC Codec Description, Standard ISO/IEC JTC1/SC29/WG7, 2020.
C. Zhang, M. Liu, W. Huang, Y. Xu, Y. Xu, and D. He, ‘‘Deep joint source-channel coding for wireless point cloud transmission,’’ in Proc. ICASSP - IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Apr. 2025, pp. 1–5.
S. Xie, Q. Yang, Y. Sun, T. Han, Z. Yang, and Z. Shi, ‘‘Semantic communication for efficient point cloud transmission,’’ in Proc. IEEE Global Commun. Conf., Dec. 2024, pp. 2948–2953.
C. Bian, Y. Shao, and D. Gündüz, ‘‘Wireless point cloud transmission,’’ in Proc. IEEE 25th Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC), Sep. 2024, pp. 851–855.
S. Ibuki, T. Okamoto, T. Fujihashi, T. Koike-Akino, and T. Watanabe, ‘‘Rateless deep joint source channel coding for 3D point cloud,’’ IEEE Access, vol. 13, pp. 39585–39599, 2025.
T. Fujihashi, T. Koike-Akino, T. Watanabe, and P. V. Orlik, ‘‘HoloCast+: Hybrid digital-analog transmission for graceful point cloud delivery with graph Fourier transform,’’ IEEE Trans. Multimedia, vol. 24, pp. 2179–2191, 2022.
S. Ueno, T. Fujihashi, T. Koike-Akino, and T. Watanabe, ‘‘Point cloud soft multicast for untethered XR users,’’ IEEE Trans. Multimedia, vol. 25, pp. 7185–7195, 2023.
J. Xu, B. Ai, W. Chen, A. Yang, P. Sun, and M. Rodrigues, ‘‘Wireless image transmission using deep source channel coding with attention modules,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 4, pp. 2315–2328, Apr. 2022.
D. B. Kurka and D. Gündüz, ‘‘Bandwidth-agile image transmission with deep joint source-channel coding,’’ IEEE Trans. Wireless Commun., vol. 20, no. 12, pp. 8081–8095, Dec. 2021.
M. Yang, C. Bian, and H.-S. Kim, ‘‘Deep joint source channel coding for wireless image transmission with OFDM,’’ in Proc. IEEE Int. Conf. Commun., Jun. 2021, pp. 1–6.
M. Yang, C. Bian, and H.-S. Kim, ‘‘OFDM-guided deep joint source channel coding for wireless multipath fading channels,’’ IEEE Trans. Cognit. Commun. Netw., vol. 8, no. 2, pp. 584–599, Jun. 2022.
S. Inokuma, Y. Sasaki, D. Hisano, Y. Nakayama, and K. Maruta, ‘‘Performance evaluation of MIMO transmission in deep joint source-channel coding,’’ in Proc. IEEE 99th Veh. Technol. Conf. (VTC-Spring), Jun. 2024, pp. 1–5.
N. Frank, D. Lazzarotto, and T. Ebrahimi, ‘‘Latent space slicing for enhanced entropy modeling in learning-based point cloud geometry compression,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2022, pp. 4878–4882.
J. Wang, D. Ding, Z. Li, X. Feng, C. Cao, and Z. Ma, ‘‘Sparse tensor-based multiscale representation for point cloud geometry compression,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 7, pp. 1–18, Jul. 2022.
Y. Zhu, Y. Huang, X. Qiao, Z. Tan, B. Bai, H. Ma, and S. Dustdar, ‘‘A semantic-aware transmission with adaptive control scheme for volumetric video service,’’ IEEE Trans. Multimedia, vol. 25, pp. 7160–7172, 2023.
Y. He, X. Ren, D. Tang, Y. Zhang, X. Xue, and Y. Fu, ‘‘Density-preserving deep point cloud compression,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 2323–2332.
L. Wiesmann, A. Milioto, X. Chen, C. Stachniss, and J. Behley, ‘‘Deep compression for dense point cloud maps,’’ IEEE Robot. Autom. Lett., vol. 6, no. 2, pp. 2060–2067, Apr. 2021.
J. Pang, D. Li, and D. Tian, ‘‘TearingNet: Point cloud autoencoder to learn topology-friendly representations,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 7449–7458.
Carsten Bormann and Paul E. Hoffman, Concise Binary Object Representation (CBOR), Standard RFC 8949, Dec. 2020.
L. Peter Deutsch, DEFLATE Compressed Data Format Specification Version 1.3, Standard RFC 1951, IETF, May 1996.
R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, ‘‘PointNet: Deep learning on point sets for 3D classification and segmentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 77–85.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2025, pp. 5998–6008.
E. Perez, F. Strub, H. D. Vries, V. Dumoulin, and A. Courville, ‘‘FiLM: Visual reasoning with a general conditioning layer,’’ in Proc. AAAI Conf. Artif. Intell., vol. 32, 2018, pp. 3942–3951.
W. Liu, Y. Zhang, X. Li, Z. Yu, B. Dai, T. Zhao, and L. Song, ‘‘Deep hyperspherical learning,’’ in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 3953–3963.
A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, ‘‘ShapeNet: An information-rich 3D model repository,’’ 2015, arXiv:1512.03012.
M. Savva, A. X. Chang, and P. Hanrahan, ‘‘Semantically-enriched 3D models for common-sense knowledge,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2015, pp. 24–31.
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, ‘‘Matterport3D: Learning from RGB-D data in indoor environments,’’ 2017, arXiv:1709.06158.
B.-S. Hua, Q.-H. Pham, D. T. Nguyen, M.-K. Tran, L.-F. Yu, and S.-K. Yeung, ‘‘SceneNN: A scene meshes dataset with aNNotations,’’ in Proc. 4th Int. Conf. 3D Vis. (3DV), Oct. 2016, pp. 92–101.
N. Barman, M. G. Martini, and Y. Reznik, ‘‘Bjøntegaard delta (BD): A tutorial overview of the metric, evolution, challenges, and recommendations,’’ 2024, arXiv:2401.04039.
M. V. Valkenburg, Reference Data for Engineers: Radio, Electronics, Computers and Communications. Boston, MA, USA: Newnes, 2001.
J. L. Gailly and M. Adler. (1995). Zlib Compression Library. [Online]. Available: https://zlib.net/
J. Prazeres, M. Pereira, and A. M. G. Pinheiro, ‘‘Quality evaluation of point cloud compression techniques,’’ Signal Process., Image Commun., vol. 128, Oct. 2024, Art. no. 117156.
H. Xu, X. Zhang, and X. Wu. (2025). Fast Point Cloud Geometry Compression With Context-Based Residual Coding and INR-Based Refinement. [Online]. Available: https://github.com/hxu160/CRCIRforPCGC

Issue

IEEE Access, vol. 13, pp. 214122-214140, 2025, United States, https://doi.org/10.1109/ACCESS.2025.3646031

Вид: статия в списание, публикация в издание с импакт фактор, публикация в реферирано издание, индексирана в Scopus и Web of Science

Е-Публикации
Технически университет - София

Детайли за публикация от базата данни на ТУ - София (Publication Details)