Autors: Neshev S., Tonchev K., Manolova, A. H., Poulkov, V. K. Title: 3D Scene Segmentation: A Comprehensive Survey and Open Problems Keywords: 3D indoor scene segmentation, 3D models, challenges, data acquisition, datasets, deep learning, future directions, performance metrics, segmentation modelsAbstract: This paper presents a detailed review of recent advancements in 3D indoor scene segmentation driven by deep learning techniques. It provides an overview of existing segmentation models, examines various data representations, data collection methods, augmentation techniques, and available datasets. A comparative analysis of loss functions and overview of evaluation metrics is conducted to highlight their impact on segmentation performance. Unlike previous surveys, this work introduces a new classification of data augmentation techniques and proposes two novel classification approaches for 3D instance and semantic segmentation. Furthermore, it unifies 3D semantic instance segmentation and 3D panoptic segmentation within an existing framework. The paper also identifies key challenges and open research directions, providing insights into future advancements in the field. References - S. M. Yasir and H. Ahn, ‘‘Deep learning-based 3D instance and semantic segmentation: A review,’’ 2024, arXiv:2406.13308.
- W. Zhou, K. Liu, W. Jin, Q. Wang, Y. She, Y. Yu, and C. Ma, ‘‘Advancements in deep learning for point cloud classification and segmentation: A comprehensive review,’’ Comput. Graph., vol. 130, Aug. 2025, Art. no. 104238.
- Y. Guo, H. Wang, Q. Hu, H. Liu, L. Liu, and M. Bennamoun, ‘‘Deep learning for 3D point clouds: A survey,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 12, pp. 4338–4364, Dec. 2021.
- Y. He, H. Yu, X. Liu, Z. Yang, W. Sun, S. Anwar, and A. Mian, ‘‘Deep learning based 3D segmentation: A survey,’’ 2021, arXiv:2103.05423.
- Y. Sun, X. Zhang, and Y. Miao, ‘‘A review of point cloud segmentation for understanding 3D indoor scenes,’’ Vis. Intell., vol. 2, no. 1, p. 14, Jun. 2024.
- S. Sarker, P. Sarker, G. Stone, R. Gorman, A. Tavakkoli, G. Bebis, and J. Sattarvand, ‘‘A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation,’’ Mach. Vis. Appl., vol. 35, no. 4, p. 67, Jul. 2024.
- A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, ‘‘ScanNet: Richly-annotated 3D reconstructions of indoor scenes,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2432–2443.
- I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese, ‘‘3D semantic parsing of large-scale indoor spaces,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 1534–1543.
- D. Rozenberszki, O. Litany, and A. Dai, ‘‘Language-grounded indoor 3D semantic segmentation in the wild,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), Jan. 2022, pp. 125–141.
- C. Yeshwanth, Y.-C. Liu, M. Nießner, and A. Dai, ‘‘ScanNet++: A high-fidelity dataset of 3D indoor scenes,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, pp. 12–22.
- J. Hou, A. Dai, and M. Nießner, ‘‘3D-SIS: 3D semantic instance segmentation of RGB-D scans,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4416–4425.
- R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, ‘‘PointNet: Deep learning on point sets for 3D classification and segmentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 77–85.
- K. Wu, F. Liu, Z. Cai, R. Yan, H. Wang, Y. Hu, Y. Duan, and K. Ma, ‘‘Unique3D: High-quality and efficient 3D mesh generation from a single image,’’ 2024, arXiv:2405.20343.
- F. Bernardini, J. Mittleman, H. Rushmeier, C. Silva, and G. Taubin, ‘‘The ball-pivoting algorithm for surface reconstruction,’’ IEEE Trans. Vis. Comput. Graphics, vol. 5, no. 4, pp. 349–359, Oct. 1999.
- J. De Loera, J. Rambau, and F. Santos, Triangulations: Structures for Algorithms and Applications, vol. 25. Cham, Switzerland: Springer, 2010.
- W. E. Lorensen and H. E. Cline, ‘‘Marching cubes: A high resolution 3D surface construction algorithm,’’ in Seminal Graphics: Pioneering Efforts That Shaped the Field. New York, NY, USA: Association for Computing Machinery, 1998, pp. 347–353.
- B. Graham and L. van der Maaten, ‘‘Submanifold sparse convolutional networks,’’ 2017, arXiv:1706.01307.
- S. Song, S. P. Lichtenberg, and J. Xiao, ‘‘SUN RGB-D: A RGB-D scene understanding benchmark suite,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 567–576.
- N. Silberman and R. Fergus, ‘‘Indoor scene segmentation using a structured light sensor,’’ in Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCV Workshops), Nov. 2011, pp. 601–608.
- N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, ‘‘Indoor segmentation and support inference from RGBD images,’’ in Proc. 12th Eur. Conf. Comput. Vis. Comput. Vis. (ECCV), Florence, Italy, Jan. 2012, pp. 746–760.
- A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, ‘‘Matterport3D: Learning from RGB-D data in indoor environments,’’ 2017, arXiv:1709.06158.
- M. A. Uy, Q.-H. Pham, B.-S. Hua, T. Nguyen, and S.-K. Yeung, ‘‘Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1588–1597.
- D. Z. Chen, A. X. Chang, and M. Nießner, ‘‘ScanRefer: 3D object localization in RGB-D scans using natural language,’’ in Proc. Eur. Conf. Comput. Vis., Dec. 2019, pp. 202–221.
- G. Baruch, Z. Chen, A. Dehghan, T. Dimry, Y. Feigin, P. Fu, T. Gebauer, B. Joffe, D. Kurz, A. Schwartz, and E. Shulman, ‘‘ARKitScenes: A diverse real-world dataset for 3D indoor scene understanding using mobile RGB-D data,’’ 2021, arXiv:2111.08897.
- B.-S. Hua, Q.-H. Pham, D. T. Nguyen, M.-K. Tran, L.-F. Yu, and S.-K. Yeung, ‘‘SceneNN: A scene meshes dataset with aNNotations,’’ in Proc. 4th Int. Conf. 3D Vis. (3DV), Oct. 2016, pp. 92–101.
- J. Straub, ‘‘The replica dataset: A digital replica of indoor spaces,’’ 2019, arXiv:1906.05797.
- S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser, ‘‘Semantic scene completion from a single depth image,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 190–198.
- A. Handa, V. Patraucean, S. Stent, and R. Cipolla, ‘‘SceneNet: An annotated model generator for indoor scene understanding,’’ in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2016, pp. 5737–5743.
- R. Royen, K. Pataridis, W. van der Tempel, and A. Munteanu, ‘‘RESSCAL3D++: Joint acquisition and semantic segmentation of 3D point clouds,’’ in Proc. IEEE Int. Conf. Image Process. (ICIP), Oct. 2024, pp. 3547–3553.
- J. McCormac, A. Handa, S. Leutenegger, and A. J. Davison, ‘‘SceneNet RGB-D: Can 5M synthetic images beat generic ImageNet pre-training on indoor segmentation?’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2678–2687.
- J. Wang, G. Zhang, and Z. You, ‘‘Design rules for dense and rapid Lissajous scanning,’’ Microsyst. Nanoeng., vol. 6, no. 1, p. 101, Nov. 2020.
- F. Engelmann, M. Bokeloh, A. Fathi, B. Leibe, and M. Nießner, ‘‘3D-MPA: Multi-proposal aggregation for 3D semantic instance segmentation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 9028–9037.
- J. Chibane, F. Engelmann, T. A. Tran, and G. Pons-Moll, ‘‘Box2Mask: Weakly supervised 3D semantic instance segmentation using bounding boxes,’’ in Proc. Eur. Conf. Comput. Vis., Jan. 2022, pp. 681–699.
- J. Schult, F. Engelmann, A. Hermans, O. Litany, S. Tang, and B. Leibe, ‘‘Mask3D: Mask transformer for 3D semantic instance segmentation,’’ in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2023, pp. 8216–8223.
- G. Qian, Y. Li, H. Peng, J. Mai, H. A. A. K. Hammoud, M. Elhoseiny, and B. Ghanem, ‘‘PointNeXt: Revisiting PointNet++ with improved training and scaling strategies,’’ in Proc. Adv. Neural Inf. Process. Syst., Jan. 2022, pp. 23192–23204.
- C. Shi, Y. Zhang, B. Yang, J. Tang, Y. Ma, and S. Yang, ‘‘Part2Object: Hierarchical unsupervised 3D instance segmentation,’’ in Proc. Eur. Conf. Comput. Vis., Sep. 2024, pp. 1–18.
- D. Robert, H. Raguet, and L. Landrieu, ‘‘Efficient 3D semantic segmentation with superpoint transformer,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, pp. 17149–17158.
- Y. Yin, Y. Liu, Y. Xiao, D. Cohen-Or, J. Huang, and B. Chen, ‘‘SAI3D: Segment any instance in 3D scenes,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2024, pp. 3292–3302.
- Y. Wu, Z. Pan, K. Wang, X. Li, J. Cui, L. Xiao, G. Lin, and Z. Cao, ‘‘Instance consistency regularization for semi-supervised 3D instance segmentation,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 12, pp. 9567–9582, Dec. 2024.
- H. Du, X. Yu, F. Hussain, M. A. Armin, L. Petersson, and W. Li, ‘‘Weakly-supervised point cloud instance segmentation with geometric priors,’’ in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2023, pp. 4260–4269.
- A. Thyagharajan, B. Ummenhofer, P. Laddha, O. J. Omer, and S. Subramoney, ‘‘Segment-fusion: Hierarchical context fusion for robust 3D semantic segmentation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 1226–1235.
- L. Han, T. Zheng, L. Xu, and L. Fang, ‘‘OccuSeg: Occupancy-aware 3D instance segmentation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 2937–2946.
- Z. Liu, X. Qi, and C.-W. Fu, ‘‘One thing one click: A self-training approach for weakly supervised 3D semantic segmentation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 1726–1736.
- A. A. M. Muzahid, W. Wanggen, F. Sohel, M. Bennamoun, L. Hou, and H. Ullah, ‘‘Progressive conditional GAN-based augmentation for 3D object recognition,’’ Neurocomputing, vol. 460, pp. 20–30, Oct. 2021.
- M. Yamaguchi, K. Higa, T. Hosoi, and T. Shibata, ‘‘Robust 3D semantic segmentation with incomplete point clouds based on sequential frame sampling,’’ in Proc. IEEE Int. Conf. Image Process. (ICIP), Oct. 2024, pp. 3526–3532.
- T.-J. Mu, M.-Y. Shen, Y.-K. Lai, and S.-M. Hu, ‘‘Learning virtual view selection for 3D scene semantic segmentation,’’ IEEE Trans. Image Process., vol. 33, pp. 4159–4172, 2024.
- A. Nekrasov, J. Schult, O. Litany, B. Leibe, and F. Engelmann, ‘‘Mix3D: Out-of-Context data augmentation for 3D scenes,’’ in Proc. Int. Conf. 3D Vis. (3DV), Dec. 2021, pp. 116–125.
- E. Gordon-Rodriguez, G. Loaiza-Ganem, G. Pleiss, and J. P. Cunningham, ‘‘Uses and abuses of the cross-entropy loss: Case studies in modern deep learning,’’ in Proc. 34th Conf. Neural Inf. Process. Syst. (NeurIPS), Nov. 2020, pp. 1–12.
- B. Liu, J. Dolz, A. Galdran, R. Kobbi, and I. B. Ayed, ‘‘Do we really need dice? The hidden region-size biases of segmentation losses,’’ Med. Image Anal., vol. 91, Jan. 2024, Art. no. 103015.
- N. Ravi and M. El-Sharkawy, ‘‘Addressing the gaps of IoU loss in 3D object detection with IIoU,’’ Future Internet, vol. 15, no. 12, p. 399, Dec. 2023.
- H. W. Kuhn, ‘‘The Hungarian method for the assignment problem,’’ Nav. Res. logistics Quart., vol. 2, nos. 1–2, pp. 83–97, Mar. 1955.
- M. Berman, A. R. Triki, and M. B. Blaschko, ‘‘The Lovasz-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 4413–4421.
- P. Yun, L. Tai, Y. Wang, C. Liu, and M. Liu, ‘‘Focal loss in 3D object detection,’’ IEEE Robot. Autom. Lett., vol. 4, no. 2, pp. 1263–1270, Apr. 2019.
- Z. Leng, M. Tan, C. Liu, E. Dogus Cubuk, X. Shi, S. Cheng, and D. Anguelov, ‘‘PolyLoss: A polynomial expansion perspective of classification loss functions,’’ 2022, arXiv:2204.12511.
- F. Wang and H. Liu, ‘‘Understanding the behaviour of contrastive loss,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 2495–2504.
- B. De Brabandere, D. Neven, and L. Van Gool, ‘‘Semantic instance segmentation with a discriminative loss function,’’ 2017, arXiv:1708.02551.
- J. S. Li and Q. Dong, ‘‘LASS3D: Language-assisted semi-supervised 3D semantic segmentation with progressive unreliable data exploitation,’’ in Proc. Eur. Conf. Comput. Vis., Oct. 2024, pp. 252–269.
- X. Wu, L. Jiang, P.-S. Wang, Z. Liu, X. Liu, Y. Qiao, W. Ouyang, T. He, and H. Zhao, ‘‘Point transformer v3: Simpler, faster, stronger,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2024, pp. 4840–4851.
- G. Zhang, D. Zhu, W. Shi, J. Li, and X. Zhang, ‘‘SemRegionNet: Region ensemble 3D semantic instance segmentation network with semantic spatial aware discriminative loss,’’ Neurocomputing, vol. 513, pp. 247–260, Nov. 2022.
- X. Wang, S. Liu, X. Shen, C. Shen, and J. Jia, ‘‘Associatively segmenting instances and semantics in point clouds,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4096–4105.
- H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, ‘‘Generalized intersection over union: A metric and a loss for bounding box regression,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 658–666.
- L. Zhao and W. Tao, ‘‘JSNet: Joint instance and semantic segmentation of 3D point clouds,’’ in Proc. AAAI Conf. Artif. Intell., Apr. 2020, vol. 34, no. 7, pp. 12951–12958.
- T. Vu, K. Kim, T. M. Luu, T. Nguyen, and C. D. Yoo, ‘‘SoftGroup for 3D instance segmentation on point clouds,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 2708–2717.
- M. E. A. Boudjoghra, S. K. Al Khatib, J. Lahoud, H. Cholakkal, R. M. Anwer, S. Khan, and F. S. Khan, ‘‘3D indoor instance segmentation in an open-world,’’ in Proc. Adv. Neural Inf. Process. Syst., Jan. 2023, pp. 3666122–3667895.
- N. Silberman, D. Sontag, and R. Fergus, ‘‘Instance segmentation of indoor scenes using a coverage loss,’’ in Proc. 13th Eur. Conf. Comput. Vis. (ECCV), Zurich, Switzerland, Jan. 2014, pp. 616–631.
- D. Miller, L. Nicholson, F. Dayoub, and N. Sunderhauf, ‘‘Dropout sampling for robust object detection in open-set conditions,’’ in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2018, pp. 3243–3249.
- A. Dhamija, M. Gunther, J. Ventura, and T. Boult, ‘‘The overlooked elephant of object detection: Open set,’’ in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis., Mar. 2020, pp. 1021–1030.
- A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollár, ‘‘Panoptic segmentation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 9396–9405.
- P. Xiang, X. Wen, Y.-S. Liu, H. Zhang, Y. Fang, and Z. Han, ‘‘Retro-FPN: Retrospective feature pyramid network for point cloud semantic segmentation,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, pp. 17780–17792.
- T. Han, Y. Chen, J. Ma, X. Liu, W. Zhang, X. Zhang, and H. Wang, ‘‘Point cloud semantic segmentation with adaptive spatial structure graph transformer,’’ Int. J. Appl. Earth Observ. Geoinf., vol. 133, Sep. 2024, Art. no. 104105.
- Z. Zhang, B. Yang, B. Wang, and B. Li, ‘‘GrowSP: Unsupervised semantic segmentation of 3D point clouds,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 17619–17629.
- B. Peng, X. Wu, L. Jiang, Y. Chen, H. Zhao, Z. Tian, and J. Jia, ‘‘OA-CNNs: Omni-adaptive sparse CNNs for 3D semantic segmentation,’’ 2024, arXiv:2403.14418.
- H. Qiu, B. Yu, Y. Chen, and D. Tao, ‘‘PointHR: Exploring high-resolution architectures for 3D point cloud segmentation,’’ 2023, arXiv:2310.07743.
- S. Qiu, S. Anwar, and N. Barnes, ‘‘Dense-resolution network for point cloud classification and segmentation,’’ in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2021, pp. 3812–3821.
- K. T. Wijaya, D.-H. Paek, and S.-H. Kong, ‘‘Advanced feature learning on point clouds using multi-resolution features and learnable pooling,’’ Remote Sens., vol. 16, no. 11, p. 1835, May 2024.
- Y. WAN and L. FANG, ‘‘Joint 2D and 3D semantic segmentation with consistent instance semantic,’’ IEICE Trans. Fundamentals Electron., Commun. Comput. Sci., vol. E107.A, no. 8, pp. 1309–1318, 2024.
- R. Ding, J. Yang, C. Xue, W. Zhang, S. Bai, and X. Qi, ‘‘Lowis3D: language-driven open-world instance-level 3D scene understanding,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 12, pp. 8517–8533, Dec. 2024.
- L. Luo, J. Lu, X. Chen, K. Zhang, and J. Zhou, ‘‘LSGRNet: Local spatial latent geometric relation learning network for 3D point cloud semantic segmentation,’’ Comput. Graph., vol. 124, Nov. 2024, Art. no. 104053.
- M. Xu, J. Zhang, Z. Zhou, M. Xu, X. Qi, and Y. Qiao, ‘‘Learning geometry-disentangled representation for complementary understanding of 3D object point cloud,’’ in Proc. AAAI Conf. Artif. Intell., May 2021, vol. 35, no. 4, pp. 3056–3064.
- J. Gao, J. Lan, B. Wang, and F. Li, ‘‘SDANet: Spatial deep attention-based for point cloud classification and segmentation,’’ Mach. Learn., vol. 111, no. 4, pp. 1327–1348, Apr. 2022.
- J. Liu, Z. Yu, T. P. Breckon, and H. P. H. Shum, ‘‘U3DS3: Unsupervised 3D semantic scene segmentation,’’ in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2024, pp. 3747–3756.
- Z. An, G. Sun, Y. Liu, F. Liu, Z. Wu, D. Wang, L. Van Gool, and S. Belongie, ‘‘Rethinking few-shot 3D point cloud semantic segmentation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2024, pp. 3996–4006.
- Y. Wu, Z. Yan, S. Cai, G. Li, X. Han, and S. Cui, ‘‘PointMatch: A consistency training framework for weakly supervised semantic segmentation of 3D point clouds,’’ Comput. Graph., vol. 116, pp. 427–436, Nov. 2023.
- Y. Zhang, Z. Li, Y. Xie, Y. Qu, C. Li, and T. Mei, ‘‘Weakly supervised semantic segmentation for large-scale point cloud,’’ in Proc. AAAI Conf. Artif. Intell., May 2021, vol. 35, no. 4, pp. 3421–3429.
- Q. Hu, B. Yang, G. Fang, Y. Guo, A. Leonardis, N. Trigoni, and A. Markham, ‘‘SQN: Weakly-supervised semantic segmentation of large-scale 3D point clouds,’’ in Proc. Eur. Conf. Comput. Vis., Jan. 2022, pp. 600–619.
- H. Kweon and K.-J. Yoon, ‘‘Joint learning of 2D-3D weakly supervised semantic segmentation,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 35, Dec. 2022, pp. 30499–30511.
- C. R. Qi, Y. Li, H. Su, and L. Guibas, ‘‘PointNet++: Deep hierarchical feature learning on point sets in a metric space,’’ in Proc. Adv. Neural Inf. Process. Syst., Jan. 2017, pp. 5099–5108.
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, ‘‘Rethinking the inception architecture for computer vision,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2818–2826.
- I. Loshchilov and F. Hutter, ‘‘Decoupled weight decay regularization,’’ 2017, arXiv:1711.05101.
- E. Fix, Discriminatory Analysis: Nonparametric Discrimination, Consistency Properties, vol. 1. Wright-Patterson Air Force Base in Dayton, OH, USA: USAF School of Aviation Medicine, 1985.
- W. Roh, H. Jung, G. Nam, J. Yeom, H. Park, S. H. Yoon, and S. Kim, ‘‘Edge-aware 3D instance segmentation network with intelligent semantic prior,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2024, pp. 20644–20653.
- M. Kolodiazhnyi, A. Vorontsova, A. Konushin, and D. Rukhovich, ‘‘Top-down beats bottom-up in 3D instance segmentation,’’ in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2024, pp. 3566–3574.
- S. Al Khatib, M. El Amine Boudjoghra, J. Lahoud, and F. S. Khan, ‘‘3D instance segmentation via enhanced spatial and semantic supervision,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, pp. 541–550.
- D. Wang, J. Liu, H. Gong, Y. Quan, and D. Wang, ‘‘CompetitorFormer: Competitor transformer for 3D instance segmentation,’’ 2024, arXiv:2411.14179.
- Z. Liang, Z. Li, S. Xu, M. Tan, and K. Jia, ‘‘Instance segmentation in 3D scenes using semantic superpoint tree networks,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 2763–2772.
- X. Lai, Y. Yuan, R. Chu, Y. Chen, H. Hu, and J. Jia, ‘‘Mask-attention-free transformer for 3D instance segmentation,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, pp. 3693–3703.
- J. Sun, C. Qing, J. Tan, and X. Xu, ‘‘Superpoint transformer for 3D scene instance segmentation,’’ in Proc. AAAI Conf. Artif. Intell., Jun. 2023, vol. 37, no. 2, pp. 2393–2401.
- T. Vu, K. Kim, T. M. Luu, T. Nguyen, J. Kim, and C. D. Yoo, ‘‘Scalable SoftGroup for 3D instance segmentation on point clouds,’’ 2022, arXiv:2209.08263.
- D. Rozenberszki, O. Litany, and A. Dai, ‘‘UnScene3D: Unsupervised 3D instance segmentation for indoor scenes,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2024, pp. 19957–19967.
- Q. Yu, H. Du, C. Liu, and X. Yu, ‘‘When 3D bounding-box meets SAM: Point cloud instance segmentation with Weak-and-Noisy supervision,’’ in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2024, pp. 3707–3716.
- M. E. A. Boudjoghra, A. Dai, J. Lahoud, H. Cholakkal, R. M. Anwer, S. Khan, and F. S. Khan, ‘‘Open-YOLO 3D: Towards fast and accurate open-vocabulary 3D instance segmentation,’’ 2024, arXiv:2406.02548.
- A. Takmaz, E. Fedele, R. W. Sumner, M. Pollefeys, F. Tombari, and F. Engelmann, ‘‘OpenMask3D: Open-vocabulary 3D instance segmentation,’’ 2023, arXiv:2306.13631.
- Z. Qian, Y. Ma, J. Ji, and X. Sun, ‘‘X-RefSeg3D: Enhancing referring 3D instance segmentation via structured cross-modal graph neural networks,’’ in Proc. AAAI Conf. Artif. Intell., Mar. 2024, vol. 38, no. 5, pp. 4551–4559.
- P. Nguyen, M. Luu, A. Tran, C. Pham, and K. Nguyen, ‘‘Any3DIS: Class-agnostic 3D instance segmentation by 2D mask tracking,’’ 2024, arXiv:2411.16183.
- X. Yang, X. Gu, X. Yin, and X. Gao, ‘‘SA3DIP: Segment any 3D instance with potential 3D priors,’’ 2024, arXiv:2411.03819.
- V. Thengane, J. Lahoud, H. Cholakkal, R. M. Anwer, L. Yin, X. Zhu, and S. Khan, ‘‘CLIMB-3D: Continual learning for imbalanced 3D instance segmentation,’’ 2025, arXiv:2502.17429.
- A. Takmaz, A. Delitzas, R. W. Sumner, F. Engelmann, J. Wald, and F. Tombari, ‘‘Search3D: Hierarchical open-vocabulary 3D segmentation,’’ IEEE Robot. Autom. Lett., vol. 10, no. 3, pp. 2558–2565, Mar. 2025.
- L. Zhao, S. Chen, X. Tang, and W. Tao, ‘‘DualGroup for 3D instance and panoptic segmentation,’’ Pattern Recognit. Lett., vol. 185, pp. 124–129, Sep. 2024.
- J. Shi and J. Malik, ‘‘Normalized cuts and image segmentation,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905, Jan. 2000.
- Y. Xian, B. Schiele, and Z. Akata, ‘‘Zero-shot learning—The good, the bad and the Ugly,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 3077–3086.
- W. Xu, C. Shi, S. Tu, X. Zhou, D. Liang, and X. Bai, ‘‘A unified framework for 3D scene understanding,’’ 2024, arXiv:2407.03263.
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, ‘‘The cityscapes dataset for semantic urban scene understanding,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 3213–3223.
- J. Dai, K. He, and J. Sun, ‘‘Instance-aware semantic segmentation via multi-task network cascades,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 3150–3158.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, ‘‘Microsoft COCO: Common objects in context,’’ in Proc. 13th Eur. Conf. Comput. Vis. (ECCV), Zurich, Switzerland, Jan. 2014, pp. 740–755.
- O. Unal, C. Sakaridis, and L. Van Gool, ‘‘Bayesian self-training for semi-supervised 3D segmentation,’’ in Proc. Eur. Conf. Comput. Vis., Oct. 2024, pp. 89–107.
- J. Tan, L. Chen, K. Wang, J. Li, and X. Zhang, ‘‘SASO: Joint 3D semantic-instance segmentation via multi-scale semantic association and salient point clustering optimization,’’ IET Comput. Vis., vol. 15, no. 5, pp. 366–379, Aug. 2021.
- S. Li, ‘‘In-place panoptic radiance field segmentation with perceptual prior for 3D scene understanding,’’ 2024, arXiv:2410.04529.
- D. Wu, Z. Yan, and H. Zha, ‘‘PanoRecon: Real-time panoptic 3D reconstruction from monocular video,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2024, pp. 21507–21518.
- X. Yu, Y. Liu, C. Han, S. Mao, S. Zhou, R. Xiong, Y. Liao, and Y. Wang, ‘‘PanopticRecon: Leverage open-vocabulary instance segmentation for zero-shot panoptic reconstruction,’’ 2024, arXiv:2407.01349.
- D. Robert, H. Raguet, and L. Landrieu, ‘‘Scalable 3D panoptic segmentation as superpoint graph clustering,’’ in Proc. Int. Conf. 3D Vis. (3DV), Mar. 2024, pp. 179–189.
- S.-C. Wu, J. Wald, K. Tateno, N. Navab, and F. Tombari, ‘‘SceneGraphFusion: Incremental 3D scene graph prediction from RGB-D sequences,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 7511–7521.
- Z. Zhou, Y. Ma, J. Fan, S. Zhang, F. Jing, and M. Tan, ‘‘EPRecon: An efficient framework for real-time panoptic 3D reconstruction from monocular video,’’ 2024, arXiv:2409.01807.
- M. Kolodiazhnyi, A. Vorontsova, A. Konushin, and D. Rukhovich, ‘‘OneFormer3D: One transformer for unified point cloud segmentation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2024, pp. 20943–20953.
- X. Wen, Z. Han, G. Youk, and Y.-S. Liu, ‘‘CF-SIS: Semantic-instance segmentation of 3D point clouds by context fusion with self-attention,’’ in Proc. 28th ACM Int. Conf. Multimedia, Oct. 2020, pp. 1661–1669.
- L. Li, J. Chen, X. Su, H. Han, and C. Fan, ‘‘Deep learning network for indoor point cloud semantic segmentation with transferability,’’ Autom. Construct., vol. 168, Dec. 2024, Art. no. 105806.
- L. Zhang, Z. Wei, Z. Xiao, A. Ji, and B. Wu, ‘‘Dual hierarchical attention-enhanced transfer learning for semantic segmentation of point clouds in building scene understanding,’’ Autom. Construct., vol. 168, Dec. 2024, Art. no. 105799.
- T.-Y. Chen, L. Yang, T.-Y. Chuang, and S.-H. Lai, ‘‘CACE: Sim-toreal indoor 3D semantic segmentation via context-aware augmentation and consistency enforcement,’’ in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), Feb. 2025, pp. 8356–8367.
- X. Wu, R. Wang, and X. Chen, ‘‘Data-efficient 3D instance segmentation by transferring knowledge from synthetic scans,’’ Pattern Recognit. Lett., vol. 179, pp. 151–157, Mar. 2024.
- X. Wu, Z. Tian, X. Wen, B. Peng, X. Liu, K. Yu, and H. Zhao, ‘‘Towards large-scale 3D representation learning with multi-dataset point prompt training,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2024, pp. 19551–19562.
- Y. Siddiqui, L. Porzi, S. R. Buló, N. Müller, M. Nießner, A. Dai, and P. Kontschieder, ‘‘Panoptic lifting for 3D scene understanding with neural fields,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 9043–9052.
- X. Yu, Y. Xie, Y. Liu, H. Lu, R. Xiong, Y. Liao, and Y. Wang, ‘‘Leverage cross-attention for end-to-end open-vocabulary panoptic reconstruction,’’ 2025, arXiv:2501.01119.
- H. Zhai, H. Li, Z. Li, X. Pan, Y. He, and G. Zhang, ‘‘PanoGS: Gaussian-based panoptic segmentation for 3D open vocabulary scene understanding,’’ 2025, arXiv:2503.18107.
- R. Royen and A. Munteanu, ‘‘RESSCAL3D: Resolution scalable 3D semantic segmentation of point clouds,’’ in Proc. IEEE Int. Conf. Image Process. (ICIP), Oct. 2023, pp. 2775–2779.
Issue
| IEEE Access, vol. 13, pp. 110457-110496, 2025, United States, https://doi.org/10.1109/ACCESS.2025.3583136 |
Copyright Institute of Electrical and Electronics Engineers Inc. |