[논문리뷰]Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

Paper Overview

CVPR'25

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

Generalized few-shot 3D point cloud segmentation (GFS-PCS) adapts models to new classes with few support samples while retaining base class segmentation. Existing GFS-PCS methods enhance prototypes via interacting with support or query features but remain

arxiv.org

Abstract

Generalized few-shot 3D point cloud segmentation (GFS-PCS)은 모델에 few support sample을 사용하여 new classes를 학습시키는 동시에 base class segmentation 성능도 유지하는 것을 목표로 한다.

기존의 연구는 support나 query features와 상호작용을 통해 prototypes을 강화해왔지만, 여전히 few-shot sample의 sparse knowledge에 의한 한계가 있다.

따라서 저자들은 3D VLMs으로부터 dense하지만 noisy한 pseudo-label을 사용하여 few-shot learning에 시너지를 내는 프레임워크를 제안한다. 이름은 GFS-VL라고 한다.

저자들은 prototype-guided pseudo-label selection을 제안하여 low-quality regions을 필터링한다.

이후 adaptive infilling strategy를 통해 pseudo-label context와 few-shot sample의 지식을 결합하여 필터링된 unlabeled areas에 적응적으로 레이블링을 한다.

추가로 저자들은 novel-base mix strategy를 설계하여 few-shot samples을 training scene에 embedding되도록 하여 필수적인 context를 보존하여 novel class 학습을 개선하였다.

프레임워크 외에도 저자들은 기존 benchmarks외에 2가지 benchmarks를 더 도입하여 포괄적인 일반화 평가가 가능하도록 했다.

Keywords

Few-Shot Learning, Few-Shot 3D Segmentation

Related Work

Generalized few-shot point cloud semantic segmenation

기존 few-shot은 inference에서 각 novel class에 대한 추가적인 support sample이 필요하고 base class없이 novel class에 대해서만 예측을 진행

Generalized Few-Shot Point Cloud Segmentation Via Geometric Words [paper]

Pseudo-Embedding for Generalized Few-Shot 3D Segmentation [paper]

3D VLMs

RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding [paper]

GFS-PCS Overview

Problem Definition

few-shot learning의 categories는 base $\mathbf{C}^{b}$, novel $\mathbf{C}^{n}$로 나뉜다.

각 그룹은 서로 겹치지 않는다.

$\mathbf{C}^{b}$에 대한 지식은 $\mathbf{D}^{base}$로 부터 학습하고

$\mathbf{C}^{n}$에 대한 지식은 $K$-shot $\mathbf{D}^{novel}$로 부터 학습한다.

이때 $\mathbf{D}^{novel}$의 support data의 background는 -1로 레이블링되었다.

아래 그림을 보면 $\mathbf{D}^{base}$에 있는 novel sample에는 레이블이 없는것을 알 수 있다.

좌측 base data, 우측 base label (검은색은 background)

좌측 novel support data, 우측 novel support label

New Evaluation Benchmarks

Method

1. Overview

GFS-VL 프레임워크는 위 그림과 같다.

backbone 모델은 $\mathbf{D}^{base}$을 사용하여 학습된다.

$\Phi$를 backbone이라 하고 base classifier를 $\mathcal{H}_{b}$라 하면 다음과 같이 base prediction을 구할 수 있다.

novel classes를 다루는 classifier를 $\mathcal{H}_{b}$라 하면 다음과 같이 concat하여 최종 prediction을 구할 수 있다.

따라서 모델이 novel categories에 대해 fine-tuning되었다고 여길 수 있다.

저자들은 GFS-VL을 사용하여 few-shot leanring의 부족한 지식을 3D VLMs으로 완화하고,

3D VLMs의 노이즈한 지식을 정확한 few-shot support set으로 정제하여 상호 보완적인 학습을 진행하고자 한다.

2. Pseudo-label Selection

3D VLMs을 학습에 사용하는 직접적인 방법은 VLMs의 예측을 novel classes의 pseudo-label로 사용하는 것이다.

raw prediction은 noisy하므로 저자들은 pseudo-label selection을 제안한다.

각 novel class에 대해 먼저 few-shot samples을 사용하여 support prototype을 계산한다.

이것은 3D VLM의 visision encoder $\Theta_{v}$를 사용하여 다음과 같이 구한다.

$\mathbf{p}^{c}$가 support prototype이다.

그다음 $\mathbf{D}^{base}$에 대해서 class names을 사용하여 3D VLM으로 base, novel 예측 \hat{\mathbf{Y}}을 얻는다.

이때 \hat{\mathbf{C}_{n}}을 \hat{\mathbf{Y}}에 있는 novel class indices라 한다.

그다음 각 novel class에 대해 prototype을 다음과 같이 계산한다.

이제 raw prediction을 다음과 같은 방법으로 high-quality novel class pseudo-labels을 만든다.

base class면 무조건 -1로 하고 novel class면 이전에 계산해 둔 $\mathbf{p}$를 사용하여 cosine similarity를 통해 임계값 $\tau$보다 작으면 -1로 한다.

이렇게 믿을 수 있는 pseudo label을 구한 다음, base label에서 -1부분에 pseudo-label을 할당한다.

식으로 쓰면 다음과 같다.

3. Adaptive Infilling

pseudo-label이 포함된 $\mathbf{Y}'_[b]$에는 여전히 unlabel 영역이 존재한다.

이로인해 true novel 영역 또는 부분적으로 놓친 novel 영역이 존재한다.

이를 보완하기 위해 저자들은 adaptive infilling approach를 제안한다.

이를 위해 먼저 $\mathbf{Y}'_{b}$로부터 novel class prototype을 다음과 같이 추출한다.

추출된 prototype과 먼저 계산해둔 $\mathbf{p}^{c}$가 support prototype를 사용하여, 다음과 같은 adaptive prototype set을 구성한다.

그다음 $\mathbf{Y}''_{b} = \mathbf{Y}'_{b}$로 초기화 하고, unlabeled point feature와 prototype $\mathbf{m}^{c}$와 cosine similarity를 계산하여 임계값 $\delta$를 사용하여 다음과같이 $\mathbf{Y}''_{b}$를 구성한다

4. Novel-Base Mix

support sample을 충분히 사요하기 위해, novel-base mix approach를 도입한다.

이를 위해 먼저 $\mathbf{D}_{novel}$로부터 랜덤으로 novel sample $\mathbf{X}_{n}^{c}$을 샘플링한다.

이후 novel class mask $\mathbf{Y}_{n}^{c}$를 기반으로 crop을 진행한다.

이후 XY 평면에서 코너를 구하여 해당부분에 두 장면을 합쳐지도록 했다.

Experiments

2. Experimental Results

3. Ablation Studies

Conclusion

This work introduces a GFS-PCS framework GFS-VL that synergizes dense but noisy pseudo-labels from 3D VLMs with accurate yet sparse few-shot samples, overcoming current GFS-PCS limitations in novel knowledge learning. GFS-VL utilizes prototype-guided pseudo-label selection to target high-quality regions and adaptive infilling to enrich pseudo-labels. Besides, the novel-base mix embeds few-shot samples into training scenes, preserving essential context for improved novel class learning. Identifying the limited diversity in current GFS-PCS evaluations, we introduce two benchmarks with broader, more diverse novel classes for more comprehensive generalization evaluation. GFS-VL achieves leading results and generalizes effectively across models and datasets, showing the potential of 3D VLMs in advancing GFS-PCS. We hope our method and benchmarks serve as a foundation for future research.

KHS Computer Vision