[논문리뷰] Mitigating the Hubness Problem for Zero-Shot Learning of 3D Objects

Paper Overview

BMVC'19

https://arxiv.org/abs/1907.06371

Mitigating the Hubness Problem for Zero-Shot Learning of 3D Objects

The development of advanced 3D sensors has enabled many objects to be captured in the wild at a large scale, and a 3D object recognition system may therefore encounter many objects for which the system has received no training. Zero-Shot Learning (ZSL) app

arxiv.org

Abstract

3D point cloud Zero-Shot Learning에서 hubness problem은 2D보다 강하게 나타난다.

이것의 한 이유로 2D vision에서는 ImageNet과 같은 큰 데이터로 pre-train된 모델을 사용할 수 있지만, 3D는 이러한 큰 데이터셋으로 학습된 모델이 없기 때문에 낮은 품질의 feature만을 추출할 수 있다는 것이다.

이를 해결하기 위해, 저자들은 hubness problem을 완화하는 한 loss를 제안한다.

Keywords

Zero-Shot Learning, Point Clouds Zero-Shot Classification, Hubness Problem

Introduction

2D 모델은 large-scale 데이터로 pre-trained backbone을 만들기 때문에 새로운 feature들을 잘 클러스터링 할 수 있다.

그러나 3D의 경우는 pre-trained backbone을 만들때 적은 데이터 만을 학습하기 때문에 hubness problem이 강하게 나타난다.

hubness problem은 아래 그림의 (a)처럼 대부분의 test 인스턴트들이 몇개 class에 편향된 예측을 할때 발생한다.

저자들은 3D point cloud ZSL에 대한 hubness problem에 대해 조사하고 이 문제를 완화하는 loss를 제안한다.

이 loss는 unsupervised 방식으로 각 training batch마다 계산되는데 각 배치내의 seen class 예측의 수를 센다.

이것은 hubness의 양을 추정하는데 사용되고, 이 hubness의 척도를 줄이기 위해 loss는 각 배치 내의 skewness를 최소화되도록 만든다.

skewness는 분포가 한쪽으로 치우친 정도를 뜻한다.

The Hubness Problem

hubness problem은 nearest neighbor (NN) search와 연관된 차원의 저주와 관련이 있다.

ZSL에서, hubness problem은 두가지 이유때문에 발생한다.

먼저, input과 semantic feature가 고차원 공간에 있기 때문이고

두번째로, ZSL에서 hubness를 유발하는 ridge regression을 많이 사용하기 때문이다.

따라서 모델의 예측이 몇개의 class에 편향되는 현상이 발생하게 된다.

이 hubness의 척도를 구하기 위해, 보통 empirical distribution $p_j$의 skewness가 사용된다.

분포 $p_j$는 $i$번째 샘플이(prototype)이 $j$ nearest neighbor ($p_{j}(i)$)에 있는 횟수를 센다.

따라서 이 분포의 skewness는 다음과 같이 정의된다.

$n$은 test prototype의 수다.

skewness가 크다는 것은 hubness problem에 영향을 받는다는 뜻이다.

($p_j$가 누적확률 분포이고 (1) 식의 분자항이 $i$가 1~$n$이므로

(1)식은 사실상 각 클래스 $j$의 히스토그램에 히스토그램 평균을 뺀 뒤 3제곱 하고 더한다고 볼 수 있다.)

저자들은 2D와 3D에서 다음과 같은 hubness를 관측했다. (a)가 2D이고 (b)가 3D이다.

Proposed Method

4.1 Problem Formulation

$n$개의 3차원 point clouds는 $\mathcal{X} = \left\{x_{1}, ..., x_{n} \right\}$와 같이 정의한다.

seen class set은 $t^{s} \in \mathcal{T}^s$, unseen class set은 $t^{u} \in \mathcal{T}^u$라 한다. (두 집합은 접점이 없다)

class에 대응되는 semantic representation은 $\mathbf{e}^{s} \in \varepsilon^{s}$, $\mathbf{e}^{u} \in \varepsilon^{u}$라 한다.

seen set은 다음과 같이 정의한다.

$\mathcal{D}^{s} = \left\{ (\mathcal{X}_{i}^{s}, t_{i}^{s}, \mathbf{e}_{i}^{s}) : i = 1, ..., n_{s} \right\}$

unseen set도 비슷하게 정의한다.

$\mathcal{D}^{u} = \left\{ (\mathcal{X}_{i}^{u}, t_{i}^{u}, \mathbf{e}_{i}^{u}) : i = 1, ..., n_{u} \right\}$

여기서 test를 unseen으로만 진행할 경우 zero-shot이 되고, unseen과 seen 둘다 진행할 경우 generalized zero-shot learning이 된다.

4.2 Training

저자들의 모델 구조는 다음과 같다.

좌측 branch에서, feature vector $\kappa(\mathcal{X}) \in \mathbb{R}^{m}$는 point clouds network로부터 추출된다.

우측 branch에서, semantic feature vector $\mathbf{e} \in \mathbb{R}^{d}$는 point cloud feature space에 매핑된다.

(projection network $\upsilon$)

최종 loss는 다음과 같다.

$L_S$는 supervised distance loss다.

$L_U$는 unsupervised skewness loss다.

Supervised distance loss

supervised distance loss $L_S$는 seen의 label을 사용하여 semantic vector가 point cloud feature vector에 align되도록 한다.

$N$은 batch내의 sample수, $\kappa(\mathcal{X}_{i}^{s}) \in \mathbb{R}^{m}$는 point cloud feature, $W$는 $\upsilon(\cdot )$의 가중치를 나타내고, $\lambda$는 regularization term의 가중치를 나타낸다.

Unsupervised skewness loss

배치 내에 $i$번째 instance의 예측은 다음과 같이 구할 수 있다.

$\mathbf{e}(t)$는 label에 대응되는 semantic vector다.

배치 내에서 모든 intance에 대해, label 예측을 진행하고 predicted label set을 $\hat{\mathcal{T}}^{s} = \left\{ \hat{t}_{1}^{s}, ... \hat{t}_{N}^{s} \right\}$을 정의한다.

그다음 이 $\hat{\mathcal{T}}^{s}$로부터 출현도(frequency)를 계산하는데, 히스토그램 함수 $\mathcal{H}( \hat{t}_{i}^{s})$을 사용한다.

$\mathcal{H}$은 특정 seen class로 예측된 횟수를 센다.

따라서 $\mathcal{H}( \hat{t}_{i}^{s})$의 값을 모두 더하면 $N$이 된다.

skewness loss는 다음과 같이 정의한다.

위 loss를 구현하는 관점은 다음과 같다.

먼저 모델의 seen label을 예측하면[$s$]크기의 one-hot 결과를 얻을 수 있을 것이다.

이때 배치 내의 모든 instance의 예측을 더하면 히스토그램이 된다.

이제 히스토그램의 평균을 구하면 $\mathbf{E}[\mathcal{H}( \hat{t}_{i}^{s})]$이 된다.

이제 loss식에 따라 계산하면 된다.

위 loss로 학습되면 배치 내의 예측 분포가 uniform distribution을 띄게 된다.

즉, 특정 class에 집중적으로 예측하는 현상을 막을 수 있다.

(그러나 이러한 학습을 진행하면 seen class의 예측 성능 자체를 떨어뜨릴 수 있는 위험이 크다 생각한다.)

argmax를 학습에 사용하기 위해서는 F.gumbel_softmax(hard=True)를 사용하면 된다.

def loss(self, visual_fea, semantic_fea):
    Cos = torch.matmul(F.normalize(visual_fea, dim=-1), F.normalize(semantic_fea, dim=-1).T)
    arg_max = F.gumbel_softmax(Cos, tau=1.0, hard=True, dim=-1)
    H = torch.sum(arg_max, dim=0)
    H_mean = torch.mean(H)
    H_var = torch.var(H)
    
    a = torch.pow(H - H_mean, 3)
    a_sum = torch.sum(a)
    loss = a_sum/(self.num_seen_classes*torch.pow(H_var, 1.5))
    return loss

그러나 Unsupervised Skewness Loss를 학습에 사용할 경우 모델 성능이 떨어지는 것을 확인했다.

배치 단위 학습을 진행할때 일반적으로 데이터셋에서 랜덤으로 데이터를 샘플링하는데 이 랜덤성을 무시하고 output 분포를 uniform하게 만든다는 접근법은 쉽게 받아들이기 어렵다.

4.3 Inference

학습된 모델을 사용하여 unseen semantic representation을 통해 point feature를 만든 다음 식(6)과 같이 cosine 유사도를 계산하여 label을 할당한다.

GZSL에 대해서 모델은 seen에 대해서 학습했으므로 seen class에 대한 예측 확률이 더 높을 수 밖에 없다.

따라서 seen class에 대한 예측 확률을 $\beta$만큼 강제로 낮춘다.

$II$는 indicate함수로 조건이 참인 index는 1 아니면 0으로 만드는 함수다.

Experiments

1. Setup

Dataset

ModelNet40, ModelNet10, McGill, SHREC2015

이때 seen unseen은 다음 표와 같이 분리한다.

backbone은 ModelNet의 seen class의 training 데이터만으로 pre-training한다.

Implementation Details

5.2 Overall Results

ZSL Results, GZSL results

5.3 Ablation study

5.4 Experiments Beyond 3D

Conclusion

With the aid of better 3D capture systems, obtaining 3D point cloud data of objects at a very large scale has become more feasible than before. However, 3D point cloud recognition systems have not scaled up to handle this large scale scenario. To readjust such a system with newly available data that have not observed during training, we apply a zero-shot learning approach to facilitate classification of previously unseen input. Similar to ZSL on 2D images, we notice that such classification of 3D point clouds suffers from the hubness problem. Moreover, the hubness problem in 3D is more severe than that observed in the 2D case. One possible reason could be that the 3D features are not trained on millions of 3D instances in the same way that 2D convolutional networks can be. In this paper, we attempt to reduce the effect of the hubness problem while performing ZSL on 3D point cloud objects by proposing a novel loss. In addition, we report results on Generalized ZSL in conjunction with ZSL. Rigorous experiments on both 3D point clouds and 2D image datasets show significant improvement in performance over the current state-of-the-art methods.

'Zero-Shot Learning > 3D Classification' 카테고리의 다른 글

[논문리뷰] Zero-Shot Learning on 3D Point Cloud Objects and Beyond (0)	2024.04.16
[논문리뷰] Transductive Zero-Shot Learning for 3D Point Cloud Classification (1)	2024.03.12

KHS Computer Vision

[논문리뷰] Mitigating the Hubness Problem for Zero-Shot Learning of 3D Objects

Paper Overview

Introduction

The Hubness Problem