PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

이 논문은 PointNet의 후속 논문이다.

기존 Pointnet을 활용하여 local structure capture능력을 더 끌어올렸다.

NeurIPS'17

Few prior works study deep learning on point sets. PointNet by Qi et al. is a pioneer in this direction. However, by design PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-graine

arxiv.org

Abstract

PointNet의 설계 때문에 point가 있는 metric space에 의해

유도된 local structure을 capture하지 못한다.

따라서 저자들은 PointNet을 재귀적으로 input point set의 포개진 partitioning에

적용하는 계층적 신경망을 소개한다.

metric space distance를 이요함으로써 network는

증가하는 contextual scale에 대한 local feature를 학습할 수 있다.

point set을 일반적인 방법인 다양한 밀도로 샘플링 하는것은

uniform 밀도에 학습된 network의 성능을 크게 하락시킨다.

따라서, 저자들은 다양한 scale로부터 적응적으로 feature를 결합하는

novel set learning layer를 제안한다.

이 network를 PointNet++이라고 한다.

Introduction

PointNet의 기본적인 아이디어는 각 point의 공간적 encoding을 학습하고

모든 개별 point feature를 global point cloud signature에 모으는 것이다.

이러한 설계로 인해, PointNet은 metric으로 유도된 local structure을 capture하지 못한다.

그러나, local structure을 이용하는 것은 CNN의 성공에 대한 중요한 근거다.

(CNN도 이미지의 geometric feature을 추출하도록 설계됐다.)

(따라서 local feature를 살려야 된다.)

저자들은 계층적 유행(fashion)에서 metric space에서 sampling된

point의 집합을 처리하는 PointNet++이라는 계층적 신경망을 도입한다.

먼저 point set을 근본적인 space의 distance metric에 의해

overlapping local region으로 분할한다.

CNN과 비슷하게, 저자들은 small neighborhood로부터

미세한 geometirc structure을 capture하는 local feature를 추출한다.

각 local feature은 나아가 larger unit으로 그룹화되고

high level feature로 처리된다.

이 과정은 전체 point set의 feature를 얻을때 까지 반복된다.

PointNet++은 2가지 문제를 다룬다.

1. 어떻게 point set의 partitioning을 만들것인가.

2. 어떻게 point set이나 local feature learner를 통한

local feature를 합칠 것인가.

위 두 문제는 point set을 partitioning하는 것이 partition을 가로지르는

공통된 구조를 처리해야 하기 때문에 상호연관성 있다.

따라서 local feature learner의 가중치는 공유되어야 한다. (CNN처럼)

저자들은 local feature learner를 PointNet으로 선택한다.

PointNet은 local point나 feature를 hight level 표현으로 융합하는데

그래서 PointNet++은 PointNet의 재귀적으로 사용하는 모델이라고 할 수 있다.

각 partition은 Euclidean space에 있는 neighborhood ball로 정의된다.

전체 set을 커버하기 위해, farshest point sampling (FPS) 알고리즘을 사용하여

input point set 사이에서 중심을 선택한다.

그러나, local neighborhood ball의 근사 scale을 결정하는 것은

아주 흥미로운 도전적인 문제다.

이유는 input point set의 feature cale과 non-uniformity의 복잡한 관계 때문이다.

저자들은 input point set이 다른 공간에서 다양한 밀도를 가질 것이라고 가정한다.

따라서 input point set은 CNN input과 매우 다르다고 할 수 있다.

(CNN input은 규격화 되어 있음)

local partition scale의 대응점은 CNN의 kernel size다.

VGG-16논문에 따르면 CNN kernel size가 작을수록 성능 개선에 도움을 준다고 한다.

그러나, small neighborhood는 PointNet을 불안정하게 만든다.

(단순히 작은 neighborhood는 성능을 떨어뜨린다는 뜻)

본 논문의 상당한 기여는 다음과 같다.

PointNet++이 안정성과 디테일 capture를 모두 달성하기 위해

다양한 scale에서 neighborhood를 이용한다는 것이다.

training에서 랜덤 input dropout의 도움으로,

network는 적응적으로 다른 scale에 대한 패턴 가중치를 학습하고

input scale feature에 따른 multi-scale feature를 결합한다.

2. Problem Statement

$\mathcal{X} = (M, d)$는 discrete metric space고

여기서 metirc은 Euclidean space $\mathbb{R}^{n}$으로 부터 내재돼있고

$M \subseteq \mathbb{R}^{n}$은 point set,

$d$는 distance metric이다.

저자들은 input으로 $\mathcal{X}$와 같은 것을 가지고,

$\mathcal{X}$와 연관된 semantic interest의 정보를 만드는

learning set func. $f$에 관심이 있다.

$f$는 classification func. 또는 segmentation func.다.

3. Method

3.1 Review of PointNet: A Universal Continuous Set Function Approximator

point set $\left\{ x_{1}, x_{2}, ...,x_{n}\right\}$ with $x_{i} \in \mathbb{R}^{d}$이 주어지면,

point set을 vector로 mapping하는

set function $f : \mathcal{X} \rightarrow \mathbb{R}$를 다음과 같이 정의한다.

$\gamma$와 $h$는 MLP network다.

식 (1)의 set func. $f$는 PointNet 논문에서 언급한것처럼

input 순열에 불변하고 어느 함수든 근사가 가능하다.

(심층 신경망 자체가 원래 모든 함수를 근사할 수 있음.)

3.2 Hierarchical Point Set Feature Learning

PointNet은 전체 point set을 모으기 위해 single max pooling operation을 사용하는 반면,

PointNet++은 point의 계층적 grouping을 하고

점진적으로 계층 사이에서 점점 크게 local region을 요약한다.

본 네트워크의 구조는 위와 같다.

각 elvel에서, point set은 fewer element가 있는 new set을 만들기 위해

처리되고 요약된다.

set abstraction lel은 3가지 key layer로 구성된다.

1. Sampling layer

input point로부터 point set을 선택하고 local region의 중심을 정의.

2. Grouping layer

중심을 주위로 "neighboring" point를 찾음으로써 local region set을 구축.

3. PointNet layer

local region을 feature vector로 인코딩 하기 위해 mini-PointNet 사용.

abstraction level은 input으로 $N \times ( d + C)$ 행렬을 받는다.

$N$은 point 수, $d$는 좌표의 차원, $C$는 point-feature의 차원

output으로 $N' \times (d + C')$ 행렬을 출력한다.
$N'$은 샘플링된 point 수, $d$는 좌표의 차원, $C'$는 local context를 요약한 feature vector 차원

Sampling Layer

$\left\{ x_{1}, x_{2}, ...,x_{n}\right\}$이 주어지면 point의 subset $\left\{ x_{i1}, x_{i2}, ...,x_{im}\right\}$을 선택하기 위해
farthest point sampling (FPS)를 반복적으로 사용한다.
$x_{ij}$는 $\left\{ x_{i1}, x_{i2}, ...,x_{ij-1}\right\}$으로 부터 가장 멀리 떨어진 point다.
CNN은 data 분포에 불변한 vector space를 scan하는 반면,
저자들의 sampling 방식은 데이터의 의존하는 recptive field를 만든다.

Grouping Layer

$N \times (d + C)$사이즈 point set과 $N' \times d$사이즈 centroid set을 입력으로 받는다.
output은 $N' \times K \times (d + C)$사이즈 point set의 group이고,
각 그룹은 local region과 관련있고, $K$는 중심점 이웃에 있는 point의 수다.
$K$는 group마다 다르지만 PointNet layer는 point의 다양한 수를
고정된 길이의 local region feature vector로 변환할 수 있다.

Ball query는 query point의 반지름 안에 있는 모든 point를 찾는다. ($K$개 이하)
KNN과 비교하여, ball query's local neighborhood는 고정된 region scale을 보장한다.
따라서, local region feature가 더 일반화가 가능하고
이것은 local pattern recognition을 필요로 하는 task (segmentation)에 더 선호된다.

PointNet layer

input은 data size $N' \times K \times (d + C)$에 대한 $N'$ point local region이다.
output에 있는 local region은 centroid와 centroid의 이웃을 인코딩한 local feature에 의해
요약(abstract)되고 사이즈는 $N' \times (d + C)$

local region에 있는 point 좌표는 먼저 centroid point $x_{i}^{(j)} = x_{i}^{(j)} - \widehat{x}^{(j)}$와 관련된
local frame으로 변환한다.
$i = 1, 2, ..., K$고 $j = 1, 2, ..., d$이고 $\widehat{x}$는 중심 좌표다.
basic building block으로 PointNet을 사용하여 local pattern learning을 한다.
point feature와 연관된 좌표를 함께 사용함으로써,
local region에서 point-to-point relation을 capture할 수 있다.

3.3 Robust Feature Learning under Non-Uniform Sampling Density

point set이 서로다른 non-uniform 밀도에서 추출된 것이라는 사실은 자명하다.
각 non-uniformity는 point set feature learning에 상당한 문제점을 가져온다.
밀도높은 data에서 학습한 모델은 sparse한 sample에 대해서 일반화 성능이 떨어진다.
sparse point cloud에 대해 학습된 모델은 fine-grained local structure 인식을 잘 하지못한다.

이상적으로, 저자들은 밀도있게 sampling된 region에서 가장 미세한 detail을 capture하기 위해
가능한 정확하게(closely) point set을 검사하고 싶다.
그러나, 각 close 검사는 저밀도 영역에서는 제한된다.
왜냐하면 local pattern은 sampling 자체의 결함으로 인해 오염되기 때문이다.
그래서 이 저밀도 영역에서는 가장 인접한 영역에서 큰 scale region을 찾아야 한다.

이를 위해, 저자들은 input sampling density 변화가 있을 때,
서로다른 scale의 region으로 부터 feature를 결합하도록 학습하는
density adaptive PointNet layer를 제안한다.
이 density adpative PointNet을 PointNet++이라고 한다.

PointNet++에서, 각 abstraction level은 local pattern의 다중 scale로 추출되고
local point 밀도에 따라 지능적(intelligently)으로 그것들을 결합한다.
저자들은 아래와 같은 2가지 density adaptive layer를 제안한다.

Multi-scale grouping (MSG)

위의 그림 (a)에서 볼 수 있듯이, 다중 scale 패턴을 capture하는 간단/효과적인 방법은

각 scale의 feature를 추출하는 PointNet에 따라 다양한 scale의 grouping layer를 적용하는 것이다.

서로다른 scale의 feature는 multi-scale 형태로 concatenate된다. (위 그림참조)

muti-scale feature를 결합하는 전략을 최적화 하도록 network는 학습한다.

각 instance를 랜덤 확률로 input point를 drop out한다. 이것을 random input dropout이라 한다.

training point set에서 dropout ratio $\theta$는

$\left [ 0, p \right ]$,$0p \leq 1$로 부터 평등하게 선택한다.

저자들은 $\theta = 0.95$로 하여 empty point set을 피했다.

따라서 network는 $\theta$에 따라 다양한 밀도로 학습하고,

dropout에 따라 다양한 uniformity로 학습한다.

Multi-resolutioon grouping (MRG)

하지만 MSG는 계산비용이 너무 높다는 문제가 있다.

그래서 그 대안으로 계산비용은 줄이면서 point 분포의 특성에 대한 정보를

적응적으로 모으는 능력을 보존하는 접근법을 제안한다.

위의 그림 (b)에서 볼 수 있듯이, level $L_{i}$에서 region의 feature는

2가지 vector가 concatenate된다.

한 vector (그림 b의 왼쪽)는 set abstraction level을 사용한 낮은 level $L_{i-1}$로부터

각 subregion에서 feature를 요약함으로써 얻는다.

다른 vector (그림 b의 오른쪽)는 single PointNet을 사용하여 local region에 있는

모든 raw point를 다이렉트로 처리함으로써 얻는다.

만약 local region의 밀도가 낮으면, first vector는 신뢰도가 떨어진다.

왜냐하면, subregion이 희소한 point를 포함할 뿐만 아니라

sampling 결함에 더 피해를 입기 때문이다.

이때는 second vector의 weight를 높여야 한다.

밀도가 높을때는 반대다.

3.4. Point Feature Propagation for Set Segmentation

set abstraction layer에서 original point는 샘플링 되었다.

그러나 segmentation task에서는 모든 point의 label이 필요하다.

한가지 방법은 모든 point을 버리지 않고 사용하는 것인데 이건 불가능하다.

또 다른 방법은 샘플링된 point에서 original point로 feature를 전파(propagate)하면 된다.

저자들은 보간법에 기반을 둔 거리와 skip link와 함께 기반 계층적 전파 전략을 사용한다.

feature propagation level에서,

저자들은 $N_{l} \times (d + C)$ point를 $N_{l-1}$으로 전파한다.

$N_{l-1}$과 $N_{l}$ ($N_{l} \leq N_{l-1}$)은

set abstraction level $l$의 input, output size다.

저자들은 $N_{l-1}$ point 좌표에서 $N_{l}$ point의 feature value $f$를 보간함으로써

feature propagation을 달성한다.

그다음 concatenated feature는 "unit pointnet"을 통과한다.

"unit pointnet"은 $1 \times 1$ conv. 연산과 비슷하다.

약간 공유된 fully connected와 ReLU는 각 point feature vector를 업데이트 하기 위해 적용된다.

이 과정은 original set으로 propagate가 진행될때까지 적용된다.

4. Experiments

Dataset

이 논문은 MNIST, ModelNet40, SHREC15, ScanNet을 사용하여 평가했다.

4.1 Point Set Classification in Euclidean Metric Space

저자들은 2D, 3D Euclidean space에서 point cloud classification을 평가하기 위해

MNIST와 ModleNet40을 point cloud로 변환하였다.

그래서 MNIST는 512개, ModelNet40은 1024개의 point를 사용했다.

ModelNet40은 표면 법선(face normal)을 사용하여 point를 증가시켜 (5000개)

성능을 조금 더 높였다. (with normal으로 표시됨)

모든 point는 정규분포로 정규화했다.

그 결과는 다음과 같다.

Robustness to Sampling Density Variation

Sensor data는 실제 장면을 직접적으로 capture하는데

몇몇 irregular sampling issue가 존재한다.

이때문에 저자들은 다양한 scale로 sampling을 하고

그것들에 알맞은 가중치를 주어 표현(descriptiveness)와 안정성을 학습한다.

좌측은 point의 random dropout을 나타낸다.

우측은 모델의 성능을 나타내는데,

SSG는 sigle scale grouping이고 DP는 Dropout이고

MSG와 MRG는 앞서 설명한 것이다.

MSG+DP가 가장 안정적인것을 확인할 수 있다.

4.2 Point Set Segmentation for Semantic Scene Labeling

이것의 목표는 실내 데이터(ScanNet)에 Semantic segmentation을 하는것이다.

ScanNet 데이터의 RGB는 사용하지 않는다.

non-uniform이라는 것은

데이터의 해당도(밀도)가 떨어진 상태라는 뜻이다.

Labeling한 결과는 다음과 같다.

4.3 Point Set Classification in Non-Euclidean Metric Space

저자들은 Non-Euclidean space에 대한 일반성을 보여준다.

다음 그림과 같이 모델은 같은 자세 다른 객체와

다른자세 같은 객체를 정확히 구별할 수 있어야 한다.

그 결과는 다음과 같다.

non-Euclidean을 구성하는 방법은 다음과 같다.

For each shape in [12], we firstly construct the metric space induced by pairwise geodesic distances. We follow [23] to obtain an embedding metric that mimics geodesic distance. Next we extract intrinsic point features in this metric space including WKS [1], HKS [27] and multi-scale Gaussian curvature [16]. We use these features as input and then sample and group points according to the underlying metric space.

4.4 Feature Visualtization

첫번째 layer로부터 학습된 point cloud 패턴은 다음과 같다.

6. Conclusion

저자들은 metric space에서 샘플링된 point set을 처리하는 PointNet++을 제시하였다.

PointNet++은 input point set의 nested partitioning에 재귀적으로 기능하고

distance metric에 대한 계층적 틍석을 학습하는데 효과적이다.

non uniform point sampling issue를 다루기 위해 local point 밀도에 따른

다중 scale 정보를 지능적으로 모으는 2종류의 set abstraction layer를 제안한다.

미래에는 각 local region에 더 많은 계산을 공유함을로써 MSG와 MRG에 대한

네트워크의 속도를 가속하는 방법의 연구가 가치 있을 것이다.

이것은 또한 더 큰 차원의 metric space에 있는 application을 찾는데 흥미로울 것이다.

(큰 차원에서 CNN based method는 계산량이 어려움.)

Supplementary

B Details in Experiments

Architecture protocol.

$SA(K,r, \left [ l_{1}, ..., l_{d} \right ])$ : set abstrction ($SA$)

$K$ : local region 수, $r$ : ball 반지름,

$d$ : PointNet의 $d$ fully connected layer with width $l_{i} ( i = 1, ..., d)$

$SA(\left [ l_{1}, ..., l_{d} \right ])$ : global set abtraction, set to single vector 역할

$ SA(K, \left [ l_{1}^{1}, ..., l_{d}^{1} \right ],...,\left [ l_{1}^{m}, ..., l_{d}^{m} \right ])$ : $m$ scale의 MSG

$FC(l, dp)$ : fully connected layer with width $l$ dropout ratio $dp$

$FP(l_{1}, ... , l_{d})$ : feature propagation level with $d$ fully connected layer

모든 FC layer에는 batch norm.과 ReLU가 적용됨. 마지막 레이어 빼고

B.1 Network Architecture

Classification Network (SSG)

MSG Network

MRG Network

branch 1과 2는 concate되고 branch 4로 들어감

branch 3과 4는 concate되어 다음으로 들어감

Semantic Segmantation Network

맨뒤 2개의 FP에 있는 FC layer는 drop out ratio가 0.5다.

Part Segmantation Network

B.2 Virtual Scan Generation

non-uniform sampling을 하는 법

각 장면에 대해 바닥면 중심에서 1.5m 위에 카메라를 두고 8방위로 회전시킴.

각 장면에서 100px $\times$ 75px짜리 이미지 평면을 활용하여 장면으로 광선(rays)을 쏨.

이러면 각 장면에서 가시적인(visible) 장면을 얻을 수 있음.

따라서 다음과 같은 non-uniform data가 만들어짐

C More Experiments

C.1 Semantic Part Segmentation

C.2 Neighborhood Query : kNN vs Ball Query

Ball Query가 가장 좋음

C.3 Effect of Randomness in Farthest Point Sampling

FPS 알고리즘이 랜덤성을 가지고 있기 때문에

랜덤성에 대한 모델의 성능의 표준편차를 나타냄.

C.4 Time and Space Complexity

vanilla는 density adaptive layer가 없는 PointNet임

'3D point clouds > Classification' 카테고리의 다른 글

[논문리뷰] PointGPT: Auto-regressively Generative Pre-training from Point Clouds (1)	2023.09.05
PointNet pytorch 리뷰 (1)	2023.06.23
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation 리뷰 (0)	2023.06.20

KHS Computer Vision

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

'3D point clouds > Classification' 카테고리의 다른 글

티스토리툴바

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

'3D point clouds > Classification' 카테고리의 다른 글

관련글

티스토리툴바