Fine Tuning The Threshold in Face Recognition

2021. 1. 29. 09:15

Fine Tuning The Threshold in Face Recognition

얼굴인식(Face recognition)모델은 일반적인 CNN(Convolutional Neural Networks)이다. 이들 모델은 벡터(vector)로 얼굴 이미지를 표현하는 역할을 한다. 두개의 얼굴 이미지가 같은지를 결정하기 위해 이들 벡터 표현간의 거리를 찾는다. 만약 임계치(threshold)보다 거리가 작다면 동일 인물로 두 얼굴사진을 분류한다. 여기서 질문은 어떻게 임계치값을 정할까?이다. 대부분의 글들은 이를 결정하는 단계를 건너뛴다. 이 글에서는 임계치를 위한 최고 분할점(best split point)를 찾는 방법을 알아본다.


Black Mirror, Nosedive

Face recognition

이 글을 읽기전에 현재의 얼굴 인식 파이프라린의 일반적인 단계를 기억해야 한다.

현대의 얼굴인식 파이프라인은 탐지(detect), 정렬(align), 표현(represent), 검증(verify)의 4단계로 구성된다. 이 글에서는 주로 파이프라인의 검증단계에 초첨을 맞춘다.

Vlog

딥러닝에서 얼굴 인식 파이프라인에 대새 임계치를 세부조정(fine tuning)하는 것은 다음 비디오에서 잘 언글되고 있다. 여러분은 아래 vlog를 볼 수도 이 글을 따라할 수도 있다.

Deepface framework

이 글에서는 deepface에서 최신 얼굴인식 모델을 사용한다. 지원되는 모델은 VGG-Face, Google FaceNet, OpenFace와 Facebook DeepFace으로 기본은 VGG-Face이다. 이 글에서도 VGG-Face를 사용한다.

Dataset

딥페이스(deepface). 단위 테스트를 위한 데이터셋이 마스터데이터셋이 된다. 여기에는 25명에 대한 25개의 얼굴 사진이 있다. [분할압축1, 분할압축2, 분할압축3]


idendities = {
 "Angelina": ["img1.jpg", "img2.jpg", "img4.jpg"
 , "img5.jpg", "img6.jpg", "img7.jpg", "img10.jpg", "img11.jpg"],
 "Scarlett": ["img8.jpg", "img9.jpg"],
 "Jennifer": ["img3.jpg", "img12.jpg"],
 "Mark": ["img13.jpg", "img14.jpg", "img15.jpg"],
 "Jack": ["img16.jpg", "img17.jpg"],
 "Elon": ["img18.jpg", "img19.jpg"],
 "Jeff": ["img20.jpg", "img21.jpg"],
 "Marissa": ["img22.jpg", "img23.jpg"],
 "Sundar": ["img24.jpg", "img25.jpg"]
}

동일 신분에 대해 38쌍을 생성할 수 있고 이것들은 positives 데이터프레임에 저장된다.


positives = []
for key, values in idendities.items():
    for i in range(0, len(values)-1):
        for j in range(i+1, len(values)):
            positive = []
            positive.append(values[i])
            positive.append(values[j])
            positives.append(positive)

positives = pd.DataFrame(positives, columns = ["file_x", "file_y"])
positives["decision"] = "Yes"

또한 다른 신분에 대해 262쌍을 생성할 수 있고 이것들은 negatives 데이터프레임에 저장한다.


samples_list = list(idendities.values())

negatives = []
for i in range(0, len(idendities) - 1):
    for j in range(i+1, len(idendities)):
        cross_product = itertools.product(samples_list[i], samples_list[j])
        cross_product = list(cross_product)

        for cross_sample in cross_product:
            negative = []
            negative.append(cross_sample[0])
            negative.append(cross_sample[1])
            negatives.append(negative)

negatives = pd.DataFrame(negatives, columns = ["file_x", "file_y"])
negatives["decision"] = "No"

생성이 끝나면 positives와 negatives 모두를 합쳐야 한다.


df = pd.concat([positives, negatives]).reset_index(drop = True)

df.file_x = "dataset/"+df.file_x
df.file_y = "dataset/"+df.file_y

거리 찾기(finding distances)

이제 데이터프레임에 이미지의 쌍과 레이블이 있다. 파이썬 list로 이미지의 쌍을 전달하면 딥페이스 프레임워크는 즉시 얼굴 인식 모델을 생성한다. 이것은 극적으로 속도를 높혀준다. 그렇지 않으면 프레임워크는 각각의 이미지 쌍에 대해서 동일한 얼굴인식 모델을 생성한다.


from deepface import DeepFace

instances = df[["file_x", "file_y"]].values.tolist()
resp_obj = DeepFace.verify(instances, model_name = "VGG-Face", distance_metric = "cosine")

검증(verification)함수의 결과는 resp_obj에 저장된다. 이 응답 객체는 각 이미지 쌍에 대한 거리값을 저장한다. 입력쌍과 출력의 순서가 다를 수 있다. 그렇기 때문에 응답의 인덱스로 거리값을 맞춰야 한다.


distances = []
for i in range(0, len(instances)):
    distance = round(resp_obj["pair_%s" % (i+1)]["distance"], 4)
    distances.append(distance)

df["distance"] = distances

거리 분석(Analying distances)

기준 연구로써 positive와 negative 샘플에 대한 평균(mean)과 표준편차(standard deviation) 값을 조사할 수 있다. 동일 인물의 이미지 쌍이 positive, 다른 인물의 이미지쌍이 negative이다.


tp_mean = round(df[df.decision == "Yes"].mean().values[0], 4)
tp_std = round(df[df.decision == "Yes"].std().values[0], 4)
fp_mean = round(df[df.decision == "No"].mean().values[0], 4)
fp_std = round(df[df.decision == "No"].std().values[0], 4)

이 평균값으로 positive와 negative를 구분할 수 있다.

Mean of true positives: 0.2263
Std of true positives: 0.0744
Mean of false positives: 0.6489
Std of false positives: 0.12

분포(Distribution)

이 두 분류(class)의 분포를 조사하는 것도 흥미롭다.


df[df.decision == "Yes"].distance.plot.kde()
df[df.decision == "No"].distance.plot.kde()

Positive 분류는 좌우 대칭 분포를 갖는 것처럼 보이지만 negative는 negative skew를 갖는 것처럼 보인다.


Distributions

거리가 0.3보다 작거나 같으면 분명하게 동일 인물로써 두 얼굴을 구분할 수 있다. 유사하게 거리가 0.40보다 크거나 같으면 다른 인물로써 이미지 쌍을 구분할 수 있다.

Positive 분류의 최대값은 0.3637인 반면, negative 분류의 최소값은 0.3186이다. 이는 0.3186과 0.3637사이의 몇몇 샘플이 positive와 negative 모두를 갖는다는 것을 의미한다. 이것이 회색 영역(gray area)이다.

우리는 이 두 분류를 이익(gain)을 최대로하는 임계치 값으로 분할할 수 있다.

통계적 접근(Statistical approach)

$2\sigma$는 95.45%의 신뢰도를 $3\sigma$는 99.73%의 신뢰도인 것을 기억하자. 여기서 $\sigma$는 표준편차이다. 이제 임계치를 $2\sigma$로 설정해 보자.


threshold = round(tp_mean + sigma * tp_std, 4)

위에서 계산한 것과 같이

Mean of true positives: 0.2263
Std of true positives: 0.0744

임계치는 $2\sigma$이므로

$0.2263 + 2 \times 0.0744 = 0.3751$

가 된다.

따라서, 거리가 0.3751보다 작으면 동일인물로써 분류한다.

결정트리(Decision Trees)

임계치를 $2\sigma$로 설정하는 방법이외에 결정트르로 임계치는 찾는 것이 더 나은 방법이다. 왜냐하면 결정트리 알고리즘은 정보 이익(information gain)이 최대가 되는 데이터셋을 분할하기 때문이다.

가볍고 if문으로 만들어진 결정트리를 읽을 수 있기 때문에 [chefboost] 프레임워크를 사용했다.


from chefboost import Chefboost as chef
config = {'algorithm': 'C4.5'}
tmp_df = df[['distance', 'decision']].rename(columns = {"decision": "Decision"}).copy()
model = chef.fit(tmp_df, config)

C4.5 알고리즘은 데이터셋에 대해 아래와 같은 결정트리를 만든다. 우리는 실제로 거리(distance)와 목표(target) 컬럼이 있기 때문에 간단한 결정 스텀프(decision stump가 adaboost알고리즘과 유사하게 생성된다.


def findDecision(distance):
    if distance<=0.3147:
        return 'Yes'
    elif distance>0.3147:
        return 'No'

따라서 결정트리 접근법으로 임계치를 0.3147로 설정한다.

(저자는)이 임계치값으로 이익(gain)이 최대가 되는 것을 알기 때문에 임계치를 결정하기 위해 경정트리를 사용하는 것을 선호한다.

검증(Verification)

거리가 임계치보다 작거나 같으면 true로써 분류된 쌍을 설정한다.


df["prediction"] = "No" #init
idx = df[df.distance <= threshold].index
df.loc[idx, 'prediction'] = 'Yes'

평가(Evaluation)

이 작업은 분류작업이고 모델을 평가하기에 정확도(accuracy)는 충분하지 않다. 혼잡 매트릭스(confusion matrix). 정밀도(precision)과 재현률(recall)이 모델 정확도에 대한 정보를 제공한다.


from sklearn.metrics import confusion_matrix
cm = confusion_matrix(df.decision.values, df.prediction.values)
print(cm)
tn, fp, fn, tp = cm.ravel()
recall = tp / (tp + fn)
precision = tp / (tp + fp)
accuracy = (tp + tn)/(tn + fp +  fn + tp)
f1 = 2 * (precision * recall) / (precision + recall)

임계치로 $2\sigma$의 결과(Results of 2 sigma as the threshold)

임계치(Threshold) : 0.3751 ($2\sigma$)
정밀도(Precision) : 90.47619047619048 %
재현률(Recall) : 100.0%
F1 score : 95.0%
정확도(Accuracy) : 98.66666666666667 %

임계치로 C4.5 알고리즘의 결과(Results of C4.5 algorithm as the threshold)

임계치(Threshold) : 0.3147 (C4.5 best split point)
정밀도(Precision) : 100.0 %
재현률(Recall) : 89.47368421052632 %
F1 score : 94.44444444444444%
정확도(Accuracy) : 98.66666666666667 %

결정트리 접근법이 정밀도에서 좋은 반면 $2\sigma$ 방법은 재현률에서 좋다. 정밀도와 재현률의 정의를 기억하자. 정밀도는 '긍정으로 예측된 결과중 실제 긍정이 얼마나 많은가'이고 재현률은 '얼마나 많은 실제 긍정이 올바르게 예측되었나'이다.

만약 보안이 우선이라면 정밀도가 더 중요하다. 왜냐하면 동일인물일때 더 신뢰성이 있기 때문이다.

유사도 매트릭스(Similarity matrix)

데이터셋 인스턴스와 거리를 시각화한 것이 아래 그림이다. 녹색 연결은 정의된 임계치 기준으로 긍정(positive)이고 붉은 연결은 부정(negative)이다. 보이는 것과 같이 2개의 구분된 신분 그룹이 존재한다.


Distance matrix

Adding new face recognition models

이 글에서는 VGG-Face 얼굴 인식 모델과 코사인 유사도 지표(cosine similarity metric)를 알아보았다. 하지만 실제 다른 얼굴 인식 모델과 거리 지표(distance metric)도 있다. VGG-Face, Google FaceNet, OpenFace, DeepFace 얼굴 인식 모델과 코사인, 유클리드와 유클리지_l2 유사도 메트릭스(similarity metrics)에 대해 동일한 접근법을 적용한다면 어떨까?

conclusion

얼굴인식 연구에서 핵심 역할을 수행하는 임계치를 어떻게 결정하는지를 알아보았다. 비록 VGG-Face 모델과 코사인 유사도 지표에 대한 세부조정된 임계치를 알아보았지만 이 접근은 다른 지원되는 모델과 지표에 적용 가능하다.

여기에 이 글에서 사용된 소스코드가 있다.

저작자표시 비영리 동일조건

Dead & Street