Deep Face Detection with MTCNN in Python

2021. 2. 2. 08:21

Deep Face Detection with MTCNN in Python

얼굴탐지(face detection)은 견고한 얼굴인식 파이프라인(face recognition pipeline)을 갖기 위한 필수 단계이다. 여기서 MTCNN은 높은 탐지 점수를 제공하는 강력한 얼굴탐지기로 Multi-task Cascaded Convolutional Networks를 나타낸다. 이름과 같이 현대 딥러닝 기반 접근법이다. 이 글에서는 MTCNN으로 얼굴탐지와 정렬(alignment)에 대해 다룬다.


The most famous selfie in the Academy Awards 2014

It is an overperforming detector

MTCNN은 OpenCV의 haar cascade와 Dlib의 히스토그램(histogram)기반 접근방법에 비해 탐지 정확도에서 커다란 개선을 하였다. SSD와 MTCNN 모두 얼굴탐지에서 성능이 뛰어나다.

SSD는 MTCNN보다 훨씬 빠르다. (필자는) 다른 얼굴 탐지기로 720p 비디오를 사용하여 테스트 했다. MTCNN이 1.54fps인데 반해 SSD는 9.20fps였다. 즉, MTCNN은 약 6배 더 느리다. 아래 비디오에서 탐지기 성능을 확인할 수 있다.

여기서 파이썬으로 어떻게 다른 얼굴 탐지기를 사용하는지 볼 수 있다.

Model Structure

MTCNN은 주로 P-Net, R-Net, O-Net의 3개로 분할된 CNN 모델을 기반으로 한다.


MTCNN Architecture : P-Net, R-Net, O-Net

P-Net의 이름은 'Proposal Network'에서 왔다. 이 네트워크는 12 X 12 크기의 프레임에서 얼굴을 찾는다. 이 네트워크의 목적은 빠른 결과를 만드는 것이다.

R-Net의 이름은 'Refine Network'에서 왔다. 이 네트워크는 P-Net보다 더 깊은 구조를 갖는다. 이전 네트워크인 P-Net에서 전달받은 모든 후보가 R-Net으로 전달된다. R-Net은 여기서 다량의 후보를 탈락시킨다.

마지막으로 출력 네트워크(Output Network) 또는 줄여서 O-Net은 바운딩 박스(bounding box - 얼굴영역)와 얼굴 랜드마크 위치를 반환한다.

Installation

MTCNN은 사전준비절차로 tensorfolw와 keras 설치에 따라 다르다. MTCNN은 David Sandgerg의 FaceNet 구현에서 많은 영감을 받았다. PyPI로 사용가능하다.


pip install mtcnn

Face detection

MTCNN은 이것이 될 수 있는 가능한 만큼 가벼운 솔루션이다. 이 글에서는 우선 MTCNN 탐지기를 구성하고 MTCNN의 인터페이스를 이용해 detect faces 함수의 입력으로 넘파이 배열을 전달한다. 아래 코드를 사용하여 OpenCV 로 입력이미지를 로드한다. Detect faces 함수는 탐지된 얼굴에 대한 객체 배열을 반환한다. 반환된 객체 점수는 box key에 탐지된 얼굴의 좌표를 저장한다.


from mtcnn import MTCNN
import cv2

detector = MTCNN()

img = cv2.imread("img.jpg")
detections = detector.detect_faces(img)

for detection in detections:
    score = detection["confidence"]
    if score > 0.90:
        x, y, w, h = detection["box"]
        detected_face = img[int(y):int(y+h), int(x):int(x+w)]

Facial landmarks

비록 OpenCV기반 SSD가 동일 수준 정확도를 제공하지만, MTCNN 역시 눈, 코, 입의 위치 같은 몇가지 얼굴 랜드마크를 찾는다. 특히 눈의 위치를 추출하는 것은 얼굴 정렬에 매우 중요하다. 열굴 정렬이 Google FaceNet 연구에 따르면 얼굴인식 모델 정확도를 거의 1% 증가시킨다는 것을 기억하자.

OpenCV는 성능이 떨어지는 합성곱(convolutional) haar cascade 방법으로 눈의 위치를 찾는다. 즉, 현재적인 SSD를 도입하더라도 얼굴정렬은 OpenCV의 고전 haar cascade에 의존해야만 한다.

탐지된 얼굴 함수의 반환된 객체 또한 얼굴 랜드마크를 저장한다. 이 글에서는 눈의 위치에만 집중해서 진행한다.


keypoints = detection["keypoints"]
left_eye = keypoints["left_eye"]
right_eye = keypoints["right_eye"]

Face alignment procedure

찾은 얼굴에서 정확한 눈의 위치를 안다면 얼굴을 정렬할 수 있다. 이 주제는 Face Alignment for Face Recognition in Python within OpenCV에서 다뤘다. 아래 코드는 deepface 프레임워크의 소스코드에서 복사했다.


Alignment procedure

요약하면 양쪽 눈이 수평이 될 때까지 기본 이미지를 회전한다.


def alignment_procedure(img, left_eye, right_eye):
    #this function aligns given face in img based on left and right eye coordinates

    left_eye_x, left_eye_y = left_eye
    right_eye_x, right_eye_y = right_eye

    #-----------------------
    #find rotation direction

    if left_eye_y > right_eye_y:
        point_3rd = (right_eye_x, left_eye_y)
        direction = -1 #rotate same direction to clock
    else:
        point_3rd = (left_eye_x, right_eye_y)
        direction = 1 #rotate inverse direction of clock

    #-----------------------
    #find length of triangle edges

    a = distance.findEuclideanDistance(np.array(left_eye), np.array(point_3rd))
    b = distance.findEuclideanDistance(np.array(right_eye), np.array(point_3rd))
    c = distance.findEuclideanDistance(np.array(right_eye), np.array(left_eye))

    #-----------------------

    #apply cosine rule

    if b != 0 and c != 0: #this multiplication causes division by zero in cos_a calculation

        cos_a = (b*b + c*c - a*a)/(2*b*c)
        angle = np.arccos(cos_a) #angle in radian
        angle = (angle * 180) / math.pi #radian to degree

        #-----------------------
        #rotate base image

        if direction == -1:
            angle = 90 - angle

        img = Image.fromarray(img)
        img = np.array(img.rotate(direction * angle))

    #-----------------------

    return img #return img anyway

여기서 어떻게 MTCNN이 얼굴을 탐지하고 정렬하는지 볼 수 있다.


Face detection and alignment with MTCNN

MTCNN in deepface

각각 탐지와 정렬을 실행하는 것은 복잡해 보이면서 혼란스럽고 엄두를 못낼 수도 있다. 여기서 deepface는 OpenCV의 haar cascade, SSD, Dlib HoG, MTCNN 탐지기를 모두 포함한다. 단지 몇줄의 코드로 deepface로 얼굴 탐지와 정렬을 할 수 있다.


from deepface import DeepFace
from deepface.commons import functions

backends = ['opencv', 'ssd', 'dlib', 'mtcnn']

for backend in backends:
    #face detection and alignment
    detected_and_aligned_face = DeepFace.detectFace("img.jpg", detector_backend = backend)

    #------------------------

    #face detection
    detected_face = functions.detect_face(img = "img.jpg", detector_backend = backend)

    #face alignment
    aligned_face = align_face(img = detected_face, detector_backend = backend)

한편 얼굴탐지 파이프라인은 백그라운드에서 탐지와 정렬 단계를 다룬다. 즉 파이프라인에서 수동으로 얼굴 탐지와 정렬을 하지 않아도 된다는 의미이다.


#face verification
obj = DeepFace.verify("img1.jpg", "img2.jpg", detector_backend = 'mtcnn')

#face recognition
df = DeepFace.find(img_path = "img.jpg", db_path = "my_db", detector_backend = 'mtcnn')

다음 비디오에서 deepface로 이 전처리 단계를 어떻게 적용하는지 볼 수 있다.

Conclusion

어떻게 MTCNN이 얼굴을 탐지하는지 그리고 MTCNN이 제공하는 얼굴 랜드마크로 얼굴을 어떻게 정렬하는지 알아보았다. MTCNN의 얼굴탐지 점수는 매우 높지만 속도가 다른 모델 보다 느리다. 여기서 정확도가 우선이면 MTCNN을 도입하지만 속도가 우선이면 SSD를 도입할 수 있다. 하지만 SSD는 얼굴 랜드마크를 찾지 않고 얼굴 정렬을 위한 눈의 위치를 찾기 위해 OpenCV의 haar cascade를 사용해야 한다. 이는 제품에서 좋지않은 영향일 수 있다.

저작자표시 비영리 동일조건

Dead & Street