Deep Face Detection with OpenCV in Python

2021. 1. 26. 08:47

Deep Face Detection with OpenCV in Python

얼굴 탐지(face detection)은 [얼굴인식 파이프라인]의 앞단계이다. 얼굴탐지는 이 파이프라인에서 중심적인 역할이다. 여기서 딥러닝 기반 접근은 전통적인 방법보다 더 빠르고 정확하게 이를 조작한다. 이 글에서는 OpenCV로 ResNet SSD(Single Shot-Multibox Detector)를 사용한다.


[Game of Thrones](https://www.imdb.com/title/tt0944947/) – The Hall of Faces

Dependencies

우선 이 글의 소스는 여기에 있다.

두번째로 외부의 ResNet 모델을 사용하고 이 모델의 사전 훈련된 가중치는 OpenCV community에서 제공한다. 아래 파일을 다운로드 받아 작업하려는 디렉토리에 저장한다.


#model structure: https://github.com/opencv/opencv/raw/3.4.0/samples/dnn/face_detector/deploy.prototxt

#pre-trained weights: https://github.com/opencv/opencv_3rdparty/raw/dnn_samples_face_detector_20170830/res10_300x300_ssd_iter_140000.caffemodel

OpenCV 심층 신경망(Deep Neural Networks) 모듈은 외부 카페(caffe) 모델을 로드할 수 있다.


detector = cv2.dnn.readNetFromCaffe("deploy.prototxt" , "res10_300x300_ssd_iter_140000.caffemodel")

Model Structure

ResNet SSD 모델은 주로 VGG 기반이다.


ResNet SSD

Loading the image

SSD 모델은 (300, 300, 3) 크기의 입력을 필요로 한다. 따라서 입력 이미지를 300 X 300으로 클기를 조정한다. 하지만 이는 저해상도이기 때문에 해상도를 잃지 않기 위해 원본 이미지를 다른 변수에 저장한다.


image = cv2.imread("image.jpg")
base_img = image.copy()
original_size = base_img.shape
target_size = (300, 300)
image = cv2.resize(image, target_size)
aspect_ratio_x = (original_size[1] / target_size[1])
aspect_ratio_y = (original_size[0] / target_size[0])

이것이 처리할 이미지아다.


Arya Stark and Jon Snow in Game of Thrones

위에서 원본 이미지를 300 X 300 픽셀로 크기를 조정하였지만 OpenCV는 실제로 (1, 3, 300, 300) 크기의 입력을 요구한다. 가장 쉬운 방법은 'blobFromImage'함수를 사용하는 것이다. 이 로직을 코딩하려면 3번째 차원을 첫번째로 굴리고 expand dimensions 함수를 호출하여 왼쪽에 더미(dummy) 차원을 추가한다.


#detector expects (1, 3, 300, 300) shaped input
imageBlob = cv2.dnn.blobFromImage(image = image)
#imageBlob = np.expand_dims(np.rollaxis(image, 2, 0), axis = 0)

Feed Forward

이제 처리된 이미지를 caffe 모델로 전달할 수 있다. 이것이 신경망에서 기본적인 feed forware 단계이다.


detector.setInput(imageBlob)
detections = detector.forward()

강력한 얼굴 후보에 집중(Focusing on strong face candidates)

신경망의 출력은 (200, 7)크기의 행렬이다. 여기서 행(row)은 얼굴 후보를 나타내는 반면 열(column)은 몇몇 특성을 나타낸다. 이 특성을 기반으로 얼굴 후보를 걸러낸다.


column_labels = ["img_id", "is_face", "confidence", "left", "top", "right", "bottom"]
detections_df = pd.DataFrame(detections[0][0], columns = column_labels)


Detections

특성 'is_face'는 배경에 대해서는 0 얼굴에 대해서는 1이 된다. 그렇기 때문에 이 컬럼이 0인 것을 제외한다. 또한 'confidence'가 임계치(threshold)보다 작은 것(예를 들면 90%)도 제외한다.


#0: background, 1: face
detections_df = detections_df[detections_df['is_face'] == 1]
detections_df = detections_df[detections_df['confidence']&amp;amp;lt;=0.90]

left, right, top, bottom의 좌표값은 0과 1사이에 있다. 이전단계에서 입력이미지를 (300, 300)으로 크기 조정을 했다는 것을 기억하자. 따라서 크기가 조정된 이미지에서 정확한 좌표를 알기 위해서는 이 좌표값에 300을 곱해야 한다.


detections_df['left'] = (detections_df['left'] * 300).astype(int)
detections_df['bottom'] = (detections_df['bottom'] * 300).astype(int)
detections_df['right'] = (detections_df['right'] * 300).astype(int)
detections_df['top'] = (detections_df['top'] * 300).astype(int)

이 필터를 적용하여 인스턴스의 대부분을 제거하면 판다스 데이터프레임에 다음 2개의 인스턴스만 남는다.


Found faces

Plotting

이제 원본이미지에서 정확히 탐지된 얼굴을 추출할 수 있다.


for i, instance in detections_df.iterrows():
    confidence_score = str(round(100*instance["confidence"], 2))+" %"
    left = instance["left"]
    right = instance["right"]
    bottom = instance["bottom"]
    top = instance["top"]
    detected_face = base_img[int(top*aspect_ratio_y):int(bottom*aspect_ratio_y) ,
                            int(left*aspect_ratio_x):int(right*aspect_ratio_x)]
    print("Id ",i,". Confidence: ", confidence_score)
    plt.imshow(detected_face[:,:,::-1])
    plt.show()


Detected faces

이제는 원본이미지에서 탐지된 얼굴에 집중할 수 있다.


cv2.putText(base_img, confidence_score, 
            (int(left*aspect_ratio_x), int(top*aspect_ratio_y-10)), 
            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1)
cv2.rectangle(base_img, (int(left*aspect_ratio_x), int(top*aspect_ratio_y)), 
            (int(right*aspect_ratio_x), int(bottom*aspect_ratio_y)), 
            (255, 255, 255), 1) #draw rectangle to main image


Deep Face Detection

Face detectors

몇가지 얼굴탐지 솔루션이 있다. 우선 OpenCV는 Haar Cascade와 SSD(Single Shot-Multibox Detctor)를 제공한다. Dlib는 HoG(Histogram of Oriented Gradients)와 MMOD(Max-Margin Object Detection)를 제곧한다. MTCNN(Multi-task Cascaded Convolutional Networks)은 요즘 유명한 솔루션이다. 여기서 Haar Cascade와 HoG는 고전 방법인 반변 SSD, MMOD, MTCNN은 딥러닝 기반의 현대적인 방법이다.

얼굴 탐지 점수(face detection score)는 SSD와 MTCNN에서 더 정확하다. (필자는) MMOD가 매우 강력한 하드웨어가 필요하기 때문에 테스트하지 못했다. 아래 비디오는 이 기술들을 비교한 것을 보여준다. False positive 비율은 Haar Cascade와 HoG에서 높다. 이 방법들은 넥타이 또는 얼굴같은 배지같은 것을 얼굴로 탐지한다. SSD와 MTCNN은 좀 더 강력한 결과를 보여준다.

여기에 실시간 연구에 사용된 소스코드가 있다.
이 소스는 최신 얼굴탐지 구현을 포함한다.

SSD는 가장 빠르다. (저자는) i7 노트북에서 720프레임 비디오로 이 모델을 테스트했다. 평균적으로 SSD가 초당 9.20프레임을 처리할 수 있는 반면, haar cascade는 6.50fps를 다룰 수 있었고 dlib HoG는 1.57fps, mtcnn은 1.54fps였던 것으로 보아 SSD 가 더 강하고 가장 빠른것 같아 보인다.


Face detector performances

DeepFace는 이미 위에 나온 얼굴탐지기를 포함하고 있다. 'detectFace' 함수는 탐지(detection)와 정렬(alignment)을 각각 적용한다.


from deepface import DeepFace backends = ['opencv', 'ssd', 'dlib', 'mtcnn']
for backend in backends:
    detected_aligned_face = DeepFace.detectFace(img_path = "img.jpg" , detector_backend = backend)

여기서 파이썬으로 다른 얼굴 타지기를 어떻게 사용하는지를 볼 수 있다.

대신에 수동으로 탐지와 정렬을 적용할 수도 있다.


img = functions.load_image("img.jpg")
backends = ['opencv', 'ssd', 'dlib', 'mtcnn']
backend = backends[3]
detected_face = functions.detect_face(img = img, detector_backend = backend)
aligned_face = functions.align_face(img = img, detector_backend = backend)
processed_img = functions.detect_face(img = aligned_face, detector_backend = backend)
plt.imshow(processed_img)

Face recognition pipeline

얼굴 탐지는 얼굴인식 파이프라인(face recognition pipeline)의 첫번째 단계이다. 아래 비디오는 끝에서 끝까지 파이프라인을 구축하는데 도움이 된다.

저작자표시 비영리 동일조건

Dead & Street