Recently, the field of face recognition is increasingly investigated by Google, Facebook, Intel, Accenture, IBM, DeepMind, and many other companies. Recognition under constrained environment is quite satisfactory. However , face recognition in uncontrolled environment is yet a challenging problem due to technical key issues.
Numerous algorithms and techniques have been developed for improving the performance of face recognition. Recently Deep learning has been highly explored for computer vision applications. This article explains in simple words, how Deep Learning (in Deep Convolution Neural Network- Fig.1) can be used for Face Recognition. As recognition is based on Deep Learning, it is referred to as Deep Face Recognition. These article does not focus on coding or mathematical aspects of Deep Learning.
Deep Face Recognition
Human brain can automatically and instantly detect multiple faces, even under pose and complex lighting variations, and recognises in a best way. But when it comes to computer, it is very difficult to do all the difficult challenging tasks at the level of human brain.
Face recognition applications have two parts or phases. Fig. 2 shows architecture of how Face Recognition System works. Let’s call (1) Phase-I: Enrollment phase – Model / system is trained using millions of prototype face images and a trained model is generated. Generated face features are stored in database and (2) Phase-II: Recognition phase – Query face image is given as input, to the model generated in phase-I, to recognise it correctly. Following section explains all the steps of Deep Face Recognition in detail. Steps within each of the Phases are given below:
(1) Enrollment Phase
1. Face Detection
2. Feature extraction
3. Store Model and extracted feature in Database
(2) Recognition Phase / Query Phase
1. Face Detection
3. Feature Extraction
Now lets us look at each steps of both the phases of Deep Face Recognition in detail.
Phase-I : Enrollment Phase
Step-1. Face Detection
The very first step in enrollment phase is detection of faces in images or video frames. Face needs to be located and region of interest is computed. Face detection algorithms by Viola and Jones is most famous.
Though they require rigorous training and is time consuming. However, Histogram of Oriented Gradients (HOG) is faster and easier algorithm for face detection. Detected faces are given to next step of feature extraction.
Step-2. Feature extraction
What is the best feature measure that represents human face in a best way? There are numerous methods like Principal Component Analysis (PCA), Kernel PCA (K-PCA), Independent Component Analysis (ICA), Linear Discriminant Analysis (LDA), Kernel based LDA (K-LDA), Discrete Cosine Transform (DCT), Haar Features, Local Binary Pattern (LBP), variant of all these methods and many more. However, researchers have discovered that the best approach is to let the computer learn and find out the measurements from images that best describe human faces.
Deep learning can determine which parts of a face are important to measure. Deep Convolution Neural Network (DCNN)  can be trained to learn important features. There are many ways to train DCNN and get best feature. Recent researchers have presented a methods that trains a DCNN to derive 128 measurements for each face. That is, each face is represented by a feature vector of size 128 numeric values.
Three step training process of DCNN is as follow:
1. Input one training face image of a known person
2. Input another face image of the same known person
3. Input a picture of a totally different person
Then the algorithm looks at the measurements it is currently generating for each of those three or more images. DCNN is tuned in such a way that measurements it generates for step #1 and #2 are closer while making sure the measurements for steps #2 and step #3 are further apart:
These steps are repeated for for all the images in our training data set. The DCNN learns and generates 128 feature values for each candidate. The DCNN training process to output face feature vector is time consuming and requires lot of computer power. It may takes about a day or more depending on size of training dataset. But once the neural network is trained, it can generate features for any face, even ones it has never seen before, within no time.
Step-3. Store DCNN model and Feature in Database
Generally this third step stores generated features in database. However, here DCNN is trained to detemine features, the DCNN network model is stored and features are stored in database.
Phase-II : Recognition / Query Phase
Step-1. Face Detection
Again, during recognition /query phase face is detected using best known face detection algorithm similar to Phase-I.
In this second step of second phase of face recognition, face images may required to pass through pre-processing steps. The images / video frames may required to be pre-processed to overcome the issues like noise, illumination, pose/rotation. Noise and illumination can be overcomed using suitable filters [Kalman Filter, Adaptive Retinex (AR), Multi-Scale Self Quotient (SQI), Gabor Filter, etc.]
Pose or rotation of face images can be accounted by using 3D transformation or affine transformation. There are many methods, but one of the best method is based on face landmark estimation. It was invented in 2014 by V. Kazemi and J. Sullivan .
The basic idea is, determine 68 landmark points on every face — the top of the chin, the outside edge of each eye, the inner edge of each eyebrow, etc. Train a Deep Neural Network to find these 68 specific points on any face. Use this 68 points to rotate, scale and translate face to align with frontal face image. Then pass this pre-processed image to feature extraction method.
Step-3. Feature Extraction
In this third step of Deep Face Recognition, we have to use trained DCNN model, which was generate during feature extraction step of enrollment phase (Phase-I). A query image is given as input. The DCNN generates 128 feature values. This feature vector is then compare with feature vector stored in database.
This is the last step of Deep Face Recognition, which find the person in our database of known people who has the closest feature vector to our query image. This can be done by using any basic machine learning classification algorithm. No need to use deep learning again. SVM classifier, Bayesian classifier, Euclidean Distance classifier, Co-relation based classifier, etc can be easily used for matching database feature vector with query feature vector. It finds best matching faces from the database and gives ID of best matching face image as a recognition output.
For further details about HOG and implementation of face recognition, one may refer reference .
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recog-
nition. In International Conference on Learning Representations, 2015.
- One Millisecond Face Alignment with an Ensemble of Regression Trees, http://www.csc.kth.se/~vahidk/papers/KazemiCVPR14.pdf
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional
neural networks. In NIPS, pages 1106–1114, 2012.