483 lines
25 KiB
TeX
483 lines
25 KiB
TeX
\documentclass[10pt,twocolumn,letterpaper]{article}
|
|
|
|
\usepackage{relsize}
|
|
\usepackage{fg}
|
|
\usepackage{times}
|
|
\usepackage{epsfig}
|
|
\usepackage{graphicx}
|
|
\usepackage{amsmath}
|
|
\usepackage{amssymb}
|
|
\usepackage{bm}
|
|
\usepackage{textcomp}
|
|
\renewcommand*{\d}{\mathrm{d}}
|
|
\newcommand*{\dd}{\partial}
|
|
\newcommand*{\diffp}[2]{\ensuremath{\frac{\dd #1}{\dd #2}}}
|
|
\newcommand*{\diffpp}[3]{\ensuremath{\frac{\dd^2 #1}{\dd #2 \dd #3}}}
|
|
\newcommand*{\diffppp}[4]{\ensuremath{\frac{\dd^3 #1}{\dd #2 \dd #3 \dd #4}}}
|
|
\newcommand*{\difff}[2]{\ensuremath{\frac{\d #1}{\d #2}}}
|
|
\newcommand*{\diffff}[3]{\ensuremath{\frac{\d^2 #1}{\d #2 \d #3}}}
|
|
\newcommand*{\difffp}[3]{\ensuremath{\frac{\dd\d #1}{\d #2 \dd #3}}}
|
|
\newcommand*{\difffpp}[4]{\ensuremath{\frac{\dd^2\d #1}{\d #2 \dd #3 \dd #4}}}
|
|
|
|
\newcommand{\Matrix}[1]{\begin{bmatrix} #1 \end{bmatrix}}
|
|
\newcommand{\Vector}[1]{\Matrix{#1}}
|
|
\newcommand*{\SET}[1] {\ensuremath{\mathcal{#1}}}
|
|
\newcommand*{\MAT}[1] {\ensuremath{\mathbf{#1}}}
|
|
\newcommand*{\VEC}[1] {\ensuremath{\bm{#1}}}
|
|
\newcommand*{\CONST}[1]{\ensuremath{\mathit{#1}}}
|
|
\newcommand*{\norm}[1]{\mathopen\| #1 \mathclose\|}% use instead of $\|x\|$
|
|
\newcommand*{\abs}[1]{\mathopen| #1 \mathclose|}% use instead of $\|x\|$
|
|
\newcommand*{\absLR}[1]{\left| #1 \right|}% use instead of $\|x\|$
|
|
\newcommand*{\normLR}[1]{\left\| #1 \right\|}% use instead of $\|x\|$
|
|
% Include other packages here, before hyperref.
|
|
|
|
% If you comment hyperref and then uncomment it, you should delete
|
|
% egpaper.aux before re-running latex. (Or just hit 'q' on the first latex
|
|
% run, let it finish, and you should be clear).
|
|
\usepackage[pagebackref=true,breaklinks=true,letterpaper=true,colorlinks,bookmarks=false]{hyperref}
|
|
|
|
|
|
\fgfinalcopy % *** Uncomment this line for the final submission
|
|
|
|
\def\httilde{\mbox{\tt\raisebox{-.5ex}{\symbol{126}}}}
|
|
|
|
% Pages are numbered in submission mode, and unnumbered in camera-ready
|
|
\iffgfinal\pagestyle{empty}\fi
|
|
\begin{document}
|
|
|
|
%%%%%%%%% TITLE
|
|
\title{Expression Invariant 3D Face Recognition with a Morphable Model}
|
|
|
|
\author{Brian Amberg\\
|
|
{\tt\small brian.amberg@unibas.ch} \and
|
|
Reinhard Knothe\\
|
|
{\tt\small reinhard.knothe@unibas.ch} \and
|
|
Thomas Vetter\\
|
|
{\tt\small thomas.vetter@unibas.ch}
|
|
}
|
|
|
|
\maketitle
|
|
% \thispagestyle{empty}
|
|
|
|
%%%%%%%%% ABSTRACT
|
|
\begin{abstract}
|
|
We present an expression-invariant method for face recognition by fitting an
|
|
identity/expression separated 3D Morphable Model to shape data. The
|
|
expression model greatly improves recognition and retrieval rates in the
|
|
uncooperative setting, while achieving recognition rates on par with the best
|
|
recognition algorithms in the face recognition great vendor test. The
|
|
fitting is performed with a robust nonrigid ICP algorithm. It is able to
|
|
perform face recognition in a fully automated scenario and on noisy data.
|
|
The system was evaluated on two datasets, one
|
|
with a high noise level and strong expressions, and the standard UND range
|
|
scan database, showing that while expression invariance increases recognition
|
|
and retrieval performance for the expression dataset, it does not decrease
|
|
performance on the neutral dataset. The high recognition rates are achieved
|
|
even with a purely shape based method, without taking image data into
|
|
account.
|
|
\end{abstract}
|
|
|
|
%%%%%%%%% BODY TEXT
|
|
\section{Introduction}
|
|
We present a system which is using shape information from a 3D scanner to
|
|
perform automated face recognition. The main novelty of the system is its
|
|
invariance to expressions. The system is tested on two
|
|
public datasets. It is fully automatic and can handle the typical artifacts of
|
|
3D scanners, namely outliers and missing regions. Face recognition in this
|
|
setting is a difficult task, and difficult tasks benefit from strong prior knowledge.
|
|
To introduce the prior knowledge we use a 3D Morphable Model
|
|
(3DMM)~\cite{blanz:model}, which is a generative statistical model of 3D faces.
|
|
3DMMs have been applied successfully for face recognition on different
|
|
modalities. The most challenging setting is recognition from single images
|
|
under varying light and illumination. This was adressed
|
|
by~\cite{blanz03:face_rec,romdhani:recognition}. There, a 3DMM with shape,
|
|
texture and illumination model was fit to probe and gallery images. As the
|
|
model separates shape and albedo parameters from pose and lighting, it enables
|
|
pose and lighting-invariant recognition. We use the same idea for
|
|
expression-invariant face recognition from 3D shape. We fit an identity/expression
|
|
separating
|
|
3DMM~\cite{blanz03:expression} to shape data and normalize the
|
|
resulting face by removing the pose and expression components. See
|
|
Figure~\ref{fig:fitting} for an example of expression normalization. The
|
|
expression and pose normalized data allows then efficient and effective
|
|
recognition. A 3D MM has been fitted to range data before~\cite{blanz07:range}
|
|
and the results were even evaluated on part of the UND database. Our approach
|
|
differs from this work in the fitting method employed, which is independent of
|
|
the acquisition device, and in the use of an expression model to improve face
|
|
recognition. Additionally, our method is fully automatic,
|
|
while~\cite{blanz07:range} needed seven manually selected landmarks.
|
|
|
|
\begin{figure}
|
|
\vspace{-0.5em}
|
|
\begin{tabular}{@{ }c@{ }c@{ }c@{ }c@{}}
|
|
\includegraphics[height=0.42\linewidth]{16_1_tgt}&
|
|
\includegraphics[height=0.42\linewidth]{16_1_expression}&
|
|
\includegraphics[height=0.42\linewidth]{16_1_neutral}\\[-0.8em]
|
|
\smaller a) Target & \smaller b) Fit & \smaller c) Normalized\\[0.8em]
|
|
\includegraphics[height=0.42\linewidth]{16_6_tgt}&
|
|
\includegraphics[height=0.42\linewidth]{16_6_expression}&
|
|
\includegraphics[height=0.42\linewidth]{16_6_neutral}\\[-0.8em]
|
|
\smaller a) Target & \smaller b) Fit & \smaller c) Normalized
|
|
\end{tabular}
|
|
\vspace{0.5em}
|
|
\caption{Expression normalisation for two scans of the same individual.
|
|
The robust fitting gives a good estimate (b) of the true face surface given
|
|
the noisy measurement (a). It fills in holes and removes artifacts using
|
|
prior knowledge from the face model. The pose and expression normalized faces
|
|
(c) are used for face recognition.
|
|
}
|
|
\label{fig:fitting}
|
|
\end{figure}
|
|
Expression-invariant recognition for shape data was also approached in
|
|
\cite{xiaoguang06:face_matching}, where a person specific 3D Morphable
|
|
Expression Model was learned for each subject in the gallery. In contrast, we
|
|
are using a general 3DMM learned from an independent database of face shapes
|
|
which can be applied without any relearning to a new scan. This makes the
|
|
enrollment phase trivial and the recognition phase effectively constant in the
|
|
size of the gallery while still being accurate. We have to fit just one
|
|
model to the probe, which can then be compared efficiently to
|
|
the enrolled subjects, by comparing their coefficients in the low dimensional
|
|
face space. While the number of comparisions is still at most linear in the
|
|
number of examples (and can be made sublinear with an indexing method) the time
|
|
it takes to compare coefficients in face space is neglectible compared to
|
|
fitting time.
|
|
%
|
|
Model-less approaches which align the probe to each example in the database
|
|
using e.g.\ ICP~\cite{bowyer05:icp_recognition} suffer from the same problem
|
|
as~\cite{xiaoguang06:face_matching}.
|
|
Because the probe has to be aligned with each gallery scan these methods scale
|
|
linearly in the gallery size, While our model based approach needs only a
|
|
single fit to the probe.
|
|
|
|
Another interesting model-less approach~\cite{bronstein05:face_rec} compares
|
|
surface by the distribution of geodesics, which stays constant for nonrigidly
|
|
deforming (but not stretching or tearing) objects. This approach is difficult
|
|
to apply in this setting though, as the scanning produces holes, disconnected
|
|
regions and strong noise, which can best be handled by a method which uses
|
|
specific information about the object class.
|
|
|
|
\section{Model}
|
|
A PCA model~\cite{blanz:model} built from 175 subjects was used. It was build
|
|
from one neutral expression face scan per identity and 50 expression scans of a
|
|
subset of the subjects. The data was registered with a modification
|
|
of~\cite{amberg07:nicp}.
|
|
The identity model consists of a mean shape $\VEC\mu$ and a matrix of offset
|
|
vectors $\MAT M_n$ such that a new face instance $\VEC f$ is generated from a
|
|
vector of coefficients $\VEC\alpha_n$ as
|
|
\begin{align}
|
|
\VEC f&=\VEC\mu + \MAT M_n\VEC\alpha_n\qquad.
|
|
\end{align}
|
|
The model is constructed such that the $\alpha_i$ are independently normally
|
|
distributed with zero mean and unit variance under the standard assumption of a
|
|
Gaussian distribution of the data. This was done by performing PCA
|
|
on the data matrix built from the mean free shape vectors.
|
|
Additionally, for each of the 50 expression scans, we calculated an expression
|
|
vector as the difference between the expression scan and the corresponding
|
|
neutral scan of that subject.
|
|
This data is already mode-centered, if we regard the neutral
|
|
expression as the natural mode of expression data. On these offset vectors
|
|
again PCA was applied to get an expression matrix $\MAT M_e$ and
|
|
expression coefficients $\VEC\alpha_e$, such that the complete expression model is
|
|
\begin{align}
|
|
\VEC f&=\VEC\mu + \MAT M_n\VEC\alpha_n + \MAT M_e\VEC\alpha_e
|
|
=\VEC\mu + \MAT M\VEC\alpha\qquad,\\
|
|
\MAT M &= \Matrix{\MAT M_n &|& \MAT M_e} \qquad \VEC\alpha = \Matrix{\VEC\alpha_n \\ \VEC\alpha_e}\qquad.
|
|
\end{align}
|
|
The basic assumption of this paper is, that the face and expression space are
|
|
linearly independent, such that each face is represented by a unique set of
|
|
coefficients. While the resulting expression and identity matrices are not
|
|
perfectly orthogonal, they do have little overlap, which together with the
|
|
regularisation employed is sufficient for this application. We assume, that the
|
|
overlap between the spaces is due to the fact that it is impossible to aquire
|
|
perfectly consistent neutral expressions.
|
|
|
|
We use the registered scans and a mirrored version of each registered scan to
|
|
increase the variability of the model. This allows us to calculate a model with
|
|
more than 175 neutral coefficients.
|
|
|
|
\section{Fitting}
|
|
The fitting algorithm used in this paper is a variant of the nonrigid ICP work
|
|
in~\cite{amberg07:nicp}. The main difference, is that the deformation model is
|
|
a statistical model and the optimisation in each step is an iterative method,
|
|
which finds the minimum of a convex function. Additionally, as it is applied on
|
|
noisy data (see Figure~\ref{fig:difficult}), we included a more elaborate robust weighting term. Like other
|
|
ICP methods, it is a local optimization method, which does not guarantee
|
|
convergence to the global mimimum, but is dependent on the initialization. It
|
|
consists of the following steps
|
|
\begin{itemize}
|
|
\item Iterate over regularization values $\theta_1>\dots>\theta_N$:
|
|
\begin{itemize}
|
|
\item Repeat until convergence:
|
|
\begin{enumerate}
|
|
\item Find candidate correspondences by searching for the closest compatible
|
|
point for each model vertex.
|
|
\item Weight the correspondences by their distance using a robust estimator.
|
|
\item Fit the 3DMM to these correspondences using a
|
|
regularization strength of $\theta_i$\label{step_fit}.
|
|
\item Continue with the lower $\theta_{i+1}$ if the median change in vertex
|
|
position is smaller than a threshold.
|
|
\end{enumerate}
|
|
\end{itemize}
|
|
\end{itemize}
|
|
\begin{figure}
|
|
\vspace{-1.0em}
|
|
\begin{tabular}{@{ }c@{ }c@{ }c@{ }c@{}}
|
|
\includegraphics[height=0.42\linewidth]{56_4_tgt}&
|
|
\includegraphics[height=0.42\linewidth]{23_2_tgt}&
|
|
\includegraphics[height=0.42\linewidth]{5_6_tgt}\\[-1.0em]
|
|
& \smaller a) Targets & \\[0.2em]
|
|
\includegraphics[height=0.42\linewidth]{56_4_expression}&
|
|
\includegraphics[height=0.42\linewidth]{23_2_expression}&
|
|
\includegraphics[height=0.42\linewidth]{5_6_expression}\\[-0.8em]
|
|
& \smaller b) Fits &
|
|
\end{tabular}
|
|
\vspace{0.2em}
|
|
\caption{The reconstruction (b) is robust against scans (a) with artifacts, noise, and holes.}
|
|
\label{fig:difficult}
|
|
\end{figure}
|
|
The search for the closest compatible point takes only points into account which
|
|
have conforming normals, are closer than a threshold, and are not on or close
|
|
to the border of the scan. This has the effect of removing many outliers. The
|
|
search is sped up by organizing the target scan in a space partitioning tree
|
|
made up of spheres.
|
|
The correspondences are then weighted with a robust function by their
|
|
residual distance. The robust function is linear for distances smaller than
|
|
$2$mm, behaves like $1/x$ between $2$mm and
|
|
$20$mm, and is zero for a distance larger than $20$mm.
|
|
Note, that it is necessary to balance robustness and regularization, as the
|
|
right balance depends on the noise characteristic of the data. Suitable values
|
|
were determined manually from a few scans of the GavabDB database and kept
|
|
constant for all experiments as well on the GavaDB as on the UND database. In
|
|
step~\ref{step_fit} the 3DMM is fit to 3D-3D point correspondences. This is
|
|
done with a gauss-newton least squares optimization, using an analytic Jacobian
|
|
and Gauss-Newton Hessian approximation. Denote the correspondence points by
|
|
$\MAT u=\Matrix{\VEC u_1, \dots, \VEC u_n}$ and the rows of the model which
|
|
correspond to the $i$th vertex by subscript $i$, then we can write the cost
|
|
function mimized in this step as
|
|
\begin{align}
|
|
f(\MAT R, \VEC t, \VEC\alpha) &= \sum_i \normLR{\MAT R( \VEC\mu_i + \MAT M_i\VEC\alpha) + \VEC t - \VEC u_i}^2 + \lambda\normLR{\VEC\alpha}^2\qquad.\label{eqn:mincost}
|
|
\end{align}
|
|
%We make the norm dependent on the target normal by using an orthonormal
|
|
%covariance matrix $\MAT C_i$ per vertex, which makes the cost of deviation
|
|
%along the normal higher than deviations inside the target surface.
|
|
%\begin{align}
|
|
% \MAT C_i &= \Matrix{ \VEC n_i^T\\ \nu \VEC a_i^T\\\nu\VEC b_i^T} & \VEC n_i &\bot \VEC a_i \bot \VEC b_i \bot \VEC n_i
|
|
%\end{align}
|
|
%where $\VEC n_i$ is the normal of the target correspondence and $\nu$ is an
|
|
%anisotropy parameter. If we do not use the anisotropic distance measure (i.e.
|
|
%$\MAT C_i=\MAT I$), then the cost function Equation~\ref{eqn:mincost} can be
|
|
%minimized more efficiently by changing it to
|
|
This can be minimized more efficiently by changing the direction of the rigid transform to
|
|
\begin{align}
|
|
f(\MAT R, \VEC t, \VEC\alpha) &= \sum_i \normLR{ \VEC\mu_i + \MAT M_i\VEC\alpha + {\VEC t'} - {\MAT R'}\VEC u_i }^2 + \lambda\normLR{\VEC\alpha}^2\nonumber\\
|
|
{\VEC t'} &= \MAT R^{-1}\VEC t\qquad {\MAT R'} = \MAT R^{-1}\qquad.
|
|
\end{align}
|
|
because then the Jacobian consists of a large constant part and three columns
|
|
which depend on the iteration.
|
|
\begin{align}
|
|
F_i &= \VEC\mu_i + \MAT M_i\VEC\alpha + {\VEC t'} - {\MAT R'_{r_1,r_2,r_3}}\VEC u_i\\
|
|
\diffp{F_i}{\VEC\alpha} &= \MAT M_i\qquad
|
|
\diffp{F_i}{\VEC t'} = \MAT I_3\qquad
|
|
\diffp{F_i}{r_i} = \diffp{\MAT R'_{r_1,r_2,r_3}}{r_i}\VEC u_i\\
|
|
\MAT J &= \Matrix{\MAT J_c & | & \MAT J_d }\\
|
|
\MAT J_c &= \Matrix{\MAT M & \VEC 1 \otimes \MAT I_3\\ \MAT I & \MAT 0}\\
|
|
\MAT J_d &= \Matrix{(\MAT I \otimes \diffp{\MAT R'}{r_1})\MAT u^T & (\MAT I \otimes \diffp{\MAT R'}{r_2})\MAT u^T& (\MAT I \otimes \diffp{\MAT R'}{r_3})\MAT u^T\\\MAT 0 & \MAT 0 & \MAT 0}
|
|
\end{align}
|
|
Accordingly, the Hessian can be approximated as
|
|
\begin{align}
|
|
\MAT H &= \Matrix{
|
|
\MAT J_c^T\MAT J_c & (\MAT J_c^T\MAT J_d)^T\\
|
|
\MAT J_c^T\MAT J_d & \MAT J_d^T\MAT J_d
|
|
}\qquad.
|
|
\end{align}
|
|
By precalculating the constant parts of the matrices we can remove most of the
|
|
computation time, making step~\ref{step_fit} very fast.
|
|
|
|
We initialize the registration by locating the tip of the nose with a
|
|
heuristic, which assumes that the head is upright and looking into the camera.
|
|
This initialization is good enough to for a fully automatic
|
|
fit, as the fitting behaves like rigid ICP in the beginning, and rigid ICP is
|
|
known to have a large basin of convergence.
|
|
|
|
\section{Experiments}
|
|
\begin{figure*}
|
|
\begin{tabular}{cc}
|
|
\scalebox{0.82}{\input{shrec_MNCG}} &
|
|
\scalebox{0.82}{\input{und_MNCG}}
|
|
\end{tabular}
|
|
\caption{For the expression dataset the retrieval rate is improved by
|
|
including the expression model, while for the neutral expression dataset the
|
|
performance does not decrease. Plotted is the mean normalized cumulative
|
|
gain, which is the number of retrieved correct answers divided by the number
|
|
of possible correct answers. Note also the different scales of the MNCG
|
|
curves for the two datasets. Our approach has a high accuracy on the
|
|
neutral (UND) dataset.}
|
|
\label{fig:mcg}
|
|
\end{figure*}
|
|
\begin{figure*}
|
|
\begin{tabular}{cc}
|
|
\scalebox{0.82}{\input{shrec_PR}} &
|
|
\scalebox{0.82}{\input{und_PR}}
|
|
\end{tabular}
|
|
\caption{Use of the expression model improves retrieval performance.
|
|
Plotted are precision and recall for different retrieval depths. The lower
|
|
precision of the UND database is due to the fact that some queries have no
|
|
correct answers. For the UND database we achieve total recall when querying
|
|
nine answers, while the maximal number of scans per individual is eight,
|
|
while for the GavabDB database the expression model gives a strong
|
|
improvement in recall rate but full recall can not be achieved.}
|
|
\label{fig:precision_expression}
|
|
\end{figure*}
|
|
|
|
\begin{figure*}
|
|
\begin{tabular}{cc}
|
|
\scalebox{0.82}{\input{shrec_FARFRR}} &
|
|
\scalebox{0.82}{\input{und_FARFRR}}
|
|
\end{tabular}
|
|
\caption{Impostor detection is reliable, as the minimum distance to a match
|
|
is smaller than the minimum distance to a nonmatch. Note the vast increase in
|
|
recognition performance with the expression model on the expression database,
|
|
and the fact that the recognition rate is not decreasing on the neutral
|
|
database, even though we added expression invariance. We can operate at $0$\%
|
|
false acceptance rate with less than $4$\% false rejection rate, or less than
|
|
$1$\%\ FAR with less than $1$\%\ FRR.}
|
|
\label{fig:impostor}
|
|
\end{figure*}
|
|
We evaluated the system on two databases with and without
|
|
the expression model. We used the GavabDB~\cite{gavabdb} database and the
|
|
UND~\cite{bowyer05:2d3d_recognition} database. For both databases, only the shape information was
|
|
used. The GavabDB database contains 427 scans, with seven scans per ID, three
|
|
neutral and four expressions. The expressions in this dataset varies
|
|
considerably, including sticking out the tongue and strong facial distortions.
|
|
Additionally it has strong artifacts due to facial hair, motion and the bad
|
|
scanner quality. This dataset is typical for a non-cooperative environment.
|
|
The UND database was used in the face recognition grand challenge~\cite{frvt06} and consists
|
|
of 953 scans, with one to eight scans per ID. It is of better quality and
|
|
contains only slight expression variations. It represents a cooperative
|
|
scenario.
|
|
|
|
The fitting was initialized by detecting the nose, and assuming that the face is
|
|
upright and looking along the $z$-axis. To detect the nose we
|
|
first removed the spike artifacts typical of range scanners by repeated
|
|
min-filtering and removal of large triangles, then we detect the vertex with
|
|
the smallest depth, which in its horizontal slice is sufficiently closer to the
|
|
camera than the other pixels in that slice. For the UND dataset this gives us
|
|
reliably a point on the tip or ridge of the nose. The heuristic worked for 939
|
|
out of 953 Scans, in the remaining 16 scans we marked the nose manually. The
|
|
GavabDB database has the scans already aligned and the tip of the nose is at
|
|
the origin. We used this information for the GavabDB experiments. The same
|
|
regularisation parameters were used for all experiments, even though the
|
|
GavabDB data is more noisy than the UND data. The parameters were set manually
|
|
based on a few scans from the GavabDB Database. We used 100 principal identity
|
|
components and 30 expression components for all experiments.
|
|
|
|
In the experiments the distances between all scans were calculated, and we
|
|
measured recognition and retrieval rates by treating every scan once as the
|
|
probe and all other scans as the gallery. Both databases were used
|
|
independently.
|
|
|
|
\subsection{Retrieval Measures}
|
|
We measure similarity between faces in parameter space as the angle between the
|
|
face parameters in Mahalanobis space, which has proven to have high recognition
|
|
rates~\cite{blanz03:face_rec}. The distance measure is
|
|
\begin{align}
|
|
s(\VEC\alpha_1, \VEC\alpha_2) &= \arccos\left(\frac{\VEC\alpha_1^T\VEC\alpha_2}{\norm{\VEC\alpha_1}\norm{\VEC\alpha_2}}\right)\qquad.
|
|
\end{align}
|
|
We observed that the angular measure gives slightly larger recognition rates
|
|
than the Mahalanobis distance. The Mahalanobis angle has the effect of
|
|
regarding all caricatures of a face, which lie on a ray from the origin towards
|
|
any identity, as the same identity. We also evaluated other measures, but found
|
|
them to be consistently worse than the Mahalanobis angle.
|
|
|
|
\subsection{Results}
|
|
As expected, the two datasets behave differently because of the presence of
|
|
expressions in the examples.
|
|
|
|
\subsubsection{UND}
|
|
For the UND database we have good recognition rates with the neutral
|
|
model. The mean cumulative normalized gain curve in
|
|
Figure~\ref{fig:mcg} shows for varying retrieval depth the number of
|
|
correctly retrieved scans divided by the maximal number of scans that could be
|
|
retrieved at this level. From this it can be seen that the first match is
|
|
always the correct match, if there is any match in the database. But for some
|
|
probes no example is in the gallery. Therefore for face recognition we have to
|
|
threshold the maximum allowed distance to be able to reject impostors. Varying
|
|
the distance threshold leads to varying false acceptance rates (FAR) and false
|
|
rejection rates (FRR), which are shown in Figure~\ref{fig:impostor}. Even
|
|
though we have been tuning the model to the GavabDB dataset and not the UND
|
|
dataset our recognition rates at any FAR rate are as good or better than the
|
|
best results from the face recognition vendor test. This shows, that our basic
|
|
face recognition method without expression modelling gives convincing results.
|
|
Now we analyze how the expression modelling impacts recognition results on this
|
|
expression-less database. If face and expression space are not orthogonal, then
|
|
adding invariance towards expressions should make the recognition rates
|
|
decrease. In fact, we observe that the recognition results are slightly lower,
|
|
but only by a marginal amount, and still on par with the results from the face
|
|
recognition vendor test. Let us now turn towards the expression database, where
|
|
we expect to see an increase in recognition rate due to the expression model.
|
|
|
|
\subsubsection{GavabDB}
|
|
The recognition rates on the GavabDB without expression model are not quite as
|
|
good as for the expression-less UND dataset, so here we hope to find some
|
|
improvement by using expression normalization. And indeed, the closest point
|
|
recognition rate with only the neutral model is 96.25\% which can be improved
|
|
to 98.36\% by adding the expression model. Also the FAR/FRR values decrease
|
|
considerably. The largest improvement can be seen in retrieval performance,
|
|
displayed in the precision recall curves in
|
|
Figure~\ref{fig:precision_expression} and mean cumulative normalized gain
|
|
curves in Figure~\ref{fig:mcg}. This is because there are multiple examples in
|
|
the gallery, so finding a single match is relatively easy. But retrieving all
|
|
examples from the database, even those with strong expressions, is only made
|
|
possible by the expression model.
|
|
|
|
%\emph{TODO: Try also $k$-NN, that should give 100\% recognition rate on the
|
|
%GavabDB too.}
|
|
|
|
\section{Speed}
|
|
Though the method as presented operates at only approximately 40 seconds per
|
|
query, it has the potential for speedup. It is possible to parallelize the
|
|
closest point estimation and the optimisation, and more elaborate fitting
|
|
algorithms including multiresolution schemes can be developed. The speed also
|
|
depends on the number of vertices and components, for the results presented
|
|
here 11000 vertices and 100 neutral plus 30 expression components were fitted.
|
|
|
|
\section{Conclusion}
|
|
We have used a 3D Morphable Model with a separating expression model to develop
|
|
an expression-invariant face recognition algorithm. We have shown, that the
|
|
system has excellent recognition rates on difficult expression data and data
|
|
taken in a cooperative environment. The introduction of expression invariance
|
|
did not incur a significant loss of precision on easier neutral data. The strong prior
|
|
knowledge of the 3DMM allows robust handling of noisy data and allowed us to
|
|
build a fully automatic face recognition system. We also introduced a relatively
|
|
efficient fitting algorithm, which, as it has the potential for
|
|
paralellisation, could be made even faster.
|
|
|
|
Note that, as we do establish correspondence between the model and the scans,
|
|
it is trivial to add image based classification for datasets where a calibrated
|
|
photo is available. This can be done by comparing the rectified textures,
|
|
which should result in even higher recognition rates. It is also important to
|
|
note that the expression normalization described here for range data can be
|
|
applied equally well to other modalities, using any of the proposed 3DMM
|
|
fitting algorithms.
|
|
|
|
In the future we plan to include the additional texture cues and make the
|
|
method faster, such that it is applicable in real world scenarios where a
|
|
processing time of 40 seconds per probe is still a problem. Furthermore we
|
|
would like to investigate more sophisticated fitting algorithms and a morphable
|
|
model with a larger expression space.
|
|
|
|
%\section*{Acknowledgement}
|
|
%The authors wish to thank P.\ Paysan for the data
|
|
%acquisition. This work was supported in part by a grant from Microsoft
|
|
%Research and the Swiss National Science Foundation (200021-103814 and NCCR COME project 5005-66380).
|
|
|
|
{\small
|
|
\bibliographystyle{ieee}
|
|
%%use following if all content of bibtex file should be shown
|
|
%\nocite{*}
|
|
\bibliography{shrec_08}
|
|
}
|
|
\end{document}
|
|
|
|
|