Biometric characteristics can be utilized in order to enable reliable and robust-to-impostor-attacks person recognition. Speaker recognition technology is commonly utilized in various systems enabling natural human computer interaction. The majority of the speaker recognition systems rely only on acoustic information, ignoring the visual modality. However, visual information conveys correlated and complimentary information to the audio information and its integration into a recognition system can potentially increase the system's performance, especially in the presence of adverse acoustic conditions. Acoustic and visual biometric signals, such as the person's voice and face, can be obtained using unobtrusive and user-friendly procedures and low-cost sensors. Developing unobtrusive biometric systems makes biometric technology more socially acceptable and accelerates its integration into every day life. In this paper, we describe the main components of audio-visual biometric systems, review existing systems and their performance, and discuss future research and development directions in this area