DeepFake detection and generation

to use the technique for good and to fight against a bad use

DeepFake

Deep learning methods have undergone significant development in recent years. In particular, Generative Adversarial Networks (GANs) are now able to generate natural images and create false identities in images or videos. We are particularly interested in the Deepfakes technique which synthesizes human faces, based on artificial intelligence. Originally created for editing purposes, the use of this technique now raises many ethical and safety issues. Indeed, the use of celebrities or politicians faces to create fake images/videos can lead to many society problems (e.g., malicious individuals are able to create fake videos to support potential "fake news"). The detection of fake videos is therefore an important subject, requiring the use of skills in the fields of multimedia security, communication and psychology.

In this research, we are analysing the images generated by DeepFake techniques to identify falsified videos. We consider the following research pathes :

Generation

Nevertheless, the DeepFake methods could be used for good in video editing, translation, etc. In recent years, voice interaction with computers has made considerable progress. Virtual agents offer a user-friendly man-machine interface while reducing maintenance costs. Speech-based interaction is already effective, as shown by Siri, Alexa or Google Assistant virtual assistants, however, their visual counterpart is still far behind. The level of user engagement for audiovisual interactions is much higher than for purely audio interactions. It is therefore desirable to be able to associate visual animations of a face with the generated audio.

The latest advances in the field of audio-controlled face video synthesis are remarkable. The proposed approaches are generalized across different people to synthesize videos of a target actor with any actor's voice of unknown source or even synthetic voices that can be generated using standard voice synthesis approaches. These solutions enable the generation of videos with a visual synchronization quality superior to photo-realistic audio and video reconstruction techniques. Taking into account the advantage of existing techniques, we want to work on the development of a Text to Speech to Video technology with a level of accuracy sufficient for commercial use.


Related projects