Timezone: »

Synthetic Realities: Deep Learning for Detecting AudioVisual Fakes
Battista Biggio · Pavel Korshunov · Thomas Mensink · Giorgio Patrini · Arka Sadhu · Delip Rao

Sat Jun 15 08:30 AM -- 06:00 PM (PDT) @ 104 C
Event URL: https://sites.google.com/view/audiovisualfakes-icml2019/ »

With the latest advances of deep generative models, synthesis of images and videos as well as of human voices have achieved impressive realism. In many domains, synthetic media are already difficult to distinguish from real by the human eye and ear. The potential of misuses of these technologies is seldom discussed in academic papers; instead, vocal concerns are rising from media and security organizations as well as from governments. Researchers are starting to experiment on new ways to integrate deep learning with traditional media forensics and security techniques as part of a technological solution. This workshop will bring together experts from the communities of machine learning, computer security and forensics in an attempt to highlight recent work and discuss future effort to address these challenges. Our agenda will alternate contributed papers with invited speakers. The latter will emphasize connections among the interested scientific communities and the standpoint of institutions and media organizations.

Sat 9:00 a.m. - 9:10 a.m.
Welcome Remarks [ Video
Giorgio Patrini
Sat 9:10 a.m. - 9:40 a.m.
Invited Talk by Professor Alexei Efros (UC Berkeley) (Invited Talk) [ Video
Alexei Efros
Sat 9:40 a.m. - 10:10 a.m.
Invited Talk by Dr. Matt Turek (DARPA) (Invited Talk)
Matt Turek
Sat 10:10 a.m. - 10:30 a.m.
[ Video

Deepfake detection is formulated as a hypothesis testing problem to classify an image as genuine or GAN-generated. A robust statistics view of GANs is considered to bound the error probability for various GAN implementations in terms of their performance. The bounds are further simplified using a Euclidean approximation for the low error regime. Lastly, relationships between error probability and epidemic thresholds for spreading processes in networks are established.

Sat 10:30 a.m. - 11:30 a.m.
Poster session 1 and Coffee break (Poster Session)
Sat 11:30 a.m. - 12:00 p.m.
Invited Talk by Professor Pawel Korus (NYU) Neural Imaging Pipelines - the Scourge or Hope of Forensics? (Invited Talk)
Sat 12:00 p.m. - 12:15 p.m.
[ Video

Manipulating video content is easier than ever. Due to the misuse potential of manipulated content, multiple detection techniques that analyze the pixel data from the videos have been proposed. However, clever manipulators should also carefully forge the metadata and auxiliary header information, which is harder to do for videos than images. In this paper, we propose to identify forged videos by analyzing their multimedia stream descriptors with simple binary classifiers, completely avoiding the pixel space. Using well-known datasets, our results show that this scalable approach can achieve a high manipulation detection score if the manipulators have not done a careful data sanitization of the multimedia stream descriptors.

David Güera
Sat 12:15 p.m. - 12:30 p.m.
[ Video

From TV news to Google StreetView, face obscu- ration has been used for privacy protection. Due to recent advances in the field of deep learning, ob- scuration methods such as Gaussian blurring and pixelation are not guaranteed to conceal identity. In this paper, we propose a utility-preserving gen- erative model, UP-GAN, that is able to provide an effective face obscuration, while preserving facial utility. By utility-preserving we mean pre- serving facial features that do not reveal identity, such as age, gender, skin tone, pose, and expres- sion. We show that the proposed method achieves a better performance than the common obscura- tion methods in terms of obscuration and utility preservation.

Hanxiang Hao
Sat 12:30 p.m. - 2:00 p.m.
Lunch Break (Break)
Sat 2:00 p.m. - 2:30 p.m.
Invited Talk by Professor Luisa Verdoliva (University Federico II Naples) (Invited Talk)
Luisa Verdoliva
Sat 2:30 p.m. - 2:45 p.m.

The recent increase in social media based pro- paganda, i.e., ‘fake news’, calls for automated methods to detect tampered content. In this paper, we focus on detecting tampering in a video with a person speaking to a camera. This form of ma- nipulation is easy to perform, since one can just replace a part of the audio, dramatically chang- ing the meaning of the video. We consider sev- eral detection approaches based on phonetic fea- tures and recurrent networks. We demonstrate that by replacing standard MFCC features with embeddings from a DNN trained for automatic speech recognition, combined with mouth land- marks (visual features), we can achieve a signif- icant performance improvement on several chal- lenging publicly available databases of speakers (VidTIMIT, AMI, and GRID), for which we gen- erated sets of tampered data. The evaluations demonstrate a relative equal error rate reduction of 55% (to 4.5% from 10.0%) on the large GRID corpus based dataset and a satisfying generaliza- tion of the model on other datasets.

Pavel Korshunov
Sat 2:45 p.m. - 3:00 p.m.
[ Video

This paper evaluates the effectiveness of a Cycle- GAN based voice converter (VC) on four speaker identification (SID) systems and an automated speech recognition (ASR) system for various pur- poses. Audio samples converted by the VC model are classified by the SID systems as the intended target at up to 46% top-1 accuracy among more than 250 speakers. This encouraging result in imitating the target styles led us to investigate if converted (synthetic) samples can be used to improve ASR training. Unfortunately, adding syn- thetic data to the ASR training set only marginally improves word and character error rates. Our re- sults indicate that even though VC models can successfully mimic the style of target speakers as measured by SID systems, improving ASR train- ing with synthetic data from VC systems needs further research to establish its efficacy.

Gokce Keskin
Sat 3:00 p.m. - 3:45 p.m.
Poster session 2 and Coffee break (Poster Session)
Sat 3:45 p.m. - 4:15 p.m.
CVPR19 Media Forensics workshop: a Preview (Invited Talk) [ Video
Sat 4:15 p.m. - 4:45 p.m.
Invited Talk by Tom Van de Weghe (Stanford & VRT) (Invited Talk) [ Video
Sat 4:45 p.m. - 5:00 p.m.
Invited Talk by Aviv Ovadya (Thoughtful Technology Project) (Invited Talk)
Sat 5:00 p.m. - 6:00 p.m.
Panel Discussion moderated by Delip Rao (Discussion Panel) [ Video

Author Information

Battista Biggio (University of Cagliari)
Pavel Korshunov (IDIAP)
Thomas Mensink (University of Amsterdam)
Giorgio Patrini (Deeptrace)
Arka Sadhu (University of Southern California)
Delip Rao (AI Foundation)