Can you spot a DeepFake?

Facebook’s Contest proves it’s tougher than you think!

Can you figure out which are real, and which are fake? Answer at the end of the article. [Image Credits: Facebook]

As a machine learning enthusiast and practitioner, news of the results of Facebook’s AI contest twigged my ears.

Note: Facebooks Public Announcement on the Results

Starting back in September 2019, Facebook AI challenged some of the best academic institutions to develop and algorithm that could identify is a video was generated by AI or if it was real.

Universities including Oxford and Berkley were tasked with a training dataset of over 4000 videos, and by November 2019, 115,000 videos being released and the competition was expanded onto Kaggle.

Ultimately, the most competitive model could identify (in-sample) about 85% of fakes but out of sample, this dropped considerably to 65%! Now this is for sure better than chance, but it’s not as great as we would have hoped.

The reasons why models work different in and out of sample are complicated, but come down to how well the machine learning model can generalise. If it recognises a certain image — a good model should also recognise the image if it is rotated. However, a model that cannot generalise will not be able to recognise unfamiliar samples.

Now remember, Facebook are clever.

Facebook had generated fake videos in a variety of ways so they could reflect the diversity in ways that deep fakes are currently made. Methods such as image enhancement, and additional augmentations and distractors, such as blur, frame-rate modification, and overlays.

They also took advice from the Universities on how to make the deep fakes even harder to identify. All in all, they made the problem difficult not in just one or two ways, but a wide variety of ways: enough ways that makes it difficult to hard-code every permutation.

Now in the testing phase, each participant would have to submit their code into a black box environment and from there, 10000 further videos would be passed through the contestants model to see how well it would perform.

Here is how it becomes tricky

Videos were then altered in ways outside the scope of the training data set by e.g. adding random images to each frame, and changing the frame rate and resolution. These are common methods to distort images and they were used increase the difficulty level. The results indicate that the models developed could not fully adapt to these new settings.


Methods that Competitors Used

Attention Dropping

Microsoft Research developed a “Weakly Supervised Data Augmentation Network (WS-DAN)” that explored the potential of data augmentation, whereby, each training image is first represented in terms of its objects discriminative parts, and then, augmented in ways that include attention cropping and attention dropping. This guides the learning procedure not to overfit as more discriminative features are being identified.

Secondly, the attention regions provide an accurate location of the object, which ensures our model to look at the object closer and further improve the performance.

See Better Before Looking Closer : [paper]

In relation to this problem, this allows the model to ‘see’ the pictures better and in more detail to discern discriminative face parts. These types of fine-grained visual classification seem to provide an edge.


Gaining an edge in their Custom Architecture

Architecturally as well, many of the participants had used pertained EfficientNet networks but some found in edge in the manner by which they combined predictions from an ensemble.

Ensemble methods are common in machine learning and the higher performers in this challenge showed that an ensemble approach is also useful for dealing with deepfakes.

Photo by Christian Gertenbach on Unsplash

Non-Learned Enhancements

Finally an interesting point: none of the top performers had used any investigative methods such as searching for noise fingerprints or other characteristics that derive from the image creation process. Given that none of the finalists had used any of these methods, it suggest they aren’t useful or just not widespread. Either way, there’s scope for research in this space


The results of the competition showed that deepfake videos are hard to identify because they require well generalised models. We’ve seen time and time again that machine learning models are often over-fitted to a certain problem so that if the input space into a model is altered (such as an image being rotated), then the model can no longer identify what the image is anymore.

That being said, Robustness methods are in growing demand to ensure that the model can work and a lot of work is being done in this space by the big players in the field. Work will progress here quickly but as the lockdown around the world continues and more people spend even more time on the internet, demand for this technology can only increase.

In the first video above, clips 1, 4, and 6 are original, unmodified videos. Clips 2, 3, and 5 are deepfakes created for the Deepfake Detection Challenge.


Thanks for reading again!! Let me know if you have any questions and I’ll be happy to help.

Keep up to date with my latest work here!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Powered by WordPress.com.

Up ↑

%d bloggers like this: