Deep

fake

Lab

Unraveling the mystery around deepfakes.

Discover how deepfakes work and the visual clues you can use to identify them. We are a group of communication designers that have created this project to demonstrate our research into making our own deep fake, and to communicate the signs you can spot to identify them.

01. Tesla Baby

Let’s have a look at a popular example from the Internet.

In this video, Elon Musk’s face has been overlay on a baby. This type of face swap is the most common use of deepfakes. Look closely: the edges of the face are not sharp and the skin colour is different.

Technical details

Visual flaws

A deepfake is created by a computer program that can teach itself how to recreate a face. By adjusting parameters in its system, the program becomes better in recreating a specific person’s face, this is a type of deep learning. The programme then overlays the face it has recreated onto an existing video – like a digital mask. You can see traces of such a mask in this video.

Target Video

Deepfaked Video

Deepfake target video: YouTube | AndrewSchrock | Cutest Baby Montage Ever.

Deepfake video source: YouTube | TheFakening | Baby Elon Musk Montage Deepfake

02. DIY

You can also try this at home. We’ll show you how.

Deepfake videos can be made with home computers, but you need quite a powerful graphics card. This video shows our first trial, which shows why it’s important to use suitable source videos.

Shia LaBeouf

Pilar

Target video source: YouTube | MotivaShian | Shia LaBeouf "Just Do It" Motivational Speech

Technical details

Visual flaws

Skin colour mismatch: there’s a difference in skin tone between the mask and target face. The face seems to be covered by a layer of different colours, showing edges or spots.

Mismatch expressions: the expressions on the deepfake face do not match the target face. Facial features do not behave naturally and are blurry, replicated or even invisible.

Visible edges: the edges of the mask are visible, either as a sharp or blurred edge surrounding the face.

The images used to train the algorithm did not contain the right facial expressions to cover Shia’s face in the video, nor did they contain footage of his face in profile. If the neural network is not trained for these situations, it cannot produce an accurate digital mask. Notice how Shia's mouth sometimes appears from underneath the mask, resulting in two mouths.

Algorithm

Dataset size

Iteration amount

Output resolution

Training time

H64

Project reach

200

/2000 images

106000

/268000 times

64

/128 pixels

31

/63 hours

03. Process

So how do you make a deepfake?

You need two videos: a source and a target. The program will train itself using both, and create a mask from the source video that can be overlaid onto the target video using editing software.

Algorithm

Dataset size

Iteration amount

Output resolution

Training time

SAEHD

Project reach

750

/2000 images

200000

/268000 times

128

/128 pixels

48

/63 hours

Target video source: The Devil Wears Prada | Andy's Interview

Original

Dataset

Mask

Alignment

Deepfake

Post

Select a target video you want to insert a face onto. Choosing a steady video with a consistent background will give you a better result.

04. Training Data

What happens if we provide the programme with more content? Will it improve?

In this experiment, two programmes were given a different number of images. More source material clearly improves the result. The model had more facial information and could develop a better mask.

Technical details

This experiment was done with the same source video exported at two different frame rates – both models were trained with the exact same studio setup. The amount of training cycles per image is equal, but the training time was longer with the bigger dataset. You can clearly see that the algorithm trained with more images can produce a more refined result that better matches the target.

Algorithm

Dataset size

Iteration amount

Output resolution

Training time

SAEHD

Project reach

200

/2000 images

200000

/20000

/268000 times

128

/128 pixels

8

/63 hours

Benedict Cumberbatch

Arthur

Original target video: Sherlock | The Reichenbach Fall | Rooftop Showdown

05. Social fraud

How susceptible are you? Can we steal your social media content and create a good deepfake?

We took all the Facebook images from one of our team members and created a deepfake. In almost all of the source images she was smiling, so the algorithm could not generate a non-smiling mask.

Natalie Portman

Pilar

Original target video: Star Wars: Episode I – The Phantom Menace | Padmé meets Anakin

Technical details

Visual flaws

Blurred face: the mask is blurred. There is a difference in sharpness or resolution between the mask and the rest of the video.

Mismatch expressions: the expressions on the deepfake face do not match the target face. Facial features do not behave naturally and are blurry, replicated or even invisible.

Profile borders: the side view of the face seems wrong. The deepfake mask is broken, less detailed or incorrectly aligned.

A video contains many more facial nuances than the images we took from Facebook. The photos of our team member on social media are self-selected, and therefore missing the kind of images required to create realistic facial expressions. Although better technologies might be able to fabricate expressions, without diverse source material it’s impossible to create something convincing.

Algorithm

Dataset size

Iteration amount

Output resolution

Training time

SAEHD

Project reach

165

/2000 images

215000

/268000 times

128

/128 pixels

44

/63 hours

06. Target Choice

We’ve seen how the source is important in training the algorithm. What about the target video?

Even with a good source, it can be hard to create a deepfake. Indiana Jones contains chaotic shots. Compared to the cleaner videos we used before, the algorithm now has difficulty keeping up.

Technical details

Visual flaws

Blurred face: the mask is blurred. There is a difference in sharpness or resolution between the mask and the rest of the video.

Flicker effect: there’s a flicker between the original and deepfake faces. The algorithm can’t recognise the face and stops creating the mask for a moment.

Wrong perspective: the deepfake has a different perspective from the rest of the video or the source and target video differ in focal length.

The deepfake was exported with a resolution of 64 px. The lower resolution means it took less time to train the algorithm, because the model only had to learn how to create a low-resolution image. In close up face shots, the low resolution is evident.

Algorithm

Dataset size

Iteration amount

Output resolution

Training time

SAEHD

Project reach

1400

/2000 images

100000

/268000 times

64

/128 pixels

8

/63 hours

Harrison Ford

Andrej

Original target video: Indiana Jones and the Temple of Doom | Rope Bridge Fight

07. Don't Blink

Sometimes the two just melt together. In this video the target even imitates his new face.

This deepfake video was made from a talk show segment where Bill Hader impersonates Arnold Schwarzenegger. By using suitable source material for Arnold Schwarzenegger, the results were convincing.

Bill Hader

Arnold Schwarzenegger

Original target video: Bill Hader Presents: Schwarzenegger Baby

Source deepfaked video: YouTube | Ctrl Shift Face | Bill Hader impersonates Arnold Schwarzenegger [DeepFake]

Technical details

Visual flaws

The face blending, skin tone and resolution are very good. The distant shot makes it difficult to see any blur. The post-production was expertly done. The only clue is when Bill Hader moves his finger in front of his face and it disappears behind the mask. The difference in sharpness suggest that the creator has tried to hide the effect in post-production.

08. Time Matters

What happens if we let the algorithm practice more on the source content. Will the results improve?

For this experiment, one model was trained for four hours and the other for 48. The results of the 48-hour model showed improved facial detail and a more three-dimensional face.

Technical details

Training time is related to the number of times the algorithm processes the images. The process involves creating a face (or digital mask), comparing it with the source image, and then making adjustments to improve the likeness of the mask to the source. The model goes through this cycle once for all source images, and then starts again. The time it takes depends on the power of the computer used.

Algorithm

Dataset size

Iteration amount

Output resolution

Training time

SAEHD

Project reach

400

/400

/2000 images

20000

/268000 times

128

/128 pixels

4

/48

/63 hours

Constance Wu

Yueling

Original target video: YouTube | The Late Show with Stephen Colbert | Constance Wu Explains What "Couture" Means

09. Algorithms

One last experiment. How do different algorithms respond to certain conditions?

For this experiment, we created both the source and the target video ourselves. The algorithm’s methods are clearly visible. H128 creates a square mask whilst SAEHD matches the face better.

Arthur

Andrej

Technical details

H128 is the lighter model of the two. It achieves quality results quicker. The more precise mask of SAEHD is better at dealing with the hand and blending with the lighting. H128 seems to be better trained to make the face: the mask is sharper, more stable and performs better with movement and perspective changes. However, experts say that with more training time, SAEHD will outperform H128.

Algorithm

Dataset size

Iteration amount

Output resolution

Training time

SAEHD

H128

Project reach

500

/500

/2000 images

150000

/150000

/268000 times

128

/128 pixels

24

/40

/63 hours

10. Be Aware

Let's see what’s really happening with deepfake videos.

Be aware: deepfakes can be high quality and difficult to spot. Although we have focused on face swaps, deepfakes can also be used for facial reenactment – making it seem as if a person said something.

Technical details

Facial reenactment takes much more computing power but is much harder to recognise. Many of the challenges posed by source videos do not apply to reenactment, but the algorithm acts in a similar way. The recreated parts of the face will be slightly blurred and less detailed.

Also, pay attention to the audio and look for flaws or lip-sync problems. Using the insight you’ve learned on this website, question whether a video is likely to be a target and if the conditions are suitable for a possible deepfake. If in doubt, always check the source of the video.

Deepfake source video: YouTube | VFXChris Ume | Fake Freeman mouth manipulation.

Here’s a handy summary of the skills you acquired on this website, so that you can check videos yourself.

All the original deepfakes in this project were created with the open source software DeepFaceLab v10.1 by Iperov, under the GNU General Public License v3.0