Restarting the experience
Discover how deepfakes work and the visual clues you can use to identify them. We are a group of communication designers that have created this project to demonstrate our research into making our own deep fake, and to communicate the signs you can spot to identify them.
Technical details
Visual flaws
Skin colour mismatch: there’s a difference in skin tone between the mask and target face. The face seems to be covered by a layer of different colours, showing edges or spots.
A deepfake is created by a computer program that can teach itself how to recreate a face. By adjusting parameters in its system, the program becomes better in recreating a specific person’s face, this is a type of deep learning. The programme then overlays the face it has recreated onto an existing video – like a digital mask. You can see traces of such a mask in this video.
Target Video
Deepfaked Video
Deepfake target video: YouTube | AndrewSchrock | Cutest Baby Montage Ever.
Deepfake video source: YouTube | TheFakening | Baby Elon Musk Montage Deepfake
Shia LaBeouf
Pilar
Target video source: YouTube | MotivaShian | Shia LaBeouf "Just Do It" Motivational Speech
Technical details
Visual flaws
Skin colour mismatch: there’s a difference in skin tone between the mask and target face. The face seems to be covered by a layer of different colours, showing edges or spots.
Mismatch expressions: the expressions on the deepfake face do not match the target face. Facial features do not behave naturally and are blurry, replicated or even invisible.
Visible edges: the edges of the mask are visible, either as a sharp or blurred edge surrounding the face.
The images used to train the algorithm did not contain the right facial expressions to cover Shia’s face in the video, nor did they contain footage of his face in profile. If the neural network is not trained for these situations, it cannot produce an accurate digital mask. Notice how Shia's mouth sometimes appears from underneath the mask, resulting in two mouths.
Algorithm
Dataset size
Iteration amount
Output resolution
Training time
H64
Project reach
200
/2000 images
106000
/268000 times
64
/128 pixels
31
/63 hours
Algorithm
Dataset size
Iteration amount
Output resolution
Training time
SAEHD
Project reach
750
/2000 images
200000
/268000 times
128
/128 pixels
48
/63 hours
Target video source: The Devil Wears Prada
| Andy's Interview
Original
Dataset
Mask
Alignment
Deepfake
Post
Select a target video you want to insert a face onto. Choosing a steady video with a consistent background will give you a better result.
Record a dataset for the face you want to place (the source), matching the lighting and expressions as much as possible to the target video.
Cover the faces of other people in the target video, otherwise they will be picked up by the algorithm and confuse the training process.
The algorithm will crop the faces so it can use them for training, and save their position to accurately overlay the mask afterwards.
The algorithm generates a mask of the face from the source video, which you then need to align onto the target video.
Video editing software will allow you to blend in the mask better and refine the final result.
Technical details
This experiment was done with the same source video exported at two different frame rates – both models were trained with the exact same studio setup. The amount of training cycles per image is equal, but the training time was longer with the bigger dataset. You can clearly see that the algorithm trained with more images can produce a more refined result that better matches the target.
Algorithm
Dataset size
Iteration amount
Output resolution
Training time
SAEHD
Project reach
200
/2000 images
200000
/20000
/268000 times
128
/128 pixels
8
/63 hours
Benedict Cumberbatch
Arthur
Original target video: Sherlock | The Reichenbach Fall | Rooftop Showdown
Natalie Portman
Pilar
Original target video: Star Wars: Episode I – The Phantom Menace | Padmé meets Anakin
Technical details
Visual flaws
Blurred face: the mask is blurred. There is a difference in sharpness or resolution between the mask and the rest of the video.
Mismatch expressions: the expressions on the deepfake face do not match the target face. Facial features do not behave naturally and are blurry, replicated or even invisible.
Profile borders: the side view of the face seems wrong. The deepfake mask is broken, less detailed or incorrectly aligned.
A video contains many more facial nuances than the images we took from Facebook. The photos of our team member on social media are self-selected, and therefore missing the kind of images required to create realistic facial expressions. Although better technologies might be able to fabricate expressions, without diverse source material it’s impossible to create something convincing.
Algorithm
Dataset size
Iteration amount
Output resolution
Training time
SAEHD
Project reach
165
/2000 images
215000
/268000 times
128
/128 pixels
44
/63 hours
Technical details
Visual flaws
Blurred face: the mask is blurred. There is a difference in sharpness or resolution between the mask and the rest of the video.
Flicker effect: there’s a flicker between the original and deepfake faces. The algorithm can’t recognise the face and stops creating the mask for a moment.
Wrong perspective: the deepfake has a different perspective from the rest of the video or the source and target video differ in focal length.
The deepfake was exported with a resolution of 64 px. The lower resolution means it took less time to train the algorithm, because the model only had to learn how to create a low-resolution image. In close up face shots, the low resolution is evident.
Algorithm
Dataset size
Iteration amount
Output resolution
Training time
SAEHD
Project reach
1400
/2000 images
100000
/268000 times
64
/128 pixels
8
/63 hours
Harrison Ford
Andrej
Original target video: Indiana Jones and the Temple of Doom | Rope Bridge Fight
Bill Hader
Arnold Schwarzenegger
Original target video: Bill Hader Presents: Schwarzenegger Baby
Source deepfaked video: YouTube | Ctrl Shift Face | Bill Hader impersonates Arnold Schwarzenegger [DeepFake]
Technical details
Visual flaws
Face occlusion: when objects pass in front of the face, the mask distorts or covers the object.
The face blending, skin tone and resolution are very good. The distant shot makes it difficult to see any blur. The post-production was expertly done. The only clue is when Bill Hader moves his finger in front of his face and it disappears behind the mask. The difference in sharpness suggest that the creator has tried to hide the effect in post-production.
Technical details
Training time is related to the number of times the algorithm processes the images. The process involves creating a face (or digital mask), comparing it with the source image, and then making adjustments to improve the likeness of the mask to the source. The model goes through this cycle once for all source images, and then starts again. The time it takes depends on the power of the computer used.
Algorithm
Dataset size
Iteration amount
Output resolution
Training time
SAEHD
Project reach
400
/400
/2000 images
20000
/268000 times
128
/128 pixels
4
/48
/63 hours
Constance Wu
Yueling
Original target video: YouTube | The Late Show with Stephen Colbert | Constance Wu Explains What "Couture" Means
Arthur
Andrej
Technical details
H128 is the lighter model of the two. It achieves quality results quicker. The more precise mask of SAEHD is better at dealing with the hand and blending with the lighting. H128 seems to be better trained to make the face: the mask is sharper, more stable and performs better with movement and perspective changes. However, experts say that with more training time, SAEHD will outperform H128.
Algorithm
Dataset size
Iteration amount
Output resolution
Training time
SAEHD
H128
Project reach
500
/500
/2000 images
150000
/150000
/268000 times
128
/128 pixels
24
/40
/63 hours
Technical details
Facial reenactment takes much more computing power but is much harder to recognise. Many of the challenges posed by source videos do not apply to reenactment, but the algorithm acts in a similar way. The recreated parts of the face will be slightly blurred and less detailed.
Also, pay attention to the audio and look for flaws or lip-sync problems. Using the insight you’ve learned on this website, question whether a video is likely to be a target and if the conditions are suitable for a possible deepfake. If in doubt, always check the source of the video.
Deepfake source video: YouTube | VFXChris Ume | Fake Freeman mouth manipulation.