Technology Untangled: Video editing technology can put words in your mouth

Listen to this story

0:00

Loading audio file, please wait.

0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00

technology-bour Sometimes in my technical work I am asked to analyze a video to see if it has been edited or altered. The argument typically goes along the lines that something has been added to a recording to make things appear worse than they really were, that words were added and essentially put in a person’s mouth. For audio alone, it may be technically feasible for some skilled people to create such a recording, but for video, that is a much more difficult technical challenge. Putting words in someone’s mouth involves matching up the movements of the lips and more.

Now of course, Hollywood has been doing this for years. We have all seen more than a few movies with talking dogs and other animals. However, the technical tools and computing power needed for an average person to do this seemed out of reach.

I recently ran across several news articles that caused me to rethink that. The information I saw points to a future where altering a video to put words in a person’s mouth will be no more difficult than altering a picture with Photoshop is today. While some of the examples I viewed were a little rough around the edges, the technical advancement and the progress toward continued improvement is clear.

The first article was about a technology called Face2Face. Video scientists are working on development of a system using consumer-grade products that can, for example, alter a YouTube video in real time. It uses a standard webcam to take a video of a live subject’s face and transfer those lip and facial movements to an existing video. They call this technique real-time face capture and re-enactment. One practical application they envision is using a real-time foreign language translator for video conferences.

This technology involves more than simply replacing the moving lips with an animated version. Work is also done on the interior of the mouth and the overall facial expression to make things appear more realistic. This software also tracks a subject’s head to account for normal movement during the video clip. This includes head rotation and movement of the eyebrows and facial lines. A perfectly static “talking head” view is not required. This could be thought of as a form of real-time motion video Photoshopping.

The second article I saw discussed a method being researched by a team of computer scientists at the University of Washington. SIGGRAPH 2017 uses a process to transform input audio from any clip into a time-varying computerized mouth shape. Photo-realistic features are then added to that mouth. The process then smoothly blends the synthesized mouth onto a face in the target video, producing a surprisingly natural effect. Their YouTube video discusses the process in more detail by using various example videos of former President Barack Obama. One segment shows four different Obama target videos all simultaneously speaking the same audio clip. Talk about putting words in a politician’s mouth!

This second example is a little farther away from a consumer-grade product, but the vision toward the future is obvious. It won’t be too long before video editing/alteration that puts words in peoples’ mouths will be as skillfully performed on the average desktop computer as today’s best Photoshop hoaxes.

Photoshop capabilities, and the skills of those who work with it, have come a long way in the last 20 years. And most people, including jurors, are well aware that almost any photo can be convincingly faked by a good Photoshop artist. At present, video is of course a much more difficult medium to fake, but as the tools like those discussed here today become more refined and more available, video as evidence will become less believable. And just as a healthy skepticism has been developed by people toward the believability of any particular photograph they see, that same skepticism concerning video will expand.

Video alteration tools will increase the potential for creating fake news and stirring up mischief, especially in the areas of politics and entertainment. I hope that our ability to not immediately believe everything we see will correspondingly grow along with it.

While there may be a lot more fake video to deal with in the future, I don’t think that this will present an insurmountable problem for lawyers. Just as there are forensic techniques used to validate the authenticity of a digital photograph, forensic video analysts will use similar validation techniques to get to the truth of important matters involving legal evidence, trials and jury matters.•

__________

Stephen Bour ([email protected]) is an engineer and legal technology consultant in Indianapolis. His company, the Alliance for Litigation Support Inc., includes Bour Technical Services and Alliance Court Reporting. Areas of service include legal videography, tape analysis, document scanning to CD and courtroom presentation support. The opinions expressed in this column are those of the author.

Please enable JavaScript to view this content.

Related Stories

Subscriber Benefit