Slow-motion video is somewhat hard to achieve in most run-of-the-mill cameras on the market. Some flagship phones offer the feature, but it"s usually limited in either length, resolution, or frame rate, and it"s also hindered by the limited storage of mobile devices for such large files.
On the other hand, applying slow-motion effects to previously recorded videos will usually produce unpleasant results, with unnatural movements as a result of the software attempting to fill in between the frames of the original video. However, Nvidia, along with researchers from the University of Massachusetts and the University of California, has created a solution that could make it possible to turn any video into a slow-motion video without sacrificing the smoothness of the playback.
The technology, which will be presented at the this year"s edition of the Computer Vision and Pattern Referencing conference - taking place this week - relies on two convolutional neural networks (CNN) which work in conjunction to determine where objects are moving across frames and the position in which they will be in the frames in-between. VentureBeat describes how the two CNNs work together:
One convolutional neural network (CNN) estimates the optical flow — the pattern of motion of the objects, surfaces, and edges in the scene — both forward and backward in the timeline between the two input frames. It then predicts how the pixels will move from one frame to the next, generating what’s known as a flow field — a 2D vector of predicted motion — for each frame, which it fuses together to approximate a flow field for the intermediate frame.
A second CNN then interpolates the optical flow, refining the approximated flow field and predicting visibility maps in order to exclude pixels occluded by objects in the frame and subsequently reduce artifacts in and around objects in motion. Finally, the visibility map is applied to the two input images, and the intermediate optical flow field is used to warp (distort) them in such a way that one frame transitions smoothly to the next.
The researchers used Nvidia Tesla V100 GPUs and the cuDNN-accelerated PyTorch deep learning framework to train the system with 11,000 videos shot at 240 frames-per-second, after which it could fill the missing frames in the slow-motion video.
The technology yields the results you see in the video above, which looks surprisingly smooth for an artificially generated effect, even on videos with just 30 frames per second. The company also worked with YouTube channel The Slow Mo Guys to test the technology on videos with high framerates, such as 240 frames per second. What"s more, the technology can be used to slow down videos by any factor of time, though, presumably, slowing down the videos further will take longer to fill in all the intermediate frames of the video.
As promising as the technology is, Nvidia doesn"t believe it"s ready for the consumer market as it needs a lot of optimization before it can run in real time and, even if it does make its way to consumers, the majority of the processing will have to be done in the cloud. With that being said, the technology is certainly interesting and it could bring slow-motion video to many more people sometime in the future.