Microsoft adds new ways to train custom language models with Video Indexer

A couple of years ago, at its Build developer conference, Microsoft introduced Video Indexer (VI), a new service for Azure that caters to people with a large library of videos. Essentially, the web app creates metadata about media, unlocking insights regarding it. For example, it automatically transcribes videos, adds closed captions, discovers keywords, and uses facial recognition, among other actions.

Today, Microsoft has announced some new enhancements for VI. These include the ability to capture any manual transcript edits automatically, and the ability for users to add their own closed caption files. The tech giant believes that these new features will be able to help organizations improve the accuracy of base language models over time.

To start off, users can manually edit automatic transcriptions through the Timeline pane in the VI portal. The changes made will be captured in a text file and be automatically inserted to the language model being used with the relevant video. The updated text will be added to a new language model, Account Adaptations, if none of the customer language models is being utilized. After going over the differences between the old and new text through the manual edits file, users can then click 'Train' in the VI portal or use the VI 'train language model' API to update the language model. Henceforth, all changes made will be automatically reflected in future videos that are indexed using the same model.

The 'update video transcript' API has also been enhanced to automatically include manually uploaded transcript files with relevant custom models, helping train the model further. For example, calling the aforementioned API for a video titled "Godfather" will lead to a new transcript file with the same title being included in the custom language model used to index said video.

With regards to closed caption files, customers can now upload these to custom models as well. Currently supported file extensions now include VTT, SRT, TTML, and TXT. Notably, this feature is not only limited to existing models, but newly created models as well. Upon uploading any subtitle files, VI removes all the metadata from it, allowing only the actual text to remain. Once again, the 'Train' option or the 'train language model' API can be used to update the model with the changes made.

If you are interested in learning more about Video Indexer, you can do so here. Incase you want to actually check out the service in action, you can try it out here.