MediaCentral's Visual Detection Process Site

I n order to increase efficiency and accuracy in the post-production process, the Global Media Operations (GMO) team has been making an effort to integrate artificial intelligence and machine learning into MediaCentral, an Avid production and post-production platform used by NBC. Our role in the AI/ML project was to help design and build a responsive content moderation panel into MediaCentral.

Content that aims to get broadcasted must follow strict regulations for airing. Each brand has their own standard operating procedures and rules for air. For example, the standards and practices guidelines for the Universal Kids channels will be very different from those for USA.

Currently, NBC has about 8 technicians who work a minimum of 15-16 hours per day in the Post Production team. These technicians need to watch and re-watch content and manually flag time codes for occurrences of content that is deemed inappropriate for air based on the brands’ individual rules. They use an Avid product that does not automatically mark the time stamps for screening and flagging content. For a mere 7 minutes of captioning, it can take as long as two hours for a technician to screen and write in the time stamps. This process is not only expensive and time-consuming, but it is also inconsistent and prone to human error. One technician’s flag can differ from another.

The GMO team wanted a solution that didn’t involve simply purchasing a new service. They wanted to innovate existing products that were already used by NBCU. The solution they came up with to reduce the time, cost, and probability of error during the QC process was to integrate artificial intelligence and machine learning into MediaCentral. Engineering had the capability and documentation provided by Avid producers of MediaCentral.

When searching for the most suitable AI/ML tool for the project, the developers looked at various APIs from Google, Microsoft, Amazon, and Clarifai. They ultimately settled on Amazon Rekognition. Amazon’s image recognition service was chosen because the GMO team found that it was much more accurate than the current IBM Watson API that they were using for transcription. Although the Microsoft Video Indexer has many good features, it was still in beta phase. Google’s Cloud Vision API was also in its early stages and Clarifai’s API only analyzes one frame per second with a file size limit of 80 MB. Thus, the GMO team settled on Amazon Rekognition.

Amazon Rekognition takes a frame every 0.2 seconds. In a test of Rekognition’s capability, the team had technicians and Rekognition screen a video with a split second of nudity. While the human loggers were unable to catch it, Rekognition was able to detect it. Using Amazon’s eliminated the need for a more complicated workflow that the other APIs would have introduced. Using Microsoft or Google would require uploading content to their respective storage systems. Since GMO currently hosts its web servers on Amazon Web Services (AWS), they wanted to centralize their services to Amazon.

Along with the benefit of being able to centralize all our services to the Amazon Web Services ecosystem, Rekognition also has a very strong facial recognition feature. Their powerful facial analysis API will be very useful in a later phase of the AI/ML project.

With the development of this new moderated content pane, the Post Production team’s workflow will also change. Instead of logging profanity, nudity, and sex by making a marker in Avid and commenting on it, the technicians will be provided the markers. From there, they can edit and delete the markers accordingly. Having a human logger adds another layer of confidence to the results from the AI/ML technology.

Amazon Rekognition

Amazon Rekognition is an image recognition service powered by deep learning. They offer APIs to detect objects and scenes, recognize and analyze faces, compare faces, and search for similar faces in a collection of faces.

It currently supports JPEG and PNG image formats and file sizes up to 15MB when passed as an S3 object and up to 5MB when submitted as an image byte array.

API analyzes an image and assigns labels based on its visual content using DetectLabels API. The API lets users identify thousands of objects, scenes, and concepts and returns a confidence score for each label.

Although Rekognition does not yet natively support videos and animated images, the Amazon Elastic Transcoder can be used to process video files and extract images and thumbnails for processing. The transcoder creates video thumbnails in .png format for every second of input video. These thumbnails are then processed and analyzed. The Elastic Transcoder is a service that makes it easy to convert media files in the cloud without the need to manage the underlying infrastructure.

Introduction

Purpose

Technology

Amazon Rekognition