Premium

Microsoft Unveils AI-Powered Image Captioning For The Visually Impaired

October 26, 2020

Microsoft AI researchers revealed that they’ve created a new “artificial intelligence system” for image-captioning. The company hopes to use artificial intelligence to achieve barrier-free products.

Become a Subscriber

Please purchase a subscription to continue reading this article.

Subscribe Now

“Ideally, everyone would include alt text for all images in documents, on the web, in social media – as this enables people who are blind to access the content and participate in the conversation. But, alas, people don’t,” said Saqib Shaikh, a Software Engineering Manager with Microsoft’s AI platform group.

To remedy this, the software giant’s AI-powered imaging captioning hopes to “generate captions for images that are, in many cases, more accurate than the descriptions people write.”

The technology uses VIsual VOcabulary pre-training (VIVO) which uses paired image-tag data to learn a visual vocabulary. Meanwhile, a second dataset of properly captioned images is then used to help teach the AI how to best describe the pictures.

Microsoft is offering the new captioning model as part of Azure's Cognitive Services, so any developer can bring the technology into their apps. It’s also available in Microsoft's app for visually impaired users, that can narrate the world around them and make things more accessible. The captioning model will soon be available to PowerPoint for the web as well as Windows and Mac, slated for release later this year. This is in addition to the accessibility measures the Microsoft Office software suite currently provides, such as accompanying screen reading software that can read content line by line to assist visually impaired people.

To help rank the performance of their new AI, Microsoft researchers entered it into the ‘nocaps’ challenge. The nocaps challenge determines how well AI is able to describe new objects that it hasn’t previously seen in training data. To date, Microsoft’s AI now ranks first on its leaderboard.

In 2016, Google claimed its AI could caption images with 94% accuracy—almost as well as humans. Microsoft claims its AI is two times better than the image captioning model it’s been using since 2015. Though, Harsh Agrawal, one of the creators of the benchmark, disputes this. He told The Verge that its evaluation metrics “only roughly correlate with human preferences” and that it “only covers a small percentage of all the possible visual concepts.” Regardless, these new updates will help improve accessibility to Microsoft’s ubiquitous products and services.