video OCR

Digitally Transforming GLAM (Galleries, Libraries, Archives and Museums)

Some problems are so complex today that they can only be solved using AI. This certainly applies to what many consider to be the last frontier in Search technology - Audio Visual Media.

By using a combination of AIs (like speech and face recognition), Videospace unlocks hundreds and thousands of hours of knowledge within your media libraries by making them accessible and discoverable.

With the World's First Translated Video Search, we can further unleash the full potential of your media library by making them searchable in other languages! Extending its assessibility and discoverability!

Besides running Videospace on a world-class video platform (the same platform used by the 2012 and 2016 Olympics), we are using a combination of following advanced technologies:

  • Speech Recognition (over 100 languages)

  • Translation (over 60 languages)

  • Face Recognition

  • Video OCR (up to 26 languages)

  • Natural Language Processing (over 20 languages)

  • Video Search Engine - index and search video in time-series

  • World’s First Translated Search Engine – searches over 6,000 different language pairs

To find out more,


Video Big Data (Part 2) - What kind of Video Data?

videospace-video-big-data.jpg

In the last installment, we explained:

  • Why Video Big Data will absolutely dwarf current Big Data
  • How Video is the most difficult medium to extract data from

Which explains why Video Big Data remains a largely unexplored field. But also means the intense opportunities available because we have not even scrap the tip of this huge data iceberg.

In this installment, we will examine the kind of data elements that we can extract from videos. 

1. Speech
In a hour of video, a person can say up to 9,000 words. So imagine the amount of data just from speech alone. However, the process of transcribing speech is filled with problems and we are currently only starting to get an acceptable level of accuracy.

2. Text
Besides speech, text is probably the second most important element inside videos. For example, in a presentation or lecture, besides speech the speaker would augment the session with a set of slides. Or news tickers appearing during a news broadcast. 

3. Objects
There are thousands of objects inside a video within different timeframe. Therefore, it can be quite challenging to identity what objects are in the video content and in which scene they appear in. 

4. Activities
The difference between video and still images is motion. Different video scenes contain complex activities, such as “running in a group” or “driving a car”. Ability to extract activities will give a lot of insight what the videos are about. This includes offensive content that might contain nudity and profanity.

5. Motion
Detecting motion enables you to efficiently identify sections of interest within an otherwise long and uneventful video. That might sound simple, but what if you have 10,000 hours of videos to review every night? That’s a near impossible task to eyeball every video minute.

6. Faces
Detecting faces from videos adds face detection ability to any survelliance or CCTV system. This will be useful to analyze human traffic within a mall, street or even a restaurant or café. When we include facial recognition, it opens up another data dimension.

7. Emotion
Emotion detection is an extension of the Face Detection that returns analysis on multiple emotional attributes from the faces detected. With emotion detection, one can gauge audience emotional response over a period of time.

This list of video data is certainly not exhaustive but is a definitely a good starting point to the field of Video Big Data. In the next installment, we will examine some of the techniques used to extract these video data. 

Yours sincerely,

The VideoSpace Team

ANNOUNCEMENT: Global Launch of World’s First “Video Search Engine with Interactive Results”

Birmingham, 24 January 2018: - Babbobox and Infini Videos officially announce the launch of the world’s first  “Video Search Engine with Interactive Results” at the Microsoft Tech Summit held in Birmingham, United Kingdom today.

Both tech start-ups Babbobox and Infini Videos believe the future of video search lies in immediate content relevance. Video has proven to be the hardest medium to index because there is so much detail.  Aside from the metadata that an editor may have typed in, most archived videos are essentially unstructured data.  Often, this is because transcripts are not made and scripts are lost, or there isn’t sufficient timing information to align with the video.

To make sense of this data, techniques such as Speech Recognition, Video OCR, Image analysis and various Cognitive and Artificial Intelligence are applied to extract data from media. Since much of the videos in archives contains speech, therefore automatic transcription is a great first step in extracting data from media. With the transcript, an editor is able to search for timecodes in source videos, scrub through those sources, and manually locate viable scenes. This manual process is time-consuming, and not suitable for public use since a text search result does not make for a watchable video.

The innovative “Video Search Engine with Interactive Results” that Babbobox and Infini Videos have co-produced, allows a user to search a topic, and immediately view the search results as an interactive video. One is able to immediately choose scenes within the video that are relevant to the search. With the full-automation of the indexing and the output scene selection, productivity is enhanced and content for the public will scale up. The platform promises increased productivity as it will provide users with fine control over topics and sources, and allow editors to focus on the direction of the stories.

Babbobox and Infini Videos believe that this will be the future of Video Search.

About Babbobox (website: www.babbobox.com)
Babbobox launched two World’s First - World’s First True Unified Search Engine where it has the ability to index and “Search Everything” (all formats including video, audio and documents). Thus, positioning Babbobox to become "The Next Generation of Intelligent Storage". With VideoSpace, we created the World’s Frist Video-Search-as-a-Service. Thus, forming the foundations to enable a new breed of video services for the world.

About Infini Videos (website: www.infinivideos.com)
Infini Videos is a B2B online technology platform for the creation and delivery of HTML5 interactive videos. Infini Videos makes it easy to create engaging interactive videos as well as to access the rich data analytics offered on the platform. The company currently offers branching aka “Choose-your-own-adventure” and 360-degree types of interactivity. In addition to the technology platform, the company also provides specialized creative services as a one-stop solution for clients. Infini Videos is part of the Mediacorp’s MediaPreneur Incubator programme.

To find out more, please CONTACT US

Episode X - A New Search (and a Forceful 2018!)

CLICK to watch how VideoSpace aid the Rebels in their search for the New Death Star...

In search of the new Death Star plans, the Rebels secretly planted thousands of cameras within the Empire in hope of finding clues. However, with thousands of videos and impending deadline, how can the rebels possibly search for information within videos?! Fortunately...

In a galaxy, not far away... 

VideoSpace Search Engine is helping the Rebels searches inside videos for speech, text, objects, motion, faces, emotions (note: not for stormtroopers since they wear helmets) automatically! Thus, finding vital clues for the New Death Star plans...

In the meantime, as we help the Rebels fight the Evil Empire, we wish you a...

Forceful 2018! 

From, 

All of us at Babbobox

May The Search Be With You!

What is a Video Search Engine? Part II – Searching Text

In Part I, we found out that there are 7099 living languages in the world. That includes both written and spoken only languages. According to Ethnologue (20th edition) out of that 7,099 living languages, 3,866 have a developed writing system.

Which leads us to this second part of our series – Searching Text inside a video. Besides Speech, Text is probably the second most important element where we can extract data from.

For example, in a presentation or talk given by a speaker. Besides speech, the speaker would augment the session with a set of slides. Therefore, besides his voice, text (in the slides) is another set of data that can be captured. This is important because what he says and what he present in the slides can be vastly different.

Text that can be OCRed during a presentation

Text that can be OCRed during a presentation

The technology to capture these text inside the video is called Video OCR (Optical Character Recognition). Video OCR is derived from OCR, a technology that has been around a long time.

By strict definition, Optical Character Recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo or from subtitle text superimposed on an image (source: Wikipedia). The first OCR machine that read characters and converted them into standard telegraph code was invented by Emanuel Goldberg in 1914!

Unfortunately, one hundred years on, OCR technology still has some ways to go, especially in the field of adding more language capabilities and recognizing handwriting. However, with more A.I. and Machine Learning, the hope is that researchers can add more capabilities to what OCR can do now.

However, Video OCR is giving OCR a new lease of life by simply adding another dimension – moving images. Given the amount of videos that has never been OCRed before and the amount of videos being generated every day, the potential for Video OCR is immerse.

To find out more about how you can search TEXT inside your videos, visit VideoSpace Video Search Engine or our Video-Search-as-a-Service.

What is a Video Search Engine?

Let's dissect this into 2 parts - "Video" and "Search Engine". 

Starting with "Search Engine" first. We are so used to using search engines today that we do not really bother with how a search engine really works. And perhaps you shouldn't... why should you as long as the results are good. We normally start questioning (or complain) when the results are not what we expect it to be. 

So a good search engine should do a couple of things. It should (let's get a bit technical) have:

  • A good Indexing engine
  • Phrase matching
  • Smart Search result Summary 
  • Keyword highlighting
  • Stemming/Lemmas (Word form variations are searched and ranked lower)
  • Complex expression support; nested groups, partial matching, NOT, OR and AND
  • Multiple Format indexing
  • Unicode and non English language support

It all the above these parameters are measurable, you will be able to figure out if one engine is better than another. 

So the format that we want to search is "Video". Today, typical search engines can only search "Title" and "Metadata". Even if both "title" and "metadata" are well defined and representative of the video itself, what is missing is the content. Imagine you have a thousand page document and you can only search the document title and it's summary. That's the current state of affairs for video search. 

So of course the next question is what do you want to search from a "Video"? That's like opening Paradox's Box. Unlike a piece of document, video is multi-dimensional and contains a lot more information. For example, speech, words, people, objects, movement, colours, etc. 

Currently, many of these search technologies still do not exist or are barely in their infancy. What is available now, is just scratching the tip of the iceberg. Therefore, the real definition of what is a video search engine is currently evolving. 

At VideoSpace, we would like to define our version of VIdeo Search Engine. Where our VIdeo Search Engine is able to search six key areas:

  • Speech Recognition
  • Words (or Text)
  • Motion Detection
  • Facial Detection
  • Emotion Detection
  • Offensive Content Detection

Numbers reports say the same thing. By 2017, videos will account for more than 70% of all internet traffic. Imagine you have the ability to search videos in future. 

The VideoSpace Video Search Engine is taking the leap now. 

VideoSpace to experiment with Video OCR (optical character recognition) technology

videospace

In the hunt for added capabilities for indexing and search, VideoSpace is happy to announce that we are currently looking into adding Video OCR (optical character recognition) technology as part of our enterprise offering. 

Video OCR, is a lot more advanced than mere OCR in documents and even images, because it has to go through each individual frame of video. 

For VideoSpace, this means that we are looking into the possibilities to incorporate 2 state-of-the-art technologies and techniques to enable search - Video Indexing and Video OCR. 

As for today, the Video OCR engine that we are experimenting with can recognize and support the following languages:

  • Arabic
  • Chinese Simplified
  • Chinese Traditional
  • Czech
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Greek
  • Hungarian
  • Italian
  • Japanese
  • Korean
  • Norwegian
  • Polish
  • Portuguese
  • Romanian
  • Russian
  • Serbian Cyrillic
  • Serbian Latin
  • Slovak
  • Spanish
  • Swedish
  • Turkish

Watch this space for further updates!