video indexing

Video Big Data Whitepaper (FREE download)

video big data videospace

The term "Video Big Data" is rarely heard of. The reasons are pretty simple: 

  1. It's difficult to extract data from videos
  2. It's difficult make sense of unstructured video data

Therefore, it is not an understatement to say that video is the most difficult medium to search and extract intelligence from. However, given the amount of videos are that generated daily in the public domain (e.g. YouTube) and private domain (e.g. broadcasters, CCTV, education, etc.), it is also not an understatement to say that video is the King of Content. 

The objective of Big Data is to gain Business Intelligence. Video Big Data is no different. The obvious difference is the source and the type of data that can be extracted out from videos.

This Video Big Data Whitepaper aims to explain how we can extract value and intelligence from videos with a 3 step approach:

  1. Extract video data 
  2. Transform unstructured video data
  3. Analyse to data into intelligence 

With this whitepaper, we hope to share some of our knowledge and experiences working with Video Big Data. From our calculations, we estimate that Video Big Data will dwarf Big Data as we know it. Thus, the importance of this whitepaper. We hope you enjoy and benefit from it!

Yours sincerely,

The VideoSpace Team

What is a Video Search Engine?

Let's dissect this into 2 parts - "Video" and "Search Engine". 

Starting with "Search Engine" first. We are so used to using search engines today that we do not really bother with how a search engine really works. And perhaps you shouldn't... why should you as long as the results are good. We normally start questioning (or complain) when the results are not what we expect it to be. 

So a good search engine should do a couple of things. It should (let's get a bit technical) have:

  • A good Indexing engine
  • Phrase matching
  • Smart Search result Summary 
  • Keyword highlighting
  • Stemming/Lemmas (Word form variations are searched and ranked lower)
  • Complex expression support; nested groups, partial matching, NOT, OR and AND
  • Multiple Format indexing
  • Unicode and non English language support

It all the above these parameters are measurable, you will be able to figure out if one engine is better than another. 

So the format that we want to search is "Video". Today, typical search engines can only search "Title" and "Metadata". Even if both "title" and "metadata" are well defined and representative of the video itself, what is missing is the content. Imagine you have a thousand page document and you can only search the document title and it's summary. That's the current state of affairs for video search. 

So of course the next question is what do you want to search from a "Video"? That's like opening Paradox's Box. Unlike a piece of document, video is multi-dimensional and contains a lot more information. For example, speech, words, people, objects, movement, colours, etc. 

Currently, many of these search technologies still do not exist or are barely in their infancy. What is available now, is just scratching the tip of the iceberg. Therefore, the real definition of what is a video search engine is currently evolving. 

At VideoSpace, we would like to define our version of VIdeo Search Engine. Where our VIdeo Search Engine is able to search six key areas:

  • Speech Recognition
  • Words (or Text)
  • Motion Detection
  • Facial Detection
  • Emotion Detection
  • Offensive Content Detection

Numbers reports say the same thing. By 2017, videos will account for more than 70% of all internet traffic. Imagine you have the ability to search videos in future. 

The VideoSpace Video Search Engine is taking the leap now. 

VideoSpace to experiment with Video OCR (optical character recognition) technology

videospace

In the hunt for added capabilities for indexing and search, VideoSpace is happy to announce that we are currently looking into adding Video OCR (optical character recognition) technology as part of our enterprise offering. 

Video OCR, is a lot more advanced than mere OCR in documents and even images, because it has to go through each individual frame of video. 

For VideoSpace, this means that we are looking into the possibilities to incorporate 2 state-of-the-art technologies and techniques to enable search - Video Indexing and Video OCR. 

As for today, the Video OCR engine that we are experimenting with can recognize and support the following languages:

  • Arabic
  • Chinese Simplified
  • Chinese Traditional
  • Czech
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Greek
  • Hungarian
  • Italian
  • Japanese
  • Korean
  • Norwegian
  • Polish
  • Portuguese
  • Romanian
  • Russian
  • Serbian Cyrillic
  • Serbian Latin
  • Slovak
  • Spanish
  • Swedish
  • Turkish

Watch this space for further updates!