Now that Google has released Conversational Search, you might call this Passive Conversational Image Search.
TalkShow shows pictures of the things you're talking about.
User's Guide
- Click the microphone button
- Speak slowly and clearly
- Talk about people, places, and things
- Marvel at the slideshow of occasionally relevant images
The program doesn't use voice commands. Turn it on, and it listens to your conversation, constantly analyzing your speech, looking for any mention of a person or place. It searches a lot, displays little.
Start by saying some famous names: TV shows, bands, cities, monuments, celebrities, historical figures, brands, products. Anything with an image that might be found on the internet. Pause between names.
If it doesn't recognize a name, try using the name in a complete sentence. Entity recognition works better on narrative text than on lists of isolated names.
Try telling a story that mentions some well-known people, places, or things.
Talk about your last vacation or the time you met a celebrity.
Turn it on a let it eavesdrop while you have a conversation.
What it does
TalkShow transcribes your speech, mines the text for proper names, does a series of internet image searches, and displays the search results as a slideshow.
When you pause for a second, the transcribed words since the last pause are examined. If there are capitalized words in this chunk, a search for 20 hits is initiated for each contiguous chunk. If there are none, the entire phrase is searched for three hits.
Between pauses, the program repeatedly checks the words that have arrived since the last pause, looking for new names. If it finds one, it interrupts the slideshow to display the new image as soon as possible.
The slideshow cycles through the results for all queries, looking for images that have not been displayed yet, paying attention to the rank of an image in the search results list. Top hits for each query are displayed first.
How it works
This is instant image search using continuous narrative speech transcription and named entity recognition.
- Speech Recognition using Web Speech API by Google Chrome Version 25
- Named Entity Recognition by Alchemy API
- Image Search by YahooBOSS and flickr
- Slideshow using jQuery
Speech Recognition:
Voice Driven Web Apps: Introduction to the Web Speech API - HTML5Rocks Updates
http://updates.html5rocks.com/2013/01/Voice-Driven-Web-Apps-Introduction-to-the-Web-Speech-API
Named Entity Recognition:
Alchemy API
http://www.alchemyapi.com/
The TalkShow javascript code decides what to search partly based on the results of Alchemy's Named Entity Recognition. Alchemy is consulted only on completed phrases. The simpler approach of looking for sequences of capitalized words in the speech transcript is used for interim results, allowing faster display of images for a named entity. Capitalization of names is done by the Speech Recognition software only in English at the moment.
Alchemy works for English, French, German, Italian, Russian, Spanish, and Swedish. For other languages, we look up every pause-delimited chunk. You control the program by your placement of pauses.
Image Search:
Yahoo BOSS
http://developer.yahoo.com/boss/search/
flickr
http://www.flickr.com/