Here is a video of Siri and TellMe in use side-by-side


Recommended Posts

Why should I have to adjust everything I know in order to use a tool? Surely a tool and its subset of features should be designed around the way we speak?

Again, if you'd read the thread you would see that this has already been discussed. "Text ccuk", "Call ccuk", "Find ccuk" are all perfectly natural things to say so you're hardly being forced to compromise. Anyway, whether you think natural language is important is irrelevant, TellMe doesn't offer it and he shouldn't try to use the software incorrectly. It's the equivalent of complaining that you can't view Twitter feeds in the iPhone's contacts application - iPhone doesn't offer this feature so there's no point claiming that it's broken.

TellMe performs at least some of its voice recognition on the phone. I've just turned off all data connections on my phone and TellMe can launch applications, call and text contacts. The only thing it can't do is search using Bing.

That's more than can be said for crappy vlingo on android at least :p driving along the motorway with a skittish data connection renders it nearly useless.

I know you've already conceeded that the recognition isn't as good but that is exactly the reason for the failure in this video, it's not because he used invalid commands.

I think it is based on my experience of TellMe. Recognition is better when I use the correct commands or use natural language in the right place. If I say "Text Jake" it will generally text me. However, if I say "Send a text to Jake" it fails every time.

Again, if you'd read the thread you would see that this has already been discussed. "Text ccuk", "Call ccuk", "Find ccuk" are all perfectly natural things to say so you're hardly being forced to compromise. Anyway, whether you think natural language is important is irrelevant, TellMe doesn't offer it and he shouldn't try to use the software incorrectly. It's the equivalent of complaining that you can't view Twitter feeds in the iPhone's contacts application - iPhone doesn't offer this feature so there's no point claiming that it's broken.

See you keep dodging around points. Regardless of what you are saying the system couldn't understand the words he was speaking, context aside. I personally don't think it's as good as other offerings.

My WP7 works much better than that. Holy ****, I mean it's no Siri, but that guy was just terrible.

I wonder how long MS spent on research and development for accents which aren't American.

Also, can TellMe work with foreign languages?

I think it is based on my experience of TellMe. Recognition is better when I use the correct commands or use natural language in the right place. If I say "Text Jake" it will generally text me. However, if I say "Send a text to Jake" it fails every time.

You're probably right, but that seems like a crazy way of doing things. It shouldn't assume the correct command will be used every time as it's being spoken to by humans not machines. It should decipher the sentence and then decide if a correct command is used, not decide there's no valid command and crap out.

I wonder how long MS spent on research and development for accents which aren't American.

Also, can TellMe work with foreign languages?

Yes, TellMe works with foreign accents. But, TellMe is NOT directly comparable to Siri. Siri is a complete solution that listens to speech with context and performs and action. TellMe itself is just voice recognition, with no context. Context is provided by the application developer using the TellMe service. It'd be perfectly easy to create something similar to Siri using TellMe. For example, people praise Kinect's voice recognition - but that's also powered by the same TellMe service, but just with more context provided by the actual software on the Xbox.

And of course, what makes speech recognition works properly is context. Non-contextual speech recognition is never mind blowing, and it's what lets Siri distinguish between "10 am" and "teen anal". Siri is expecting a time to fit in with the rest of the sentence, so it assumes a time. Note that voice recognition doesn't usually return exact fixed words, they tend to return words accompanied by liklihood probabilities - i.e. it's 90% likely to be this, but maybe 70% sounds like this. With proper context in Siri's case it does better because it can looks at those at all those choices, and then look at the types of commands it accepts, and be able to choose appropriately. If Windows Phone was programmed to let you make appointments over speech, it would work fine. But it's not, so it doesn't.

In the end, Siri is programmed to recognise a lot more, Microsoft's Windows Phone speech is only programmed to recognise a tiny subset of commands, nothing to do with problems on TellMe's ends - it's just the say it's been programmed on Windows Phone. Hell, even if one company knows another companies implementation is better, they're not going to be stupid enough to tell people their competition is better are they @___@ Microsoft are hardly going to toot Apple's horn ever, and neither are Google :p

You're probably right, but that seems like a crazy way of doing things. It shouldn't assume the correct command will be used every time as it's being spoken to by humans not machines. It should decipher the sentence and then decide if a correct command is used, not decide there's no valid command and crap out.

A crazy way of doing things? Why? You're just telling TellMe to do something (command) and not having a conversation (natural human speech) with it. In my opinion, "text John" is a lot clearer, precise, and shorter than "Can you please send a text to John?" Don't get me wrong, Siri's speech abilities are good and based on this video, it's natural speech recognition is great but TellMe is not as bad as the video makes it seem as it just needs to be updated.

See you keep dodging around points. Regardless of what you are saying the system couldn't understand the words he was speaking, context aside. I personally don't think it's as good as other offerings.

I'm not trying to dodge around anything. You are just making points without having bothered to read the rest of the thread which is leading to confusion. Read what I've said before trying to pick my posts apart.

A crazy way of doing things? Why? You're just telling TellMe to do something (command) and not having a conversation (natural human speech) with it. In my opinion, "text John" is a lot clearer, precise, and shorter than "Can you please send a text to John?" Don't get me wrong, Siri's speech abilities are good and based on this video, it's natural speech recognition is great but TellMe is not as bad as the video makes it seem as it just needs to be updated.

You need to read the comment I replied to to see the context. Jakem1 was suggesting the entire sentence was misread because no correct command was used. He said the voice recognition is more accurate when correct commands are used.

I said that is a crazy way of doing things, because the way the software recognises speech shouldn't change based on whether the correct command is used or not. It's expecting human input so occasionally incorrect commands will be used and the software should account for that.

I'm not saying that commands are a bad idea, just that the voice recognition should be equally as accurate whether the correct command is used or not.

You need to read the comment I replied to to see the context. Jakem1 was suggesting the entire sentence was misread because no correct command was used. He said the voice recognition is more accurate when correct commands are used.

I said that is a crazy way of doing things, because the way the software recognises speech shouldn't change based on whether the correct command is used or not. It's expecting human input so occasionally incorrect commands will be used and the software should account for that.

I'm not saying that commands are a bad idea, just that the voice recognition should be equally as accurate whether the correct command is used or not.

Siri is still command based - it just features a lot more commands.

Unfortunately most current voice recognition systems ARE based on context. They don't tend to return exact matches of words. - they return a list of words and probabilities, and then the software forms a sentence using the probabilities provided by the speech recognition service, and the context and sentence structures the software supports. Speech recognition isn't advanced far enough to perfectly understand speech with no context. Siri is still ultimately command base, it's just more likely to get it right because it's implicitly programmed to recognise more variations of the same thing.

Siri is still command based - it just features a lot more commands.

Both Siri and TellMe can be used for transcribing. Siri is system wide, TellMe can be used inside the Text Messaging app. How accurate are both systems when doing pure transcription? I know from using Siri it's about 98% correct. How is TellMe though? If it's anything like when trying to tell it to use commands I'm not impressed.

Both Siri and TellMe can be used for transcribing. Siri is system wide, TellMe can be used inside the Text Messaging app. How accurate are both systems when doing pure transcription? I know from using Siri it's about 98% correct. How is TellMe though? If it's anything like when trying to tell it to use commands I'm not impressed.

It works perfectly well for me. The only things it messes up for me tends to just be odd names, but apart from that I've had no problems with it.

For what it's worth - Siri's use of Nuance is directly comparable to Windows Phone's use of TellMe for speech synthesis. So it's really Nuance speech synthesis vs TellMe speech synthesis for pure transcription, rather than Siri or Windows Phone :p

You need to read the comment I replied to to see the context. Jakem1 was suggesting the entire sentence was misread because no correct command was used. He said the voice recognition is more accurate when correct commands are used.

I said that is a crazy way of doing things, because the way the software recognises speech shouldn't change based on whether the correct command is used or not. It's expecting human input so occasionally incorrect commands will be used and the software should account for that.

I'm not saying that commands are a bad idea, just that the voice recognition should be equally as accurate whether the correct command is used or not.

I agree with you as I did read the comment and understand why you believe the software should account for any speech inconsistencies. You're right, speech software should understand both but to be honest I prefer the command way of telling my speech recognition software how to do things. However, I love the fact that companies are pushing the software in new directions as it benefits us the consumers.

Siri is still command based - it just features a lot more commands.

Unfortunately most current voice recognition systems ARE based on context. They don't tend to return exact matches of words. - they return a list of words and probabilities, and then the software forms a sentence using the probabilities provided by the speech recognition service, and the context and sentence structures the software supports. Speech recognition isn't advanced far enough to perfectly understand speech with no context. Siri is still ultimately command base, it's just more likely to get it right because it's implicitly programmed to recognise more variations of the same thing.

If that was the case then transcription would suck. It's clear to me that siri is just better..a lot better.

To text you say ... "Text" then the name for tell me.. its so dumb because this guy did not say the commands needed for tell me.. he would say something like.. send this person a text.. instead of just saying text and then the name..

Also how it works is that what ever you say without the commands, is just searched on bing.. i like this because i dont need to be like, "siri, where is the closest indian resaraunt".. I can be like "Indian Restaurant" and bing local shows me.

If that was the case then transcription would suck. It's clear to me that siri is just better..a lot better.

From that video? That video evidently proves how context makes recognition better. Find a video comparing direct voice transcription on a text message where context doesn't exist for a better idea of how they compare on that front.

From that video? That video evidently proves how context makes recognition better. Find a video comparing direct voice transcription on a text message where context doesn't exist for a better idea of how they compare on that front.

No it doesn't..not everything is context based, what about Bing/Google searches? What's the context there, words?? :laugh:

If contextual expectations made that big a difference to any voice recognition software then transcription would suck, but it clearly doesn't.

Siri is incredibly accurate at sending text messages, emails etc where the message can be anything.

Your explanation doesn't wash.

No it doesn't..not everything is context based, what about Bing/Google searches? What's the context there, words?? :laugh:

If contextual expectations made that big a difference to any voice recognition software then transcription would suck, but it clearly doesn't.

Siri is incredibly accurate at sending text messages, emails etc where the message can be anything.

Your explanation doesn't wash.

Did you not see the point where I said "in this video"? Windows Phone and Android do great transcriptions too of text messages, and side by side with an iPhone give largely similar results.

Did you not see the point where I said "in this video"? Windows Phone and Android do great transcriptions too of text messages, and side by side with an iPhone give largely similar results.

The windows phone tried to search bing for something completely different to what was said, and there's not really any contextual guesswork involved with Bing searches..logic dictates transcription would offer the same poor recognition for the man in the video.

The windows phone tried to search bing for something completely different to what was said, and there's not really any contextual guesswork involved with Bing searches..logic dictates transcription would offer the same poor recognition for the man in the video.

I think you may be missing Johnny's point but dictation is more accurate in TellMe than the stuff you've seen in this thread. Check out this video from 56 seconds on for an example of dictation in a text message:

Also, dictation into the Bing app for searching is mostly accurate although it does fall over on some names.

EDIT: It just occurred to me that (as I guess you'd expect) the dictation gets passed to a server for translation so that's one more thing that can't be done without a data connection. It might also explain why it's more accurate than some of the command-based stuff.

I think you may be missing Johnny's point but dictation is more accurate in TellMe than the stuff you've seen in this thread. Check out this video from 56 seconds on for an example of dictation in a text message:

Also, dictation into the Bing app for searching is mostly accurate although it does fall over on some names.

EDIT: It just occurred to me that (as I guess you'd expect) the dictation gets passed to a server for translation so that's one more thing that can't be done without a data connection. It might also explain why it's more accurate than some of the command-based stuff.

Unfair comparison..the person in that video is American. It tends to be other accents these types of software fail on. Show me an Australian and well talk :p

Unfair comparison..the person in that video is American. It tends to be other accents these types of software fail on. Show me an Australian and well talk :p

:laugh:

It's not Australian but this one's a little closer to home for us and it shows off search and dictation in action:

http://www.youtube.com/watch?v=0vW8vE10Snk

The windows phone tried to search bing for something completely different to what was said, and there's not really any contextual guesswork involved with Bing searches..logic dictates transcription would offer the same poor recognition for the man in the video.

Or ergo, logic dictates that the only reason Siri COULD properly understand it is because there was added context. Certainly for the very first one, you can see where context comes in handy. And second is actually phonetically very similar, and context would easily sway a speech recognition system in favour of "Send a text to <contact>" if that's programmed into it's grammar. And same with the third. This video proves nothing, apart from the fact that Siri has a great context engine. (And it is infact a better system than what Windows Phone has, but I'm only defending TellMe, not what Microsoft have done with it in Windows Phone)

The actual underling accuracy of the speech -> text can only be shown in a situation where no context is used, in raw transcription, which that original video didn't. And unfortunately I have no Australians in my house to make a video comparing how they work for Australians :p

Or ergo, logic dictates that the only reason Siri COULD properly understand it is because there was added context. Certainly for the very first one, you can see where context comes in handy. And second is actually phonetically very similar, and context would easily sway a speech recognition system in favour of "Send a text to <contact>" if that's programmed into it's grammar. And same with the third. This video proves nothing, apart from the fact that Siri has a great context engine. (And it is infact a better system than what Windows Phone has, but I'm only defending TellMe, not what Microsoft have done with it in Windows Phone)

The actual underling accuracy of the speech -> text can only be shown in a situation where no context is used, in raw transcription, which that original video didn't. And unfortunately I have no Australians in my house to make a video comparing how they work for Australians :p

http://m.youtube.com/index?desktop_uri=%2F&gl=GB#/watch?v=E91Qu1nVQtE

I can't find a dictation video but this shows the conversation features...very accurate :p

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.