Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Applications > speech recognition _ still crap 10 years later

speech recognition _ still crap 10 years later
Thread Tools
eddiecatflap
Baninated
Join Date: Sep 2002
Location: http://www.rotharmy.com
Status: Offline
Reply With Quote
Nov 26, 2004, 01:49 PM
 
i bought a quadra in 1994 and it had plaintalk installed , it wasnt very impressive , but i wasnt expecting much

now with gigahertz processors etc i find its still total rubbish

any idea when or if it will ever be updated ?
     
Severed Hand of Skywalker
Addicted to MacNN
Join Date: Apr 2001
Location: The bottom of Cloud City
Status: Offline
Reply With Quote
Nov 26, 2004, 02:33 PM
 
Same with the voices. I would kill for a new voice that sounded just like HAL.

"Ahhhhhhhhhhhhhhhh"
     
paully dub
Mac Elite
Join Date: Feb 2004
Location: Paris, NY, Rome, etc
Status: Offline
Reply With Quote
Nov 26, 2004, 03:54 PM
 
What's Tiger's speakable items like?

How close are we to a synthetic voice as full and rich (and real) as HAL's anyway. Wouldn't that open the door to anyone's voice, like Elvis or Jimmy Walker?

Most of the current ones are kind of stupid and useless.

Adopt-A-Yankee
     
SafariX
Mac Elite
Join Date: May 2003
Status: Offline
Reply With Quote
Nov 26, 2004, 04:08 PM
 
My voice is my password...




My voice IS my password...




MY voice IS my password....



MY VOICE IS MY PASSWORD MOTHER****ER, LET ME IN BEFORE I **** ON YOUR FACE!

(after calming down):
.....muh eye vuh oye suh, is my pass word uh...

good ol' OS9
     
Spheric Harlot
Clinically Insane
Join Date: Nov 1999
Location: 888500128, C3, 2nd soft.
Status: Offline
Reply With Quote
Nov 26, 2004, 10:04 PM
 
Originally posted by paully dub:
What's Tiger's speakable items like?

How close are we to a synthetic voice as full and rich (and real) as HAL's anyway.
"I'm sorry Paul, I'm afraid they can't do that."
     
eddiecatflap  (op)
Baninated
Join Date: Sep 2002
Location: http://www.rotharmy.com
Status: Offline
Reply With Quote
Nov 27, 2004, 06:21 AM
 
yeah , that victoria voice really annoys me , it was forgievable ten years ago , but fer gods sake , things have moved on so much since then

it really is a ridiculous situation , im amazed apple haven't radically updated plaintalk

they pontificate about improving the os , but something like this they totally ignore

odd
     
Thain Esh Kelch
Mac Enthusiast
Join Date: May 2001
Location: Denmark
Status: Offline
Reply With Quote
Nov 27, 2004, 09:21 AM
 
I think it would need major development to get beyong the synthetic voices we have now... Some 3rd party software company, or maybe IBM, would have done it by now otherwise..
     
Millennium
Clinically Insane
Join Date: Nov 1999
Status: Offline
Reply With Quote
Nov 27, 2004, 09:33 AM
 
Are you talking about speech recognition or speech synthesis? They're basically opposite technologies, and while your thread title mentions one the thread itself seems to be talking about the other.

Neither technology, however, has gotten any significant attention from Apple in the last decade or so, except to port it to OSX, add a couple of new voices, and downgrade several existing voices (Trinoids in particular has suffered). I'm left wondering why they even bothered to port it if they weren't going to do anything with it.
You are in Soviet Russia. It is dark. Grue is likely to be eaten by YOU!
     
Cadaver
Addicted to MacNN
Join Date: Jan 2003
Location: ~/
Status: Offline
Reply With Quote
Nov 27, 2004, 11:12 AM
 
I use a product at work called PowerScribe. Its a medical speech-to-text dictation system. Does a fairly decent job. Obviously not something one would use on their personal computer, however.

And as far as speech synthesis, try this:
http://www.research.att.com/projects/tts/demo.html

Its most impressive. Not exactly new (a couple years old now), but still impressive.

     
kcmac
Mac Elite
Join Date: Jan 2001
Location: Kansas City, Mo
Status: Offline
Reply With Quote
Nov 27, 2004, 12:15 PM
 
Cadaver,

I used that link a while back to record a voice message in mail when I have a new message. Voices are much more natural than Fred, for example.
     
tooki
Admin Emeritus
Join Date: Oct 1999
Location: Zurich, Switzerland
Status: Offline
Reply With Quote
Nov 27, 2004, 12:20 PM
 
Originally posted by SafariX:
My voice is my password...




My voice IS my password...




MY voice IS my password....



MY VOICE IS MY PASSWORD MOTHER****ER, LET ME IN BEFORE I **** ON YOUR FACE!

(after calming down):
.....muh eye vuh oye suh, is my pass word uh...

good ol' OS9
That wasn't speech recognition at all. It was voice fingerprint recognition. In other words, it just compared what it recorded to what you spoke into the mike to see if it sounds the same. You could just as well have done it with a non-voice sound.

tooki
     
tooki
Admin Emeritus
Join Date: Oct 1999
Location: Zurich, Switzerland
Status: Offline
Reply With Quote
Nov 27, 2004, 12:20 PM
 
Originally posted by eddiecatflap:
i bought a quadra in 1994 and it had plaintalk installed , it wasnt very impressive , but i wasnt expecting much

now with gigahertz processors etc i find its still total rubbish

any idea when or if it will ever be updated ?
Voice recognition/dictation software has improved substantially since then -- they just don't make it for the Mac.

tooki
     
tooki
Admin Emeritus
Join Date: Oct 1999
Location: Zurich, Switzerland
Status: Offline
Reply With Quote
Nov 27, 2004, 12:22 PM
 
Originally posted by eddiecatflap:
yeah , that victoria voice really annoys me , it was forgievable ten years ago , but fer gods sake , things have moved on so much since then
it really is a ridiculous situation , im amazed apple haven't radically updated plaintalk
they pontificate about improving the os , but something like this they totally ignore
odd
Umm, they have updated it. In case you didn't notice, there's a new voice, Vicky, that is a vast improvement over Victoria. There's also a new male voice, Bruce, which is a big jump from Fred.

tooki
     
Person Man
Professional Poster
Join Date: Jun 2001
Location: Northwest Ohio
Status: Offline
Reply With Quote
Nov 27, 2004, 04:15 PM
 
Originally posted by tooki:
Umm, they have updated it. In case you didn't notice, there's a new voice, Vicky, that is a vast improvement over Victoria. There's also a new male voice, Bruce, which is a big jump from Fred.

tooki
Yes, they improved those voices, but compare the new improved Apple voices to the AT&T ones linked to above, and the AT&T ones win HANDS DOWN.

Why does Apple continue to let this fall by the wayside when everyone else seems to have surpassed them on this point?

Or try these on for size... so many different accents and languages (the Greek speakers are spot-on!). The "valley girl" voice is hilarious (sounds a lot like Ellen Feiss)
     
arekkusu
Mac Enthusiast
Join Date: Jul 2002
Status: Offline
Reply With Quote
Nov 27, 2004, 09:04 PM
 
Originally posted by tooki:
There's also a new male voice, Bruce, which is a big jump from Fred.
Vicki is new, but Bruce isn't. Bruce is as old as Agnes and Victoria.
     
tooki
Admin Emeritus
Join Date: Oct 1999
Location: Zurich, Switzerland
Status: Offline
Reply With Quote
Nov 28, 2004, 02:14 AM
 
I think it's an updated Bruce. (I could be wrong.)

As for the comparison with AT&T... well, what's Apple's reason? Text-to-speech is not a big thing on the desktop. It's not used for screen readers (which have different requirements), and personal computers aren't used for voice guided phone systems, which is where pretty much all the speech synthesis research goes. AT&T makes those phone systems, which is why they need top-notch speech synthesis.

Give me one real-world example of where text-to-speech is used, and would benefit from higher quality. (The only real-world app that I know of that uses TTS is iStumbler, which can read aloud the network names it finds, as it finds them.)

Besides, our text to speech is still better than Windows'!

tooki
     
cla
Mac Enthusiast
Join Date: Mar 2000
Status: Offline
Reply With Quote
Nov 28, 2004, 04:14 PM
 
Except as an aid for visually impaired, human-computer interaction could benefit greatly from text-to-speech. The only means of an interface getting the user's attention today is by displaying dialogs, alternatively jumping or flashing icons, annoying the user until they're dealt with.

I'm not saying the computer should read me every dialog box that pops up, but surely there are situations where I would prefer the computer to inform me auditively. That way I can decide whether I want to act upon the event or not, such as messages from my firewall software or iChat messages.

I know people using text-to-speech as a learning aid. To have the computer read the text while you read it yourself, as a means of maintaining concentration and improving the memorization process.


Give me one real-world example of where text-to-speech is used, and would benefit from higher quality.
I'm thinking chickens and eggs.
I think text-to-speech would be used more frequently in the real-world if the quality improved - especially if it could be combined with speech recognition. To not actually have to sit in front of the computer when doing mail correspondance.
Sounds a bit star trek-alike... :>


EDIT: Changed "hearing impaired" to "visually impaired" =)
( Last edited by cla; Nov 29, 2004 at 09:42 AM. )
     
cla
Mac Enthusiast
Join Date: Mar 2000
Status: Offline
Reply With Quote
Nov 28, 2004, 04:22 PM
 
Originally posted by tooki:
That wasn't speech recognition at all. It was voice fingerprint recognition. In other words, it just compared what it recorded to what you spoke into the mike to see if it sounds the same. You could just as well have done it with a non-voice sound.
Isn't that about the thing, considering I have to "learn" speech recognition software (IBM ViaVoice, iListen) what my phonemes and voice sounds like?

Or is Apple's built-in engine actually extracting phonemes from my input in order to compare it to a pronunciation database?
     
Person Man
Professional Poster
Join Date: Jun 2001
Location: Northwest Ohio
Status: Offline
Reply With Quote
Nov 28, 2004, 05:06 PM
 
Originally posted by tooki:
Give me one real-world example of where text-to-speech is used, and would benefit from higher quality.
Tiger's Spoken Interface for the visually impaired. While the current Vicki voice is adequate, it can still be a bit hard to follow sometimes, especially if you're not reading along. That kind of thing can ALWAYS benefit from more natural sounding higher quality speech.
     
tooki
Admin Emeritus
Join Date: Oct 1999
Location: Zurich, Switzerland
Status: Offline
Reply With Quote
Nov 30, 2004, 09:45 PM
 
Originally posted by cla:
Isn't that about the thing, considering I have to "learn" speech recognition software (IBM ViaVoice, iListen) what my phonemes and voice sounds like?

Or is Apple's built-in engine actually extracting phonemes from my input in order to compare it to a pronunciation database?
Neither. It had you record any utterance of your choice several times, created an average of them, and then when you went to log in, it checked to see if it's similar enough to allow login. It's comparing audio characteristics, not doing any kind of speech recognition whatsoever.

tooki
     
cla
Mac Enthusiast
Join Date: Mar 2000
Status: Offline
Reply With Quote
Nov 30, 2004, 10:20 PM
 
Originally posted by tooki:
Neither.
Oh, the "Or is Apple's built-in engine actually extracting phonemes from my input in order to compare it to a pronunciation database?"-part was refering to speech recognition.

Wonder if Apple has any tech docs about the inner workings of speech recognition stached away somewhere.
     
goMac
Posting Junkie
Join Date: May 2001
Location: Portland, OR
Status: Offline
Reply With Quote
Dec 1, 2004, 05:00 AM
 
Originally posted by cla:
Oh, the "Or is Apple's built-in engine actually extracting phonemes from my input in order to compare it to a pronunciation database?"-part was refering to speech recognition.

Wonder if Apple has any tech docs about the inner workings of speech recognition stached away somewhere.
I heard voice recognition was farmed out to India. Now it just sends what you say to some massive center where a response is quickly created on the other end.

It also explains the accuracy.
8 Core 2.8 ghz Mac Pro/GF8800/2 23" Cinema Displays, 3.06 ghz Macbook Pro
Once you wanted revolution, now you're the institution, how's it feel to be the man?
     
cla
Mac Enthusiast
Join Date: Mar 2000
Status: Offline
Reply With Quote
Dec 1, 2004, 07:29 AM
 
Originally posted by goMac:
I heard voice recognition was farmed out to India. Now it just sends what you say to some massive center where a response is quickly created on the other end.
Gol' dang it!
     
badidea
Professional Poster
Join Date: Nov 2003
Location: Hamburg
Status: Offline
Reply With Quote
Dec 1, 2004, 08:41 AM
 
Originally posted by SafariX:
My voice is my password...




My voice IS my password...




MY voice IS my password....



MY VOICE IS MY PASSWORD MOTHER****ER, LET ME IN BEFORE I **** ON YOUR FACE!

(after calming down):
.....muh eye vuh oye suh, is my pass word uh...

good ol' OS9

Back then I thought it would be really cool to log in via voice recognition and everytime I recorded my "password", I always passed the test with the first try ..... I could NEVER log in with just one try though!
***
     
theolein
Addicted to MacNN
Join Date: Feb 2001
Location: zurich, switzerland
Status: Offline
Reply With Quote
Dec 1, 2004, 12:08 PM
 
Originally posted by tooki:

Give me one real-world example of where text-to-speech is used, and would benefit from higher quality....
Language education software would benefit enormously from quality like the AT&T system offers where the the system can pronounce the foreign words or sentences that you type.
weird wabbit
     
paully dub
Mac Elite
Join Date: Feb 2004
Location: Paris, NY, Rome, etc
Status: Offline
Reply With Quote
Dec 1, 2004, 12:19 PM
 
Originally posted by theolein:
Language education software would benefit enormously from quality like the AT&T system offers where the the system can pronounce the foreign words or sentences that you type.
I have the feeling I've seen this already somewhere. The ability to recognise speech is still the major hurdle.

Speakable items is fun for about two minutes.

Adopt-A-Yankee
     
Forte
Forum Regular
Join Date: Feb 2004
Status: Offline
Reply With Quote
Dec 1, 2004, 01:48 PM
 
Speakable items is fun for about two minutes.
Three if you enjoy having the computer speak curse-words and web-slang!
     
cla
Mac Enthusiast
Join Date: Mar 2000
Status: Offline
Reply With Quote
Dec 1, 2004, 02:41 PM
 
Infitiny amount of minutes if it would work.
     
kcmac
Mac Elite
Join Date: Jan 2001
Location: Kansas City, Mo
Status: Offline
Reply With Quote
Dec 2, 2004, 03:16 PM
 
Check out this article and the video that accompanies it.

David Pogue writes about Dragon Naturally speaking and then does a video where he dictates and tries to get it to make a mistake.

Unfortunately it does not work for the Mac. And he states that the apps for the Mac are not ready for use.
     
cla
Mac Enthusiast
Join Date: Mar 2000
Status: Offline
Reply With Quote
Jan 3, 2005, 09:44 AM
 
     
HOMBRESINIESTRO
Dedicated MacNNer
Join Date: Nov 2003
Location: On a West Indian Island.
Status: Offline
Reply With Quote
Jan 3, 2005, 12:33 PM
 
AHHH, I've just seen that Scansoft has acquired Rhetorical (by far my favourite http://www.rhetorical.com/). Please, no, I always wished Apple would buy them/license their system. :-(

This is sad news.
Scarcely pausing for breath, Vroomfondel shouted, "We DON'T demand solid facts! What we demand is the total ABSENCE of solid facts. I demand that I may or may not be Vroomfondel!"
     
cla
Mac Enthusiast
Join Date: Mar 2000
Status: Offline
Reply With Quote
Jan 3, 2005, 01:35 PM
 
Yes, the link is not entirely up-to-date - Babel and Elan have merged as well.
     
[APi]TheMan
Mac Elite
Join Date: Sep 2001
Location: Chico, CA and Carlsbad, CA.
Status: Offline
Reply With Quote
Jan 3, 2005, 01:37 PM
 
Originally posted by SafariX:
My voice is my password...


My voice IS my password...


MY voice IS my password....

MY VOICE IS MY PASSWORD MOTHER****ER, LET ME IN BEFORE I **** ON YOUR FACE!

(after calming down):
.....muh eye vuh oye suh, is my pass word uh...

good ol' OS9
Hah, yes!
"In Nomine Patris, Et Fili, Et Spiritus Sancti"

     
lavar78
Professional Poster
Join Date: Feb 2002
Location: Yorktown, VA
Status: Offline
Reply With Quote
Jan 3, 2005, 04:30 PM
 
I long for a voice that sounds like the Transformer Soundwave. Then it would say "eject" every time I unmounted something.

"I'm virtually bursting with adequatulence!" - Bill McNeal, NewsRadio
     
C.J. Moof
Mac Elite
Join Date: Aug 2001
Location: Madison, WI
Status: Offline
Reply With Quote
Jan 3, 2005, 06:02 PM
 
I find it interesting that the Speech Prefpane has a popup to select the Recognition System in use. That hints at the potential for plugging in an alternate recoginition system.

Hopefully it would be one that understands more of what I say than my dog does, unlike the current Speakable Items. And has the ability to differentiate between me and someone on speakerphone....

Although, I did amuse my father in law over Christmas, who wanted an easier way to sleep his iBook. I made a speakable Sleep Now applescript... that was too easy for him- after I started just telling it to go to sleep for laughs.
OS X: Where software installation doesn't require wizards with shields.
     
discotronic
Mac Elite
Join Date: Oct 2003
Location: Richmond,Va
Status: Offline
Reply With Quote
Jan 3, 2005, 07:07 PM
 
Originally posted by C.J. Moof:
I find it interesting that the Speech Prefpane has a popup to select the Recognition System in use. That hints at the potential for plugging in an alternate recoginition system.

Hopefully it would be one that understands more of what I say than my dog does, unlike the current Speakable Items. And has the ability to differentiate between me and someone on speakerphone....

Although, I did amuse my father in law over Christmas, who wanted an easier way to sleep his iBook. I made a speakable Sleep Now applescript... that was too easy for him- after I started just telling it to go to sleep for laughs.
Closing the lid not easy enough
     
Jolt21
Dedicated MacNNer
Join Date: May 2004
Status: Offline
Reply With Quote
Jan 10, 2005, 08:38 AM
 
does anyone use ViaVoice for Panther? if so, how is it?
blah
     
voyageur
Mac Elite
Join Date: Jul 2003
Status: Offline
Reply With Quote
Jan 10, 2005, 09:11 AM
 
We've tried it with Panther. It seemed to work fine on a G4 iMac running Panther, but we had trouble getting through the setup assistant on a dual 1.8 GHz G5 running Panther. The problem on the G5 seemed to be getting ViaVoice to "hear" the microphone input; during setup we would get an error when we tried to test the microphone, even with the input sound turned all the way up in Sound Preferences. But we weren't using the microphone that came with ViaVoice; we were using a high-end Sennheiser handheld mike (with the Andrea USB sound pod). We had used this mike successfully with ViaVoice on Macs running Jaguar. I think we might eventually have gotten it to work with Panther on the G5, but the person who was going to use it lost interest.
     
Rickster
Mac Elite
Join Date: Feb 2001
Location: Vancouver, WA
Status: Offline
Reply With Quote
Jan 10, 2005, 07:38 PM
 
Hey, it's hard to wreck a nice beach.
Rick Roe
icons.cx | weblog
     
jasong
Mac Elite
Join Date: Mar 2000
Location: Allston, MA, USA
Status: Offline
Reply With Quote
Jan 11, 2005, 09:18 AM
 
Originally posted by Rickster:
Hey, it's hard to wreck a nice beach.
Huh?

-- Jason
     
Skip Breakfast
Mac Enthusiast
Join Date: Oct 2002
Location: Seattle
Status: Offline
Reply With Quote
Jan 11, 2005, 08:18 PM
 
Yeah. wha? 0_o
PowerMac G4 Gigabit 1.2GHz, 896MB, 2x 80GB WD SE, Pioneer 107, Radeon 9000 Pro 128MB

Macintosh TV
     
Rickster
Mac Elite
Join Date: Feb 2001
Location: Vancouver, WA
Status: Offline
Reply With Quote
Jan 11, 2005, 10:20 PM
 
Say it out loud. It's somebody's anecdote from talking to researchers at a CS conference, which nicely illustrates some of the difficulties they face.

Seriously, though... Apple's marginal speech support may be sort of a chicken-and-egg thing. Every year at WWDC the head of Apple's Speech Technology group waxes eloquent about all the great uses application developers could put OS X's speech APIs to, and yet nobody ever does anything interesting with it. Surely there's some powers-that-be at Apple wondering why they should put more money and manpower into improving speech frameworks that nobody uses. (Of course, now that Apple's getting to be a major applications developer, they're just at much at fault as the third parties.)

There've been some pretty impressive developments that haven't received much attention, too. Remember when Junk Mail filtering was new in 10.2, and they were talking about their impressive Latent Semantic Mapping technology? They also use it in Speakable Items in 10.2 and newer, making it capable of understanding variations on commands instead of requiring precise wording. (For example, you can say "What time is it", "What's the time", "What hour is it", or pretty much any variation with equivalent meaning, and it'll still be recognized.) It's CPU-intensive, so it has to be turned on manually in System Prefs.
Rick Roe
icons.cx | weblog
     
yukon
Mac Elite
Join Date: Oct 2000
Location: Amboy Navada, Canadia.
Status: Offline
Reply With Quote
Jan 11, 2005, 10:21 PM
 
I would think voice recognition, when advanced enough, would be really useful. Many actions can be done with a mouse easier, but if the dictionary used is large enough, then things could be done more remotely and naturally.

Hard to think up examples though. How about this. At a party, you use iTunes to play music. It's set on shuffle, and it eventually comes up to BillyBoBob-CountryJamboree.mp3. "Computer! Run script iTunes. Pause. Play Royksopp's Poor Leno. Volume up 3.". Could use the new "Party Shuffle" of course, but you can't request specific music with that or a remote without being on the computer. Handwriting recognition has become accurate enough to be useful, voice recognition may be more complex, but it's still useful, if not just impressive.

Speech synthesis can be useful, I've used it to read out books to me....books with writing so complex that human readers get confused after a dozen commas in the same sentance, where a computer can keep track and add proper inflection. You haven't lived until you've heard Fred explaining virtue ethics to you.
[img]broken link[/img]
This insanity brought to you by:
The French CBC, driving antenna users mad since 1937.
     
   
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 03:25 AM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,