iPhone XR Can Someone Help Me Understand Voice to Text Software on iOS ?

slomojoe · Feb 12, 2022

I input voice to text on my phone using: Drafts, Office, Evernote for short snippets of dictation and I wonder how this works ?

Do they all process via the same voice to speech software engine or do they each use their own ?

Is there a particular app that does the best job (Drafts seems the best but not by too much) or should they all be about the same ?

My guess is that they all default to Apple's built-in engine, is this right ?

ericwn · Feb 13, 2022

I believe anything you trigger via the dictation button on the keyboard uses Apple’s own voice to text algorithms.

slomojoe · Feb 13, 2022

ericwn said:
I believe anything you trigger via the dictation button on the keyboard uses Apple’s own voice to text algorithms.

right, thanks, that was my thinking, i know that microsoft and google have very good dictation software, it seems that, if i use the google search app, i get better recognition, so it isn't their speech engine i am activating but all still apple's universal speech to text ?

hard to do a one to one test to see if it really varies or is just my fantasy

cynics · Feb 14, 2022

Google Speech is the API Google uses in their iOS apps. iOS devs can use the API in their own apps. Google does have multiple implementations of speech to text (STT) for on device (off line) processing but usually that is used in combination with their objectively better online (Google server) processing. Its get a bit sketchy though when it comes to pricing. If you use Google Speech API they charge you less if you allow google to log your users speech data.....and since app development is a business design to make money and paying less overhead = more profit....well...

Screen Shot 2022-02-14 at 5.38.12 AM.png

Pricing | Cloud Speech-to-Text | Google Cloud

Review pricing for Speech-to-Text

cloud.google.com

Much like Apple they claim its to improve the speech to text. This is Googles wording so my bias doesn't rub off, keep in mind though thats for the developers to opt in and out with their app not the user...

Data logging | Cloud Speech-to-Text | Google Cloud

cloud.google.com

Apple has their Speech Framework available to iOS devs. Its "free" if you ignore the yearly expense for being a dev and Apple tax. To be fair Apple uses everyones voice to "improve their product". However Apple requires devs to be very explicit that the STT is going to be used for product improvement.

Google on the other hand skirts around this requirement by giving the user instructions on how to turn the mic on in settings with no word that their voice is being logged or anything...here is a comparison.

Screen Shot 2022-02-14 at 5.47.53 AM.png

Also if you use Apples Speech you can only ask the user that question one time, subsequent request are blocked. Apple thinks if you bug someone enough they will eventually say Yes...probably right..more info on that..

Asking Permission to Use Speech Recognition | Apple Developer Documentation

Ask the user’s permission to perform speech recognition using Apple’s servers.

developer.apple.com

I should mention that Google is asking for generic microphone permissions while Apple is asking specifically for permission for the app to use their Speech Recognition framework. Google just wants the mic on and its just assumed its to use their Speech API, which is a fair assessment IMO.

Apple and Google aren't the only game in town though. There are a half dozen companies that develop STT SDK's for iOS for high quality on device (offline) use.

I prefer Apple, it seems to work better for me which might just be because I've adjusted too it from using it for so long. I can say they all suck in their own way, so does Alexa. Privacy concerns aside I think what it mostly comes down to is how well the STT handles your language and your accent. Me having a standard east coast American accent both of them do fairly well with Apple picking up my voice over other people and background much better. Oh and it can identify me, vs other people which is nice for HomeKit.

Man...that was long winded...apologies...

doobydoooby · Feb 14, 2022

Have a look at Otter, Ive found it to be excellent for transcription. It uses its own servers, and seems to make fewer errors than any other I've tried. There's a free package for up to 600 minutes per month.

Otter Meeting Agent - AI Notetaker, Transcription, Insights

Otter AI Meeting Agent supports real-time transcription, live chat, automated summaries, insights, and action items.

otter.ai

slomojoe · Feb 14, 2022

cynics said:
Google Speech is the API Google uses in their iOS apps. iOS devs can use the API in their own apps. Google does have multiple implementations of speech to text (STT) for on device (off line) processing but usually that is used in combination with their objectively better online (Google server) processing. Its get a bit sketchy though when it comes to pricing. If you use Google Speech API they charge you less if you allow google to log your users speech data.....and since app development is a business design to make money and paying less overhead = more profit....well...

View attachment 1958696

Pricing | Cloud Speech-to-Text | Google Cloud

Review pricing for Speech-to-Text

cloud.google.com

Much like Apple they claim its to improve the speech to text. This is Googles wording so my bias doesn't rub off, keep in mind though thats for the developers to opt in and out with their app not the user...

Data logging | Cloud Speech-to-Text | Google Cloud

cloud.google.com

Apple has their Speech Framework available to iOS devs. Its "free" if you ignore the yearly expense for being a dev and Apple tax. To be fair Apple uses everyones voice to "improve their product". However Apple requires devs to be very explicit that the STT is going to be used for product improvement.

Google on the other hand skirts around this requirement by giving the user instructions on how to turn the mic on in settings with no word that their voice is being logged or anything...here is a comparison.

View attachment 1958698 View attachment 1958697

Also if you use Apples Speech you can only ask the user that question one time, subsequent request are blocked. Apple thinks if you bug someone enough they will eventually say Yes...probably right..more info on that..

Asking Permission to Use Speech Recognition | Apple Developer Documentation

Ask the user’s permission to perform speech recognition using Apple’s servers.

developer.apple.com

I should mention that Google is asking for generic microphone permissions while Apple is asking specifically for permission for the app to use their Speech Recognition framework. Google just wants the mic on and its just assumed its to use their Speech API, which is a fair assessment IMO.

Apple and Google aren't the only game in town though. There are a half dozen companies that develop STT SDK's for iOS for high quality on device (offline) use.

I prefer Apple, it seems to work better for me which might just be because I've adjusted too it from using it for so long. I can say they all suck in their own way, so does Alexa. Privacy concerns aside I think what it mostly comes down to is how well the STT handles your language and your accent. Me having a standard east coast American accent both of them do fairly well with Apple picking up my voice over other people and background much better. Oh and it can identify me, vs other people which is nice for HomeKit.

Man...that was long winded...apologies...

super interesting and helpful thanks, i get then that a developer can choose the engine they want which answers my question (even if it doesn't clear things up completely) that all speech to text is not processed through apple developers can use whatever engine they choose and also, the answer is to experiment and see what works for me thanks !

slomojoe · Feb 14, 2022

doobydoooby said:
Have a look at Otter, Ive found it to be excellent for transcription. It uses its own servers, and seems to make fewer errors than any other I've tried. There's a free package for up to 600 minutes per month.

Otter Meeting Agent - AI Notetaker, Transcription, Insights

Otter AI Meeting Agent supports real-time transcription, live chat, automated summaries, insights, and action items.

otter.ai

also helpful, have bookmarked, thanks!

Search

Search

iPhone XR Can Someone Help Me Understand Voice to Text Software on iOS ?

slomojoe

macrumors regular

ericwn

macrumors G5

slomojoe

macrumors regular

cynics

macrumors G4

Pricing | Cloud Speech-to-Text | Google Cloud

Data logging | Cloud Speech-to-Text | Google Cloud

Asking Permission to Use Speech Recognition | Apple Developer Documentation

doobydoooby

macrumors 6502

Otter Meeting Agent - AI Notetaker, Transcription, Insights

slomojoe

macrumors regular

Pricing | Cloud Speech-to-Text | Google Cloud

Data logging | Cloud Speech-to-Text | Google Cloud

Asking Permission to Use Speech Recognition | Apple Developer Documentation

slomojoe

macrumors regular

Otter Meeting Agent - AI Notetaker, Transcription, Insights

Our Staff