Speach to Text Software? at DVinfo.net

Chris Sgaraglino · August 26th, 2010, 10:07 PM

I have some audio interviews that I would like to convert to text so that I can use them in a verity of different documents.

Any recommendations?

Robert Turchick · August 27th, 2010, 12:42 AM

Nuance - Dragon Naturally Speaking

I have the iphone version and it's super accurate. My aunt has been using it on her XP machine for years and raves about it.

Paul Cascio · August 27th, 2010, 09:59 AM

I use Drago for writing, especially first drafts. It's great once you understand how it works and adjust your speech pattern slightly.

The Dragon Iphone app is my favorite Iphone app.

Jay West · August 28th, 2010, 01:52 PM

I haven't tried the iPhone version but thought it was primarily a dictation app. (Also, we don't yet have iPhone coverage in the area where I live and work.) I have worked a bit with the Nuance/Dragon apps but found their usefulness was limited to dictation (where you can train them to your voice). Never had much luck with it for translating/transcibing meetings, interviews, etc.

But, if you've got Adobe's Premiere Pro CS4 or CS5 or Soundbooth CS4 or CS5, you've got a speech-to-text program in their "metadata speech analysis function" which can do a decent job with interview transcription.

The operative word is "may." I do a fair amount of work for lawyers. This sometimes involves video of depositions and sometimes mp3 or wav audio recordings of meetings and interviews. I've used the CS4/CS5 program when they've needed something so immediately --- say, when we are on the road or its at the end of the day and the secretaries have gone home and the lawyers need something to study that evening.

I import the audio or video file into CS5, put it on the timeline, click on the audio track, go to the source monitor window and click on the Metadata tab, and then click the "Analyze button." The last time I did this, it transcribed about 6½ hours of deposition into a text file in about an hour.

This could not replace a stenographer's transcription, but it was good enough to be usable for what the lawyer needed.

Adobe Speech Analysis does a very good job in distinguishing one deposition/interview voice from another for purposes of separating questions and answers. It gives you a script where it identifies the speaker (with a label/name) gives a text of what was said. The response/answer is identified as coming from another speaker. Although it does a good job of separating the text from different speakers, sometimes it identifies the same voice as a different speaker.

It has trouble with technical terms.

As you would expect, much depends on the clarity of the recording, the absence of ambient noise, and timbre of the speaker's voice.

Accents can throw it for a loop. I think of an example of this kind of thing from a Prairie Home Companion/Guy Noir episode. Guy is in North Carolina looking for a town called Boiling Springs. Minnesotan Guy (using a sort of New York accent) pronounces it "boyle-ng" but the North Carolina locals say "bye-lyn." It is hard enough for us to figure out what was meant when a Texan pronounces "cookie" with four syllables or a Minnesotan says, "you betcha' ... well, you get the picture.

So, "usable" and "accurate enough to immediately drop into a publishable document" can be different things. Proofreading and corrections are a must. The system can learn and can work with "reference scripts." Say, you have several interview sessions with the same speaker. You use the corrected first text conversion as a "reference script" for helping with accurancy in translating later interviews.

Actually, this software reminds me of what is was like with PC-based OCR from 15 and 20 years ago. Sometimes, the results were excellent. Sometimes, the scan worked well enough to be usable. Sometimes you were better off just typing the thing from scratch.

Hope this helps.

Vasco Dones · August 29th, 2010, 07:32 AM

Chris,

my two cents:
- used Nuance's Dragon software with barely acceptable results;
not a bad product at all, but it's a dictation software, hence it needs to
be fine-tuned to ONE particular voice; you can't expect great results if used
with a variety of voices;
- switched to outsourcing the task to transcription services:
I get fairly accurate transcriptions for around $1.55/minute of recorded audio
(the company I outsource to charges between $1.55 and $3.20,
depending on sound quality, accents, interview style, etc.).
Hope this helps.

All the best

Vasco

August 26th, 2010, 10:07 PM	#1
Chris Sgaraglino Major Player Join Date: Apr 2010 Location: Fort Worth, TX Posts: 237	Speach to Text Software? I have some audio interviews that I would like to convert to text so that I can use them in a verity of different documents. Any recommendations? __________________ Chris Sgaraglino The Outdoor Life Blog \| Widow Creek Photography

August 27th, 2010, 12:42 AM	#2
Robert Turchick Trustee Join Date: Oct 2009 Location: Mesa, AZ Posts: 1,389	Nuance - Dragon Naturally Speaking I have the iphone version and it's super accurate. My aunt has been using it on her XP machine for years and raves about it. __________________ The older I get, the better I was!

August 27th, 2010, 09:59 AM	#3
Paul Cascio Trustee Join Date: Sep 2004 Location: Bristol, CT (Home of EPSN) Posts: 1,192	I use Drago for writing, especially first drafts. It's great once you understand how it works and adjust your speech pattern slightly. The Dragon Iphone app is my favorite Iphone app. __________________ Paul Cascio www.pictureframingschool.com

August 29th, 2010, 07:32 AM	#5
Vasco Dones Major Player Join Date: Aug 2007 Location: El Cerrito, CA Posts: 266	Chris, my two cents: - used Nuance's Dragon software with barely acceptable results; not a bad product at all, but it's a dictation software, hence it needs to be fine-tuned to ONE particular voice; you can't expect great results if used with a variety of voices; - switched to outsourcing the task to transcription services: I get fairly accurate transcriptions for around $1.55/minute of recorded audio (the company I outsource to charges between $1.55 and $3.20, depending on sound quality, accents, interview style, etc.). Hope this helps. All the best Vasco __________________ www.donesmedia.net bricioledamerica.blogspot.com (in Italian)

August 28th, 2010, 01:52 PM	#4
Jay West Major Player Join Date: Mar 2010 Location: Red Lodge, Montana Posts: 889	I haven't tried the iPhone version but thought it was primarily a dictation app. (Also, we don't yet have iPhone coverage in the area where I live and work.) I have worked a bit with the Nuance/Dragon apps but found their usefulness was limited to dictation (where you can train them to your voice). Never had much luck with it for translating/transcibing meetings, interviews, etc. But, if you've got Adobe's Premiere Pro CS4 or CS5 or Soundbooth CS4 or CS5, you've got a speech-to-text program in their "metadata speech analysis function" which can do a decent job with interview transcription. The operative word is "may." I do a fair amount of work for lawyers. This sometimes involves video of depositions and sometimes mp3 or wav audio recordings of meetings and interviews. I've used the CS4/CS5 program when they've needed something so immediately --- say, when we are on the road or its at the end of the day and the secretaries have gone home and the lawyers need something to study that evening. I import the audio or video file into CS5, put it on the timeline, click on the audio track, go to the source monitor window and click on the Metadata tab, and then click the "Analyze button." The last time I did this, it transcribed about 6½ hours of deposition into a text file in about an hour. This could not replace a stenographer's transcription, but it was good enough to be usable for what the lawyer needed. Adobe Speech Analysis does a very good job in distinguishing one deposition/interview voice from another for purposes of separating questions and answers. It gives you a script where it identifies the speaker (with a label/name) gives a text of what was said. The response/answer is identified as coming from another speaker. Although it does a good job of separating the text from different speakers, sometimes it identifies the same voice as a different speaker. It has trouble with technical terms. As you would expect, much depends on the clarity of the recording, the absence of ambient noise, and timbre of the speaker's voice. Accents can throw it for a loop. I think of an example of this kind of thing from a Prairie Home Companion/Guy Noir episode. Guy is in North Carolina looking for a town called Boiling Springs. Minnesotan Guy (using a sort of New York accent) pronounces it "boyle-ng" but the North Carolina locals say "bye-lyn." It is hard enough for us to figure out what was meant when a Texan pronounces "cookie" with four syllables or a Minnesotan says, "you betcha' ... well, you get the picture. So, "usable" and "accurate enough to immediately drop into a publishable document" can be different things. Proofreading and corrections are a must. The system can learn and can work with "reference scripts." Say, you have several interview sessions with the same speaker. You use the corrected first text conversion as a "reference script" for helping with accurancy in translating later interviews. Actually, this software reminds me of what is was like with PC-based OCR from 15 and 20 years ago. Sometimes, the results were excellent. Sometimes, the scan worked well enough to be usable. Sometimes you were better off just typing the thing from scratch. Hope this helps.