Transcription HTML Handy Tool at DVinfo.net

Greg Paulson · August 2nd, 2009, 01:08 AM

Problem: How do you give a client the audio for an interview they need to listen to, as well as

the transcription metadata XML file so they can study roughly what was said?

Supposedly this metadata can be encoded with Flash, but I've got no idea how it displays it, what

you can do other than just look at it? I dunno.

Ok, I wrote this tool today, it's very, very basic, but if you know the code, there are hints

about how great this could evolve to become.

I can't expect my client to have Soundbooth kicking around, though it has features that are handy,

like renaming words, deleting words, so on.. things that someone can change to improve the

quality of the transcription and also get a better handle on picking out talking points. Those features are beyond what I have skills to do. But I can do this:

To use this code, you'll need to do a few VERY simple things.
#1 Grab this file. It's got two HTML files in there. Unzip the files into a folder somewhere.
Free File Hosting Made Simple - MediaFire

#2 Take the audio file (.wav, .mp3, whatever) and the resulting transcription file from Soundbooth , copy them to that above folder and rename the

.xml file to 'voices.xml'. The file name is fixed in the code. Easy to change.

#3 You need Media Player Classic. In the options tab, select Web Interface.
- Check Listen on port ##### (the default is 13579, check your firewall settings, though
shouldn't matter, everything serves from your localhost).
- Check Serve pages from: and select the folder where you put the files.
- Default Page should be the file named 'MPC.html'

#3 Open the audio file via Media Player. Pause it, play it, whatever.

#4 Open the following link in Firefox

http://localhost:13579/
or
http://localhost:13579/MPC.html

Two things are happening here. The first is a smaller window is being opened up (make sure you

allow popups from localhost). This window is just a simple workaround to a problem I have

submitting the data to the web server).
The other thing that is happening is the transcription file (voices.xml) is being parsed and made

ready to be displayed. I've set the max words shown to 1500. Increase that number if you choose.

You'll see displayed on the left a timecode starting at 00:00:00. To the right are 20 words.

Next line is another timecode, and another 20 words, and so it goes.

The timecode, as well as the words are clickable and will move the playhead of Media Player

Classic to the corresponding timecode.

So it's really easy to click around a document and then immediately hear the audio from that spot

in the transcription.

The down side is the transcription is SO off the mark, it's useless. I couldn't make heads or

tails of what I was seeing.

----
About the code:
I've got a few thing in there that would change the size or colour of the font according to the

confidence level of the transcription. Trouble words would be reder or taller than the more

accurate greener and smaller words. I ran into a few odd tag structures in one of my larger .xml

files and was getting errors parsing the confidence data, so I disabled this feature.
Confidence is always at at 40.

There is no feedback from Media Player Classic as to where it is in a document. I've not figured

out all the communication back & forths on this yet. What should happen is each word is

highlighted as it gets played out.

Time is only accurate to the second, since I'm not sure of any precise way of setting the

playhead. The metadata gives you millisecond accuracy, but that gets reduced by a factor of 1000

so Media Player Classic can be told where to skip to (expects times as '00:00:00').

Would be sweet to edit the words, delete them, do those things that can be done in Soundbooth.

As I said, the XML parsing is rough and prone to errors. And if you changed any of the words, not

sure how to turn all that data back into an .XML file.

August 2nd, 2009, 01:08 AM	#1
Greg Paulson Regular Crew Join Date: Jun 2009 Location: Winnipeg , Manitoba, Canada Posts: 43	Transcription HTML Handy Tool Problem: How do you give a client the audio for an interview they need to listen to, as well as the transcription metadata XML file so they can study roughly what was said? Supposedly this metadata can be encoded with Flash, but I've got no idea how it displays it, what you can do other than just look at it? I dunno. Ok, I wrote this tool today, it's very, very basic, but if you know the code, there are hints about how great this could evolve to become. I can't expect my client to have Soundbooth kicking around, though it has features that are handy, like renaming words, deleting words, so on.. things that someone can change to improve the quality of the transcription and also get a better handle on picking out talking points. Those features are beyond what I have skills to do. But I can do this: To use this code, you'll need to do a few VERY simple things. #1 Grab this file. It's got two HTML files in there. Unzip the files into a folder somewhere. Free File Hosting Made Simple - MediaFire #2 Take the audio file (.wav, .mp3, whatever) and the resulting transcription file from Soundbooth , copy them to that above folder and rename the .xml file to 'voices.xml'. The file name is fixed in the code. Easy to change. #3 You need Media Player Classic. In the options tab, select Web Interface. - Check Listen on port ##### (the default is 13579, check your firewall settings, though shouldn't matter, everything serves from your localhost). - Check Serve pages from: and select the folder where you put the files. - Default Page should be the file named 'MPC.html' #3 Open the audio file via Media Player. Pause it, play it, whatever. #4 Open the following link in Firefox http://localhost:13579/ or http://localhost:13579/MPC.html Two things are happening here. The first is a smaller window is being opened up (make sure you allow popups from localhost). This window is just a simple workaround to a problem I have submitting the data to the web server). The other thing that is happening is the transcription file (voices.xml) is being parsed and made ready to be displayed. I've set the max words shown to 1500. Increase that number if you choose. You'll see displayed on the left a timecode starting at 00:00:00. To the right are 20 words. Next line is another timecode, and another 20 words, and so it goes. The timecode, as well as the words are clickable and will move the playhead of Media Player Classic to the corresponding timecode. So it's really easy to click around a document and then immediately hear the audio from that spot in the transcription. The down side is the transcription is SO off the mark, it's useless. I couldn't make heads or tails of what I was seeing. ---- About the code: I've got a few thing in there that would change the size or colour of the font according to the confidence level of the transcription. Trouble words would be reder or taller than the more accurate greener and smaller words. I ran into a few odd tag structures in one of my larger .xml files and was getting errors parsing the confidence data, so I disabled this feature. Confidence is always at at 40. There is no feedback from Media Player Classic as to where it is in a document. I've not figured out all the communication back & forths on this yet. What should happen is each word is highlighted as it gets played out. Time is only accurate to the second, since I'm not sure of any precise way of setting the playhead. The metadata gives you millisecond accuracy, but that gets reduced by a factor of 1000 so Media Player Classic can be told where to skip to (expects times as '00:00:00'). Would be sweet to edit the words, delete them, do those things that can be done in Soundbooth. As I said, the XML parsing is rough and prone to errors. And if you changed any of the words, not sure how to turn all that data back into an .XML file.