Greg Paulson
August 2nd, 2009, 01:08 AM
Problem: How do you give a client the audio for an interview they need to listen to, as well as
the transcription metadata XML file so they can study roughly what was said?
Supposedly this metadata can be encoded with Flash, but I've got no idea how it displays it, what
you can do other than just look at it? I dunno.
Ok, I wrote this tool today, it's very, very basic, but if you know the code, there are hints
about how great this could evolve to become.
I can't expect my client to have Soundbooth kicking around, though it has features that are handy,
like renaming words, deleting words, so on.. things that someone can change to improve the
quality of the transcription and also get a better handle on picking out talking points. Those features are beyond what I have skills to do. But I can do this:
To use this code, you'll need to do a few VERY simple things.
#1 Grab this file. It's got two HTML files in there. Unzip the files into a folder somewhere.
Free File Hosting Made Simple - MediaFire (http://www.mediafire.com/?sharekey=ab364c3c6627366f61d4646c62b381cbe04e75f6e8ebb871)
#2 Take the audio file (.wav, .mp3, whatever) and the resulting transcription file from Soundbooth , copy them to that above folder and rename the
.xml file to 'voices.xml'. The file name is fixed in the code. Easy to change.
#3 You need Media Player Classic. In the options tab, select Web Interface.
- Check Listen on port ##### (the default is 13579, check your firewall settings, though
shouldn't matter, everything serves from your localhost).
- Check Serve pages from: and select the folder where you put the files.
- Default Page should be the file named 'MPC.html'
#3 Open the audio file via Media Player. Pause it, play it, whatever.
#4 Open the following link in Firefox
http://localhost:13579/
or
http://localhost:13579/MPC.html
Two things are happening here. The first is a smaller window is being opened up (make sure you
allow popups from localhost). This window is just a simple workaround to a problem I have
submitting the data to the web server).
The other thing that is happening is the transcription file (voices.xml) is being parsed and made
ready to be displayed. I've set the max words shown to 1500. Increase that number if you choose.
You'll see displayed on the left a timecode starting at 00:00:00. To the right are 20 words.
Next line is another timecode, and another 20 words, and so it goes.
The timecode, as well as the words are clickable and will move the playhead of Media Player
Classic to the corresponding timecode.
So it's really easy to click around a document and then immediately hear the audio from that spot
in the transcription.
The down side is the transcription is SO off the mark, it's useless. I couldn't make heads or
tails of what I was seeing.
----
About the code:
I've got a few thing in there that would change the size or colour of the font according to the
confidence level of the transcription. Trouble words would be reder or taller than the more
accurate greener and smaller words. I ran into a few odd tag structures in one of my larger .xml
files and was getting errors parsing the confidence data, so I disabled this feature.
Confidence is always at at 40.
There is no feedback from Media Player Classic as to where it is in a document. I've not figured
out all the communication back & forths on this yet. What should happen is each word is
highlighted as it gets played out.
Time is only accurate to the second, since I'm not sure of any precise way of setting the
playhead. The metadata gives you millisecond accuracy, but that gets reduced by a factor of 1000
so Media Player Classic can be told where to skip to (expects times as '00:00:00').
Would be sweet to edit the words, delete them, do those things that can be done in Soundbooth.
As I said, the XML parsing is rough and prone to errors. And if you changed any of the words, not
sure how to turn all that data back into an .XML file.
the transcription metadata XML file so they can study roughly what was said?
Supposedly this metadata can be encoded with Flash, but I've got no idea how it displays it, what
you can do other than just look at it? I dunno.
Ok, I wrote this tool today, it's very, very basic, but if you know the code, there are hints
about how great this could evolve to become.
I can't expect my client to have Soundbooth kicking around, though it has features that are handy,
like renaming words, deleting words, so on.. things that someone can change to improve the
quality of the transcription and also get a better handle on picking out talking points. Those features are beyond what I have skills to do. But I can do this:
To use this code, you'll need to do a few VERY simple things.
#1 Grab this file. It's got two HTML files in there. Unzip the files into a folder somewhere.
Free File Hosting Made Simple - MediaFire (http://www.mediafire.com/?sharekey=ab364c3c6627366f61d4646c62b381cbe04e75f6e8ebb871)
#2 Take the audio file (.wav, .mp3, whatever) and the resulting transcription file from Soundbooth , copy them to that above folder and rename the
.xml file to 'voices.xml'. The file name is fixed in the code. Easy to change.
#3 You need Media Player Classic. In the options tab, select Web Interface.
- Check Listen on port ##### (the default is 13579, check your firewall settings, though
shouldn't matter, everything serves from your localhost).
- Check Serve pages from: and select the folder where you put the files.
- Default Page should be the file named 'MPC.html'
#3 Open the audio file via Media Player. Pause it, play it, whatever.
#4 Open the following link in Firefox
http://localhost:13579/
or
http://localhost:13579/MPC.html
Two things are happening here. The first is a smaller window is being opened up (make sure you
allow popups from localhost). This window is just a simple workaround to a problem I have
submitting the data to the web server).
The other thing that is happening is the transcription file (voices.xml) is being parsed and made
ready to be displayed. I've set the max words shown to 1500. Increase that number if you choose.
You'll see displayed on the left a timecode starting at 00:00:00. To the right are 20 words.
Next line is another timecode, and another 20 words, and so it goes.
The timecode, as well as the words are clickable and will move the playhead of Media Player
Classic to the corresponding timecode.
So it's really easy to click around a document and then immediately hear the audio from that spot
in the transcription.
The down side is the transcription is SO off the mark, it's useless. I couldn't make heads or
tails of what I was seeing.
----
About the code:
I've got a few thing in there that would change the size or colour of the font according to the
confidence level of the transcription. Trouble words would be reder or taller than the more
accurate greener and smaller words. I ran into a few odd tag structures in one of my larger .xml
files and was getting errors parsing the confidence data, so I disabled this feature.
Confidence is always at at 40.
There is no feedback from Media Player Classic as to where it is in a document. I've not figured
out all the communication back & forths on this yet. What should happen is each word is
highlighted as it gets played out.
Time is only accurate to the second, since I'm not sure of any precise way of setting the
playhead. The metadata gives you millisecond accuracy, but that gets reduced by a factor of 1000
so Media Player Classic can be told where to skip to (expects times as '00:00:00').
Would be sweet to edit the words, delete them, do those things that can be done in Soundbooth.
As I said, the XML parsing is rough and prone to errors. And if you changed any of the words, not
sure how to turn all that data back into an .XML file.