View Full Version : Aligning two audio files?
TingSern Wong September 7th, 2005, 09:11 AM I recorded a session this afternoon. Using a Sound Devices 442 mixer, output to two recording devices - (a) Canon XL2, (b) Sound Devices 722 digital recorder. One XLR cable coming out of 442 has two as output (an inexpensive "splitter"). In this case, 2 XLR from 442 (stereo) has 4 wires as connection, one pair going to (a), the other pair to (b).
Canon XL2's tape recording is read in using Sony DSR25 and recorded as 48 khz, 16 bits, stereo.
Sound Devices 722 recording is set at 48 khz, 24 bits, stereo.
Both devices does NOT have any facility to link to an external timecode generator.
I read both audio files into my PC using Sony's Sound Forge 8.0.
When I align the audio (listening to it critically) of (a) at time XX, with (b), I found at 30 seconds down the line, 30 seconds of (a) will not match with 30 seconds of (b). I use the time displayed by Sound Forge to accurately position the pointer on the audio files and I play them at XX+30 seconds. The sounds don't match.
Questions -
a) Is there a way to make the two audio files "equal"?
b) Is there a way to "align" two audio files this way? Or am I hoping for the moon here?
Reason why I am doing this -
Output from 722 is much better than output from Canon XL2. 722 is recording at 24 bits, versus 16 bits for Canon XL2.
Thank you,
TS
A. J. deLange September 7th, 2005, 09:42 AM TS,
Yes to both questions, but it involves additional equipment (i.e. expense). The Sound Devices 722 (according to it's specifications) accepts word clock. If you derive word clock from the camera then audio recorded on the camera and on the tape will be recorded at exactly the same sample rate, or at least at phase locked rates (see below) and if aligned at any point will stay in alignment throughout. There are several devices that will produce word sync from the composite video output of the XL2. The only one with which I have any familiarity is the MOTU Midi Timepiece AV (http://www.motu.com/products/midi/mtpav_usb/body.html/en)
Connect the video output BNC on the XL2 to the video in (BNC) on the Timepiece, run a cable from the AV wordsync out (BNC) to the wordsync in on the recorder, and struggle with the manuals to figure out how to get the MOTU device to generate word sync from video and how to get the recorder to sample from external clock.
The bad news here is that the MOTUI Midi Timepiece AV is, as the name suggests, primarily a Midi router with lots of features you don't want and only a couple you do plus it is 19" x 1 RU and runs on 110/220 VAC. At least it only draws 7 watts so you can run it in the field from a small battery and inverter and it doesn't weigh much.
RE the "see below": There is some debate as to whether the XL2 actually samples at 48 kHz. If indeed it does not it seems that FCP and other NLE's resample to exactly 48 KHz at capture. This means that in the computer the stream from the camera and recorder will be frequency locked. It remains your job to phase lock them i.e. get them lined up. Once they are aligned at one point they should be aligned throughout.
TingSern Wong September 8th, 2005, 12:39 AM Hi AJ,
Thanks for the info ....
Need a bit more info from you - namely, what other manufacturers you are aware of that makes word clock extraction unit (like the way you describes it)? I don't (and can't carry either) need the MOTU - it is way too big. I can hunt the "toys" myself - if only I have some ideas where to go, looking for them. This appears to be highly specialised - and only few makers around. Ideally a unit that is operated by 4 AA batteries or something around that parameter will be ideal.
(I found something that could do this job very neatly ... Horita's PTG. According to the website, it takes in composite video signals, and generates SMPTE timecodes. Runs on a single 9V battery. Very small. Very portable. I would be very appreciate if you could confirm that Horita PTG will do this job - thank you).
Next - the Canon XL2 sampling rate is indeed 48khz 16 bits. But - how accurate is the sampling rate, I have no way to tell. From my experiments with the two audio files (XL2 and 722), after 30 minutes, it is about 1 to 2 seconds drift. Again, because the XL2's data has to pass through Canopus NLE first - before being encoded as AVI. The 722 audio is simply copied as a WAV file. No additional manipulations are involved for 722. I am not sure if the drift is in the XL2 itself or induced by NLE hardware / software when it encodes into AVI format.
TS
A. J. deLange September 8th, 2005, 02:41 PM TS,
I am only familiar with the MOTU unit and therefore reluctant to advize you in any other direction for fear of steering you towards something that might not work out for you. You are right that this seems quite specialized and the equipment selections so limited that I wonder if many people try to do what you (and I) are attempting. There was another thread here in which a third chap trying to do essentially the same thing was advised that it wasn't necessary because the audio can be stretched or compressed in time in post processing and while that is true it doesn't seem a very elegant way to do things given that it should be easy to build gear which derives word clock from video.
I did an extensive search for portable devices to do this job and could only find one which is similar to the Horita in that it produced LTC (Longitudinal SMPTE Time Code) from video and the time code which appears on the LANC port of Canon and Sony (an perhaps some other) cameras. It was powered by the camera and reasonably priced. It is no longer made. All the other devices I could find were mains powered and expensive.
The problem as I see it with the Horita box is that while it produces LTC resolved to a video reference signal it does not produce word clock and your recorder cannot, according to my reading of it's manual, lock to LTC. Thus with the Horita device you would need a recorder which can derive sync from LTC or another box to develop word sync from LTC. These exist (for example the MOTU) but you don't need the Horita device to do this job if you have an MT AV.
Motu makes another box which is battery operated and which will record to the hard disk of a laptop (over firewire) and which will sync to LTC but you already have the 722 for recording and doubtless wouldn't want to carry the digitizer and the laptop. If I knew how to do what you are trying to do with portable gear I'd be doing it. Note that my searching did not turn up Horita so thanks for that one.
Good luck and if you find anything let us know. There are more than just you and I interested in this.
TingSern Wong September 8th, 2005, 07:13 PM Hi AJ,
Thank you very much for hunting the info for me. I have confirmation from Horita themselves that the DTG won't work for this application as well. But, now - we know that if we have to generate LTC from composite video, we don't have to drag out the MOTU or its equivalent now. It is really very small - looks real neat.
Looks like there appears to be no easy way out. Sound Devices have another audio recorder - 744T - which does take in SMPTE time codes. Unfortunately, it is a 4 channel recorder, and costs twice the price of my 722. I don't need a 4 channel recorder to begin with. Can't justify paying 2 x 722 for something which is essentially way overkill for the audio applications I am doing.
I will keep hunting for a portable and reasonably priced device to generate word clock for 722. Also, waiting for reply from Sound Devices themselves what equipment they are aware of that is compatible with 722. I will keep posting here what I can find - if one exists.
Thanks once again,
TS
Jack Smith September 8th, 2005, 07:57 PM Have you looked at the file properties once captured to the hard drive?Are they both 48k ?Or has your capture changed that?Another thing, if you record to both devices and clap head and tail then capture the 2 files and load them into an audio editor eg. sound forge ,trim both head and tail to the clapper what is the time difference in length?Something doesnt sound right here.Something might be changing the file properties.It seems like it is too much out of sync for such a short record time.
Until you figure out why they differ(which I would make a priority) a workaround would be to stretch the short one without pitch change.
TingSern Wong September 8th, 2005, 08:07 PM Hi Jack,
They are both indeed 48Khz. Only the bit depths are different. XL2 is 16 bits, 722 is 24 bits. They are not that much different. After 30 minutes, the XL2 audio is about 1 second to 2 seconds (as far as my ears can tell) faster than the 722 audio.
I have been reading somewhere that Canon XL2 sampling rate is not exactly 48,000 hz, but 48,007 hz. Not sure about the 722 recorder, though. I presume the 722 could be more accurate than the XL2.
Hi AJ,
Found something else - Ambient's Lockit. This one takes video sync, generates both LTC and Word Clock. Is totally portable too. Not sure about prices yet. But, it looks like it might fit the bill here.
Thanks,
TS
A. J. deLange September 9th, 2005, 06:43 AM TS,
The Lockit appears that it will do the job. Everything looks right except the price. It's definitely professional gear. One of our sponsors (B&H) sells it for US $ 1180 - more than twice the price of the MOTU box but it is small, runs off A cells etc.
It is interesting to note that the manufacturer gives some specs on how good their crystal, which they say is temperature compensated, is with respect to ageing, calibration etc. They seem to indicate that it is possible to set two of their boxes to within 0.2 ppm by calibrating each to an external reference daily. At 0.2 ppm offset it would take 46 hours for 1 frame drift. They indicate ageing of 1 ppm per year and 0.5 ppm over -10 to + 40 C. I'm guessing the crystals in prosumer cameras are 1 to 2 orders of magnitude worse than this (they are not calibrated daily or indeed ever unless perhaps when the camera goes to the shop) so it shouldn't be surprising (though it seems to be) to people that you can build up a frame of drift in a few minutes to an hour.
TingSern Wong September 9th, 2005, 09:26 AM Hi AJ,
Have asked Trew Audio for advice whether there exists anything (presumably cheaper) that could do this job as opposed to only Ambient's Lockit. Waiting for reply from them now.
I supposed the high cost reflects the engineering needed to create such an accurate crystal to do this job. If this is the only one that could do it (hopefully not), then the money question comes in. Have to weigh whether it is worth paying an arm and a leg for this piece of hardware.
Will keep this forum informed about the thereabouts of this journey.
TS
A. J. deLange September 9th, 2005, 11:03 AM The frustrating thing about all of this is that the camera is developing sample clock to run its A/D's. It would be a trivially simple matter to buffer this up and put it out on a connector. Also they are putting SMPTE time code out on the LANC port. It would be a simple matter to convert this to LTC and put that out on a connector. While I'm on a roll it should be simple for them to lock the camera to black burst or composite video coming in through another (or the existing BNC) video connector. I'll bet the sum total of all three of those things wouldn't raise the cost to the consumer by more than a couple hundred dollars if that. But then what percentage of buyers wants or would even know what to do with these signals? It would probably be a bad marketing decision.
Michael D. Shivers September 9th, 2005, 11:23 AM I saw your post and it reminded me of some research I was doing a while back. I reading about color sampling and at the end of the article (by Adam Wilt) it talks about Locked vs. Unlocked audio.
It seems that the XL2 (Canon in general) audio does slip a bit by a extremely small amount. The head developer of FCP figured it out and made some adjustments to the system to compenstate for it.
This may be the answer to why its not in sync. But it depends on what system you are capturing on.
Scroll down the page of this link until you see "Locked vs unlocked audio" and see if this helps out.
http://www.adamwilt.com/DV-FAQ-tech.html#color_sampling
thanks,
michael
George Ellis September 9th, 2005, 03:30 PM Just as a possible cheap solution.... I would assume that the "delta" is constant. Find an in and out that match on both clips. Go into Sound Forge and use the Time Stretch plugin to force a match on the in and out (you will either need to Mark in and out or cut the clips). So, clip a is 45m 20.3 s, clip b is 45m 42.5s. Use time stetch to make clip b 45:20.3...
In a pinch, maybe... Definitely not the best solution though.
Edit - You can also then calculate a % difference with a matching in and out and apply to the whole clip.
Jack Smith September 9th, 2005, 09:46 PM Thanks George ,you said it better than I.
TS even if the sample rate were 48007 versus 48000 that would be a .015 percent difference(approx) that would equate to 7 or 8 frames difference over 30 minutes.Much less than what you get( 30 to 60). Have you tried resampling the files ? Have you tried the test with a clapper at head and tail?
Douglas Spotted Eagle September 9th, 2005, 09:56 PM Canon cameras grab 48kHz sound at around 48.009 kHz, which can result in almost a second of video/audio slippage over the course of an hour (or around one frame every two minutes)
This is well documented, and Canopus, FCP, Sony have all had compensating code in their NLE's. In the early days, we had to modify .ini files for the hardware to work with this correctly.
TingSern Wong September 10th, 2005, 01:02 AM Yes, I understand locked and unlocked audio principles. Thank you. I was just frustrated that I had no easy solution (hardware wise) to make the two pieces of very good equipment talk to each other without paying a king's ransom for it.
It looks like suggestion by George is the best (and cheapest) way out for me. Just determine the delta change (compared with output from 722) and make the changes to the audio in 722 to match the faulty one in XL2 so that the frames match the "accurate" audio coming out of 722.
TS
A. J. deLange September 14th, 2005, 12:46 PM The frustrating thing about all of this is that the camera is developing sample clock to run its A/D's. It would be a trivially simple matter to buffer this up and put it out on a connector. Also they are putting SMPTE time code out on the LANC port. It would be a simple matter to convert this to LTC and put that out on a connector. While I'm on a roll it should be simple for them to lock the camera to black burst or composite video coming in through another (or the existing BNC) video connector. I'll bet the sum total of all three of those things wouldn't raise the cost to the consumer by more than a couple hundred dollars if that. But then what percentage of buyers wants or would even know what to do with these signals? It would probably be a bad marketing decision.
Asked for it on Friday and they grant my wish the following Wednesday (without sample clock but genlock and LTC will do the job)!
TingSern Wong September 14th, 2005, 07:59 PM What kind of hardware is it? Extract timing info from LANC port? Hmmm, sounds like nice thing then. Can point me to weblink then?
A. J. deLange September 14th, 2005, 08:20 PM TS,
I was referring to the newly announced XLH1. It will accept or generate both genlock and LTC. The box I use for recording external audio (MOTU traveller) can resolve to LTC so my sync problem is solved (as soon as I figure out how to get a $9000 camera past my wife - wish they'd kept the color scheme the same!)
[Fixed XL1H to read XLH1]
TingSern Wong September 14th, 2005, 08:29 PM Ouch!! XL1H costs more than XL2? That don't make sense to me. Canon is definitely up to something here. I have to check out what's so great about XL1H then. Thanks for headup.
Oops ... new camera is not XL1H ... but, it should be XL H1 :-). HD version of XL fellow. Can sleep on it - because PAL version won't be out so soon. Even NTSC is only available Dec this year.
|
|