Sample Rate Conversion in Double System Sound [Archive]

View Full Version : Sample Rate Conversion in Double System Sound

Steve House

December 24th, 2005, 09:25 AM

Actually, the Sony M100 Hi-MD recorder does 16-bit 44.1 wav files and sounds pretty darn nice. ... (That would require some sample rate conversion, which might be problematic ...)

Happened to wake up this morning thinking about sample rate conversions (the mind works in strange ways) and maintaining sync. With audio-for-video using a 48k sample rate but the most common rate with inexpensive recorders being the CD norm of 44.1k, as you point out we need to convert the rate when importing the audio into our project. If we don't, one second's worth of audio, 44,100 samples, played at 48,000 sps will play back in 0.91875 seconds, falling out of sync at a rate of about 2.5 frames for each second of video. So we convert by resampling as you note. To increase the 44,100 samples in each second of audio to 48,000, we have to add 3900 samples. But where does the data for those samples come from? We could duplicate every 11th sample in the source file or we could look at each pair of samples in the original file and interpolate between them to come up with a "guess-timate" of what that 1 or 2 samples would have been had it been recorded at 48k. But it seems to me that either method would potentially introduce noise and/or distortion.

Sample rate conversions where the two rates are even integer multiple of each other seem like they would be straight forward and free of introduced distortion - to record in 96k and resample to 48k you'd just drop every alternate sample. Going the other way, you'd just have to add a sample between each original pair of samples that is the average of the two original samples. But where they're not even integer multiples, such as a 44.1k file going into a 48k project, it looks like there is at least the potential for a signifigant loss of quality from the resampling process itself.

Comments?

Bob Grant

December 24th, 2005, 05:15 PM

I use Sony's Vegas which by default resamples perfectly. You face the same isssues taking audio from a CD and DAT recordings made at 44.1KHz. None of this should be a problem with modern software. I know with early versions of FCP one did have to have everything on the timeline at the same sample rate which meant a trip through another app but I think that's been addressed.
Apart from that though a much bigger problem is that without being able to genlock everything on the shoot things do run at slightly different clock rates. Biggest problem I have are the CD players at live events.
In post I replace the feed from the desk with audio from the CDs and it can be half a beat out at the end of a track. Again with Vegas not a problem, just line up the waveforms at the start and ctl-drag the end to line up the waveforms at the end, typically less than 0.2% correction is required.
Most recorders these days though do support 48KHz, in fact can't say I've noticed one that doesn't. I usually record at 24bit 48KHz, that way I can keep my levels a bit lower so there's no risk of clipping and still have enough headroom to bring the levels up in post.

Steve House

December 25th, 2005, 08:18 AM

Ben De Rydt

December 26th, 2005, 05:58 AM

Happened to wake up this morning thinking about sample rate conversions (the mind works in strange ways) and maintaining sync. ... To increase the 44,100 samples in each second of audio to 48,000, we have to add 3900 samples. But where does the data for those samples come from?

That's the magic of digital signal processing (DSP). A 44100 samples per second audio stream contains all information necessary to do a perfect reconstruction* of the original 0 to 22050 Hz audio signal. This reconstructed signal can be resampled to 48000 samples per second without losing anything*. Things get trickier if you want to resample to a lower sampling rate, say 22kHz, because then a low pass filter needs to be applied before resampling.

So, in theory, it is perfectly feasible to resample a 44100 Hz digital audio stream to a 48000 Hz one, lossless. In practice there is one big problem: the algorithm needs every input sample from -infinity to +infinity for one output sample. And it needs to look through all those again for the next output sample. This is clearly not workable, so windowing filters are used. These filters limit the amount of input samples needed and describe their respective weight. I can't find any documentation about the windows used by popular audio programs for resampling, not even the amount of input samples they use.

* aside from quantisation errors. All frequencies will be acurately represented but there might be some error in their respective levels due to the 16 bit digitalisation process. There will be some rounding errors on resampling too.