View Full Version : Does the resolution of an audio recorder really matter?


Emre Safak
October 12th, 2007, 07:46 AM
After buying a 24/96-capable Marantz recorder I just had a thought. Does the resolution on these things even matter when the SNR is limited by the pre-amp's and mic's internal noise? How low does the resolution need to be before it becomes the bottleneck, say with a good mic in the field?

Wayne Brissette
October 12th, 2007, 08:38 AM
Emre:

I think ultimately it depends on what type of work you are doing. I do everything from docs to orchestras, and do it all with a single recorder. Prior to the Zaxcom Deva, I carried around a mixer, DAT recorder, external A/D converters, all sorts of things. Basically it was a mess. For voice/docs I could get away with the 16-bit limitation of the DAT recorder, but for music, I used an external A/D which captured a 24-bit signal and dithered it, so it ended up as a 16-bit signal before going to the DAT recorder. Some of those recordings have stood up quite well to the all 24-bit recordings I'm doing today.

If you A/B a pure 16-bit recording with a 24-bit recording done at the same time, under identical conditions, you'll be able to hear a definite difference. Where things get fuzzy is the sampling rate. Going from 44.1 kHz to 48 kHz or 96 kHz doesn't always buy you a better recording. In fact several years ago, I took part in a blind test, where several people who claimed they could tell the difference were giving the identical recording with 16 and 24-bit recordings done at the same time. That change was pretty easy to detect, but few (I don't think anybody actually) could consistently pick out from 24-bit recordings when the sampling rate changed. So, certainly there is some validity to the SNR getting better on new equipment and making it less of a point to record at a higher sampling rate. That said, I really would enjoy doing the test again with Sanken's 100 kHz microphone. I don't know if it really would make any difference, but it would be a nice test to take part in anyhow.

Wayne

Emre Safak
October 12th, 2007, 08:56 AM
That's another thing. I totally forgot that most mics top out at 20khz... making high sampling rates seemingly all the more ridiculous. Is it more about giving yourself more leeway in the DAW?

Mike Peter Reed
October 12th, 2007, 09:08 AM
I've yet to do my unscientific testing on my recently acquired 702T. Previously I used a Fostex FR2 and could swear there was a difference between recording (of all things) dialogue at 48K and 96K (both at 24-bit). Really really marginal, but somehow the 96K recordings sounded more "real" on playback. Now I've never recorded 192K except for sound effects that might get stretched around, but 96K became my standard just based on the FR2 experience, downsampling to 48K if production requested. Must try the same again on the 702T because I've also read somewhere that certain A/D convertors work better at higher sampling rates. So, if the 702T has far superior A/D than the FR2 then recording at 48K may sound as good as recording 96K on the FR2. If you see what I mean ....

Steve House
October 12th, 2007, 09:41 AM
That's another thing. I totally forgot that most mics top out at 20khz... making high sampling rates seemingly all the more ridiculous. Is it more about giving yourself more leeway in the DAW?

If you are mixing digitally the noise floor increases as additional sources are added into the mix. This increased noise floor is the equivalent of a recording at lower bit depth. Losing the equivalent of (just for example) 4 bits due to this effect means that when mixing several 16-bit sources you end up with the equivalent of a 12-bit S/N ratio while mixing the same number of 24-bit sources still leaves you with the equivalent of a 20-bit recording. While the difference between a 16-bit recording and a 24-bit recording may not be audible, the difference between a 12-bit and 20-bit final mix is dramatic. Holman discusses this effect more in-depth in his text "Sound for Film and Video."

Emre Safak
October 12th, 2007, 09:48 AM
... but doesn't the DAW process the audio internally at a higher resolution, like 32-bit float? As long as you acquire with a high enough resolution to exceed your equipment's SNR, shouldn't you be fine?

Marco Leavitt
October 12th, 2007, 09:55 AM
There seems to be consensus among sound people that 24 bit recording is worth it if your recorder is high enough quality to really take advantage of it (most aren't), but 48K is plenty for dialog, and 96K isn't worth the extra storage space. I would think you would want to use a higher sampling rate for orchestras and things like that though.

Martin Pauly
October 12th, 2007, 11:06 AM
I totally forgot that most mics top out at 20khz... making high sampling rates seemingly all the more ridiculous.I have heard/read this argument before, and while my ears agree based on the recording work that I have done, I still wonder if there isn't an advantage in describing the waveform of, say, a 20kHz note with more detail/precision?

Taking it to an extreme, assume a 24kHz tone recorded with 48kHz sampling rate, how could one distinguish between a pure sine wave and other possible shapes of this note? Is the difference not important because our ears can't tell them apart?

I am not an expert in this area; this is just something that I've been thinking about.

- Martin

Mike Teutsch
October 12th, 2007, 11:13 AM
That's another thing. I totally forgot that most mics top out at 20khz... making high sampling rates seemingly all the more ridiculous. Is it more about giving yourself more leeway in the DAW?

I'm no expert, but I think you are confusing the mics maximum frequency response with the recorder sampling rate. Very different animals.

The mic outputs in linear and the digital recorder sample in different rates, as set up by you. The more often it is sampled, as 12 vs 96, the more natural the sound or the more information you have to work with.

Mike

Emre Safak
October 12th, 2007, 01:22 PM
I'm no expert, but I think you are confusing the mics maximum frequency response with the recorder sampling rate. Very different animals.
Not at all; they are closely related through sampling theory. If your mic's frequency response tops out at 20khz, sampling the sound at 44.1khz or 48khz should be enough to capture all the information.

A. J. deLange
October 12th, 2007, 03:27 PM
With respect to bit depth it works this way: if you have n bits to play with one is taken up specifying whether the sampled voltage is positive or negative. Two more are taken up by headroom i.e. an A/D presented a complex (relative to a tone) input will start to overload when the rms input voltage is around 12 dB (approximately - the actual value depends somewhat on the number of bits) below the rail (the rail is the highest DC voltage which the A/D can represent). The quantizing noise is about 2 bits (10.8 dB) below the least significant bit's power. If you set the noise floor of the source (mic/preamp combination) at the level of the LSB the quantizing noise is then insignificant relative to the source system self noise (the desired condition). You wouldn't want to record within 10 or so dB of the mic/preamp's noise floor so you set things up (mic type, location, preamp gain...) so that the softest sound you record is say 12 dB (2 bits) above the LSB. Thus any level that is more than 2 bits above the LSB and more than 2 bits below the rail is at least 10 db above the input system (mic/preamp) self noise and below the distortion level. You have, therefore, n - 1 - 2 -2 = n-5 bits of dynamic range which is 6*(n-5) dB and, if you know the dynamic range of the sounds to be recorded you can figure out the required n. If 6*(n-5) is greater than the dynamic range of the source adding extra bits doesn't do a thing for you. This is, apparently, a hard concept to grasp but nevertheless true. Another thing to consider is that a 24 bit A/D converter is most unlikely to have 6*(24 - 16) dB more dynamic range than a 16 bit A/D converter. We speak, in such cases, of "effective bits". The number of effective bits is the dynamic range divided by 6 plus 5 (with dynamic range defined the way I've defined it here which is somewhat arbitrary). In any case if A/D converter 1 has 9 dB better dynamic range than A/D converter 2 it has 1.5 more effective bits.

When it comes to processing consider multiplying 10.5 by 7.1 = 74.55. If we are only allowed 1 decimal place we must truncate to 74.5 or round to 74.6. In either case the error is less than the least significant digit. It is much the same with binary arithmetic. The number of bits lost is more like one or 2 than 4 or 5 if things are done with care. The proper approach is to carry as many bits as you can until the last minute and then round. Floating point does help with this.

When it comes to rate, intuitive as it may seem that more samples per cycle would convey more information than 2 this is not the case. Two samples of the highest frequency to be conveyed are sufficient. This is a theoretically correct statement but there are practical considerations. The input signal must be "strictly band limited" which is also a theoretical condition that cannot be met in practice but which can be approximated sufficiently closely. The problem lies in the analog filter which must precede the A/D converter. It is very hard to build an analog filter which doesn't attenuate up to 20 KHz, has flat delay and amplitude response up to that frequency and yet rejects signals strongly at 22.05 KHz (half 44.1 KHz) and above. It's a little easier to do this for a sampling rate of 48 KHz (needs to attenuate strongly above 24 KHz) and easier still with 96 and 192 MHz rates. What is easy to do is come up with a relatively simple filter which cuts (starts to roll off) at 20 KHz and is way down at 96 KHz), A/D convert and then use a linear phase (FIR) filter to low pass filter to 20 KHz with stop band starting at 22.05, 24, or 48 KHz. The outputs of these digital filters can then be decimated without introduction of aliasing. If I had to build an A/D box capable of sampling at 192 kHz this is exactly how I would do the lower sampling rates and I'm guessing that I'm not the only guy in the world that knows a little DSP. If I'm right that the manufacturers do it that way then there shouldn't be a detectable difference in the quality of the sound at the higher sample rates (given, of course, that the reconstruction process to which similar considerations apply, does an equally good job for any sampling rate). The proof of the pudding WRT to any particular system is in a double blind triangle test.

Petri Kaipiainen
October 14th, 2007, 07:37 AM
I try to state my view as simple as possible:

24 verus 16 bits: there is a true benefit in using 24 bits for recording, even though 16 bits is more than needed in real life (hardly anybody has systems and listening rooms to use up even 16 bits/96 dB of dynamic range). This is because it gives more safety and headroom in recording and mixing. An example: I took the PA signal from a soundboard in a hotel ballroom setting the levels from the signal given by the sound person (real signal, not tone). I had no possibility to readjust during the show. To be safe I adjusted the signal to lowesh level on my SD722 using 24 bits. The end result was low in level, max about -25 FSdB. It was perfectly safe to raise levels in post without any noise, the end result was just as good as 16 bits with perfect levels, as the SD722 has s/n ratio of around 110 dB.

Using higher sampling rates is not usfull at all. Waveform "detail" which somebody here mentioned in this thread is nothing more than higher frequences. Because we can not hear them, mics can not pick them up and speakers can not reproduce them they are not needed. It is funny some people advocate 24/96 recording when these goals are impossible to achive at the same time: making a mic which hears 40-50 kHz range means the membrane is small. Small membrane mic has bad s/n ratio, less than 70 dB, which is only 12 bits... So it is one or another, never both at the same time. 96k sampling is usefull only with effects, AND only with very special microphones.

Of course using 96kHz sampling does no harm, just uses up disk double the space, but nobody should belive he gets more audible quality out of that.

Emre Safak
October 14th, 2007, 08:03 AM
Thanks for confirming my suspicions. I'll record at 24/48 from now on and reserve higher sampling rates for mastering.

Mike Teutsch
October 14th, 2007, 08:45 AM
Lots of very interesting info here. Thanks for the education. I'm reading and rereading.

Mike

Seth Bloombaum
October 14th, 2007, 06:56 PM
I'd like to reinforce what Petri wrote - his opinions and practices are shared by me and several pros I know and work with.

For certain, 24-bit is a lifesaver when quieter program comes in, expected or unexpected. It is a great reassurance that when you hear something you want, and realize that it went in at -32db, there will be plenty of signal and no audible digital noise floor because you're recording in 24-bit.

Same on 48KHz - higher sampling rate doesn't get you much if anything.

Martin Pauly
October 14th, 2007, 08:30 PM
Waveform "detail" which somebody here mentioned in this thread is nothing more than higher frequences.Ah, now that makes a lot of sense - this was the piece in the puzzle that I was missing. Thanks a lot, Petri!

- Martin

Peter Moretti
October 15th, 2007, 04:35 AM
... You have, therefore, n - 1 - 2 -2 = n-5 bits of dynamic range which is 6*(n-5) dB and, if you know the dynamic range of the sounds to be recorded you can figure out the required n. If 6*(n-5) is greater than the dynamic range of the source adding extra bits doesn't do a thing for you. This is, apparently, a hard concept to grasp but nevertheless true. Another thing to consider is that a 24 bit A/D converter is most unlikely to have 6*(24 - 16) dB more dynamic range than a 16 bit A/D converter.
...Thank you very much for your detailed explanation. I do have a few questions.

Using your formula, a 16-bit recording should have approximately 66 dB of dynamic range. A 24-bit recording should have 114 dB. That's a 48 dB increase.

But you also state that realizing all of the 24-bit increase is unlikely because of the A/D converter.

1) What is the issue with 24-bit versus 16-bit A/D converters?

2) Is there a "real world" estimate for how much more dynamic range 24-bits actually provides.

3) Do some 24-bit recorders come close to providing the full additional 48 dB? I will be using a Sound Devices 702T which is supposed to have:

A/D Dynamic Range
114 dB, A-weighted bandwidth
110 dB, 20 Hz – 22 kHz bandwidth

While I don't know what "A-weighted bandwidth" is, do these specs mean the SD 702T can provide up to 110 dB of "real world" dynamic range?

Thanks for your help.

Petri Kaipiainen
October 15th, 2007, 05:25 AM
Quite frankly I do not know where A. J. deLange manages to loose his 5 bits worth of dynamic range. It is certain that the theroretical 16*6=96 dB does not hold in real practice, but there were already in the eighties some CD with real dynamic range of more than 70 dB, and with proper dithering some audio companies have been able to squeeze more than 96 dB out of 16 bits. In a word: 6*(n-5)dB: I do not buy it.

I'll test this with SD722 and a 6dB self noise mic when I get home.

1) modern 24 bit converters are just as good as 16 bit converters, 24 bit recorders have only a 24 bit converter anyway, everything is A/D:ed at 24 bits and downconverted with dither when recording at 16 bits.

2) 8*6=48 dB more. But the reality is that no analog system is quiet enough to utilize this. The best systems have about 120 dB dynamic range, so you get about 30dB more at best. And this depends greatly on the machine, some cheap 16/24 bit recorders are analog limited, 24 bits actually gives almost no quality advantage.

3) SD 7-series recorders are among the best, and even them fall 30 dB short of the theoretical maximum. But no fear: hardly anything in real life has over 140 dB of dynamic range, and NOBODY has systems to reproduce such range (and good for them, as they would be deaf already).

I have a system which can put out about 115 dB peak level across the audible range (75 liter 3.5 way custom made monitors with 500W amp, Genelec 7071 sub), but even in my separate studio/music room the noise floor is about 45 dB: usable dynamic range is under 80 dB, and my conditions are quite ideal for a home setting.

Petri Kaipiainen
October 15th, 2007, 09:41 AM
I made a simple dynamic range test as promised with SD722 recorder, the mic was NT1-A “The Worlds Quietest Studio Condenser Microphone" with 5 dB of self noise. The room was fairly quiet, the instrument was a hemispherical steel bowl (stolen for the purpose from our german shepard) striken with a potato smasher...

Here are the results from Adobe Audition analyze function (takes were about 15 sec with mostly silence, some spoken notes about the take and 4-5 blows to the bowl 8 inches from the mic).

---------
16 bit take:
Left Right
Min Sample Value: -30011 -30011
Max Sample Value: 30010 30010
Peak Amplitude: -.76 dB -.76 dB
Possibly Clipped: 0 0
DC Offset: -.002 -.002
Minimum RMS Power: -94.58 dB -94.61 dB
Maximum RMS Power: -4.07 dB -4.07 dB
Average RMS Power: -37.39 dB -37.39 dB
Total RMS Power: -23.84 dB -23.84 dB
Actual Bit Depth: 16 Bits 16 Bits

Using RMS Window of 50 ms

24 bit take:
Left Right
Min Sample Value: -30048 -30048
Max Sample Value: 30048 30048
Peak Amplitude: -.75 dB -.75 dB
Possibly Clipped: 0 0
DC Offset: 0 0
Minimum RMS Power: -93.31 dB -93.31 dB
Maximum RMS Power: -3.9 dB -3.9 dB
Average RMS Power: -39.78 dB -39.78 dB
Total RMS Power: -25.55 dB -25.55 dB
Actual Bit Depth: 24 Bits 24 Bits

Using RMS Window of 50 ms
----------

What we see from this is that I got a dynamic range of almost 94 dB* with 16 bit sampling, and in this case 93 dB range with 24 bit sampling. This has of course nothing to do with the sampling depth, but the room noise floor which was about 95 dB below the highest blows in this case and waried slightly**. I took the 24bit take just out of curiosity. Even in this extreme case 24 bits did not give any benefits. Which proves that 16 bits is plenty enough for final release, even if 24 bits is nice to have sometimes when recording.

I think this clearly proves that dynamic ranges of over 90 dB can be easily achieved with 16 bits contrary to what A. J. deLange claims in his post. I still can not understand where he gets his ideas, which clearly do not hold water.
----

*) from absolute peak to lowest RMS value. From asolute peak to absolute low it would have been even more, maybe 95 dB, almost the theoretical maximum.

**) the value differences really have nothing to do with 16/24 differences, but nonstandard test "procedures"...

A. J. deLange
October 15th, 2007, 03:43 PM
Lots of questions here. Let's start with the last first and see if we can get a little water into the bowl. The basic ideas are complex enough as to be well beyond what we can reasonably discuss here. Those in search of all the details will have to consult an engineering text which deals with the theory and application of A/D converters. First and foremost we must point out that "dynamic range", the parameter under discussion here, can be, and is, defined in a variety of ways often peculiar to the industry involved. I should point out that my experience is in telecommunications where the best (IMO) definition is a thing called the noise power ratio (NPR). Spur free signal to noise ratio, weighted signal to noise ratio (this is what "A weighting" is about) are others. In any case the concept is the same but the numbers may vary somewhat depending on the definition. For example, if we define dynamic range to be the maxium power level which the device can represent relative to the minimum then the dynamic range is 6*(n-1) dB (one bit gone for the sign). In a 16 bit A/D with the LSB encoding 1 mV the largest (magnitude) signal encodable (the "rail") is -2^15 = -32768 with power 6*15 = 90 dB above the LSB. If we define the dynamic range in terms of the ratio of the rail to the rms quantizing noise level the dynamic range becomes 90 + 10.8 = 100.8 dB.

These are valid definitions for dynamic range but not terribly useful ones. Better ones are usually motivated by questions like "What are the maximum and minimum useful signal levels my system can handle". The key here lies in the definition of "useful". It is usually defined in terms of signal to noise plus distortion ratio. A handy definition for useful is that the system should not degrade the sensor's performance by more than about 0.4 dB. This is handy because it requires that the system noise and distortion must be 10 dB below the signal power. This is the basis for setting sensor self noise at the level of the LSB. Quantizing noise is then 10.8 dB down on the sensor self noise and the sensor's signal to self noise ratio is degraded by only 0.4 dB by the fact of A/D conversion i.e. practically speaking, the A/D conversion has no effect on the quality of weak (quiet signals). If a particular set of circumstances allows the sensor SNR to be degraded by more or less than 0.4 dB then the setting of the LSB relative to the sensor noise floor can be adjusted to accomodate this requirement.

At the loud end of the dynamic range the bad guy is distortion and again the question is as to how much distortion can be tolerated. One definition is that the distortion power should not excede the self noise power of the sensor. Now note that the distortion can come from the sensor itself or from the A/D converter. The goal in system design is to choose an A/D converter whose dynamic range is greater than that of the sensor i.e. one whose quantizing noise can be set lower than the sensor's self noise but which does not overload at voltages appreciably higher than the output voltage at which the sensor overloads. To borrow from the r.f. engineer, just as the quantizing noise should be 10 dB or more below the self noise the "IP3" should be 10 dB or more above the "IP3" of the microphone. This leads to yet another definition of dynamic range: 2/3 the difference between IP3 and self noise but I have never seen this defintion applied to audo equipment. But we are interested in the A/D itself. The peak voltage in a sine wave is sqrt(2) times (3dB) greater than the rms voltage. Thus if a sinewave is applied to an A/D converter with rms power a bit more than 3 dB below the rail it should not clip. This leads to yet another set of possible dynamic range definitions: max distortion free sin wave power to LSB or to quantizing noise power. With sine loading the quantizing noise is not noise but a series of tones so this isn't a particularly (IMO) useful definition. This brings in "dithering" which is the addition of noise at the input whose function is to decorrelate the quantizing "noise" (which spreads the quantizing error tones into more noise like waveforms) at the cost of slightly reduced dynamic range. There are several ways to do this and we're now getting really esoteric.

Real signals are seldom sin waves but rather more complex and the relationship between peak and rms voltages is random. In general if the signal is the sum of several sources (band, orchestra, outdoor noises etc - but notice that I have left speech off the list) the voltage is distributed as a gaussian random variable (bell shaped histogram). Extensive modeling of A/D converters has been done for gaussian loads and this has shown that the output noise plus distortion is dominated by quantizing noise until the load approaches about 13 dB below the rail at which point overload noise rapidly takes over. Thus for gaussian loads one must not load the A/D above that level (7335 counts in a 16 bit A/D). If one does, distortion greater than the quantizing noise will be incurred violating our rule that the A/D conversion should not degrade the sensor SNR by more than 0.4 dB or so. Note that distortion power increases by at leasdt 3 dB for each dB increase in overload.

This brings us to Petri's numbers. A rms load of 4 db down on the rail in a 50 ms widow may or may not represent overload depending on the rapidity with which the transient dies out and the purity of the tone. If the dog dish rings like fine crystal then the signal is close to sinusoidal and so no overload. My dogs' (Leonbergers) dishes clunk and so while the signal probably isn't truely gaussian it certainly isn't sinusoidal either. And this brings up the point as to whether the dynamic range is based on an rms or peak value (it could be peak signal to rms noise for example). So before this gets totally out of hand here are some definitions of dynamic range which could be applied to a 16 bit A/D:

Max instantaneous signal (rail)to LSB: 15*6 = 90 dB n-1 bits
Max instantaneous signal to rms quantizing noise: 15*6 + 10.8 = 100.8 dB n + .8 bits
Max rms sinwave to rms quantizing noise: 15*6 + 10.8 - 3 = 97.8 n+ 0.3 bits
Max rms gaussian to rms quantizing noise: 15*6 + 10.8 - 13 = 81.8 n - 2.3 bits (NPR)
Max rms gaussian to 11 dB above mic noise at LSB:15*6 - 13 -11= 66 n-5 bits
Max instantaneous to 11 dB above mic noise at LSB: 79 n - 2.7 bits

The last 2 values represent (to me anyway) reasonable approximations to a good system load and are the source of the 6*(n-5) bits number from yesterday. Here I define the bottom end of the dynamic range as a signal 10 db above the mic noise and the top end as the maximum distortion free gaussian rms (66 db) or absolute instantaneous peak (79 dB) load.

Back to "A weighting". This is another definition of dynamic range based on signal to quantizing + overload noise ratio with the signal having a particular spectral distribution (there is also C weighting) derived from the spectrum of speech.

There are several reasons why adding a bit to an A/D doesn't always buy 6 dB. Among them are that the dynamic range of the analogue hardware in the A/D converter itself comes into play. Another factor is that when the quantizing noise of the A/D gets very low "phase noise" in the sample clock begins to become appreciable (the samples are not taken at precisely spaced intervals). This is not a problem in the A/D component in itself but rather the circuitry which drives it. Clocks have to be very good.

Just to be clear: I am by no means saying that 16 bits is good enough. Even the comittee which defined the CD years back recognized that. With the technology of the day it was a reasonable engineering compromise. Certainly, even though 24 bits may not grant 48 dB more dynamic range (and no, I don't know how much is real but I'd love to find out - my favorite test, the NPR, is practically speaking impossible, AFAIK, for A/D's of that depth - I've generally found an n bit A/D to have n-1.5 to n-2 effective bits but these are for A/D's that clock much faster than audio A/Ds) it is certainly appreciably better than 16. This is provided that one does not make the number one mistake in loading A/D's which is setting the sensor self noise higher than the LSB so that it's "toggling a reasonable number of bits". This gains nothing and throws away dynamic range at the top.

Wow!

Ty Ford
October 15th, 2007, 08:44 PM
Good 24/48 is better than bad 24/48.
Good 24/48 may be better than bad 24/96.

Regards,

Ty Ford

Dan Brockett
October 15th, 2007, 11:07 PM
Hi all:

Good thread with a lot of knowledge imparted. My .02 worth from 10 years selling consumer and professional audio gear is that of course, a higher bit rate is more desirable, if the quality of the A/D and D/A convertors are good.

Most acoustic musicians can speak, at least in rudimentary terms, about the benefits of overtones that occur at exponential frequencies at much higher frequencies than can be heard by human ears. Many people feel that this the main factor that makes a Stradivarious, a Stradivarious and what makes a fine piano, a fine piano. All other things being equal, it makes sense to me that overtones, occuring at musical intervals, add depth and richness and DO affect the frequencies that we can hear at. Sympathetic overtones are an important part of high quality audio.

Whether or not overtones extra frequency content that occurs at musical intervals relate directly to extended sample rates is highly debatable. Most proponents of formats like SACD are champions of the sound because they say that the formats are much smoother. more linear and more "analog sounding" than CDs or 48Khz sampled sources. After years of working with the relatively bad audio sound quality of Beta SP, I think that it is debatable also as to whether most of your audience can tell any difference between high quality sound versus low quality/resolution sound. Most the cars I drive today sound hilarious when I look at the CD player/radio and realize that the "boom sizzle" coming from the speakers is coming out while the tone controls are set to the middle of the dial, a supposedly neutral setting (no cut/no boost usually). With the popularity of iPods, cheap computer speakers, awful sounding car stereos, etc., I believe that the average consumer has almost no ear for audio quality so pursuing sound for picture to a 24/96 level is mostly underappreciated by the audiences.

Unless the project is shown in a theatrical setting with a high quality sound system, I feel that 16/48 is perfectly adequate for dialog-based material that most of your audience will hear on a badly setup, highly inaccurate EQ'd home theater system.

Personally, I feel that the era of the audiophile is over anyway. The CD and iPod prove that convenience trumps sound quality for 99% of people these days.

Dan

Mike Peter Reed
October 16th, 2007, 04:49 AM
The way I am reading all this ... there is a very very minor advantage to recording at 96kHz because even if your mic cannot resolve above 20kHz, it will be subject to zero attenuation at the top end. But - this is dependent upon whether the recorder reconstructs lower sampling rates from its highest, and how well it does it. If it reconstructs well, 48kHz should be absolutely indistinguishable from 96kHz. If it reconstructs badly (eg Fostex FR2?) then best record with the higher rate to begin with if you are looking for the best reproduction.

As to 96kHz taking up twice as much space - that's really not an issue in the 21st century where CF is available at 64GB and counting. (and 8GB is pretty cheap).

As we see mics that can resolve above 20kHz (eg the Sennheiser 8000 series and no doubt others) then 96kHz (or beyond) could become the common denominator. I am skeptical about using 96kHz for dialogue, but I do it anyway because on the indie shows I work on one minute I'm doing dialogue, the next scene could be somebody playing the cello, then some wild sound fx ... so 96kHz for me is a sweet spot and means I don't hand in recordings of different sample rates each day, or forget to swap sampling rate between scenes.

I'm not advocating recording 96kHz as standard, far from it. I do what works for me, and the production (mainly the editor and his NLE).

The audiophile is dead, long live the audiophile.

Petri Kaipiainen
October 16th, 2007, 05:50 AM
The irony of 24/96 "super hi-fi" is the fact that you can not get both 24 bit dynamics and 40kHz frequency range at the same time. A mic which can resolve frequencies of over 20 kHz must have a small diaphragm, but small diaphragm mics have inherently bad s/n ratios. Even the very best large diaphragm high voltage condensors do not deliver anywhere near 24 bit resolutions. Special mics reaching to 40kHz and beyond are typically at least 20 dB worse (see Sanken and DPA sites).

Even then 24 bits has it's value as level setting safety feature, but 96 kHz sampling really adds NOTHING audible to the recording. But as it harms nobody, use it if it make you happy (or if you are a bat).

A comments to a previous post: those over 20 kHz high frequency components adding something audible to the musical signal like purists claim: yes, they do make lower frequency interference components, but as those are withing the recording system's specifications they get recorded just as they are heard; there is no need to record the higher make-up components.

Final fact: I have not seen any scientifically valid double blind test made where test subjects could even tell a 16/44.1 AD-DA conversion from the original high quality analog live signal. Or where they could distiquish 16/48 from 24/96 SACD. I think this proves that 16/44.1 or 48 is plenty good enough final format for everything. Most of the time (100% actually) it is not the tecnical specs of the system but raw reality (mic quality, mic placement, acoustics, backround noise, reproduction system, listening room acoustics & noise floor) which sets the real limits. Using time and money and intellectual capacity (?) like we do here to worry about 48kHz sampling not being good enough is total waste of time...

A. J. deLange
October 16th, 2007, 06:01 AM
It's been intimated a couple of times and said a couple of times here. Let me say it again. The ultimate test is a double blind triangle (ABC) test.

In most music we hear today the compression scheme has done lots more damage to the signal than the A/D conversion ever did.

Petri Kaipiainen
October 16th, 2007, 06:57 AM
And we have to remember that DV audio is pristine uncompressed WAV (this is a VIDEO discussion board...) at better than CD quality. And that while HDV vastly improves the picture quality, the audio side is severelly compromised by lossy MP2 compression at about 1:5 ratio. Even then it is passable.

It is indeed an irony that while some lone souls advocate super-hifi standards like SACD because they think CD quality is not good enough (I think it is based not on ears but "because it is there" syndrome), the buying public buys not even CD:s but MP3 files, audibly inferor to CD:s...

And about AD conversion doing something to the signal: it certainly does, but it also certainly makes it possible to record the signal in a vastly superior way, and cheaply, compared to ANY $$$$$$ analog system ever invented. I have no complaints at all, bless you AD converter!

Gints Klimanis
October 16th, 2007, 10:53 AM
... but doesn't the DAW process the audio internally at a higher resolution, like 32-bit float? As long as you acquire with a high enough resolution to exceed your equipment's SNR, shouldn't you be fine?

32-bit floating point is still 24-bit resolution, just with a larger range.
Your point about processing resolution exceeding source resolution is important.

Mike Peter Reed
October 16th, 2007, 01:28 PM
4:4:4

Because it is there?

Petri Kaipiainen
October 16th, 2007, 01:44 PM
Bad analogy. 4:4:4 preserves somethign we can see. 96kHz sampling preserves something that is there, but not for us to hear.

rather: 16/48 WAV, because it is there (not MP2 or MP3)...

Peter Moretti
October 16th, 2007, 10:10 PM
Good 24/48 is better than bad 24/48.
Good 24/48 may be better than bad 24/96.

Regards,

Ty FordTy,

I think there is a sentiment being expressed that very good 16/48 ~= 24/48. I don't know if that's true or not.

But to get very good 16/48, it seems to me that you'd be using a 24-bit recorder just set to 16-bits.

So while the "good 16 is all you need" arguement might be true, I don't know if it really matters pratically. With either option, you'll need a high quality recorder, which will invariably be 24-bit capable.

Ty Ford
October 16th, 2007, 11:12 PM
In every situation I have heard, 24 bit beats 16 bit.

24 bit recorder to get 16 bits? Well, if you don't record at full level you won't get all the bits and most of us don't record at full level. So, in that situation, you'd be using less than 24 bits, even though you HAD it to use.

Bottom line, having a maximum of 24, even if you don't use them is better than having only 16 and not being able to use all 16.

A LOT depends on the converters. A LOT. When I was testing the Sound Devices 744T, I purposely recorded with peaks (PEAKS) at -30db, just to test the preamps and A/D converters. I then normalized the audio to I could hear it. While there was some hiss, it was low in level. Not may other recorders will allow you to get away with underrecording by that much.

A fictitional test perhaps, but it does say a lot for the converters.

Regards,

Ty Ford

Glenn Chan
October 17th, 2007, 12:19 AM
1- My Coles notes version of sampling theory:

When you (down)sample a signal, you have to deal with three issues:
A- Imperfect frequency response. e.g. The highest frequencies aren't as loud as they should be. And you don't necessarily get perfect response up to the Nyquist limit... e.g. 48khz sampling doesn't necessarily give perfect response up to 24khz.
If your sampling rate is sufficiently high, then this isn't really a problem.

B- Aliasing. Weird artifacts if frequencies higher than the Nyquist limit aren't filtered out.

C- Ringing/phase artifacts.
(*Usually not a problem for audio??)

You can get (mostly) rid of two of these problems but not all three at once. So you have to pick which issue you can live with.

To get rid of aliasing, you can either use a bad microphone (one with poor frequency response) and/or you can apply analog and/or digital filtering.
You need to apply some analog filtering before the A-->D converter (and/or assume that the microphone won't produce high frequencies that will alias). Any aliasing that gets into the A-->D converter can't really be gotten rid of.

However, there are some limitations as to analog filtering... there's limitations to what kind of frequency response you can get (and cost considerations). The ideal frequency response would be very good response right up to the Nyquist limit, and then it drops off immediately for frequencies past the Nyquist limit (sometimes called a "brickwall" response).

So your system might implement a mix of analog and digital filtering. With digital filtering, you would oversample the signal... so if you intend on outputting 48khz, you'd sample at some multiple of that rate (e.g. twice / 96khz, four times, etc.). Apply digital filtering to get rid of frequencies above 24khz (the Nyquist limit of the 48khz signal you intended to make), and then downsample the signal to 48khz.
In equipment that does this, the A->D is already sampling at 96khz or some other rate above 48khz. So the higher sampling rate is kind of there already.

2- Anyways I kind of rambled on there. The key point to note is that a reasonable system sampling at 48khz won't give perfect frequency response up to 24khz... you'll get good performance up to some number lower than that. And performance depends on implementation.

Lots more information here:
http://www.wescottdesign.com/articles/Sampling/sampling.html

3- An oversampled system is not that bad an idea at all since you can avoid both ringing and aliasing artifacts.
Cost-wise, a higher sampling rate does add cost. But to lower the sampling rate, you'd probably want to pick up some ringing artifacts (aliasing won't fly) and you need to apply digital filtering to do that (and digital filtering costs money... so there's a balance there).

(I don't engineer this stuff, so the information above may not be correct. There's a bunch of trade-offs to take into consideration. Like I said... it's the Coles notes version.)

- And to echo AJ's comment...
The ultimate test is a double blind triangle (ABC) test.

Peter Moretti
October 20th, 2007, 01:12 AM
Glenn,

Having read Jay Rose's books, I'm familiar with some of the arguements for oversampling. It seems that conventional wisdom is: "96K doesn't really help but it definitely can't hurt, so use it if you want to."

Have you heard or found any noticeable benefits to 96K?

Ben Winter
October 23rd, 2007, 01:21 PM
Same on 48KHz - higher sampling rate doesn't get you much if anything.

Not sure if this has been mentioned already or not, but I just had a discussion with my engineering professor about this. Higher sampling rates reduce the incidence of aliasing when the sound is sampled as a discrete-time sinusoid. While the range is usually from 20Hz to 20kHz, the aliases of the frequencies within this range extend far beyond this range.

Seth Bloombaum
October 23rd, 2007, 03:40 PM
There is the theory side of the discussion, and there is the "this is what my ears tell me when listening to reference monitors" side of the discussion.

My comments are purely on the application side, this is what my ears tell me and what other working sound engineers have mentioned to me. 24/48 acquisition* is the sweet spot for many who make their living at this.

*with good mics, good placement, good A/D converters, proper gain structures and a hundred other things that go into making a good recording...