View Full Version : 384 kbps Mpeg1 layer II audio (HDV) - is equivalent to what?


Shannon Rawls
December 28th, 2005, 02:35 PM
What is HDV audio equivalent to in terms of PCM audio?

I found out that MPEG1/Layer2 audio is in fact a 16bit/48Khz sampling frequency (THAT'S GOOD NEWS PEOPLE). Which is probably why nobody is having problems with its quality in the real world. Ofcourse its better to convert it to PCM for editing and tweaking, but for acquistion, it's fine.

However, my question still remains. What is it equivalent to? because they are both 16bit/48khz, does this mean that hertz for hertz, the HDV audio sounds just as good as DV audio in ALL situations? (movie dialogue, singing, interviews, musician recording, etc...)

The reason why I ask is because the Canon XL-H1 will record 2 channel or 4 channel audio in HDV mode as well.
In 2 channel mode, it will record MPEG1/Layer2-16bit/48Khz audio on each channel (384kbps transfer rate)
In 4 channel mode, it will record MPEG2/Layer2-16bit/48khz audio on each channel (192kbps transfer rate)

Now, consider this: In order to record 4 channels of DV PCM audio you will suffer a hit in the sampling (where the sound counts & the ears can hear it) because it drops down to only 12bit/32khz. However, when recording 4 channel audio in HDV MPEG audio you will not take a reduction in sampling quality like you do in DV. Each of the 4 channels still remain 16bit/48khz. (only the transfer rate is reduced)

Now since I like to investigate how things REALLY work (like 24f/24p) and not hawk on mis-using terminology and titles. I'd like to know the truth about this HDV MPEG audio. After investigating the specs, I wonder if 4 channel HDV audio is actually better then 4 channel DV audio when capturing. I wonder if it's just as good (sound quality wise) as if you were recording in 2 channel HDV mode since it doesn't take a hit in sampling? When comparing it to 4 channel 12bit/32khz DV PCM audio, is it brighter with richer highs and deeper lows? Is it louder with more clarity and crispness? I do realize PCM will hold up better then MPEG after going through a few generations of modifications, but that's not what I'm talking about right now. I'm talking about the MASTER FILE....which is better?

AUDIO RECORDING MASTERS need only reply. *smile*

- ShannonRawls.com

Hse Kha
December 29th, 2005, 09:43 PM
I think that the bit rate of 384K for just 2 channels results in very good sound quality. Hard to tell from uncompressed PCM.

Remember most DVDs are only 384K or 448K and that is for 5.1 channels!

Douglas Spotted Eagle
December 30th, 2005, 09:08 PM
After investigating the specs, I wonder if 4 channel HDV audio is actually better then 4 channel DV audio when capturing. I wonder if it's just as good (sound quality wise) as if you were recording in 2 channel HDV mode since it doesn't take a hit in sampling?

- ShannonRawls.com

Technically, no. Practically, yes.
HDV is exceptionally efficient for audio. It's not PCM audio, but it's darn close for *most* things. Will it sound better than 12 bit, 32KHz audio? Yes. Can you see the differences on a scope? Yes. Can you hear the diff? Absolutely.
Like you commented on, it's about perception. Like perceived resolution vs actual resolution. And perceived quality, assuming the DACs are good, and Canon DACs are pretty good given what they are, the HDV audio is quite good. I wouldn't use it for intimate, exceptionally dynamic recordings, but I wouldn't use a DV deck for this in PCM, either.

Steve House
December 31st, 2005, 07:20 AM
As I understand it, the 16bit bit depth and 48kHz sample rate applies to the original digital audio stream after conversion from analog. But then that digital audio has to be stored as a data file. It could be stored in an uncompressed format or it can be compressed to save room and speed transfer and that's where MPEG, MP3, and other formats come in. MPEG is a "lossy" compression algorithm, meaning that some of the nformation in the original data that is considered to be relatively unimportant is discarded. 192bps throws away more data than does 384bps, setting the bar between important and unimportant lower. As a result more of the bits carrying information about subtle detail in the waveform may be thrown away. It's a judgement call whether or not that would matter for the particular sounds you're recording.

By analogy, a still image can be stored at varying degrees of compression as a BMP bitmap file or a TIF, GIF, or JPG file. If the image is 1024x768, a single row of pixels in a plain purple area of the screen would represented by a string of the number FF00FF, FF00FF, FF00FF, FF00FF .... for 1024 repetitions. We could save a lot of space if instead of writing out all 6,000 plus characters it would take to define a line, we could just write "1024 x FF00FF" and let the viewing software expand it. Further if there was some very subtle shading so we had purple shading into red so the sequence was FF00FF, FF00FE, FF00FD, etc we could ignore the difference between pixels that were really close to those on either side we could save even more space. How much of that loss we consider acceptable is up to us to decide, the trade off between file size on the one hand and subtle detail and colour information on the other, and that's why there are so many different image file formats. Audio works the same way.

A. J. deLange
December 31st, 2005, 08:56 AM
What is HDV audio equivalent to? Only you can answer that. I think we all agree that 16 bits sampled 48000 times per second is sufficient to convey audio in high fidelity but to record it in that form requires 768,000 bits per second or 96,000 bytes. Now if I type "1kHz1s1V177" meaning that the audio is a 1 kHz tone of duration 1 second and amplitude 1 volt with starting phase of 177 degrees I have conveyed in 11 bytes (and with greater accuracy) the same information that requires 96000 bytes to convey with PCM. I have compressed the audio into a much denser form and I can do that because of the trivially simple signal I chose to use as an example i.e. a constant amplitude constant frequency sinusoid. Real world sounds are usually more complex than constant sinusoids (though some compression schemes decompose them into the sum of sinusoids) so lossless compression of the magnitude of my little example is not a reality. Schemes which do allow appreciable compression such as the 2 channel MPEG scheme in which 4:1 compression is acheived and the 4 channel system in which the compression is 8:1 are viable. But how good are they? This ultimately has to be a subjective determination involving a panel of listeners prefferably using triangle testing in which the listener is given a box with three buttons two of which will connect the compressed audio to his headphones and one of which will connect the uncompressed or vice versa and is tasked to tell which one of the three is different and whether it sounds better or worse than the others. If the listeners can't tell the difference (with statistical significance) or if they can but say the compressed sounds just as good or better then the compressor designer has succeeded. If the panel finds degradation then the story is different and the industry decides whether the degradation is still within the bounds of acceptability. We accept a fair amount of degradation in our cell phones. Will we accept it in HDV or will we go to dual recording? Only experience will tell.

It is generally true that the higher the level of compression the more "artifacts" will be perceived thus the 2 channel audio is going to sound more natural than the 4. It is also generally true that a compression scheme works better for some kinds of sound than it does for others. The VOIP compressor (Speex) gives beautiful voice at 5kBps for example and there are compresseors which do a very good job on music. The compressor used on a video camera has to do voice and the rustling of the breeze and thunder claps and dogs barking (which the XL-H1 does not do very well but to be fair it may have been the AGC at fault in this case) and the babbling of the brook and the roar of an aircraft taking off. This means that tricks like modeling of the human vocal tract (used in speech compressors) aren't available and that only makes the job more difficult.

4:1 compression isn't much. One can usually get 2:1 lossless (i.e. the decompressed data is identical to the input data) compression just by writing the data in Huffman coded (the Morse code is a Huffman code) form so I'm guessing the 2 channel is going to be acceptable for most purposes. The 4 channel I'd be more wary of.

FWIW JVC says: The GY-HD100U records CD quality audio at 384Kbps in the MPEG1 Layer 2 format

Shannon Rawls
December 31st, 2005, 11:46 AM
WOW,

Now that was a BOATLOAD of knowledge. Thanks you three awesome guys. That was very helpful.

As a result of your posts, I' going to go looking for the differences in the MPEG schemes....

2 Channel HDV Audio = MPEG1
4 Channel HDV Audio = MPEG2
(both are layer2)

When I find the difference in those two formats, I'll report back. (unless you already know *smile*)

- ShannonRawls.com

Alex Filacchione
January 5th, 2006, 10:09 AM
A couple of things...

The overall quality of the sound can largely depend on the codecs used to encode the audio. You could encode with say a frauenhofer codec & then encode the same audio at the same bitrate with "Bill's Nifty mp3 encoder" (I just made that up) and the result can be dramatically different in terms of quality. I have heard, for example, 128bit mp3 files that sound exceptional, and 192bit ones of the same material that sound awful (less dynamic range, sibellance, etc.).

So technically the rate of a certain file may be equivalent to the rate of some other non-compressed file & a 3rd compressed file, but that doesn't necessarily tell you what the ultimate sound quality is going to be. It can, however, give you a "ballpark" I guess. SO many other things are going to affect the sound quality like the mics, mic placement, pre-amps, and even the mic cables (Belden, Mogami, and Canare are the way to go for cables)

As far as editing it, if at all possible, it's best to keep that audio in it's compressed form and edit it that way PROVIDED that the editing software does not decompress & then recompress on every edit (some do, some don't, and I don't know which do and don't - someone else can answer that). That way you don't have to go through an additional lossy recompression. IOW, if you get the mpeg audio off of the camera, it has already been compressed and there is a certain amount of loss of quality. Then you decode it into PCM (wave or whatever) format, edit it, and then recompress it for final HDV output, putting it through a further stage of quality loss. You might want to try some experiments with filming and your audio to determine if the loss of recompressing after editing is even noticable or not. You might not hear a difference at all, and then again you could hear a big difference.

The BEST solution, if at all possible, is to record audio with something else other than the camera, and have everything synch'ed up with SMPTE timecode. Thats a lot of extra hardware & money though, and is not always practical. Though I don't know what difficulties you would encounter, if any, due to the HDV GOPs are encoded. I don't really deal with HDV, just SD for now, but I thought that the info above might be helpfull.

Alex F

Joshua Provost
January 5th, 2006, 02:50 PM
2 Channel HDV Audio = MPEG1
4 Channel HDV Audio = MPEG2
(both are layer2)

Here is a FAQ on MPEG audio (http://www.tnt.uni-hannover.de/project/mpeg/audio/faq/) that should be very informative.

In answer to your questions, MPEG-2 Layer-2 is basically the same as MPEG-1 Layer-2, but supports more than two channels (Thus why it uses MPEG-2 Layer-2 for 4-channel support). There is no difference in terms of efficiency and quality.

The whole process is lossy. 2 channels of MPEG-1 Layer-2 may be acceptable (maybe, read further) at 384kbps, but cramming 4 channels into that same bitrate? Asking for trouble. Overall, the quality will suffer.

The audio is only 16bit/48kHz in the sense that that is what was input and what will be output. In between it is a lossy perceptually encoded bitstream, and there is no gaurantee that the 16-bit/48kHz audio you get out won't be worse for wear.

MPEG-1 Layer-2 audio is the older version that has been around since VCD and before. MPEG-1 Layer-3 audio is what we all know as MP3 audio. Now, MP3 is more efficient than Layer-2, at about 2:1 or more. So, to put it into a context that we all can relate to, the 2-channel Layer-2 audio at 384kbps is going to be roughly equivilent to a 192kHz MP3 file. That's pretty good for most people, but I wouldn't say it's good enough for professional work. Personally, I can audibly tell the difference in MP3 bitrates up to 256kbps, and I can hear artifacts in that sometimes as well.

How does it compare to PCM? Well 16-bit/48kHz PCM is 1536kbps, about 5 times more bits.

Generally, I'm disappointed by the low bitrate set aside for audio in the HDV spec. 384kbps would have been good enough if the format was Layer-3, since it is more efficient. I suspect it comes down to licensing, with Layer-3 being more expensive than Layer-2. Or perhaps simply that Layer-2 has been around longer and more software supports it.

Josh

Steve Crisdale
January 6th, 2006, 01:41 AM
Generally, I'm disappointed by the low bitrate set aside for audio in the HDV spec. 384kbps would have been good enough if the format was Layer-3, since it is more efficient. I suspect it comes down to licensing, with Layer-3 being more expensive than Layer-2. Or perhaps simply that Layer-2 has been around longer and more software supports it.

Josh

The audio spec of HDV was set to match broadcast MPEG data streams such as those sent to air by the major networks, just as the video stream was supposed to be designed to match the MPEG2 HD spec.

When played back to a Home theatre setup, the audio is indecipherable from that one gets with any standard broadcast of either terrestrial or satellite programs. I have actually discerned greater ambience with superior spatial characteristics in HDV audio than that in MPEG broadcast streams. Whether that is due to the fact that broadcasters run another encode on already compressed programs or not would be pure conjecture on my part... but it would seem logical to assume it is the case.

Those signatories/parties involved in the original HDV group including Canon, would have found it difficult to have remained part of that consortium if they had "broken" from the agreed specification for HDV.

Panasonic is not a member so they aren't encumbered by obligations to the HDV specification, so they can offer totally different options in video and audio if they desire.

Jeremy M West
January 6th, 2006, 07:34 AM
The audio spec of HDV was set to match broadcast MPEG data streams such as those sent to air by the major networks, just as the video stream was supposed to be designed to match the MPEG2 HD spec.


Steve, while this may be true of HDV, our concern is that if we are capturing audio in this format to begin with, then taking it into post, then re-encoding for transmission, then possibly having it re-encoded again for broadcast, how much degradation can we expect to see? We produce a daily syndicated show that airs worldwide and the multiple generations of rendering has our engineers spooked.

This is a big concern for us and one of the main reasons we are looking at XDCAM HD and DVCPRO HD since it captures PCM Uncompressed. Should we be that concerned?

Joshua Provost
January 6th, 2006, 02:34 PM
The audio spec of HDV was set to match broadcast MPEG data streams such as those sent to air by the major networks, just as the video stream was supposed to be designed to match the MPEG2 HD spec.

When played back to a Home theatre setup, the audio is indecipherable from that one gets with any standard broadcast of either terrestrial or satellite programs. I have actually discerned greater ambience with superior spatial characteristics in HDV audio than that in MPEG broadcast streams. Whether that is due to the fact that broadcasters run another encode on already compressed programs or not would be pure conjecture on my part... but it would seem logical to assume it is the case.

Those signatories/parties involved in the original HDV group including Canon, would have found it difficult to have remained part of that consortium if they had "broken" from the agreed specification for HDV.

Panasonic is not a member so they aren't encumbered by obligations to the HDV specification, so they can offer totally different options in video and audio if they desire.

I didn't say anything about Canon or Panasonic, just the HDV spec.

Isn't the point of acquisition to capture really good quality image and sound above and beyond what the broadcast spec may be... so by the time it has gone through numerous post processes and broadcast, it is as high quality as possible? Since when do we only want to capture as good as but no better than the final delivery format?

Like I said, disappointing. Uncompressed PCM, of course, isn't subject to recompression at any point during post, only on final delivery.

It's strange the DV25 has 25mbps video and uncompressed PCM audio, yet HDV has 25mbps video and didn't have space left for uncompressed PCM?

Steve Crisdale
January 6th, 2006, 07:49 PM
There was a time where responding to queries on these forums and forums like it elsewhere on the net was interesting, stimulating and informative.

A time when the developments in videographics were viewed with a sense of voyaging into uncharted territory that held gains for those prepared to use their ingenuity. Even queries from those people who thought the new technology may not be "good enough" discussed developments without the negativity that has recently become so evident.

From a personal standpoint: responding on forums is now tedious and counterproductive due to the inability - whether it's deliberate or unintentional - of some who see every word that is written by anyone else as some personal affront. An answer to a question is an answer to a question, not a personal mainfesto with a hidden agenda of personal vilification. In the light of such growing paranoia, I will avoid posting a response to anyone anywhere from this point onwards.