using frequency analyzer to aid with sound mix [Archive]

View Full Version : using frequency analyzer to aid with sound mix

Josh Bass

May 16th, 2010, 12:13 PM

Seth Bloombaum

May 16th, 2010, 01:16 PM

If I take an audio clip of a voice/line of dialogue, and run it through a frequency analyzer (e.g. the free one in audacity), I would look for which frequencies "peak", and those are the ones with the most "energy?" So if putting music/other sound FX under that voice, I should either a) enhance the dialogue at those freqs, or dip the music sound FX at same. Correct?
This isn't black and white, but, generally, yes.

However, typically you're not going to mess with boosting the vox at those frequencies, most likely you'll wind up just making the vocals sound as good as possible, then, using your freq analysis to help guide your eq of the music.

But, the issue is typically intelligibility. You want plenty for dialog, and, you may need to make space in the music eq/level for it. Particularly if the music has vocals. The point of the freq analysis and playing with dialog eq is to help guide you to the frequencies most important to intelligibility.

Another closely related concern is dynamic range. Typically, music is highly compressed, your raw dialog isn't. Even without music, compression/limiting/volume max filters really help intelligibility - they become even more important against music.

Once you develop an ear for this kind of efx work and mixing, you'll probably find you don't need the frequency analyzer much.

Also, if it's the same voice recorded with the same mic, will analyzing one tiny sample/line of dialogue give me settings I can use whenever I'm dealing with that voice anywhere in the project? Or does the freq plot change with every new line of dialogue/every word/etc.?
Mostly yes, I think so. Certainly it would be the starting point. More tweaking can always be done.

*************************************************************
Good reference monitors are pretty important for this kind of work...

Bill Davis

May 16th, 2010, 02:21 PM

I just want to make sure I understand this. . .

If I take an audio clip of a voice/line of dialogue, and run it through a frequency analyzer (e.g. the free one in audacity), I would look for which frequencies "peak", and those are the ones with the most "energy?" So if putting music/other sound FX under that voice, I should either a) enhance the dialogue at those freqs, or dip the music sound FX at same. Correct?

Also, if it's the same voice recorded with the same mic, will analyzing one tiny sample/line of dialogue give me settings I can use whenever I'm dealing with that voice anywhere in the project? Or does the freq plot change with every new line of dialogue/every word/etc.?

Josh,

The voice will be in the same RANGE throughout. But voices do changed day to day, and even scene to scene. A relaxed pair of vocal cords may resonate more freely, providing more bass presence than the tightened vocal cords of the same actor in a "stressed" situation. So like everything else in audio, it's consistency mixed with variability.

One thing you're NOT mentioning is the largest single challenge of recording audio on a movie set. That's preserving a robust signal-to-noise ratio. The signal is whatever you WANT to record. Let's say in a scene where one character is talking, your signal is a clean vocal track of those words. If you wish the voice to sound REALISTIC in the space of the room, you would consider the rooms natural reverberation part of the SIGNAL since it's part of the natural sound you wish to record. This makes the tones of a voice recorded in a closet - much different from those recorded in a large reverberant space such as a church. So that's SIGNAL.

But what if the SIGNAL in the natural space has TOO MUCH reverb, or echo, or suffers from other audio issues that you DON'T Want in your final track. Then the same reverberation, echo, or even the sound of the clock ticking on the mantle in the room that was previously a "natural" part of the scene, BECOMES noise.

Once you understand that - you start approaching every set, every room as a listening challenge to determine what sounds are SIGNAL and what sounds are NOISE. Then you set out to ELIMINATE noise and get the most accurate, strongest and cleanest recording of the SIGNAL that you can.

Part of the magic of the movies, is that often, much of what you eliminate from the SIGNAL - such as reverb or echo - can be added back in post AT PRECISELY THE LEVELS YOU WISH.

So this pushes for "clean" field recordings that concentrate even MORE on bringing back clear audio tracks without excessive color or room issues.

In your original question, you talk about analyzing frequencies in such a way as to be able to isolate and SWEETEN just the sound of the voice track and not have those changes effect the NOISE of the scene.

You need to realize that this might or MIGHT NOT be possible, particularly if the NOISE you've identified is at or near the same frequencies as the SIGNAL.

A great example is recording in a space that's TOO reverberant. How can you ask any system to keep the ORIGINAL voice SIGNAL - yet eliminate the reverb NOISE which is composed of PRECISELY the same signal - just reflected off other surfaces?

How do you extract the barking dogs sounds from the human sounds if they are largely produced in the same way and at around the same frequencies?

I think these are some reasons that most of the time, the kind of frequency manipulation you're asking about is very limited in motion picture practice.

It's SOOOO much easier just to record things properly in the first place - which means preserving the most robust possible S/N ratio - than to try to FIX THINGS IN POST with fancy manipulation.

Put another way you could say Audio is like a nice glass of iced tea.

Well created, it can be perfectly sweetened to taste after the fact.

But if it comes in from the field tainted with mud and leaves, It's VERY hard to filter out the crap to create something pleasing.

Finally, as to your question about whether it's better to "dip" VO around music or enhance certain frequencies of one or the other - the answer is that it's done both ways. If you wish the soup to taste saltier you can either increase the amount of salt added, OR add the same amount of salt to LESS soup and the effect is precisely the same. From your questions, I'd suggest doing some web surfing on topics like "gain structure" and "audio sweetening" and start building your experience as an "audio chef" after all, if becoming a great cook, was as simple as following something in a book of recipies - considering the number of cookbooks sold over history - we should all be eating every day like KINGS!

The truth is that the knowledge is the beginning. Going out and APPLYING that knowledge is the path to expertise.

Good luck.

Paul R Johnson

May 16th, 2010, 02:39 PM

If you're mixing a music product, then one of the common techniques is to give each instrument or sound source it's own space - but, if you cut out a space with eq, then you have destroyed part of that sound. It doesn't matter what it was, it won't be the same with a chunk chopped out. This might not matter, if the content is appropriate, but if it's a recognisable sound - maybe a music track with something well know in the section chopped out, it sounds wrong. However - if it sounds 'wrong' with the space made, then it was probably a wrong choice anyway! The snag with this, is that to do this properly, you need better eq than many editors have as standard. This means faffing about with audio editors, decent monitoring loudspeakers and a good ear. Sound always gets left till last.

Josh Bass

May 16th, 2010, 03:14 PM

Ok, let me go back. I may have confused someone.

this is for an animated piece. Voices are already laid down, and were all recorded in an improvised sound both. They sound nice, if a little punchy at times (recorded with an SM57--I tried the several mics available to me, and liked it the best)).

What I'm worried about is how to mix them with the sound fX and music I will bring in.

PS I thought voices WERE compressed in pro stuff? at 2 or 3 to1?

Bill Davis

May 20th, 2010, 01:02 AM

Dude,

Go ahead then and TRY compressing them. Nothing to lose there. But I suspect that what you're going to get if you listen to the result carefully over full range speakers or headphones is that compression will not only bring the voice you want forward - but also reveal a LOT of issues you didn't notice before.

That's because you'll not only compress and maximize the voice, but any hiss, rumble, room air conditioning, refrigerator compressors, bad fluorescent light ballasts and/or traffic noise that you didn't even know as a part of the track until you told the software to smash everything into a narrow range of loudness and bump the result up to FULL volume.

Most people never REALLY listen to their audio tracks. When they do, they realize that there's an excellent reason why professional VOs are done in ABSOLUTELY silent spaces and with low noise equipment including both Microphones AND processing electronics.

Just how it is.

Good luck, tho.

Josh Bass

May 20th, 2010, 11:14 PM

Am I wrong then? Like, when I'm watching simpsons/family guy/etc., are those vox not compressed? Am I hearing the dynamic range with which they were originally recorded?

Having seen my last short screened in a number of different venues, I didn't think the dialogue sounded that bad (or bad at all really) and it was compressed 2:1, threshold around between -20 and -40 (can't check it easily so going by memory). Long as you keep a "room tone" around to help even out sudden changes, it's not that noticeable.

Seth Bloombaum

May 21st, 2010, 10:09 AM

...Typically, music is highly compressed, your raw dialog isn't. Even without music, compression/limiting/volume max filters really help intelligibility - they become even more important against music...

Am I wrong then? Like, when I'm watching simpsons/family guy/etc., are those vox not compressed? Am I hearing the dynamic range with which they were originally recorded?...

Yes, the vox are compressed. What I wrote above started with raw dialog. From there, depending on where the piece is intended to be played, there may be 2:1 to 5:1 compression of the dialog, additional volume maximization of the whole program, and further compression/limiting/maxing by the broadcaster upon playback for air.

But, every distribution method calls for something a little different. Theatrical release is a different animal with much lower compression in the chain, but, typically justifies a level of dialog recording and editing that touches each word for volume, if needed, preserving intelligibility without so much compression. Of course, theaters are controlled playback environments...

Locally originated broadcast programming probably represents the other extreme of very little dynamic range.

Online distribution tends towards lower dynamic range, but there's really no standard approach for internet.

How will this piece be distributed? Where will it be heard? This info should inform your approach to mixing, including dynamic range.

Paul R Johnson

May 21st, 2010, 10:20 AM

I'm also pretty certain that many sat channels are also compressing yet again before broadcast. So the audio people probably spend a great deal of time on their audio quality maximising the available dynamic range - but the problem is really that now we're all digital, the available dynamic range is too wide. Hence the normalise tool that practically everything we edit and record on has. I think most of us mix on loudspeakers much better quality than the average TV - and now flat displays are the norm, most have pretty feeble speakers - so bass gets filtered off so they don't rattle too much. From channel hopping and often coming across the same programme the sound is frequently very different. NCIS on CBS and NCIS on FX here in the UK sound different. FX seems to sound better on my TV than CBS, suggesting that some extra processing is being added.

My own opinion is that too much compression is often added. I tend to use light compression on speech with a skilled artiste, but up to maybe 3:1 for somebody who either changes levels too much, or physically changes mic distance and can't control it. I don't like the sound when compressed this much - so I try not to.

Josh Bass

May 21st, 2010, 11:38 AM

distribution could be anywhere...theater at fests...tv...online.

Seth Bloombaum

May 21st, 2010, 02:14 PM

distribution could be anywhere...theater at fests...tv...online.

Then, I would suggest that's at least two mixes. One theater mix, and one tv/online mix, and probably a 2nd version of the TV/online mix that is the same mix but peaking at -3db.

Josh Bass

May 21st, 2010, 03:21 PM

Is that really how it's done?

I mean, let's say you have a movie. . ."Watchmen" (love it or hate it, it's an example). You're telling me the sound mixer (the post guy I mean) makes two or three completely different versions of the film, one for theatrical, one for Blu Ray/DVD, one for TV broadcast, etc.?

That seems like an insane amount of work. You'd basically have to do everything three times. . .compress, eq, ride/alter volume levels, etc.

I thought I read somewhere on here, in a thread several years ago, about creating one mix and then altering your master level for the different distribution outlets (e.g. one with peaks at -3, one with peaks at -10, etc.) rather than remixing everything.

Anyway, for the last several projects I've been using the full range (going up to -1 or whatever for loudest sounds), and have not seen any problems any time I've seen screened, or watched it online (haven't dealt much with TV yet). Have I just been lucky?

Bill Davis

May 21st, 2010, 03:42 PM

Josh,

Yes, that's REALLY how it's done. If your movie's going to be shown in a Dolby Certified 5.1 theater - the audio MIX better be 5.1. When that SAME movie gets chopped up for promos to post on the Apple and Microsoft web sites, what's the point of the 5.1 encoding? It's useless. So there's a simple STEREO mix for that. (sometimes MONO!)

What's the point of wasting digital encoding space for the subwoofer channel (which EVERY mainstream theatre expects) in a mix that will NEVER be played on a subwoofer? Same for rear channel information. That stuff has to be assigned to somewhere in the main stereo mix if you're going to hear it on a two-channel system.

The even larger issue is that depending on the recording techniques used originally, you can get phase problems like comb filtering that can screw up audio signals when you collapse stuff originally recorded in stereo down to a mono signal.

So audio mastering is always a BIG DEAL.

I had a personal experience a long time ago. I volunteered to do sound for a friends "digital feature" - he did the audio mix and mastering himself and somehow screwed up the phase relationship between the right and left channels of his playback DVD.

The sound of ALL DIALOG as heard in the theater at the premier was HORRIBLE. Phasing in and out, thin and screechy!

On the DVD copies they handed out to the cast and crew - the sound was GREAT.

Recording audio AND mastering audio is IMPORTANT.

Always and forever.

BTW, the discussion of "compression" finally brought up an important point.

YOU can compress. The mastering engineer can compress. The Satellite uplink engineer can compress. And there will almost certainly be an automatic gain control at the final transmission point for broadcast that essentially "compresses" the signal.

That's FOUR stages of possible compression between the audio and the audience.

So how much did you want to add in the beginning?

Josh Bass

May 21st, 2010, 04:27 PM

I'm not going to lie, there won't be that much work put into it. . .it will be a simple stereo mix. It's to going to be "hollywood" quality sound; it's going to be ultra no budget DIY-quality sound, and I'm okay with that. The dialogue compression would be enough to make it pop compared to any sound around it. I"m thinking from everyone's comments here that 2:1 will be fine.

Let me put it this is way. . .is there a way to make one mix and repurpose it fairly easily for different media? My instinct is to go ahead and utilize the full range , and deal with problems when they come. This is a no-budget project and not for a client.

When you guys make your no-budget shorts/features, how do you handle this side of things? Do you actually mix/compress/EQ 3 or more times?