View Full Version : looking for a iriver replacement
Steve House June 13th, 2012, 03:58 AM ..
About the way I sync audio tracks, I never do that by looking at the persons mouth, I always do it by looking at the waveform and first look at a obvious peak in the sound in the beginning that I can find back in all my audio tracks, I use my xh-a1 as master to sync all the rest of the audio with. ...
Are you looking at a true impulse noise, like clapper sticks banging together or even a handclap, in the camera audio track as you head-end sync point?
Noa Put June 13th, 2012, 02:31 PM Well, in my tests I clapped my hands but I don't do that in church, an obvious sound peak could come from sliding my Canon into my manfrotto tripodhead, it makes a distinct clicking sound once the safety pin moves into place which is easily seen in the waveform on all recorders. I don't listen to the sound untill I hear that clicking sound, on my xh-a1 waveform those peaks are easily to be found as the clicking sound appears closest to the camera's microphone.
Steve House June 13th, 2012, 04:29 PM Well, in my tests I clapped my hands but I don't do that in church, an obvious sound peak could come from sliding my Canon into my manfrotto tripodhead, it makes a distinct clicking sound once the safety pin moves into place which is easily seen in the waveform on all recorders. I don't listen to the sound untill I hear that clicking sound, on my xh-a1 waveform those peaks are easily to be found as the clicking sound appears closest to the camera's microphone.
So you're saying that both the sound recorder in the groom's pocket and the camera are rolling when you attach the camera into the tripod mount and the 'click' of the mount's pin is clearly audible on both the camera audio (which I'd expect) AND on the audio track from the groom's lav & recorder (surprising)?
Noa Put June 13th, 2012, 04:45 PM Yes, it could be depending how close I am standing to the groom, but like I said, it can be any peak sound, that might be even louder, it doesn't necessarily have to be the tripod head clicking sound.
Noa Put June 13th, 2012, 04:54 PM Maybe I have to clarify my statement of not looking at a persons mouth to synchronisize, I don't look at persons speaking to sync the sound with their mouthmovements but I just will pick a part where I hear the first person speak and I will find that part in all my audio recorders, then I do some rough synching up based on what I hear again and then I look at the waveform to find obvious peaksound and once I found that I"ll synch precisely on that peak and verify again just by listening. I always use my xh-a1 as a "base" and sync all other recorders on that, I make them sound equally loud so I will hear out off sync issues immediately. Then I check again at the end and just listen to each track separately with my xh-a1 recording running along as a base audio track, if there is drifting in audio I will hear it, I use my waveform again just to verify and look for obvious peaks to compare.
Greg Miller June 13th, 2012, 06:37 PM Noa,
It sounds as if you have given a lot of thought to evolving your "sync test" procedure. Assuming that you are zooming in to the frame level (and I believe you have confirmed that in an earlier post), then I see nothing wrong with the procedure and I tend to trust your conclusions.
So perhaps you are indeed blessed with an unusually copacetic batch of equipment... the audio gods have smiled upon you!
One thing I have often wondered in the past, when people are discussing sync drift, is this: Which is really drifting? The camera, or the audio recorder? I'm not sure whether anyone has actually tested for that. Maybe audio recorders are really stable, and the cameras account for most of the drift. (I tend to doubt that, since cameras should have pretty complex and accurate time bases... but I don't know for certain.)
One could, for example, record an hour from a known time source (such as the WWV time broadcasts here in the USA). Then upon playback, carefully sync a "beep" at the beginning of the track with the current broadcast, and compare the two sets of beeps (recorded beeps, and live beeps) during the course of the hour-long playback.
A similar test could be made with the video, by filming a running time code display, then comparing upon playback.
Of course human reflex time would enter into the above tests, so it would not be simple to perform them accurately.
Just speculating here, but thought I'd throw out this little nagging thought at the back of my brain. Maybe someone has conducted more detailed tests, or knows more about the innards of the equipment (especially the camera time bases).
Noa Put June 14th, 2012, 12:29 AM For these type of questions I usually look at National Geographic :) I also wonder then why videocamera's, small and bigger, different brands, shooting to different formats mixed in multi-cam shoot stay synced all the way. I have done a 4 multicam shoots before in those circumstances without an issue, is it so that just because they are camera's we should not question their ability to hold sync while with audio recorders you should?
Steve House June 14th, 2012, 04:08 AM Because of the need to synchronize all the various elements in the broadcast chain from original camera through to final monitor, the tolerances for camera sample rate clocks are tighter than is the case for low-cost audio recorders. If a camera runs even a fraction of a percent off of the nominal sample rate the picture gets scrambled. But consider the case of an audio recorder that is 1% too low. It is supposed to be at 48,000 samples per second but it's only running at 47,520. If you record exactly 1 minute of audio it should contain 2880000 samples but if fact it will only contain 2851200. When you then play it back ON THE SAME RECORDER, as is usually the case in the applications these recorders are intended to fill, it will use the same slow sample clock to control the playback duration and those 2851200 will play back in exactly 1 minute. But now put that same file in a playback device that is running at the proper 48000 samples per second. When it plays the 2851200 samples it has to work with in this file, only 2851200/2880000 of a minute will have elapsed - what SHOULD be playing in 60 seconds as far as the original recorder is concerned actually plays in 59.4 seconds. At 25 FPS (since Noa is in PAL-land), that 0.6 seconds amounts to 15 frames of "drift" in only 1 minute of recording. Luckily even the cheapest audio clock circuits are better than that. But it doesn't take much to take it far enough off of nominal to get a few frames fast or slow in, say, 10 minutes. It turns out that the NLE's we all use are based on the assumption that the camera is God when it comes to sync and their sample clocks are running at EXACTLY 48000 SPS (samples per second) - everything else has to conform to the clock embedded in the video file. Even if the camera was slightly off, not enough to scramble the image but still off, there would be no detectable change in the image - motion would be a little faster or slower than it was in the original scene but we'd never notice it. But if a separate audio recorder is even the slightest bit different from the 48kHz the standard calls for, the timing on the audio file will gradually diverge from that of the video file when they are married together under the same sample rate clock in the NLE timeline. One hour of PAL video has 90,000 frames. To maintain frame level accuracy over that hour, your audio recorder's clock can only deviate from the camera's by no more than 0.00011% - do'able but it takes some sound engineering and very tight production testing and QA monitoring and those don't come cheap. It's that engineering and QA that makes a Lockit box, which is essentially ONLY a highly accurate clock circuit, cost in the neighborhood of a thousand bucks US.
Noa Put June 14th, 2012, 04:24 AM Something tells that you are involved in some kind of audio production every day :D
Thx for the clarification, it's too complicated for me though, I"ll just continue looking at my waveforms. :)
Bill Grant June 14th, 2012, 05:16 AM I was gonna say, if we gotta work this hard to break it... it probably ain't broke. :-)
Bill
Greg Miller June 14th, 2012, 11:47 AM Steve,
Thanks for confirming what I had suspected (and explaining the math) about video time stability.
If you set up a few different cameras side by side, and shot the same scene, would they all be pretty close (within a frame or two) at the end of an hour or so?
Still, I'm a bit confused as to why a clock rate error would cause the picture to get "scrambled." Wouldn't that just be analogous to cranking your film cam at 23.5 FPS instead of 24.00 fps? Sure, there is the additional complexity of sequential scanning, but I still don't see it.
After all, when NTSC TV was switching from 30.00 FPS (monochrome) to 29.97 FPS (color), that didn't cause any "scrambling." Each scan line took slightly longer, each frame took slightly longer, but the phase lock loops in the receivers could easily accomodate that change and everything worked just fine.
So, then, why would a slight clock rate error in a digital video cam cause picture "scrambling"?
Steve House June 15th, 2012, 04:33 AM It can scramble when you have to cut between or mix sources with slightly different clocks. If you cut from one video source to another, the frame boundaries and the scan lines within the frames must be exactly "in register" with each other or there will be a glitch. In other words, camera A and camera B must start and end a field at exactly the same instant. That's why for live switching, each camera plus the switcher plus the title generator, etc are all genlocked to a common house clock. As it makes its way through a typical broadcast chain, the signal has to pass through any number of different devices, all of whom have their own clocks. One paper I read gave the example of a remote feed from a camera that doesn't have access to the studio's genlock that needs to be switched with picture from in-studio cameras that do. While there are circuits that can correct for minor variation from one device to another or for the fact that two asynchronous devices won't have the frames exactly "in register", they can only go so far. If the remote camera clock is too far from the studio clock driving the switcher, the picture will have problems.
Greg Miller June 15th, 2012, 06:21 AM Thanks, Steve, I'm starting to see some of that.
Certainly I'm aware of the genlock situation, going back to the old days of vacuum tube camera chains. In a live switching -- or worse, a live fading -- situation, of course every source would need to be exactly in sync. You'd need to be at exactly the same point in the same scan line, or everything would be a mess, because you'd always be transmitting sync from the sync generator, and not switching sync from one source or the other.
But I still don't quite get it, in terms of editing. Let's say two cameras shoot a one-minute scene. One camera has 1798 frames, one one camera has 1795 frames. When these two files are imported into the NLE, there are now two sets of frames.
Couldn't you view either set of frames, by itself, without any problem?
And, if so, then if you start your edit with file A, and, at frame 500 you cut to file B, why would there be a problem? You're doing it at a frame transition, right?
In other words, does the NLE know or care that one frame from file A represents 0.0333704 seconds, while one frame from file B represents 0.334262 seconds? I'd think that once the files are in the NLE, a frame is a frame is a frame.
(Obviously, I work only on the audio side, someone else works on the video side... But I would like to understand this, just for the sake of increasing my overall knowledge of the world.)
Richard Crowley June 15th, 2012, 09:49 AM But I still don't quite get it, in terms of editing. Let's say two cameras shoot a one-minute scene. One camera has 1798 frames, one one camera has 1795 frames. When these two files are imported into the NLE, there are now two sets of frames.
Couldn't you view either set of frames, by itself, without any problem?
And, if so, then if you start your edit with file A, and, at frame 500 you cut to file B, why would there be a problem? You're doing it at a frame transition, right?
In other words, does the NLE know or care that one frame from file A represents 0.0333704 seconds, while one frame from file B represents 0.334262 seconds? I'd think that once the files are in the NLE, a frame is a frame is a frame.
(Obviously, I work only on the audio side, someone else works on the video side... But I would like to understand this, just for the sake of increasing my overall knowledge of the world.)
Even when (not IF) camera clocks drift the bottom line is that they produce a different number of frames for the time period in question. If you are doing live switching (and especially wipes and dissolves) the video sources must be in exact lock-step together. And after the advent of color, the sync requirements bar was raised by an order of magnitude with not only vertical and horizontal sync-lock, but color reference (burst) sync as well to a fraction of a degree of the sine wave.
However, shooting multiple cameras without genlock indeed produces video files with varying numbers of frames for a given time period. Sometimes it is only a few frames and makes no difference, but sometimes (with long time periods and/or less accurate equipment clocks) it may add up to several seconds. And a similar phenomenon with video camcorders keeping pace with non-sync audio recorders.
To directly answer the questions, When dumped into a video NLE, yes, of course you can view either set of frames without any problem. And again, there is no issue with transitions related to the video frames themselves. Dumping your video files into an NLE has the same effect as "genlocking" the cameras together for the purposes of frame synchronization. However, it does NOT solve the time accuracy problem.
The issue is with what period of RealTime each frame represents. Unless you are in Boulder, Colorado and have your camera genlocked to the National Institute of Standards and Technology (NIST) time reference station (WWV, et.al.), no camera on this planet perfectly produces a frame in exactly 1/29.97002997002997... seconds. Even when cameras are genlocked together using a really accurate sync generator, they are never perfect. But it is close enough for what we do every day.
And remember that we got those weird (1/1.001) numbers when we devised the NTSC color standard to be backward compatible with the existing (RS170) monochrome system. And because of that difference between SMPTE timecode and Real Time, we must apply the "drop-frame" kludge to keep timecode synchronized to wall-clock time.
In the case of sources (video or audio) that don't agree in time period (even though "frame-synced" together in the NLE), we must apply corrective measures to re-establish video/audio synchronization (or video/video synchronization when cutting between different cameras). I do mostly live event musical productions and IME this is easier to do than it is to explain it. The easiest way to do this is to simply "pull up" the video track to match the reference audio track wherever there is a video transition. This simple method is sufficient for most purposes.
Greg Miller June 15th, 2012, 03:27 PM Thanks, Richard. I certainly understand that for live switching "absolute" sync is necessary... and my earlier question ignored live switching and asked only about the NLE ramifications. I could not understand how a slight difference in frame rate would cause pictures to be "scrambled" in an NLE situation.
shooting multiple cameras without genlock indeed produces video files with varying numbers of frames for a given time period.
And yes, I realize that if you lay down two video tracks and one or more audio tracks, each from a recorder with a slightly different clock rate, there will of course be "drift" in regard to real time, which will require a bit of extra work to fix.
there is no issue with transitions related to the video frames themselves. Dumping your video files into an NLE has the same effect as "genlocking" the cameras together for the purposes of frame synchronization.
That's exactly what I thought. Hence I was puzzled by the comment that different clock rates would cause the images to be "scrambled"; perhaps I misunderstood something there.
OK, if I understand Richard correctly, the frame rate won't have any effect on the image quality of the individual frames; it will only affect the exact number of frames in a given time period. So then, to get back to the earlier question: is a video camera's timebase necessarily more accurate than an audio recorder's?
Since we're talking here about audio recorders that are basically consumer level (Tascam DR-0x, Zoom H1, etc.), let's talk about low price consumer level video recorders (take your pick). If we line up a few low price ($500?) video recorders and a few low price (100?) audio recorders, would the video recorders necessarily -- or even likely -- have timebases that are significantly more accurate than the audio recorders? Why or why not?
Richard Crowley June 15th, 2012, 04:04 PM is a video camera's timebase necessarily more accurate than an audio recorder's?
Since we're talking here about audio recorders that are basically consumer level (Tascam DR-0x, Zoom H1, etc.), let's talk about low price consumer level video recorders (take your pick). If we line up a few low price ($500?) video recorders and a few low price (100?) audio recorders, would the video recorders necessarily -- or even likely -- have timebases that are significantly more accurate than the audio recorders? Why or why not?
Simply, no. Why would we expect above-average performance from any level of consumer gear? There is no fundamental difference between the kind of components used in low-price (<$1000) camcorders or low-price audio recorders. A mass-production crystal or oscillator is a commodity item just like any other component that gets soldered to the PC board. They are what they are. Some people are lucky and experience low error rates, but that is simply a law of averages. If you want real sync, then you must use equipment that sends/receives clock information from others.
In another forum I prognosticated that it seems likely that someone will (or should!) offer an audio recorder that accepts the video monitor signal from your camcorder (or DSLR) and uses that video clock to phase-lock the 48KHz (or 96KHz) audio sample clock. Modern digital technology makes this so simple I'm surprised this isn't available already. It is likely only the (relatively) small size of the market that keeps it from being available now.
Steve House June 15th, 2012, 07:23 PM Agree with Richard - was thinking of higher level cameras than the sub $1000 consumer cams when I wrote the camera would have a better clock.
|
|