View Full Version : 4:4:4 10bit single CMOS HD project



Kyle Granger
April 1st, 2005, 01:29 PM
If a thread is just calling a routine, there should be no additional latency. It should be as fast as if it were called by WinMain(), which is just a thread too.

BTW, Linux is not simple and nor fool proof (but a damned good OS). It is possible to write inefficient and buggy code on any platform, even on the Mac. ;-)

Kyle Granger
April 1st, 2005, 01:36 PM
If your display is chewing up 60% of the CPU (this is also true when not writing?), you may want to skip every other frame on the display and bring it down to 30%.

60% is way high.

Wayne Morellini
April 1st, 2005, 09:39 PM
<<<-- Originally posted by Kyle Granger : If a thread is just calling a routine, there should be no additional latency. It should be as fast as if it were called by WinMain(), which is just a thread too.
-->>>

Obin, what is in your inner loops? If you are calling routines each time you get a pixel you will be wasting a lot lot of time on latency. One way to get around this is too flatten out the code (or second, simple at this stage, choice compile in line) where you eliminate as many subroutines as possible by integrating them into one routine, in the inner loops. If you have profiled your software properly, you will know which loops the programs spends 90% of it's execution time in. It helps, a lot, to define the work to be done on each pixel at once (i.e capture has it's own speed/timing separate from storage and can't be integrated together conveniently). We would probably be surprised at the number of developments that don't model this behaviour properly, so it is worth a rescan. My memory has gone again, so I have forgotten the third and most vital thing, will try to update if I remember again.

I have been involved with Forth, and am aware of the large (unseen) latency problems in windows PC systems. In the old days hits of 1000's of percent happened, I doubt much of that happens in XP, but from using XP it looks far from ideal. So, 50% of your execution cycles could be slipping away, and that is just the ones you can prevent (how come you think the Mac always does so well).

I think it is good to profile the weaknesses of your OS/PC, and work around them.

Wayne Morellini
April 1st, 2005, 10:13 PM
<<<-- Originally posted by Kyle Granger : If your display is chewing up 60% of the CPU (this is also true when not writing?), you may want to skip every other frame on the display and bring it down to 30%.

60% is way high. -->>>

I forget where ever Obin is using the 3.4Ghz P4, or the 2Ghz PM, but wouldn't 30% be high, even for a software solution?

I know Obin is using GPU programming for display, so I would expect it should be closer to 6%. What I said before about the slow software emulation of missing GPU functions problem, I would still suggest (and too still keep in mind those latency problems).

Obin:

There was another thing I forgot (and those sites I suggested about how to configure a machine for best performance would help) writing the inner loops code so you can force it to stay in the cache. If the code strays outside the cache, a page has to be read in, and another potentially read out, only for the process to be reversed when it strays somewhere else, that could easily consume 30% (and making a call to a foreign routine who's setup in the cache you have no control over might just do that, which could also be a problem with GPU software emulation). A page is big, thats a lot of cycles, even a subroutine call can do a lot of cycles before you hit new code. Subroutine oriented languages tend to have a lot of problems in modern PC machines (and others) partly because their hi-speed memories are not made for low latency non sequential instruction flow out of cache.

I don't know how C compilers are in general nowadays, but the code they produced used to perform pretty poorly compared to the Intel compiler, and I think MS eventually improved theirs (don't know if it was to the same level as Intel). But you can get a missive boost switching to the best compiler back in those days. Worth finding out about.

As long as you have the active model of how the machine actually physically works (and the OS) in your head + the experience, you can see lots of issues you can never see in the code itself. I only have the physical machine sufficiently mapped in my mind, so I can make good guesses, I suggest buying advanced books on real-time (with machine code as well as C) games programming if you really get stuck.

I am going to take a hunch, knowing how lousy PC's can get, that the performance difference between unrefined code and the most refined code might be ten times on a Windows PC, so if you have improved your performance by ten times since you started coding, you are close to the maximum you can get. Does that sound possible Kyle?

Kyle Granger
April 2nd, 2005, 05:32 AM
> so if you have improved your performance by ten times
> since you started coding, you are close to the maximum
> you can get. Does that sound possible Kyle?

I suppose a factor of ten can well be possible, but honestly, I haven't thought about it too much.

Obin,

A few more suggestions, just to get your application working

1) Display one out or three images. This will give you 8 frames/sec, and should bring your graphics CPU usage down to 20% (from 60%). This should let you work in peace.

2) Profile where your Display CPU usage is going. Is it in the processing of your RAW 16-bit data, or is it sending the data to the GPU and displaying it? These are clearly separate tasks, easy enough to comment out to profile separately.

3) Try displaying only one of the primaries. I.e., for every 2x2 square of Bayer pixels, display only one of the green pixels as a monochrome (luma) bitmap.

4) Consider using OpenGL for the screen drawing. Sending a bitmap to the GPU and displaying it is only a few lines of code. There is a lot of introductory code available on the net. It should not be complicated at all.

Good luck!

Obin Olson
April 3rd, 2005, 12:35 AM
thank you Kyle..I am working on all your ideas

Obin Olson
April 3rd, 2005, 11:51 AM
we are doing a bunch of re-coding now with the software to streamline things a bit....and I am going to get a new graphics card to see if that helps the cpu%% overhead with the display..looks like the older gpu card I have may be spitting the tasks back out to the cpu, giving us the very high display cpu%

Wayne Morellini
April 3rd, 2005, 11:44 PM
If you get a new graphic card to measure the results, get one that is closet to what your code and GPU shader package depends on. This will either be the latest mid to high end card from Nvidia or ATI. Nvidia has the most advanced shaders in the cards over the last year or so, just not often the fastest at the functions the games have been using a number of times. ATI either has similar capability in their latest top cards now, or will by the time the xbox2 comes out (DX10 compliant).

Either Nvidia is a clear winner for you (some of their lower end cards have same shader functions) or the functions you are all supported on ATI. It is an compromise as to which, as ATI may have low cost one by end of the year (or maybe only in the xbox2) with Direct X 10. DirectX 10 would definitely outclasses everything out for shader programming. Either DX Ten, or 11, you can whack most of the image code directly on the card only dumping results to the computer PC to be saved (as it is to support full most program flow capabilities with integrated ). Some of us want to implement new true 3D raytracing software that will make ordinary 3D look second rate, that is difficult on a PC.

Go to Tomshardware.com, www.digit-life.com, or extremetech.com, to find articles on the current situation with cards and direct X.

I don't know about latest Intel GPU, but most integrated GPU's area compromise and support limited hardware functionality. ATI or Nvidia, might have near desktop functionality on the GPU (but have problems with shared memory nonlinearly stealing memory time (making memory loads jump from place to place, which is the worst things to do, unless managed). But some integrated chips have their own memory. As long as it is big enough for you, the programs and OS to occupy at once, then yo will get best efficiency.

What card were you using, Obin? You should be able to map the low level functionality of a card to the instructions/functions you use, from it's formal low level specifications on it's web site, probably listed in a whitepaper type PDF document (or email their development section). They also support the same functionality in different ways, but apart from Nvidia and ATI (maybe slower Matrox and Wildcats) there is no other cards to look at in terms of completness and performance.

You can get around many issues with integrated as well, by finding out what it does do, separating the the GPU shader supported execution into one batch and the stuff that has to be done in software into your own customised software routines (by passing DirectX) (as much as is feasible for performance). By a process of program code factoring.

Have a good day.

Wayne.

Obin Olson
April 4th, 2005, 08:27 PM
we are now testing a bypassed method of image calculations without GPU support to see what the results will be...looks like our current setup has the gpu choking and shooting all the work BACK to the cpu...providing our 50-60% cpu numbers just for preview!

I will know more in the morning...would it be to much to ask for some PROGRESS!!? ;)

Wayne Morellini
April 5th, 2005, 04:25 AM
Good move, how many percent are the bypassed routines doing?

I have news on he next ATI chip with new shaders, mid year, the low end or low powered versions might be end of the year (I imagine something like this may come out on main-boards). Whatever solution you go for, try to get involved with that vendors official development section, they should have answers to many of the questions, hopefully in low cost support documents (Intel/AMD and Microsoft are also good sources for development information (I think). http://www.gamedev.net/reference/ have good resources too, and igda.org, and gamasutra are also spots that may help.

I should be posting links (if God is willing) about new silent coolers, storage etc in the technical thread in the next day or so. I should also be posting technical design tips, which I haven't, in times past because so much stuff is an potential source of patentable income, but some stuff not or less so.

Obin Olson
April 5th, 2005, 06:40 PM
well well..I get 36% cpu load now with the image resize being done by the cpu and then feeding that to the gpu...this is working very well but we still get choked up with the save AND display at the same time...

I have a profiling test app now from my programmer that I will try. It will tell us what the HECK is going on in my dfi system here...he says things are working on his system but not mine..Kyle any ideas why we would have display refresh issues when we start saving raw data on the disks. Display AND packing only take about 45-50% cpu and I KNOW saving will not take 50%!!! it's like the thing has timing issues..we did try your suggestions from before..anymore ideas pop into your head..we are still using DIrectDraw for display AFTER pixel packing and resize is done with the cpu

Wayne Morellini
April 5th, 2005, 07:44 PM
<<<-- Originally posted by Obin Olson : well well..I get 36% cpu load now with the image resize being done by the cpu and then feeding that to the gpu...this is working very well but we still get choked up with the save AND display at the same time... -->>>

I am not Kyle, but must say, that is more like it. Strange, image resize should be a basic function on most GPU, it should not effect. Maybe the way resize done. But I assume you talk about resize from one screen resolution to another. Cards should have hardware (nowadays) that auto displays an image at a different resolution than what it's stored at, virtually for free, no resize needed. Still are we talking about a resize after the GPU is finished versus a resize before he resolution is finished. If the GPU is stalling then then resizing pre or post would explain the differences.

The rest of the stuff (what did you mean by "dfi system" anyway). It most likely is that memory access timing thing I mentioned last year. To many things competing with memory at the same time stalling the memory pipeline causing long delays in access to main memory (keep in cache and sequentialise everything that does not need to be parallel, then adjust all that to work around each other).

Kyle Granger
April 6th, 2005, 03:04 AM
Obin,
Wayne is absolutely correct when he says the GPU should be doing the resize: you get that for free.
What is the size of the bitmap you are creating? How are you doing the Bayer interpolation?

Obin Olson
April 6th, 2005, 08:26 AM
about 960x540 or 1/4 the resolution of the 1080 image..this is what we do so that the thing will fit on a small screen 1024x768

Obin Olson
April 6th, 2005, 08:32 AM
we take the RGGB and make it one pixel instead of 4..this is what the GPU was choking on and spitting back to the cpu

Kyle Granger
April 6th, 2005, 08:50 AM
Obin,
> we take the RGGB and make it one pixel
That sounds reasonable. The CPU should create the one pixel from the four Bayer pixels, and then you send 960x540 RGB to the GPU. It Should Just Work.

Obin Olson
April 6th, 2005, 09:16 AM
it works well with pixel packing and resize the cpu % is at 50%

Jonathon Landell
April 6th, 2005, 09:26 AM
Go Obin! Go Obin!

This is addicting. I'm not even a programmer, and I've gotten to where I'm checking this forum 4, 5 times a day. You guys are freaks of nature. I wish I could be a freak. :)

Enthusiastically wishing you the greatest success...

-Jonathon

Obin Olson
April 6th, 2005, 02:44 PM
thx john! if I keep at it I have a feeling I will get it done.. the more I fight the more I am willing to keep at it till I get some RESULTS!


I really want RESULTS from our project after shooting the VariCam..such a dirty image when you want to color grade it!

Obin Olson
April 6th, 2005, 03:02 PM
I hate message boards. really. such a TW


so on a better note I have the test results from our profiler and am awaiting a reply from my programmer...hope it will be a good one ;)

once again I had the chance to shoot a 30sec spot on the Panasonic VariCam...I sure am glad I have not been fooled into buying that thing...try and do ANYTHING to the images in post..sooooo much noise!! SOOOOO much compression..while some think the DVCPROHD codec is great I am not one of them at all unless you never touch your images in post(I never shoot without a lot of post color grading)..our camera will beat the crap outa the VariCam images!!

VariCam would be great for DOC work and ENG/EFP production that does not need to be fooled with in post

Radek Svoboda
April 7th, 2005, 06:14 AM
Obin, 1080p has 2.25x more pixels than 720p, so new Panasonic mini camera will be 2.25x more compressed in 1080p.

Is your camera 1080p Obin? Will it use Altasens CMOS? If so. it could better than HDCAM.

Jason Rodriguez
April 7th, 2005, 06:52 AM
Have to agree with you on that one Obin.

DVCProHD the way the Varicam does it IMHO is only a couple steps above HDV.

HDV is totally unuseable for me.

DVCProHD can be very nice, but if you push it too far, it does fall apart quickly. But the reason is not necessarily the codec itself, but the way that Panasonic records to the codec.

First off, at 1280x720, the codec is actually prefiltering to 960x720. So you're not dealing with square pixels at the native codec level.

Second, although the codec is running at 100Mb/s which is pretty nice, the images you get from the Varicam only have that data-rate when you're recording at 60fps. When you record at 24fps, you're still only using 100Mb/s, but the useable frames are only giving you a data-rate of 40Mb/s, which isn't that much better than 25Mb/s DV for a HD-size image!

That's what I actually like about the Cinealta. When recording at 24p, the camera actually changes the speed of the tape, so you're using all the data-rate the codec can support for any given frame-rate. While only 185Mb/s (including audio) for 1920x1080, at least at 24fps you still get 185Mb/s. Not something less like the Varicam gives you.

If the Varicam actually recorded at 100Mb/s for each frame-rate, you'd be singing a different tune about the compression. But unfortunetly it doesn't, so at frame-rates under 60fps, you're not getting the data-rate you think you're getting, you're getting much less. Or another way of putting it is that at 24fps, the image is 2.5 times more compressed than you think it should be.

Leon Nox
April 7th, 2005, 10:21 AM
i 've just seen a project from varicam...it's really ice to see a sooooo big picture , but as you look at midtones and not lightened area you see noise noise and noise ...not even try to adjust gamma ;)

Obin Olson
April 7th, 2005, 11:28 AM
Radek: we will use Altasense yes..


Jason: so true..how can they still get 60k for that thing?!
ugghh and the 960 images!?!?! what the heck is that all about! what a JOKE!

Obin Olson
April 7th, 2005, 05:46 PM
I wonder what the new little panasonic will be like? if it was VariCam quality level for the 10k they are going to ask I would say that is about the right price..

so...looks like we have some weird things happening in our system..seems that we have some strange delays and timming issues...we are looking into it now...more when I get some info...

Jason Rodriguez
April 7th, 2005, 06:47 PM
Well Obin, the Varicam is 5-6 year old technology.

HDCAM prefilters as well to 1440x1080 instead of the full 1920x1080 raster.

The only tape formats right now that do not pre-filter are HDCAM-SR (brand new), and Panasonic's D-5. Except for the Panavision Genesis, there are no camera's with built-in HDCAM-SR recorders.

I did see a prototype for a P2-based D-5 recording camera though at last year's NAB. Approximately the same form-factor as the Varicam. With FILM REC mode, that should be a very interesting camera, as long as it does 1920x1080/24p.

Obin Olson
April 7th, 2005, 10:29 PM
Jason what do you shoot anyway? are you a feature DP? high-end commercials? pron?(gotta be as the "fleshtones" are not good enough with 4:2:2 HD)

LOL :)

Jason Rodriguez
April 7th, 2005, 11:20 PM
Short films (up to half-hour) and commercial/special effects stuff right now. Would like to move up to features in the future, but right now my main line of work is in post production, directing, and visual effects supervision.

I'm more of a Director/Visual Effects Supervisor that DP's, rather than just a pure DP.

Try to shoot on the highest-end formats I can for special effects stuff, which is typically right now a Cinealta with high-end lenses (such as Digiprimes or the HD Primo, haven't used the new Fujinon HAe-series yet so can't comment on those) into a disk-based DDR system for uncompressed 4:2:2 or 10-bit 4:4:4 recording to DPX files. Love to shoot with a Viper, but haven't gotten the opportunity yet.

When on location, I just shoot HDCAM right now, or 16mm. Very little 35mm work around here.

And I definitely don't shoot porn ;)

Wayne Morellini
April 8th, 2005, 01:52 AM
Jason,

I have to agree with you. After I was told the spec of the format I quickly realised that everything was not as good as it seemed (though I did read that it was supposed to be 10 bit recorded). 6.?:1 is only going to be equivalent to 13.?:1 Mpeg2 compression at the most (maybe a bit better for motion). If they went to Mpeg2 at 100Mb's instead, then we would be getting a clear winner.

But it turns out that the new JVC will do uncompressed out upto 60fps, and talk that the Pana might do the same. They are still 1/3 inch chips though. So a documenters delight they may stay.

Obin Olson
April 8th, 2005, 08:39 AM
Awesome Jason...about like myself...I would say I am more the pure DP/Editor though as my brother does all the effects/animation/compositing/greenscreen/motion tracking etc on our work ;) sorry 'bout the porn..it was a joke trust me...

Obin Olson
April 8th, 2005, 08:40 AM
running lots of tests today on our system to pin down what is going south causing the issues we have been having for weeks..

Rob LaPoint
April 8th, 2005, 08:31 PM
Obin, all I can say is that at 5:00 on a Friday when the rest of us are at happy hour you are earning every great reward you have coming when this project gets finished. Great work man!

Obin Olson
April 9th, 2005, 09:17 AM
Steve N. what is the highest you think we can run the 3300rgb before noise starts to set in on the image? I need to make SURE the rolling shutter crap is not going to be an issue and I know the Altasense has a MUCH higher MHZ then the 3300rgb right? what is the most mhz we can push from the pci-x epix card?

Obin Olson
April 9th, 2005, 03:17 PM
Kyle:

I've been analyzing data since last night. I now have a pretty good
> picture on what to do next to get this to work. For some reason there is
> some latency in writing to disk and the system does not perform at full
> speed. I will be doing some trials on this in the coming days with different
> methods of writing to disk to see if I can improve the speed. Theoretical
> speed should be sufficient, but it is not operating at full speed in
> reality. We are getting about 80MB/sec on the writing operation now, and we
> would need much more nearer to 130Mb/sec to do it in he time we have. Right
> now we have 41.66ms between frames at 24fps (1000/24). The loop overhead is
> very low at 0.013ms. Here is what is going on in this time:
>
> Framebuffer read: 9ms That is copying data from the framebuffer to memory
> via the sdk
> Packing the data to 10bit: 6.8ms
> Converting data for display: 2.5ms
> Displaying data: 3.6ms
>
> Thus our processing overhead is around 21.9ms and it leaves us
> 19.76ms for the actual saving to disk. The fastest method I have tested so
> far gives us 34.9ms for doing this. Thus the system does not work properly
> as it takes too long to save. I have some ideas to improve things in the
> frame overhead.
>
> It seems that DirectX is not helping there and by dropping it I
> might be able to gain about 1ms form the overhead. We can probably also save
> another 1ms or two so by displaying during recording at 1/8 resolution
> instead of 1/4. Another thing would be to increase efficiency of your
> memory. I do not know if you can get faster memory than what you currently
> have there, or overclock the board to get faster memory speed, but compared
> to my board you are about 3 to 4ms slower on framebuffer read than I am. You
> are faster on most everything else though. So lets say we get an extra 2ms
> from memory and CPU overclocking. That would reduce the overhead by around
> 5ms if we get lucky. I will do some test for this in the coming days to see
> if that can be done. I will also play with the parameters for writing to
> disk and research to see if there would be some high performance disk I/O
> routines that would be faster than the Windows API WriteFile that I am
> currently using. So far I have not found much on this.
>
> In the coming days I will see how much overhead I can reduce in the
> routines, and also I will see if we can get the disk throughput up. It's too
> bad that you do not have a twin disk SCSI setup as we could use them both
> and effectively get rid of the problem as using the overlapped I/O API would
> be ideal in a twin disk setup.
>
> BTW, I want to also try the overlapped I/O routines, which are a bit
> slower, but might prevent reentrancy. I will keep you posted there. We are
> getting there slowly...

our tests show 140MB/sec write speed from a disk speed test app...that is for the 2 raid sata 10k/rpm disks

Obin Olson
April 9th, 2005, 05:10 PM
ok does anyone know if this would help us?

File Mapping: What Why When?


File Mapping is an easier and faster way to access files, once you know how to do it. It's one of the capabilities of Windows that is so good, you can't understand why so few use it.

With File Mapping you don't get a handle to use with other APIs, you get a pointer to the raw data in memory. And Windows is the one that has to worry about what part of the file to copy to memory and so on.

The unique disadvantage of this system is that you can't change the size of the file while it's filemapped. With normal file handling functions, you can execute a WriteFile at the EOF and the file will grow. But you can't do that with filemapping. You have to unmap the file, and change the size with any known method (for example, SetEndOfFile).

But the easiness of use and the speed increment is so big that you don't need to use the normal file handling functions any more. Wrap it in a class, and use files as memory!

Kyle Granger
April 10th, 2005, 05:58 AM
Obin,
I used File Mapping on the SETI@home project, where we wanted to share a large chunk of memory between two processes. I am sure you don't need it for your capture program.
I am using overlapped file IO, and it works fine.
One thing I noticed, is that your capture is taking 21.6% of your time. This is 0% for me. This is just getting the data from the FrameGrabber card to system memory. You should not be paying ANYTHING for that. But it may be another unfixable problem with the Epix card.
Before doing a 1/8 resolution draw routine, I would first thin out your drawing by skipping one or two frames (giving you display of 12 or 8 FPS).

> Packing the data to 10bit: 6.8ms
> Converting data for display: 2.5ms
> Displaying data: 3.6ms

You want to convert your RAW data (16-bit) for the display, not the already-10-bit-packed data.

Kyle Granger
April 10th, 2005, 06:01 AM
I have tried doing exactly what you are doing on my system: SI-3300, 1920x1080, 61.44 MHZ clock, 24.0512 fps.

Here are the CPU usage stats I get:

1) Capture and display: 16-18%.
2) Capture, display and writing: 40-44%. This is 72MB/sec, packed 12-bit. No dropped frames, I just got 2.327 GB file after recording 745 frames, 31 seconds.
3) Capture (24fps) and display (12fps): 9%.
4) Capture only: 0.25%, est. (somewhere between 0 and 1%)

Please also bear in mind that my system is slower than yours: 1.7 GHz Xeon, 3-4 year old NVidia Quadro 2 with 32 MB, AGP 4x.

Kyle Granger
April 10th, 2005, 06:31 AM
You should consider perhaps the Pleora iPORT GigeLink. My software only works with that interface -- it can't help you out with the Epix card.

Maybe Steve can give you a good price on it! ;-)

Just a suggestion.

Longer term, especially with something like the Altasens SI-1920, you may want a frame grabber without the capture overhead, and that handles 12-bit packed data, too.

Obin Olson
April 10th, 2005, 09:15 AM
how is your rolling shutter at 61mhz Kyle....

Kyle Granger
April 10th, 2005, 10:47 AM
> how is your rolling shutter at 61mhz Kyle....
I don't know -- the scene was static. The vertical blanking was only 28 lines, so the scan duration of the active lines is 1080 / 1108 * 41.66 ms.
I use faster "shutter" times with the SI-1280.

Kyle Granger
April 10th, 2005, 11:10 AM
I think you want your scantime in the range of 1/48 - 1/60 of a second.
What kind of artifacts you see is heavily dependent on what is going on in the shot.
In general, I would not use something like 1/24.62 sec. as in my little test. But for a talking head, it could work.

Obin Olson
April 10th, 2005, 01:09 PM
I have found less then 65mhz sucks and things "lean" in the image

Kyle Granger
April 10th, 2005, 02:11 PM
Well, you're stuck between a rock and a hard place. As you increase the clock rate, the pixel quality decreases.

You can always just grab less pixels:
1) 1920x817 (2.35:1)
2) 1600x900
3) 1280x720
etc...

This will cut you scan time (and RS) significantly.

Obin Olson
April 10th, 2005, 06:19 PM
see Kyle that is the GOOD thing about the Epix card...I can crank it up as far as I want and still get frames every 41.6ms or 24fps....the framegrabber will just sit spitting out frames REALLY fast but that does NOT mean you need to pick them up...see?

Wayne Morellini
April 10th, 2005, 10:57 PM
<<<-- Originally posted by Obin Olson : ok does anyone know if this would help us?

File Mapping: What Why When?


File Mapping is an easier and faster way to access files, once you know how to do it. It's one of the capabilities of Windows that is so good, you can't understand why so few use it. -->

Probably because many programmers do not think there is anything past normal programming. Real time embedded programming is a different kettle of fish, it is the real game in programming and normal programming style doesn't help that much. But, then again, remapping might be unsuitable for normal applications. I am unfamiliar with the term file mapping, I probably use a different method on my OS design.

Pick the average commercial desktop, non games, program (that is not hardware co-processing intensive (like 3D, media players etc) and you might find that it could be sped up ten times and be a tenth of the size, if it was done properly, but this is a tall order for most programmers.

I knew, for me to program a capture app that operates at max efficiency (90+%), in 3 weeks. I would have to spend upto 6 months researching, brushing up my embedded programming knowledge, and designing the best solution, before starting. To take the average programmer (that can do it) up, it might take a year of practice.

P.S. How long does the re-mapping process take?

------------------------------

When I hit reply I thought you were asking for help with the questions above, but looks like you were not. At the risk of incurring the wrath of Odin ;) , I say. You are on the right track, you are looking for the correct things now. You just have to do the research to find out the best methods. Asking here will only get you my half knowledge and whatever Kyle can throw in over that. Going to the resources I suggested (in times past) and asking on more professional forums to do with Windows Embedded Realtime would also help get to the rest of the knowledge.


<<<-- Originally posted by Obin Olson : see Kyle that is the GOOD thing about the Epix card...I can crank it up as far as I want and still get frames every 41.6ms or 24fps....the framegrabber will just sit spitting out frames REALLY fast but that does NOT mean you need to pick them up...see? -->>>

I thought the Epix did not have a buffer, which means that it is using main memory to buffer, then you are picking it up from main memory, packing saving to main memory and saving to disk. If it does have a buffer that has space for two frames, then that is excellent. Otherwise, this might explain why capture is so high (I thought it was supposed to be 6% some time ago?). In the unlikely event that the drivers (or MB chipset drivers/bios) are not written to be the most efficient at this process, it could be really stuffing up main memory access in competition with the program. There are two options: 1. Determine it's timing, and work around it's timing. or 2. do it yourself, and establish the best timing for the system. 3. Figure out which is the fastest. 4. Write a test program to determine the best combination of methods and timings on any MB/HDD/System and use relevant methods and timings on a system by system basis. This program can run at installation/hardware change on a system to get best performance. I think ATI uses similar testing method when installing (the screen goes blank) their graphic cards. You might like to leave doing such a program until your finished.

Kyle Granger
April 11th, 2005, 02:43 AM
> ...I can crank it up as far as I want and still get frames

Yes, I know what you are saying.

There is a limit for the camera, though. The noise etc. in the pixels at some point become unusable, the CMOS pixels don't reset fast enough. I think you are already close to the limit at 65 MHz

The Epix card probably maxes out at 85 MHz.

> I thought the Epix did not have a buffer,

Yeah, I think Wayne is right here. If you're on a 64/bit/64MHz PCI slot, then burstiness is not an issue (528MB/sec max bandwidth).

misc. note:
With the GigelLink, I have a maximum bandwidth of about 120MB/sec (the limit of gigabit Ethernet). But there is a 16MB buffer, so the burstiness can be smoothed out.

With the SI-1300, I also run the clock high, and add lots of vertical blanking, all with the aim of reducing rolling shutter. But often with a resolution of 1280x692.

Wayne Morellini
April 11th, 2005, 04:11 AM
Kyle (a good chunk of this is for other people readers to read, thus some of the explanation)

Because they are trying to file so close to the limit of the drives, burstiness on the drive save canbe a problem, inducing misses and waiting to re-access the track. But as they are already getting 80MB/s out of a maximum of 140MB/s it may not be such a problem in their system.

But like traffic on a freeway, a little bit of a problem can ripple through and cause greater problems somewhere else (how many control points some of these programmers must use, must be staggering). But simply dumping the information from the control points to the display can upset everything, so best to build a log data structure in memory, oner page at a time (kept in cache). But ti shouldn't be that drastic here. Obin, just realise that even testing can effect your results, so have a version without it to compare if things look a bit strange.

I think burstiness of memory access can also have an affect. Let's assume that the driver writer isn't experienced and the driver is writing out data in dribs and drabs or some other non optimised size. That would (on a system like this with high memory traffic from competing parallel sources) stuff competing processes memory access. Memory is optimised for writing memory in chunks, and the Rambus memory, on various Intel boards, much more so. So it is best to write at least a memory page at a time, and probably bigger (the details are not in my head at the moment). If they write smaller than a page, the setup overhead for the write access goes up, writing a few words of memory at a time, can make it go through the roof (never to be seen again sort of hight) as other processes can steal memory away to access a separate page. Writing a string of pages at a time would reduce the overhead a little more. Now lets hypothetically think that the driver writer does not block other processes from stealing memory control away during page writes. So now you have odd sized writes competing with each other, with massive overheads.

Running a program from main memory instead of cache, is many times slower. Having a loop that crosses the page boundary produces overhead. This is why I say keep the code in the cache and leave main memory for the card, chipset, graphics and disk to manage among themselves (not forgetting that you need to pack and preview from it too). Because these devices should be programmed to write in optimised chunks, this will reduce overheads in main memory. You have three processes putting in and out of memory in parallel at one time. Although the data-rate might be a fraction (under 600MB/s) of the available to main memory, having memory conflicts will add significant overheads. There is enough room for many of these things to be done in parallel, in the background, and hardly effect the processor load. The trick is controlling and keeping the processor out of the way (working around each other).

Now all this can lead to burstiness in memory, that will translate to burstiness in execution and drive writes.

Leon Nox
April 14th, 2005, 03:50 AM
hi i have been asking a lot of cmos and most of them told me that i have to use a camera link interface (it seems that there is just one cameralink interface for laptop , but anyway there are no so fast hard disk to write on...so i cannot use a laptop)...i think that the tranfer rate for 1920*1080@25 10 bit is about 192MB/s isn't it?...so i started to look for a mini-micro motherboard ...and i found some of them really interesting ...the real problem is where to record...if i use a sta sata i've got 150MB/s ,but this is just the optimal...so i should implement a raid configuration with at least 4 SATA to reach 200MB/s...but in this way i have to take with myself...thimk about the weight and the volume...any suggestion?

Kyle Granger
April 14th, 2005, 04:39 AM
1920x1080 @ 24p, 12-bit = 72MB/sec
@25p = 75MB/sec

Kyle Granger
April 14th, 2005, 05:09 AM
Leon,

I record 1920x1080 24p with no problems, with 2-port 3ware RAID 8006 controller, and two Maxtor SATA drives. My max bandwidth is 114MB/sec