4:4:4 10bit single CMOS HD project - Page 181 at DVinfo.net
DV Info Net

Go Back   DV Info Net > Special Interest Areas > Alternative Imaging Methods
Register FAQ Today's Posts Buyer's Guides

Alternative Imaging Methods
DV Info Net is the birthplace of all 35mm adapters.

Closed Thread
 
Thread Tools Search this Thread
Old April 1st, 2005, 01:29 PM   #2701
Regular Crew
 
Join Date: Feb 2005
Location: .
Posts: 52
If a thread is just calling a routine, there should be no additional latency. It should be as fast as if it were called by WinMain(), which is just a thread too.

BTW, Linux is not simple and nor fool proof (but a damned good OS). It is possible to write inefficient and buggy code on any platform, even on the Mac. ;-)
Kyle Granger is offline  
Old April 1st, 2005, 01:36 PM   #2702
Regular Crew
 
Join Date: Feb 2005
Location: .
Posts: 52
If your display is chewing up 60% of the CPU (this is also true when not writing?), you may want to skip every other frame on the display and bring it down to 30%.

60% is way high.
Kyle Granger is offline  
Old April 1st, 2005, 09:39 PM   #2703
Inner Circle
 
Join Date: May 2003
Location: Australia
Posts: 2,762
<<<-- Originally posted by Kyle Granger : If a thread is just calling a routine, there should be no additional latency. It should be as fast as if it were called by WinMain(), which is just a thread too.
-->>>

Obin, what is in your inner loops? If you are calling routines each time you get a pixel you will be wasting a lot lot of time on latency. One way to get around this is too flatten out the code (or second, simple at this stage, choice compile in line) where you eliminate as many subroutines as possible by integrating them into one routine, in the inner loops. If you have profiled your software properly, you will know which loops the programs spends 90% of it's execution time in. It helps, a lot, to define the work to be done on each pixel at once (i.e capture has it's own speed/timing separate from storage and can't be integrated together conveniently). We would probably be surprised at the number of developments that don't model this behaviour properly, so it is worth a rescan. My memory has gone again, so I have forgotten the third and most vital thing, will try to update if I remember again.

I have been involved with Forth, and am aware of the large (unseen) latency problems in windows PC systems. In the old days hits of 1000's of percent happened, I doubt much of that happens in XP, but from using XP it looks far from ideal. So, 50% of your execution cycles could be slipping away, and that is just the ones you can prevent (how come you think the Mac always does so well).

I think it is good to profile the weaknesses of your OS/PC, and work around them.
Wayne Morellini is offline  
Old April 1st, 2005, 10:13 PM   #2704
Inner Circle
 
Join Date: May 2003
Location: Australia
Posts: 2,762
<<<-- Originally posted by Kyle Granger : If your display is chewing up 60% of the CPU (this is also true when not writing?), you may want to skip every other frame on the display and bring it down to 30%.

60% is way high. -->>>

I forget where ever Obin is using the 3.4Ghz P4, or the 2Ghz PM, but wouldn't 30% be high, even for a software solution?

I know Obin is using GPU programming for display, so I would expect it should be closer to 6%. What I said before about the slow software emulation of missing GPU functions problem, I would still suggest (and too still keep in mind those latency problems).

Obin:

There was another thing I forgot (and those sites I suggested about how to configure a machine for best performance would help) writing the inner loops code so you can force it to stay in the cache. If the code strays outside the cache, a page has to be read in, and another potentially read out, only for the process to be reversed when it strays somewhere else, that could easily consume 30% (and making a call to a foreign routine who's setup in the cache you have no control over might just do that, which could also be a problem with GPU software emulation). A page is big, thats a lot of cycles, even a subroutine call can do a lot of cycles before you hit new code. Subroutine oriented languages tend to have a lot of problems in modern PC machines (and others) partly because their hi-speed memories are not made for low latency non sequential instruction flow out of cache.

I don't know how C compilers are in general nowadays, but the code they produced used to perform pretty poorly compared to the Intel compiler, and I think MS eventually improved theirs (don't know if it was to the same level as Intel). But you can get a missive boost switching to the best compiler back in those days. Worth finding out about.

As long as you have the active model of how the machine actually physically works (and the OS) in your head + the experience, you can see lots of issues you can never see in the code itself. I only have the physical machine sufficiently mapped in my mind, so I can make good guesses, I suggest buying advanced books on real-time (with machine code as well as C) games programming if you really get stuck.

I am going to take a hunch, knowing how lousy PC's can get, that the performance difference between unrefined code and the most refined code might be ten times on a Windows PC, so if you have improved your performance by ten times since you started coding, you are close to the maximum you can get. Does that sound possible Kyle?
Wayne Morellini is offline  
Old April 2nd, 2005, 05:32 AM   #2705
Regular Crew
 
Join Date: Feb 2005
Location: .
Posts: 52
> so if you have improved your performance by ten times
> since you started coding, you are close to the maximum
> you can get. Does that sound possible Kyle?

I suppose a factor of ten can well be possible, but honestly, I haven't thought about it too much.

Obin,

A few more suggestions, just to get your application working

1) Display one out or three images. This will give you 8 frames/sec, and should bring your graphics CPU usage down to 20% (from 60%). This should let you work in peace.

2) Profile where your Display CPU usage is going. Is it in the processing of your RAW 16-bit data, or is it sending the data to the GPU and displaying it? These are clearly separate tasks, easy enough to comment out to profile separately.

3) Try displaying only one of the primaries. I.e., for every 2x2 square of Bayer pixels, display only one of the green pixels as a monochrome (luma) bitmap.

4) Consider using OpenGL for the screen drawing. Sending a bitmap to the GPU and displaying it is only a few lines of code. There is a lot of introductory code available on the net. It should not be complicated at all.

Good luck!
Kyle Granger is offline  
Old April 3rd, 2005, 12:35 AM   #2706
Trustee
 
Join Date: Jan 2003
Location: Wilmington NC
Posts: 1,414
thank you Kyle..I am working on all your ideas
Obin Olson is offline  
Old April 3rd, 2005, 11:51 AM   #2707
Trustee
 
Join Date: Jan 2003
Location: Wilmington NC
Posts: 1,414
we are doing a bunch of re-coding now with the software to streamline things a bit....and I am going to get a new graphics card to see if that helps the cpu%% overhead with the display..looks like the older gpu card I have may be spitting the tasks back out to the cpu, giving us the very high display cpu%
Obin Olson is offline  
Old April 3rd, 2005, 11:44 PM   #2708
Inner Circle
 
Join Date: May 2003
Location: Australia
Posts: 2,762
Direct X cards.

If you get a new graphic card to measure the results, get one that is closet to what your code and GPU shader package depends on. This will either be the latest mid to high end card from Nvidia or ATI. Nvidia has the most advanced shaders in the cards over the last year or so, just not often the fastest at the functions the games have been using a number of times. ATI either has similar capability in their latest top cards now, or will by the time the xbox2 comes out (DX10 compliant).

Either Nvidia is a clear winner for you (some of their lower end cards have same shader functions) or the functions you are all supported on ATI. It is an compromise as to which, as ATI may have low cost one by end of the year (or maybe only in the xbox2) with Direct X 10. DirectX 10 would definitely outclasses everything out for shader programming. Either DX Ten, or 11, you can whack most of the image code directly on the card only dumping results to the computer PC to be saved (as it is to support full most program flow capabilities with integrated ). Some of us want to implement new true 3D raytracing software that will make ordinary 3D look second rate, that is difficult on a PC.

Go to Tomshardware.com, www.digit-life.com, or extremetech.com, to find articles on the current situation with cards and direct X.

I don't know about latest Intel GPU, but most integrated GPU's area compromise and support limited hardware functionality. ATI or Nvidia, might have near desktop functionality on the GPU (but have problems with shared memory nonlinearly stealing memory time (making memory loads jump from place to place, which is the worst things to do, unless managed). But some integrated chips have their own memory. As long as it is big enough for you, the programs and OS to occupy at once, then yo will get best efficiency.

What card were you using, Obin? You should be able to map the low level functionality of a card to the instructions/functions you use, from it's formal low level specifications on it's web site, probably listed in a whitepaper type PDF document (or email their development section). They also support the same functionality in different ways, but apart from Nvidia and ATI (maybe slower Matrox and Wildcats) there is no other cards to look at in terms of completness and performance.

You can get around many issues with integrated as well, by finding out what it does do, separating the the GPU shader supported execution into one batch and the stuff that has to be done in software into your own customised software routines (by passing DirectX) (as much as is feasible for performance). By a process of program code factoring.

Have a good day.

Wayne.
Wayne Morellini is offline  
Old April 4th, 2005, 08:27 PM   #2709
Trustee
 
Join Date: Jan 2003
Location: Wilmington NC
Posts: 1,414
we are now testing a bypassed method of image calculations without GPU support to see what the results will be...looks like our current setup has the gpu choking and shooting all the work BACK to the cpu...providing our 50-60% cpu numbers just for preview!

I will know more in the morning...would it be to much to ask for some PROGRESS!!? ;)
Obin Olson is offline  
Old April 5th, 2005, 04:25 AM   #2710
Inner Circle
 
Join Date: May 2003
Location: Australia
Posts: 2,762
Good move, how many percent are the bypassed routines doing?

I have news on he next ATI chip with new shaders, mid year, the low end or low powered versions might be end of the year (I imagine something like this may come out on main-boards). Whatever solution you go for, try to get involved with that vendors official development section, they should have answers to many of the questions, hopefully in low cost support documents (Intel/AMD and Microsoft are also good sources for development information (I think). http://www.gamedev.net/reference/ have good resources too, and igda.org, and gamasutra are also spots that may help.

I should be posting links (if God is willing) about new silent coolers, storage etc in the technical thread in the next day or so. I should also be posting technical design tips, which I haven't, in times past because so much stuff is an potential source of patentable income, but some stuff not or less so.
Wayne Morellini is offline  
Old April 5th, 2005, 06:40 PM   #2711
Trustee
 
Join Date: Jan 2003
Location: Wilmington NC
Posts: 1,414
well well..I get 36% cpu load now with the image resize being done by the cpu and then feeding that to the gpu...this is working very well but we still get choked up with the save AND display at the same time...

I have a profiling test app now from my programmer that I will try. It will tell us what the HECK is going on in my dfi system here...he says things are working on his system but not mine..Kyle any ideas why we would have display refresh issues when we start saving raw data on the disks. Display AND packing only take about 45-50% cpu and I KNOW saving will not take 50%!!! it's like the thing has timing issues..we did try your suggestions from before..anymore ideas pop into your head..we are still using DIrectDraw for display AFTER pixel packing and resize is done with the cpu
Obin Olson is offline  
Old April 5th, 2005, 07:44 PM   #2712
Inner Circle
 
Join Date: May 2003
Location: Australia
Posts: 2,762
<<<-- Originally posted by Obin Olson : well well..I get 36% cpu load now with the image resize being done by the cpu and then feeding that to the gpu...this is working very well but we still get choked up with the save AND display at the same time... -->>>

I am not Kyle, but must say, that is more like it. Strange, image resize should be a basic function on most GPU, it should not effect. Maybe the way resize done. But I assume you talk about resize from one screen resolution to another. Cards should have hardware (nowadays) that auto displays an image at a different resolution than what it's stored at, virtually for free, no resize needed. Still are we talking about a resize after the GPU is finished versus a resize before he resolution is finished. If the GPU is stalling then then resizing pre or post would explain the differences.

The rest of the stuff (what did you mean by "dfi system" anyway). It most likely is that memory access timing thing I mentioned last year. To many things competing with memory at the same time stalling the memory pipeline causing long delays in access to main memory (keep in cache and sequentialise everything that does not need to be parallel, then adjust all that to work around each other).
Wayne Morellini is offline  
Old April 6th, 2005, 03:04 AM   #2713
Regular Crew
 
Join Date: Feb 2005
Location: .
Posts: 52
Obin,
Wayne is absolutely correct when he says the GPU should be doing the resize: you get that for free.
What is the size of the bitmap you are creating? How are you doing the Bayer interpolation?
Kyle Granger is offline  
Old April 6th, 2005, 08:26 AM   #2714
Trustee
 
Join Date: Jan 2003
Location: Wilmington NC
Posts: 1,414
about 960x540 or 1/4 the resolution of the 1080 image..this is what we do so that the thing will fit on a small screen 1024x768
Obin Olson is offline  
Old April 6th, 2005, 08:32 AM   #2715
Trustee
 
Join Date: Jan 2003
Location: Wilmington NC
Posts: 1,414
we take the RGGB and make it one pixel instead of 4..this is what the GPU was choking on and spitting back to the cpu
Obin Olson is offline  
Closed Thread

DV Info Net refers all where-to-buy and where-to-rent questions exclusively to these trusted full line dealers and rental houses...

B&H Photo Video
(866) 521-7381
New York, NY USA

Scan Computers Int. Ltd.
+44 0871-472-4747
Bolton, Lancashire UK


DV Info Net also encourages you to support local businesses and buy from an authorized dealer in your neighborhood.
  You are here: DV Info Net > Special Interest Areas > Alternative Imaging Methods


 



All times are GMT -6. The time now is 06:15 AM.


DV Info Net -- Real Names, Real People, Real Info!
1998-2024 The Digital Video Information Network