![]() |
Don't think it's MB/sec Wayne
|
I said bits per second.
|
BTW Obin,
beware that you can't change the DNxHD codec in any way for redistribution, at least not without AVID's approval and licensing fees. In other words, this isn't "open source" as in GPL, BSD, etc. |
Kyle,
Yes, it is a pain. I prefer the way that traditional Internet newsgroups work. It would be good if the board could do the same the same in the editing window using ">" for indentation of quotes. But replace the edit view ">" indentations in the thread view, with box indentations around each subsequent quote level with the present reply with no box. I am sure I have seen this done on this software on other boards. That thread, I don't know what it was about, apart from another project like this. It was mostly savaged (edited) by the time I got there, with very very little left about the actual project, we suspect it was just a troll that has been sub-sequentially deleted. They should really make a general discussion forum called "Arguments" for each board and Usenet group, so all the Trolls and flamers can get together. That was a valid suggestion Rob. Guys, Boy this thread has slowed down, I'm glad. I looked over the thread history, and do you realise we laid down the over 50% of this thread in the first three months. |
It will be an "option" for export Jason ;) if you have the codec installed then your golden ;)
|
ok..so we are doing well with the bitpacking(thanks Kyle) we have a solid 33fps 1080p 12bit at 88MB/sec..things seem to be on a good track again and we are looking at getting a working test version with display and save today..
|
...I am keeping my fingers crossed! our overhead for bitpacking and save is only 4-13% cpu..this is good! as we will have leftover CPu for other things...display is 60% cpu...
|
I got some updates but it's too late to test as I am at hom now..I will go in the morning and see what we have! my programmer says the save and display is working well on his system...now I gotta test it on the Intel mobile board/chip here....
|
<<<-- Originally posted by Obin Olson : ...I am keeping my fingers crossed! our overhead for bitpacking and save is only 4-13% cpu..this is good! as we will have leftover CPU for other things...display is 60% cpu... -->>>
I thought the display was being done by the GPU, if so then display should be close to 4% also. This leaves me to believe, that the onboard GPU you are using does not support all the GPU features you are using, and is emulating the unsupported instructions in software. My experience with Direct X games is that when you switch from the custom made software render to the Direct X only software render, you get great slow downs (I forget but at least half, maybe 4+ times). I think Directx is good when it addresses actually hardware, but poor when it has to emulate it. So maybe similar things are happening here. The best bet is to have a profiling program that looks to see what features are available in hardware that can't be emulated quicker on the local processors (some hardware is that slow compared to high speed processor with speed left over going to waste). Have dynamic code that uses custom written hi-speed portions to emulate the missing hardware instead of direct X. This may also be undoable with the GPU programming system you have. You might have to re-arrange the code so most of the portions that are compatible are together and separate from most of the emulated bits. One thing I think is happening in direct x is that their dynamic code is not written for speed, and is getting context hits through the different abstraction layers they are using. So the methods used to jump between different code portions cane also greatly slow you down. If you can get this down to a low percentage then you will have enough to run lossless compression or very quiet, low speed processor on it. Happy Easter |
Is the Altasens CMOS avilable already? If no, when will it become available?
|
Radek,
The short answer is yes. Shipping now. In the SI-1920HD camera. The longer answer is that we are still trying to bridge the gap between a working piece of hardware and a cinematography tool. Please contact me off-list for specific sales information. Steve |
I've been following these homemade HD threads for months and have been trying to decide what my best option is. I've been thinking about the SI-3300 for a while, but I am unsure as to whether it is in my price range with whatever other expenses it will require. I'm someone who would otherwise be buying a prosumer DV or HD camera but cant stand the limitations of the color sampling and compression of DV and HDV formats.
Do cameras (SI-3300 and 1920) from Silicon Imaging have built in GigE in the unit or do they use a cameralink to GigE adapter (is that then added to the price)? Also, it sounds like there is no tried and true method of capturing video... It would be nice if someone could just condense all the best (and cheapest) options for each required piece of hardware/software necessary for getting an HD image onto HDD (camera, grabber, software). Sounds like streampix (also expensive) and Xcap have their problems... but is such software adequate for adjusting, previewing, and capturing uncompressed video streams to HDD. Is a special framegrabber required for GigE cameras? or is a standard Gigabit ethernet connection all you need. Also, It is still unclear to me if these cameras can do 24 or 48fps while maintaining 1/48th shutter... Maybe I just need to look harder for these answers, but if anyone can help me, thanks. I trying to learn as much as i can about this. Anyone know anything about the Epix "silicon video" packages that have micron cmos sensor cameras bundled with cable, grabber and software? 1280x1024 at 30p over GigE with all components needed for capture at only $995 seems like a great deal. http://www.epixinc.com/products/sv9m001.htm http://www.epixinc.com/products/sv9m001.htm |
the HuGE issue is SOFTWARE...and that is my focus..and as of now after 6 months we are VERY close but not done....
from my programmer: I've just complete a new suite of test on a new system and I am still running into the same problem even though the technology and approach is totally different. When calling the routine across threads it seems that there is a pretty big delay involved that ends up being close to saving time itself. I'm building a fourth test case right now to see if I can find a way around it as I have still 3-4 different things to try. can anyone on here HELP with this issue? it seems that call times across threads is our last problem to a working software...we are at the LIMIT of the CPU at this point and MUST increase the performance or we will not have a working system |
I can tell you right now that the 1280 images are not going to be a clear as you may want...it's a single cmos camera not 3CMOS system...makes a BIG change in the resolution of your image( after all your looking at RGGB in the space of 4 pixels instead of 4 pixels that EACH show RGB) I would try and go with the 1080P if you can...Kyle on the board has some software that will record the raw images from a GigaBit camera - the 3300rgb...THe 3300rgb is a GREAT camera...if you can capture your data from it!!
|
<<<-- Originally posted by Obin Olson :
can anyone on here HELP with this issue? it seems that call times across threads is our last problem to a working software...we are at the LIMIT of the CPU at this point and MUST increase the performance or we will not have a working system -->>> Hunt around for a real-time embedded system/kernel for Windows XP (used to be popular for previous versions of Windows). I do not know exactly what you are meaning, but I can guess. Yes, getting a routine to work on another thread should introduce a big latency problem. There is a large latency hit in subroutine calls in modern computers and memory systems (for other readers here, the further you get away from primary cache to storage the longer, of course). When you call these routines through windows/abstraction layers they add a major hit too. I imagine waking up another thread to do something also has an significant hit. Just calculate all the various hits and find the shortest path. Windows is not so good at these things, I don't know if you can really reduce it (apart from a embedded replacement kernel) apart from doing things differently. Also the programming technique you use to cause a thread to wait for something to happen, can waste lots of cycles, but also putting something to sleep and waking it up can take a lot, but there are ways to do it with less latency and less cycles. I am used to thinking of a lot of latency in terms of less than one cycle, all this PC stuff is so primitive. Rob.L should know how to do these things on a PC, it would be good to ask him. Your CPU overhead should be a lot less then what it is at the moment for view, what is Kyle doing on his system? Thanks for your continuing efforts. Rob.S, when are you coming back with yours. He might have some ideas that could help? |
I got an easy and foolproof solution.
It is called : LINUX |
Yes, I think Mac OSX as well, but how do we get this Windows version completed.
I have news that a dual core Mac Mini will be out mid year season, that would be a good basis for the Mac version . Have a technical update on the technical thread with other new Macs, batteries, JVC HD lower compression camera, interesting stuff: http://www.dvinfo.net/conf/showthrea...872#post294872 By the way, where is Ronald Biese, he was interested in Linux version. |
Juan how hard to convert our software to LINUX? we are still looking at other solutions..I am going to overclock the board a bit and see if that will be enough power
|
so anyone have some answers to our questions? we are a bit stuck at this point... I understand Linux but that is a last resort as we have everything working but this last mile...
|
Hi Obin,
OK, you seem to have a problem. Before jumping to another OS or the Mac or overclocking your board, it may be wise to first determine WHAT the problem is. It SOUNDS like there is a thread interaction problem (waiting for mutex, or some signal, not sleeping?), but you should first exclude a problem with one thread. Wayne is right about problems with subroutine calls and Windows programming in general, but I believe the problem is PROBABLY much less exotic. Also, if you assume the problem exists in your code, you then have a chance to fix it. These things for me have always turned out to be bugs. And bugs can be fixed. For what it is worth, my application is extremely multi-threaded, and I have not experienced these problems. You said earlier that the problem had to do with "When calling the routine across threads", but I don't really know what "across threads" means. 1) Profile each of your threads individually with a canonical test case (1920x1080x24p, or some standard config file) I assume there are three main threads or tasks: Capture, Display, and Writing. Get a rough CPU usage for each. Display can just read the same buffer, 24 times a second. Same with writing. If the CPU usage is reasonable (say, 3% for Capture, %17 for Display, and 24% for Writing), then there is some interaction problem. 2) Try thread interactions, doing just Capture and Display. Does that work with the same CPU usage as Capture only PLUS Display only? 3) Try Capture + Writing. 4) Try Display + Writing. 5) Look at how the threads get fed. Are they properly sleeping? If they are waiting for a signal, can you test just by polling and sleeping 1-10 ms? 6) Is one thread running at too high a priority? If you are using DirectX for the interface to the GPU, you will have to rewrite that for Linux. |
If a thread is just calling a routine, there should be no additional latency. It should be as fast as if it were called by WinMain(), which is just a thread too.
BTW, Linux is not simple and nor fool proof (but a damned good OS). It is possible to write inefficient and buggy code on any platform, even on the Mac. ;-) |
If your display is chewing up 60% of the CPU (this is also true when not writing?), you may want to skip every other frame on the display and bring it down to 30%.
60% is way high. |
<<<-- Originally posted by Kyle Granger : If a thread is just calling a routine, there should be no additional latency. It should be as fast as if it were called by WinMain(), which is just a thread too.
-->>> Obin, what is in your inner loops? If you are calling routines each time you get a pixel you will be wasting a lot lot of time on latency. One way to get around this is too flatten out the code (or second, simple at this stage, choice compile in line) where you eliminate as many subroutines as possible by integrating them into one routine, in the inner loops. If you have profiled your software properly, you will know which loops the programs spends 90% of it's execution time in. It helps, a lot, to define the work to be done on each pixel at once (i.e capture has it's own speed/timing separate from storage and can't be integrated together conveniently). We would probably be surprised at the number of developments that don't model this behaviour properly, so it is worth a rescan. My memory has gone again, so I have forgotten the third and most vital thing, will try to update if I remember again. I have been involved with Forth, and am aware of the large (unseen) latency problems in windows PC systems. In the old days hits of 1000's of percent happened, I doubt much of that happens in XP, but from using XP it looks far from ideal. So, 50% of your execution cycles could be slipping away, and that is just the ones you can prevent (how come you think the Mac always does so well). I think it is good to profile the weaknesses of your OS/PC, and work around them. |
<<<-- Originally posted by Kyle Granger : If your display is chewing up 60% of the CPU (this is also true when not writing?), you may want to skip every other frame on the display and bring it down to 30%.
60% is way high. -->>> I forget where ever Obin is using the 3.4Ghz P4, or the 2Ghz PM, but wouldn't 30% be high, even for a software solution? I know Obin is using GPU programming for display, so I would expect it should be closer to 6%. What I said before about the slow software emulation of missing GPU functions problem, I would still suggest (and too still keep in mind those latency problems). Obin: There was another thing I forgot (and those sites I suggested about how to configure a machine for best performance would help) writing the inner loops code so you can force it to stay in the cache. If the code strays outside the cache, a page has to be read in, and another potentially read out, only for the process to be reversed when it strays somewhere else, that could easily consume 30% (and making a call to a foreign routine who's setup in the cache you have no control over might just do that, which could also be a problem with GPU software emulation). A page is big, thats a lot of cycles, even a subroutine call can do a lot of cycles before you hit new code. Subroutine oriented languages tend to have a lot of problems in modern PC machines (and others) partly because their hi-speed memories are not made for low latency non sequential instruction flow out of cache. I don't know how C compilers are in general nowadays, but the code they produced used to perform pretty poorly compared to the Intel compiler, and I think MS eventually improved theirs (don't know if it was to the same level as Intel). But you can get a missive boost switching to the best compiler back in those days. Worth finding out about. As long as you have the active model of how the machine actually physically works (and the OS) in your head + the experience, you can see lots of issues you can never see in the code itself. I only have the physical machine sufficiently mapped in my mind, so I can make good guesses, I suggest buying advanced books on real-time (with machine code as well as C) games programming if you really get stuck. I am going to take a hunch, knowing how lousy PC's can get, that the performance difference between unrefined code and the most refined code might be ten times on a Windows PC, so if you have improved your performance by ten times since you started coding, you are close to the maximum you can get. Does that sound possible Kyle? |
> so if you have improved your performance by ten times
> since you started coding, you are close to the maximum > you can get. Does that sound possible Kyle? I suppose a factor of ten can well be possible, but honestly, I haven't thought about it too much. Obin, A few more suggestions, just to get your application working 1) Display one out or three images. This will give you 8 frames/sec, and should bring your graphics CPU usage down to 20% (from 60%). This should let you work in peace. 2) Profile where your Display CPU usage is going. Is it in the processing of your RAW 16-bit data, or is it sending the data to the GPU and displaying it? These are clearly separate tasks, easy enough to comment out to profile separately. 3) Try displaying only one of the primaries. I.e., for every 2x2 square of Bayer pixels, display only one of the green pixels as a monochrome (luma) bitmap. 4) Consider using OpenGL for the screen drawing. Sending a bitmap to the GPU and displaying it is only a few lines of code. There is a lot of introductory code available on the net. It should not be complicated at all. Good luck! |
thank you Kyle..I am working on all your ideas
|
we are doing a bunch of re-coding now with the software to streamline things a bit....and I am going to get a new graphics card to see if that helps the cpu%% overhead with the display..looks like the older gpu card I have may be spitting the tasks back out to the cpu, giving us the very high display cpu%
|
Direct X cards.
If you get a new graphic card to measure the results, get one that is closet to what your code and GPU shader package depends on. This will either be the latest mid to high end card from Nvidia or ATI. Nvidia has the most advanced shaders in the cards over the last year or so, just not often the fastest at the functions the games have been using a number of times. ATI either has similar capability in their latest top cards now, or will by the time the xbox2 comes out (DX10 compliant).
Either Nvidia is a clear winner for you (some of their lower end cards have same shader functions) or the functions you are all supported on ATI. It is an compromise as to which, as ATI may have low cost one by end of the year (or maybe only in the xbox2) with Direct X 10. DirectX 10 would definitely outclasses everything out for shader programming. Either DX Ten, or 11, you can whack most of the image code directly on the card only dumping results to the computer PC to be saved (as it is to support full most program flow capabilities with integrated ). Some of us want to implement new true 3D raytracing software that will make ordinary 3D look second rate, that is difficult on a PC. Go to Tomshardware.com, www.digit-life.com, or extremetech.com, to find articles on the current situation with cards and direct X. I don't know about latest Intel GPU, but most integrated GPU's area compromise and support limited hardware functionality. ATI or Nvidia, might have near desktop functionality on the GPU (but have problems with shared memory nonlinearly stealing memory time (making memory loads jump from place to place, which is the worst things to do, unless managed). But some integrated chips have their own memory. As long as it is big enough for you, the programs and OS to occupy at once, then yo will get best efficiency. What card were you using, Obin? You should be able to map the low level functionality of a card to the instructions/functions you use, from it's formal low level specifications on it's web site, probably listed in a whitepaper type PDF document (or email their development section). They also support the same functionality in different ways, but apart from Nvidia and ATI (maybe slower Matrox and Wildcats) there is no other cards to look at in terms of completness and performance. You can get around many issues with integrated as well, by finding out what it does do, separating the the GPU shader supported execution into one batch and the stuff that has to be done in software into your own customised software routines (by passing DirectX) (as much as is feasible for performance). By a process of program code factoring. Have a good day. Wayne. |
we are now testing a bypassed method of image calculations without GPU support to see what the results will be...looks like our current setup has the gpu choking and shooting all the work BACK to the cpu...providing our 50-60% cpu numbers just for preview!
I will know more in the morning...would it be to much to ask for some PROGRESS!!? ;) |
Good move, how many percent are the bypassed routines doing?
I have news on he next ATI chip with new shaders, mid year, the low end or low powered versions might be end of the year (I imagine something like this may come out on main-boards). Whatever solution you go for, try to get involved with that vendors official development section, they should have answers to many of the questions, hopefully in low cost support documents (Intel/AMD and Microsoft are also good sources for development information (I think). http://www.gamedev.net/reference/ have good resources too, and igda.org, and gamasutra are also spots that may help. I should be posting links (if God is willing) about new silent coolers, storage etc in the technical thread in the next day or so. I should also be posting technical design tips, which I haven't, in times past because so much stuff is an potential source of patentable income, but some stuff not or less so. |
well well..I get 36% cpu load now with the image resize being done by the cpu and then feeding that to the gpu...this is working very well but we still get choked up with the save AND display at the same time...
I have a profiling test app now from my programmer that I will try. It will tell us what the HECK is going on in my dfi system here...he says things are working on his system but not mine..Kyle any ideas why we would have display refresh issues when we start saving raw data on the disks. Display AND packing only take about 45-50% cpu and I KNOW saving will not take 50%!!! it's like the thing has timing issues..we did try your suggestions from before..anymore ideas pop into your head..we are still using DIrectDraw for display AFTER pixel packing and resize is done with the cpu |
<<<-- Originally posted by Obin Olson : well well..I get 36% cpu load now with the image resize being done by the cpu and then feeding that to the gpu...this is working very well but we still get choked up with the save AND display at the same time... -->>>
I am not Kyle, but must say, that is more like it. Strange, image resize should be a basic function on most GPU, it should not effect. Maybe the way resize done. But I assume you talk about resize from one screen resolution to another. Cards should have hardware (nowadays) that auto displays an image at a different resolution than what it's stored at, virtually for free, no resize needed. Still are we talking about a resize after the GPU is finished versus a resize before he resolution is finished. If the GPU is stalling then then resizing pre or post would explain the differences. The rest of the stuff (what did you mean by "dfi system" anyway). It most likely is that memory access timing thing I mentioned last year. To many things competing with memory at the same time stalling the memory pipeline causing long delays in access to main memory (keep in cache and sequentialise everything that does not need to be parallel, then adjust all that to work around each other). |
Obin,
Wayne is absolutely correct when he says the GPU should be doing the resize: you get that for free. What is the size of the bitmap you are creating? How are you doing the Bayer interpolation? |
about 960x540 or 1/4 the resolution of the 1080 image..this is what we do so that the thing will fit on a small screen 1024x768
|
we take the RGGB and make it one pixel instead of 4..this is what the GPU was choking on and spitting back to the cpu
|
Obin,
> we take the RGGB and make it one pixel That sounds reasonable. The CPU should create the one pixel from the four Bayer pixels, and then you send 960x540 RGB to the GPU. It Should Just Work. |
it works well with pixel packing and resize the cpu % is at 50%
|
Encouragement
Go Obin! Go Obin!
This is addicting. I'm not even a programmer, and I've gotten to where I'm checking this forum 4, 5 times a day. You guys are freaks of nature. I wish I could be a freak. :) Enthusiastically wishing you the greatest success... -Jonathon |
thx john! if I keep at it I have a feeling I will get it done.. the more I fight the more I am willing to keep at it till I get some RESULTS!
I really want RESULTS from our project after shooting the VariCam..such a dirty image when you want to color grade it! |
I hate message boards. really. such a TW
so on a better note I have the test results from our profiler and am awaiting a reply from my programmer...hope it will be a good one ;) once again I had the chance to shoot a 30sec spot on the Panasonic VariCam...I sure am glad I have not been fooled into buying that thing...try and do ANYTHING to the images in post..sooooo much noise!! SOOOOO much compression..while some think the DVCPROHD codec is great I am not one of them at all unless you never touch your images in post(I never shoot without a lot of post color grading)..our camera will beat the crap outa the VariCam images!! VariCam would be great for DOC work and ENG/EFP production that does not need to be fooled with in post |
All times are GMT -6. The time now is 06:30 PM. |
DV Info Net -- Real Names, Real People, Real Info!
1998-2025 The Digital Video Information Network