View Full Version : how fast: Packing < 16 bit pixels into words?
Les Dit August 3rd, 2004, 12:42 AM I have a programming question that maybe some of you can answer: How fast can we expect a high end P4 system with dual channel DDR be expected to bit shift and repack 12 bit data into a few words?
Is this something that MMX or SSE instructions can help with?
I suppose this would happen in cache if it's a scan line at a time.
So, How many megapixels per second, for lack of better metrics?
I don't care if it's not real time, but how fast. I want to move images across a gigE network, weighing packing vs just transferring.
Thanks
-Les
Rob Lohman August 3rd, 2004, 05:21 AM If I understand you correctly this can be done very fast. Especially
since it can be done inline in your buffer (ie, no need to allocate
a second buffer). I have no indications to you how fast, but
probably fast enough to do realtime over a gigE network.
Especially if it where coded in assembly.
12 bits are excellent as well since you can store these as 8 + 4
bits instead of 16 bits. So you will need to process two 16 bit
pixels at a time and pack that into 24 bits (3 bytes), not that
hard to do in C or assembly as long as you know how many
pixels there are in the memory block.
I'm not sure if MMX/SSE could help here, I have no experience
with those. But if someone checks the spec on the instructions
available it shouldn't be too hard, basically in assembly it boils
down to something like this (the following code assumes the
lower 12 bits are to be used in Intel format and is not tested):
MOV ESI, [source buffer]
MOV EDI, ESI
MOV ECX, [number of 16bit pixels]
Start:
; load 2 16 bit pixels, 32 bits
LODSD
MOV BX,AX
SHR EAX, 16
; re-pack them as 24 bits
SHL BX, 4
SHLD AX,BX,4
STOSW
SHR BX, 4
MOV EAX,EBX ;probably faster than AL,BL
STOSB
DEC ECX
LOOP Start
Something like this. Can probably be optimized futher, but this
should basically do what you want pretty fast (it has one 386
instruction in it to speed up some of the packing). This routine
overwrites the buffer it is given transforming every 4 bytes into
3 bytes. So for every n pixels (which must be a multiple of 2!)
you will get a buffer back of (n * 2) - (n / 2) bytes. It might swap
some pixels, but that rarely is a problem if you have enough time
on the other end to de-pack the data which can probably be
done in realtime as well.
I've looked the web a bit regarding MMX and SSE and I don't
think they will help for this. MMX seems to largely operate on
words/double words/quad words which is not what we are
trying to do. SSE seems to be mostly for floating point work
which is also now what you are trying to do.
One last thing. In this case it will introduce another loop over
the data. Whenever possible you should try to integrate packing
into either the writing routine or the reading one so you don't
waste time going over the data again.
Rob Belics August 3rd, 2004, 07:59 AM Hey! A fellow asm coder! Have I seen you on hutch's board or win32asm before?
Rob Lohman August 3rd, 2004, 08:28 AM Hmmm, all Rob's seem to be ASM coders thusfar, hehe. Rob Scott
at least codes as well, think he can do ASM as well.
Nope, never been to any of those boards. It has been a pretty
long time since I've done anything in ASM (at least 5+ years),
but did a lot of low-level stuff in the DOS days and whatnot.
Never really forgot it although I never ventured into the whole
386/protected mode/MMX/SSE stuff etc.
Gives you a great insight into computers / operating systems and
how things works, don't you think?
My programming experience went like this:
(Quick)basic -> assembly -> pascal/delphi -> C(++) -> Visual Basic -> C#
Rob Belics August 3rd, 2004, 08:46 AM I used to design hardware so assembly was part of the job. Begrudgingly learned C, then C++. Looks like I'll be starting a server business so need to get into C#, Java, etc. But I'd rather do it all in assembly.
But programming has taken a back seat for a few years now that I've gotten back into film. So I hesitate to answer MMX/SSE questions since it would only be from a foggy memory.
My foggy memory says SSE can do this 12-bit work on chip but, as I said, I don't recall.
Rob Lohman August 3rd, 2004, 08:50 AM It's like the other way around for me. I'm doing programming as
a job and hopefully will be moving to some film related stuff in
the near future. Too bad I ain't exactly on the right side of the
globe for that.
Les Dit August 3rd, 2004, 11:51 AM Thanks Rob,
I think it would be interesting to see what the ASM output of the Visual C compiler would look like to do the same thing!
A test for their optimizer?
I hear the Intel compiler is the best, but I don't think it's popular.
-Les
Rob Belics August 3rd, 2004, 11:57 AM The output of compilers can be really bizarre to look at, especially Microsofts. Names get mangled and it can be hard to follow the logic flow. Though efficient, it is just hard to follow sometimes.
The optimizers are very good. But in critical timing, it can still be best to hand optimize it.
I just happen to think that, in the programming world, you get arguments about HLL language vs assembly all the time. Just like the film/digital arguments.
Rob Lohman August 4th, 2004, 02:05 PM If you build the function correctly in C I think Microsofts and Intels
compiler will probably closely match to what I've written. They
might even include some more tricks. I've been reading an
assembly optimization guide for Intel processors the other day.
Interesting stuff regarding cache misses etc. etc. Such stuff will
take quite a lot of time if you want to do it correctly (ie do it in
C first time that, do it in assembly, time again and see what can
be futher improved etc. etc.)
Les Dit August 4th, 2004, 02:11 PM Thanks again for the info guys.
On a related note: I just did some network tests between 2 identical P4 3Ghz machines with a tool called iperf.
I'm getting 90 megabytes a second between the two !!
This has no file system overhead, it's just raw data passing between the two, but it looks very good!
-Les
Rob Lohman August 4th, 2004, 02:25 PM Is this true the microsoft drivers and TCP/IP stack? What kind of
network is this exactly?
Les Dit August 4th, 2004, 02:34 PM Microsoft drivers, for the Marvel Yukon chip on the motherboards.
8 port gig E switch.
Nothing fancy!
-Les
Rob Scott August 9th, 2004, 03:56 AM Les, Rob Lohman is correct -- it should be possible to write a very efficient routine in assembler. In the ObscuraCapture app (tm :-) (http://www.obscuracam.com/wiki/static/Capture.html) I've been able to pack the 10-bit data from the SI-1300 camera at over 250 MB/sec.
Les Dit August 9th, 2004, 11:44 AM Thanks Rob, that's fast enough. I'm looking at options for speeding up my film scanner, it has an 8 megapixel camera on it.
Are you guys using GCC ? I was wondering what the interactive debugger is like on that. Most of my code is written by my programmer, but I do mods and add features. I am OK with the MS visual debugger, but MS isn't issuing bug fixes on the C side of that dev system much anymore, so we want to switch to the GCC system. I'm not comfortable with their optimizer and I hear the debugger is command line.
What type of system are you getting 250MB a sec on? dual ddr with 800 Mhz FSB ?
Thanks
-Les
Rob Scott August 9th, 2004, 12:09 PM Les Dit wrote:
Are you guys using GCC?I'm using MS VC++. I started out using MinGW (basically GCC for Windows) but had trouble calling DirectX. There is very nice a graphical IDE for MinGW and it had a decent visual debugger IIRC.
What type of system are you getting 250MB a sec on? dual ddr with 800 Mhz FSB ?No, it's just a basic laptop running an AMD Athlon XP-M 2500. I'm not sure about the memory configuraiton. (And no, I don't have the camera connected to the laptop. I have some code that simulates it.)
Rob Lohman August 10th, 2004, 02:01 AM Les: there have been Service Packs for MS Visual Studio 6 (and
thus MSVC) up till recently. What problems do you have exactly?
Otherwise there is also a new MSVC in the Visual Studio .NET
range of applications (which still generates native code!) which
has already been updated once.
Les Dit August 10th, 2004, 03:12 AM Rob,
My programmer was getting "internal compiler error" s , when he was working on some custom subtitler code for me. He seemed to indicate that the patches are infrequent for 'regular C' and they mainly concentrate on C# now.
Or maybe since he is mainly a LINUX guy, he just wants to use GCC ... LOL
I'd prefer him to use Visual C, because I'm happy with that myself.
-Les
Rob Scott August 10th, 2004, 03:21 AM Les Dit wrote:
He seemed to indicate that the patches are infrequent for 'regular C' and they mainly concentrate on C# now.It's possible, but I have been very pleased with the stability and performance of VC++.
Rob Lohman August 10th, 2004, 05:09 AM Same here... never had any problems with it. Updates aren't
frequent, that is true!
|
|