View Full Version : Compression Question: From 1080 to 720. How does it work?


Miggy Sanchez
February 22nd, 2016, 07:44 PM
Hi there

When an image is compressed from 1080 to 720, how is that being achieved?

Are pixels in the 1080 source simply being thrown away to get down to 720?

Are pixels being merged?

I have no idea really.

Any info/advice appreciated as always.

Mig

Jim Andrada
February 22nd, 2016, 09:38 PM
No they don't just throw away pixels, instead they re-sample the image and calculate new pixels based on the surrounding hi res pixels. There are quite a few ways to do it - and that's pretty much where my knowledge ends.

Seth Bloombaum
February 23rd, 2016, 01:18 AM
The exact resampling method depends on the tool you're using. For example, in Sony Vegas Pro, Draft, Good, and Best choices are available when rendering. Good works fine, until you want to change the size (rescale) the video when rendering. Best should always be chosen when rescaling, it uses an advanced "bicubic" method.

Bicubic refers to some maths that I'm really not familiar with.

Yadif is the open source method generally referred to as the best rescaler. Strangely, common NLEs don't use it. However, you can get it in the freeware Handbrake, which is a very good h.264/MP4 encoder, that uses the best available methods of rescaling, deinterlacing, and MP4 encoding.

Miggy Sanchez
February 24th, 2016, 06:58 AM
Thanks Jim and Seth for taking the time to reply. Appreciate it.

In a sense then re-sampling the image and calculating new pixels based on the surrounding hi res pixels is a kind of "merging of the pixels"?

Does downscaling result in a better quality picture? (Do I hear a "it depends"?)

Cheers.

Bryan Worsley
February 24th, 2016, 11:58 AM
Yadif is the open source method generally referred to as the best rescaler.

Yadif is a deinterlacer - name derivation "Yet Another De-Interlace Filter"

Jim Andrada
February 24th, 2016, 02:38 PM
I guess you could call it "merging" but it's pretty complicated and there are a lot of different algorithms. Have to admit you're making me curious about exactly how it's done but sampling is a tough science - a big part of reading hard drives and tape drives - no they don't just write neat little 1's and 0's on the medium. It's REALLY complicated.

And yes, down rezzing can improve SOME aspects of the image under SOME conditions. But probably not ALL aspects under ALL conditions.

Damn - now I AM curious. I'll research it a little and see if I can come up with a better explanation.

Thanks for asking

Jim Andrada
February 24th, 2016, 04:11 PM
Found this - it's a simplified story but ideas are right.

Basics of Image Resampling (http://entropymine.com/imageworsener/resample/)

Jon Fairhurst
February 24th, 2016, 04:14 PM
For more information, search the web on "Finite Impulse Response" filters and "Fourier Transforms" as well as "resampling algorithms". I expect that there are some good tutorials and lectures out there.

And yes, downsampling can have benefits. Assuming that there is high frequency information in the signal (as compared to a gray field), the obvious downside is that you lose resolution and detail. The upside is that high-frequency noise is reduced too. And if you preserve all the bits, you can get more bit depth.

For instance, if you have a camera signal with UHD (2160p) resolution and downsample it to FHD (1080p), you are reducing the spatial information by a factor of four. That information doesn't just vanish. It shows up as higher accuracy (more bits). Consider a simple algorithm where you average four pixels to create one new, larger pixel. And consider that the values of the original (integer-only) pixels are 13, 13, 14, and 13. The result is 13.25, which is no longer an integer. You need two more bits to represent this, meaning that you have two more bits of information than you had before.

Assuming that the processing is done with good quality, there are no downsides to shooting at a higher resolution than you deliver, except that it might take more initial storage, more transfer and processing time and that you (obviously) lose resolution. On the other hand, if the processing is done badly, you could get aliasing, ringing, banding, and all sorts of artifacts. So use good quality tools. :)

Miggy Sanchez
February 25th, 2016, 01:54 AM
Ask and ye shall receive!!!

Jim, Jon, and Bryan, thanks for the links/references...

David Heath
February 25th, 2016, 11:31 AM
And yes, downsampling can have benefits. Assuming that there is high frequency information in the signal (as compared to a gray field), the obvious downside is that you lose resolution and detail. The upside is that high-frequency noise is reduced too.
Not sure I agree with the conclusions. If you downsample you are almost inevitably throwing information away, so in quality terms the overall quality can only be less. The point about noise reduction is a little more complex, since you're comparing lower overall noise with larger pixels - which being larger will show up a given level of noise more......

A rough analogy may be audio, where you start off with a noisy full frequency signal. Cut off a lot of the HF and it will certainly reduce the hiss - but at the expense of the higher frequencies. Does it make the overall result "better"? That's where it starts to become subjective.....

The real benefit to downsampling is it means a smaller image, which is likely to encode to a smaller file size/data rate. That may be more important than absolute quality, and that is likely to be the reason for doing any downsample.
And if you preserve all the bits, you can get more bit depth.

For instance, if you have a camera signal with UHD (2160p) resolution and downsample it to FHD (1080p), you are reducing the spatial information by a factor of four. That information doesn't just vanish. It shows up as higher accuracy (more bits). Consider a simple algorithm where you average four pixels to create one new, larger pixel. And consider that the values of the original (integer-only) pixels are 13, 13, 14, and 13. The result is 13.25,.........
This has come up before, and the best I would say is that it's questionable logic and may depend much on how the original 8 bit signal is derived.

To try to illustrate the point, then we have to think how the four pixels in your example are derived, and whether it's a good real world model. If we assume the input is a uniform scene whose value in the "10 bit" world is 13.25, it has to be asked how it gets sampled in the first place.

Ideally, each sample should be rounded to the nearest ("8-bit") integer, so each of the samples would then be rounded to 13, so the average will also be 13! The process of downsampling has gained no bitdepth advantage - 13.25 has still ended up as 13.

To end up with what you say implies a certain randomness in the rounding process. So that in this idealised situation, statistically 3 of every 4 samples will be rounded down by 0.25, 1 in four will be rounded up by 0.75. This is really implying noise in the system is required for the principle to hold true.

And the gotcha is that 10 bit really makes most difference in a low noise signal. If the signal is noisy in the first place, all 10 bit may do is more accurately define the noise signal! I'm not saying that this "4 8-bit samples transforms to 1 10-bit" is altogether wrong, but there's a lot more to it than is often assumed.
Assuming that the processing is done with good quality, there are no downsides to shooting at a higher resolution than you deliver, except that it might take more initial storage, more transfer and processing time and that you (obviously) lose resolution. On the other hand, if the processing is done badly, you could get aliasing, ringing, banding, and all sorts of artifacts. So use good quality tools. :)
Yes to most of that, but there are a couple of things to consider. First is datarate of the recorded 4K signal. It's highly likely to not be 4x that of an equivalent QFHD image with an equivalent compression system, which means that for any given block of x pixels square they must be more heavily compressed in 4K than QFHD, yes? That may not normally be an issue, as normally that block will be physically far smaller in QFHD than HD - assuming the same screen size in each case - but when you start to zoom in the picture........

The final thought to consider is aliasing. If we imagine starting off with a 1080 signal, then also assume it has blocks of lines corresponding to 1000, 900, 800 700, and 600 lpph. We downconvert to 720p and we hope it will keep the last two blocks intact, but the others must be more than a 720 system can handle. A perfect system would just make them grey blocks. The alternative is to cause aliasing. And in such the aliases "wrap round" to give lower frequency banding rather than plain grey. (I won't go into the theory, but take it from me that 800 will transform to 640, 900 to 540, and 1,000 to 440 lpph in all cases.)

Not good, and even worse with movement. So a good downscaler will filter out the (too) fine detail in the image BEFORE doing the actual downscale. (Though you may not be aware of it.)

Jon Fairhurst
February 25th, 2016, 12:56 PM
David, that's a lot to respond to. :)

Regarding "throwing information away", that's not necessarily true. If you use a good filter, you take all of the information as the input. Of course, for downsampling, we will reduce the HF content, but as I showed before, that information shows up as more bit depth. Of course, if you immediately round or truncate back to 8-bits, then you are truly throwing the information away.

One place where this fails is when you have a perfectly flat field that is noise free and already has an error. An example would be a digital cartoon with a flat sky that was supposed to be a given color of cyan, but was uniformly rounded to a different, perfectly uniform value. In this case, the information is already lost and downsampling will never get it back.

But now consider a real sky with subtle variations, and noise in the system. (Or a synthetic sky with more bits where you added dither before rounding.) If the actual value was 13.285714..., then in two sevenths of the area, the rounded pixels will be 14 and in five sevenths of the area, the rounded pixels will be 13. Dithering like this is good practice anyway as without it, shallow ramps will show contouring artifacts. With dither or natural noise, the density of the "13s" will gradually reduce and the density of the "14s" will gradually increase as the ramp level slowly grows.

With such a signal, you do indeed gain two bits of information as you downscale UHD to FHD,

Now let's say you do some color correction using a deep bit depth system (as one should). In this case, you are indeed using these extra two bits in the math. After than process, round back to 8-bits.

Except... First dither. Let's now say that you have a flat field of 13.25 values after processing. When you dither and then round, a quarter of the values in that area will have a value of 14 and three quarters will be 13. Of course, if one doesn't need to go back to 8-bits, you just keep that information.

So yes, you are correct that this doesn't work when, a) the information is already lost (such as when a noise-free signal was truncated without dithering), b) or when the bits are thrown away after filtering without dithering. But for most non-synthetic signals and well-processed synthetic signals, one has the option of doing the conversion well.

One caveat: If the original signal was heavily compressed, it might have block artifacts. These artifacts are like undithered, rounded, flat fields. In this case, the information was already lost (there's no HF detail or bit depth information), so downconversion won't get that information back. Then again for a flat field, you haven't lost any detail due to downsampling either.

Regarding the filtering, yes, one loses resolution by design. If you simply throw away pixels, you get aliasing and phase shifts. By filtering too softly (like with a cheap, weighted average filter), one loses detail. By filtering too sharply, one can get ringing. But like Baby Bear's porridge, one can select a filter that is "just right", where the edges are sharp and the ringing just on the edge of being noticeable. If your tool doesn't have a good downconversion filter, one trick is to add or reduce sharpness in the original before downconversion to produce the desired result.

To summarize good practice:
* Use a higher bit depth setting when down converting.
* Dither before rounding.
* Use a good down conversion tool

David Heath
February 25th, 2016, 05:42 PM
Regarding "throwing information away", that's not necessarily true. ........ Of course, for downsampling, we will reduce the HF content, but as I showed before, that information shows up as more bit depth.
Oh, I think it is reasonably accurate to say "If you downsample you are almost inevitably throwing information away......"

As far as the bitdepth point goes, then I did say "I'm not saying that this "4 8-bit samples transforms to 1 10-bit" is altogether wrong, but there's a lot more to it than is often assumed", and I stick by that. I certainly think that when downsampling, what you lose in resolution information is never compensated for in extra bitdepth, even when the circumstances are such that it works well.

Of course, it depends what the original is. It's plausible to have a (say) 1080 signal, but the information contained in it to not make use of the full capacity of the signal. As example I'd quote a camera like the HVX200. It's 960x540 chipset had a limiting resolution of about 1150x650. It could be recorded as a 1080p signal - but that didn't affect the fundamental resolution limit. And downconvert to 720p (at 1280x720) and you wouldn't see any significant softening.

But start off with a 1920x1080 chipset and it's a different matter. Downconvert to 720 and information WILL be lost - how much depending on the image itself. (In the extreme, if say a pattern of very fine lines at 900lpph, the downscaled 720p should be just a uniform grey raster!! So ALL the information is lost! :) But that's an extreme example!)

As far as dithering goes, then whilst it may be used to disguise or mask banding, then isn't it almost equivalent to saying "adding noise"? So whilst it may make the pseudo 8 to 10 bit transform work in theory, the dither is introducing a certain randomness (noise) into the sample values - so the whole point of raising bitdepth (making the exact sample value more precise) is negated?

I'm not arguing that there may be no benefit in post producing in 10 bit, even if acquisition was 8 bit, but that is a separate argument to the whole subject of saying useful bitdepth can increase with downsampling.

Seth Bloombaum
February 25th, 2016, 07:51 PM
...Yadif is the open source method generally referred to as the best rescaler. Strangely, common NLEs don't use it. However, you can get it in the freeware Handbrake, which is a very good h.264/MP4 encoder, that uses the best available methods of rescaling, deinterlacing, and MP4 encoding.
Yadif is a deinterlacer - name derivation "Yet Another De-Interlace Filter"
Quite right Bryan. I managed to transpose the deinterlacer and the rescaler. Handbrake *does* use state of the art methods for both, but Lanczos is the scaler, and Yadif is the deinterlacer.

Bryan Worsley
February 26th, 2016, 09:48 AM
There is also a modified version of Yadif - YadifMod - that takes it's spatial (interpolation) predictions from an external source - typically NNEDI3, which itself can be used as an upscaler....but not downscaler.

Yadifmod - Avisynth wiki (http://avisynth.nl/index.php/Yadifmod)

Nnedi3 - Avisynth wiki (http://avisynth.nl/index.php/Nnedi3)

I use them all the time in my AVISynth processing routines. I'm not sure if YadifMod has a counterpart in ffmpeg which is what Handbrake runs on. But the AVISynth version is actually a modification of a port of the original Yadif from Mplayer (ffmpeg).

I use Spline36 for downscaling 1080p to 720p, by the way. Although considered a "sharp" resizer, it is less prone to ringing artifacts than Lanczos, at least using AVISynth resize filters.

Jon Fairhurst
February 26th, 2016, 01:07 PM
David, you are absolutely correct that high frequency information near and above the Nyquist frequency is thrown away. Poof. Gone.

But don't discount information theory when it comes to oversampling. 1-bit audio doesn't just work in some conditions. SACDs just plain work. You can in fact trade sampling frequency for bit depth.

https://en.wikipedia.org/wiki/Oversampling
"Oversampling improves resolution..." (In this case, they mean bit depth resolution.) https://en.wikipedia.org/wiki/Oversampling#Resolution

David Heath
February 28th, 2016, 07:09 AM
But don't discount information theory when it comes to oversampling. 1-bit audio doesn't just work in some conditions. SACDs just plain work. You can in fact trade sampling frequency for bit depth.

I'm not arguing against most of the basic principles there. And maybe the best example of "1-bit sampling" is the traditional way photographs were reproduced in newspapers when the printing process was "black ink or nothing" - so a photograph which appeared to have grey scales was composed of black dots.

I've also heard a suggestion put forward for very high frame rate TV. The problem for such is obviously the huge amounts of raw data. Inter-frame coding is obviously one way forward, but another suggestion is for one bit coding within each frame, and gradations of tone for a pixel to be conveyed by how many frames have it "white" and how many "black". Obviously we are talking about very high frame rates indeed, but in theory the principle is as you say in both those cases - oversampling spatially in the first case, temporally in the second to trade sampling frequency for bit depth.

https://en.wikipedia.org/wiki/Oversampling
"Oversampling improves resolution..." (In this case, they mean bit depth resolution.) https://en.wikipedia.org/wiki/Oversampling#Resolution

That is really talking about audio sampling in the time domain, but yes, I'm sure a lot of the principle holds good in the spatial domain.

But note it puts some maths to it: "The number of samples required to get n bits of additional data precision is
number of samples = (2^n)^2 = 2^2n."

So if we want to move from 8 bit to 10 bit, (2 bits of additional data precision), that formula predicts us to need 2^4 times as many samples - 16x as many! Not 4x.

And even then it qualifies it.
This averaging is only possible if the signal contains equally distributed noise which is enough to be observed by the A/D converter.[3] If not, in the case of a stationary input signal, all 2^n samples would have the same value and the resulting average would be identical to this value; so in this case, oversampling would have made no improvement.
Which is more or less what I said earlier about "we have to think how the four pixels in your example are derived, and whether it's a good real world model".

So whilst I don't disagree with the basic principle of what you're saying - that oversampling can be traded for better bitdepth - I do disagree with simply saying "downscale a 8 bit 4K signal to FHD and it can be considered as 10 bit."

The above formula predicts that the BEST that could be hoped for is a "9bit" signal, and even this is dependent on circumstance. You'd need to be talking about 16x as many samples to really get 10 bit, in other words, 8K.

Jon Fairhurst
February 29th, 2016, 12:59 PM
Good catch on the formula; however, the article first says this:

When oversampling by a factor of N, the dynamic range increases by log2(N) bits, because there are N times as many possible values for the sum.

So the basic formula says four times the samples gives two more bits of dynamic range.

The next statements about noise are in the context of an A/D or D/A converter. The A/D is the equivalent of a camera sensor system. They assume that for a given A/D technology, if you speed up the clock, it will have a shorter sampling time, which would increase noise. This is similar to increasing the resolution of a video sensor. That makes each pixel smaller, so the noise increases.

But the context I'm presenting is signal-only. I'm not comparing the 4K downsampled signal to what you would have gotten had the sensor been 2K with its inherent lower noise. In our case, we have a given signal with its given noise depending on camera, ISO, etc. We are just looking at the extra dynamic range without comparing it to an engineering tradeoff with a lower res, lower noise camera.

And yeah, the part about needing distributed noise is important. This is a problem in synthetic media, but not typically with real scenes as the signal varies. One real pixel might be 0.1 shy of the recorded value while the next is 0.1 too hot. Scene variation gives us that randomness, even when the noise is quantized. But yeah, don't apply a heavy handed noise reduction to create plastic faces before the downsampling. The downsampled signal would show that same, low resolution, low noise, inaccurate face tone. So, yeah, conditions need to be right.

Also keep in mind that a good digital low pass filter (used for downsampling) has many taps across many samples horizontally and vertically. So each new 2K pixel gets a small contribution from a wide range of 4K pixels. This helps ensure that the noise contribution is random as each new pixel gets fed by more than its nearest neighbor.

The bottom line is that one "can" get the equivalent of more bits of information by downsampling, but only if that additional information hasn't already been lost.

David Heath
February 29th, 2016, 03:32 PM
Good catch on the formula; however, the article first says this:

When oversampling by a factor of N, the dynamic range increases by log2(N) bits, because there are N times as many possible values for the sum.

So the basic formula says four times the samples gives two more bits of dynamic range.
That's not how I read it. Even before that quote, the article states that:
For instance, to implement a 24-bit converter, it is sufficient to use a 20-bit converter that can run at 256 times the target sampling rate. Combining 256 consecutive 20-bit samples can increase the signal-to-noise ratio at the voltage level by a factor of 16 (the square root of the number of samples averaged), effectively adding 4 bits to the resolution and producing a single sample with 24-bit resolution.
In other words, to increase the bitdepth by 4 bits - here from 20-24 bits - it's necessary to have 256x as many samples. Which is in line with the later formula I used ("The number of samples required to get n bits of additional data precision is: number of samples = (2^n)^2 = 2^2n.")

So in that case we're talking about 4 extra bits - so n=4 - so we need 2^2*4 as many samples. In other words, 256. This is all consistent with needing 16x as many to get an equivalence with 2 extra bits.

What you quote above should be seen as an interim step.

The way I see it is that in your earlier example (4 values of 13,13,13,14, averaging to give 13.25) it's an idealised example, which may not be typical - and is not likely to be. The next block of 4 may be 13,13,14,14 and give an averaged value of 13.5, the next may be 13,13,13,13 and so on on a statistical basis. It's only when you get up to 16 samples that statistically you can realistically expect three times as many "13" values as "14". (In practice I'd expect other values such as 12 and 15, but the average to become more predictably 13.25.) But this always assumes that what is really 13.25 gets digitised randomly, and not always perfectly to the nearest integer - when it would always be 13, and the average of any number of samples will then always be 13 exactly. But I think we're agreed on that.....?

Jon Fairhurst
February 29th, 2016, 09:30 PM
What's weird is that SACD (1-bit) samples at just 64 times 44.1 kHz, yet competes with DVD-A, which is up to 24 bits at 192 kHz. Of course, it's sigma-delta, which is a bit different than PCM, but still...

And regarding the 13.25 example, that's where the many samples in the filter help out. It's not just nearest neighbor, but point taken that it relies on a random distribution to work.