ONCE upon a time, life was easy. When you wanted the best possible performance from your video surveillance system you went out and bought Panasonic or Pelco cameras and S-VHS timelapse recorders and all was good. But things have changed. Over the past 10 years performance has morphed into a paper specification based on 3 easy to remember numbers (resolution, minimum scene illumination, dynamic range). These 3 numbers have bred an expectation of simplicity thanks to our human cognitive biases. We desperately want there to be just 3 simple numbers to cleave to so we don’t have to switch on our prefrontal cortices. Trouble is the surveillance specs we now cling to are utterly subjective.I have to confess, it’s not entirely helpful blurring the lines between camera performance and let’s face it, network performance, as I’m doing here. But in the current market there really is little choice but to lump these things together. The reason for this is that it does not matter what your optical solution and chipset are doing up front – viewing and recording performance now come down to the capability of video coding coming out of a camera’s tailpipe or being laid down by the encoders in DVRs or NVRs. It goes without saying that for a few years now, people have been buying H.264 without really knowing what they are buying. Well – what they are buying is essentially no more than a word that describes a list of encoding options compression developers might use. Whether they take 1 of these options or all 12 will dictate the quality of the video produced and recorded. So while H.264 provides a broad palette of options it does not qualify which options are best for your application and nor does it specify which options should be used. That selection is entirely up to the manufacturer. If all this is bending your head – good. We need intelligent discourse on H.264 and we all need to know that H.264 is not something that can be blithely written into a tender specification as a guarantee of compression quality because it can’t be. Instead H.264 is a general label in the same way the word ‘petrol engine’ is a label on the fender of a car. It says nothing whatever about the performance of a given compression solution.
Let’s begin by outlining the 12 encoding options that can apply to H.264. They include: I and P Slices, B Slices, Multiple Reference Frames, In-Loop De-Blocking Filter, CAVLC Entropy Coding, CABAC Entropy Coding, Interlaced Coding (picAFF, MBAFF), 8×8 vs 4×4 Transform Adaptivity, Quantization Scaling Matrices, Separate Cb and Cr QP Control, Separate Colour Plane Coding and finally, Predictive Lossless Coding. Of these 12 encoding options, Baseline Profile H.264 might include good or bad interpretations of 3 or 4 parameters, Main Profile might use 7 encoding options and only High 4:4:4 Predictive Profile will use all 12. At the start of this discussion you want to understand the difference between profiles and levels with H.264. Put simply, an H.264 ‘profile’ defines a box of coding algorithms (I-Frames, P-Frames, B-Frames, etc), that a developer can deploy in order to build a bitstream said to conform to the H.264 standard. At the same time, an H.264 ‘level’ actually constrains physical parameters of that bitstream and defines real things like resolution and data rate. There are a couple of key profiles that apply to H.264, and in this feature we are going to talk about the primary Baseline Profile. At this stage of the game, Baseline Profile H.264 compression is what most the manufacturers are talking up. Some go further, but not many. Baseline Profile has benefits but it also comes with some technological burdens and users and integrators need to be aware of them. The key feature of Baseline H.264 is that it’s 30 per cent more efficient than MPEG-4 in terms of bandwidth consumption and storage requirements. This is nice and if all you have to do is appease the network administrator, then Baseline Profile will suit you nicely. In terms of its rule set, Baseline with generally use I Frames and P Frames, there will be an In-Loop Deblocking Filter and there will be Multiple Reference Frames. That’s about all you’ll get. With Baseline Profile, De-Blocking filters are important as they soften the compression variations between the encoding block sizes. This variation is important as there are areas in a scene that need little compression and areas that need a lot of compression. De-Blocking allows a device to achieve this while retaining a natural looking image. This profile’s use of Multiple Reference Frames is also important. H.264 uses the image or two before a frame to establish whether or not there was a difference between the images and then uses the earlier images as a reference to make new images. It’s a smart reference that increases efficiency in terms of bitstream but it takes a lot of processing power in the chipset to pull off.
“H.264 is a general label in the same way the word ‘petrol engine’ is a label on the fender of a car. It says nothing whatever about the performance of a given compression solution”
Also valuable with Baseline Profile is Context Adaptive Variable Length Encoding – the variable length allows tailoring of the image quality required. CAVLC allows reduction of file size and bandwidth but this too, requires heavy processing. Importantly, CABAC is considered to offer about 15 per cent better quality that CAVLC for the same data rate. But there’s another complexity in this discussion. The general standard for video is MPEG-2 and many devices employ it. And the way many devices deliver H.264 is to take this MPEG-2 compression and convert it to H.264 after its capture by the camera and at the DVR or NVR. They achieve this conversion using transcoding. Typical H.264 Baseline is a 2CIF encoder (720 x 240) – which is no different to MPEG-4 in terms of quality. But often this 2CIF signal will be transcoded to 4CIF in order to achieve the current perceived gold standard – H.264. Trouble is, the process is undertaken with little regard to what’s happening behind the scenes. Yes, with Baseline Profile H.264 you get 2.5x better compression but with transcoding you generally wind up with blockier images that are of poorer quality. Latency is another issue with transcoding. When image streams are being transcoded from the camera to the server the server is doing more work as it battles to handle conversion of the bitstream being stored. The simplest and most common method used is to completely decode the MPEG-2 bit stream and then re-encode it with an H.264 encoder. There are going to be system issues – handling SCTE-35 digital program insertion (DPI) messages will require that decode and encode operations be tightly coupled. The key thing is to realise the quality of transcoding with this simple approach will not be high. Typically, if the product you buy is transcoding MPEG-2 at 4Mb/s into H.264 this means you will lose 20 per cent compression efficiency. We are going to look at H.264 in more detail next month, with a specific focus on Extended Baseline Profiles and Main Profiles. Suffice to say, end users and integrators need to think hard about all the parameters of the modern networked surveillance system and ensure they only buy H.264 products that meet all their complex requirements.