Advanced Video Coding on Linux
by Dave Berton October 2006
Original article at Linux Journal

"Use H.264 to create high-quality, low-bitrate digital video with currently-available tools on Linux."

The impact of H.264 on the world of digital video compression is growing. Companies such as Apple are already switching wholeheartedly to it. As part of the MPEG-4 standard (part 10), H.264 is now a part of both the HD-DVD and Blu-ray specifications for High Definition DVD. And for good reason: H.264 can encode video using very low bitrates while maintaining an incredibly high perceived quality.

Of particular interest are the low-bitrate possibilities this video codec provides. Luckily for those who run Linux, the H.264 codec (also known as the 'Advanced Video Codec', or AVC), has a very successful and effective open source implementation known as x264. In fact, the x264 project won the Doom9 2005 codec comparison test. X264 continues to make progress and improvements, and it remains an active project. So let's take advantage of what it offers us: an extremely high quality AVC encoding tool which can be used right away for DVD and home movie backups, creating video clips for streaming over the web, or just for experimenting with the latest video encoding technology.

The balance of this article will focus on the basic steps involved in creating standard .mp4 files which contain H.264 video coupled with AAC audio (or 'Advanced Audio Codec', also an MPEG standard). The vagaries and subtle corners of hard-core video encoding are beyond the scope of this discussion. But hopefully this introduction will encourage you to explore the topic further.

Since both AVC and AAC are now MPEG standards, it stands to reason that there are already many tools (commercial and otherwise) which support it. For example, Apple's Quicktime natively supports the video files we will be creating. And mplayer, the well-known and successful open source media player, also supports .mp4 playback.

Getting Started

Creating a standards-compliant video file will involve 3 basic steps: the creation of the encoded video, the creation of the encoded audio, and the combination of those two things. Here are the software tools we will need.

Our goal will be to produce a low-bitrate video file suitable for posting on the web. It will be a small file, but the quality will be exceptional compared with a higher-bitrate XviD encoding. Our source video will be a home movie clip called 'max.dv', which is a 9 second raw DV file captured directly from a digital video camera.

We'll process the audio first, since it is a pretty straightforward operation. The idea is to first have mplayer dump the raw pcm audio directly from our video source.

  mplayer -ao pcm -vc null -vo null max.dv

This will produce a file called 'audiodump.wav'. The video portion of the source file is ignored. Now encode this wave file to AAC.

  faac --mpeg-vers 4 audiodump.wav

The --mpeg-vers parameter specifies the MPEG version. We now have the audio portion of our work finished, and you can listen to 'audiodump.aac' by playing it with mplayer.

When it comes to encoding the video, we are faced with several options. The highest quality encodes can only be made using multiple passes: we actually process the source video twice (or more) in order to allow the encoder to pick the best possible distribution of bits across the destination file. Using multiple passes will also enable us to pinpoint the bitrate and resulting file size of the output. However, encoding with an AVC encoder such as x264 is very processor intensive, and thus can run pretty slowly, so we may not want to sit through a lengthy multi-pass encoding. We could instead run the encoding with one pass. This will still produce outstanding results, but never as good as a multi-pass encode. We also give up the possibility of targeting the resulting file size and bitrate. It all depends on what is most important to you, time or quality.

Fortunately, x264 provides a good middle ground. An option exists to specify a Constant Rate Factor (or Constant Quality), which instructs x264 to take into account the differences between high- and low-motion scenes. Since your eye loses the details in high-motion scenes anyway, x264 will use fewer bits in those spots so that it can allocate them elsewhere, resulting in a much improved overall visual quality. This mode allows the highest quality possible without using multiple passes, which is a great time saver. The cost in using this mode, however, is in giving up the ability to determine the final file size and bitrate. While this is possible with multiple passes, we would be forced to double the encoding time. So for our example, we will stick with one pass, utilizing the constant rate factor feature (--crf) for greatly improved quality. Good values of the Constant Rate Factor range between approximately 18 and 26 (where a lower value produces higher quality but larger file sizes). Your needs in terms of size vs. time vs. quality may be different however. If so you should investigate multi-pass mode further which gives you more control.

The x264 encoder will only accept raw YUV 4:2:0 input. To do this, simply pipe the output of mencoder directly into x264.

  mkfifo tmp.fifo.yuv
  mencoder -vf format=i420 -nosound -ovc raw -of rawvideo \
      -ofps 23.976 -o tmp.fifo.yuv max.dv 2>&1 > /dev/null &
  x264 -o max-video.mp4 --fps 23.976 --crf 26 --progress \
      tmp.fifo.yuv 720x480
  rm tmp.fifo.yuv

As you can see, we must specify the framerate (--fps) otherwise x264 will not know what is being fed into it. Similarly for the width and height of the incoming raw video. Encoding in this way will enable the x264 default encoding parameters, which are quite good, but there are a few improvements we can make. In particular, we can make general improvements to some of the encoding strategies it uses without sacrificing too much in the way of extra encoding time. The number and variability of the parameters you can feed into x264 is very high, and they are all geared toward improving the quality of the resulting output in some way. However, some options are more expensive, time- and processor-wise, than others. And some options can sacrifice compatibility with certain media players, notably Quicktime. In order to remain compatible with the existing install base of Quicktime users, we need to keep a few things in mind.

Quicktime and H.264

It is nice that Quicktime 7 supports H.264 encoded video. Apple itself encodes all of its movie trailers online using H.264. Although this is good, and fosters the adoption of this codec, the Quicktime implementation has some limitations, most notably with B-Frames and Profile support. We need a short detour to explain what this means for our encoding project.

The MPEG standard for H.264 includes a number of 'profiles' including Baseline, Main, Extended and High. These profiles delineate different technical capabilities which a decoder may need to possess. As its name suggests, the Baseline profile is the simplest and least demanding profile, while Main, High and Extended require more processing power and the interpretation of more technical features in order to decode properly. Quicktime 7 supports Baseline and parts of the Main profiles, however it will choke on features of the Extended and High profiles.

B-Frames are a type of storage format for digital video. These types of frames reference information from other previously-decoded frames in order for the decoder to properly to its job, which is to decode the video. B-Frames are interleaved amongst other frame types known as I-Frames and P-Frames. It's a technical detail, but the Quicktime 7 H.264 decoder can only support up to two B-Frames, no more. This is unfortunate, since using more B-Frames would allow us to increase quality under some circumstances.

To remain Quicktime compatible, we need to keep these limitations in mind. However, the quality of our low-bitrate encoding will not really suffer that much, even with these limitations. And there are a few additional options to enable which will improve things quite a bit. The first is the 'subpixel motion estimation' (--subme) size, which controls the precision of motion estimation calculations used by x264 during the encoding process. By increasing this to 6, the maximum, we gain a lot of visual quality at the cost of some additional encoding time, but it is worth it. We can also configure how x264 analyzes frames to perform better motion estimation (--analyse), which will lead to higher quality encodes. Note that some types of analysis are for High profile encodings only, such as 8x8 DCT, which are not supported by Quicktime so we will avoid those settings. We can also disable PSNR calculations (--no-psnr) to buy back a little speed during the encode. PSNR is just a quality measurement and has no effect on the actual encoding quality.

Putting all this together, we can now output a high-quality, low-bitrate, Quicktime-compatible and standards-compliant video encoding using H.264.

  mkfifo tmp.fifo.yuv
  mencoder -vf format=i420 -nosound -ovc raw -of rawvideo \
      -ofps 23.976 -o tmp.fifo.yuv max.dv 2>&1 > /dev/null &
  x264 -o max-video.mp4 --fps 23.976 --bframes 2 --progress --crf 26 \
      --subme 6 --analyse p8x8,b8x8,i4x4,p4x4 --no-psnr tmp.fifo.yuv 720x480
  rm tmp.fifo.yuv

There are further improvements we can make. Since this video file is destined for the web, we would most likely want to reduce the frame size to something more friendly, possibly crop out unwanted areas, or make other adjustments. For example, to reduce the frame size run the following commands.

  mkfifo tmp.fifo.yuv
  mencoder -vf scale=480:320,format=i420 -nosound -ovc raw -of rawvideo \
      -ofps 23.976 -o tmp.fifo.yuv max.dv 2>&1 > /dev/null &
  x264 -o max-video.mp4 --fps 23.976 --bframes 2 --progress --crf 26 \
      --subme 6 --analyse p8x8,b8x8,i4x4,p4x4 --no-psnr tmp.fifo.yuv 480x320
  rm tmp.fifo.yuv

Here we instruct mencoder to scale the output to 480x320, and also tell x264 to accept that frame size. This will further reduce the file size, which is appropriate for video on the web.

Final Steps

Based on the Quicktime format, the .mp4 container format can store many types of media, and is also the MPEG standard for storing H.264 video and AAC audio, which is how we will be using it. Use MP4Box, part of the gpac project, to combine the audio and video streams we've just created:

  MP4Box -add max-video.mp4 -add audiodump.aac -fps 23.976 max-x264.mp4

This produces the final output file 'max-x264.mp4'. You can play back the file with mplayer, or with Apple's Quicktime player on a non-Linux OS. You can also embed this file into a web page for playback from a browser by using Apple's instructions for embedding Quicktime movies. Free software tools such as the mplayer-plugin can be used to play this file from within Firefox on Linux.

By way of comparison, here are the file sizes and bitrates of the original raw DV file 'max.dv', our H.264 encoded file 'max-x264.mp4' and a corresponding XviD encoding 'max-xvid.avi', which was created* from the same source video.

File File Size Video Bitrate
max.dv 32M 3 MB/s
max-xvid.avi 623K 418 kb/s
max-x264.mp4 522K 392 kb/s

And here are accompanying screenshots of each sample.

Raw DV max-dv-small-1.png max-dv-small-2.png max-dv-small-3.png
XviD - 418 kb/s max-xvid-1.png max-xvid-2.png max-xvid-3.png
X264 - 392 kb/s max-x264-1.png max-x264-2.png max-x264-3.png

As you can see, the visual quality of the H.264 encoded file is just as high as the XviD version, arguably higher, but at a lower bitrate and file size. This shows that you can achieve similar results in less space, or much better results in the same space, with H.264 compared to other codecs such as XviD. In addition, the workflow and options for encoding with x264 are very similar to XviD, but with greatly improved output. So if you are used to encoding with XviD, many of the concepts and options should be familiar to you when working with x264.

XviD Detail detail-xvid.png
X264 Detail detail-x264.png

The more you experiment with x264, the more you will discover the amazing savings in bitrates and file sizes while still maintaining an extremely high visual quality. The world of video encoding is definitely a black art, as there are hundreds of variables and options which can be brought to bear in any particular encoding project. There is no 'one size fits all' method of video encoding. However, the technical superiority of H.264 over XviD or regular MPEG-2 encoded video is too great not to take advantage of. And you can start taking advantage of it today, using the tools described above. Since H.264 is an MPEG standard encoding, used with an MPEG standard audio codec inside of an MPEG standard container format, all the work you invest in using these tools to encode your video will be future-proof as well as high-quality. Use the techniques outlined above as a starting point for your own H.264 encoding projects, and you'll discover why H.264 is becoming the next standard for video encoding.


* Command line used to create max-xvid.avi:
   mencoder max.dv -vf scale=480:320 -ovc xvid -xvidencopts \
       fixed_quant=7:qpel:nopacked -oac mp3lame \
       -ofps 24000/1001 -o max-xvid.avi