Julio E. Duran
Technology Development Specialist
julio.duran@intelsat.int
1.0 INTRODUCTION
The MPEG-2 Standard has made it practical to use compressed digital video signal in consumer products. This fact has opened a wide range of applications (including new Internet services). The widespread use of the MPEG-2 standard demonstrated the need for a detail technical study of the MPEG-2 structure. The research began few weeks ago, first compiling all available information, and secondly synthesizing the information. This paper provides an overview of the MPEG-2 Standard (Video, Audio and System). The MPEG-2 system is an ISO/IEC standard (13818-1) that defines the syntax and semantics of bitstreams in which digital audio, visual data, and data are multiplexed and transmitted. This standard integrates all services of Video, Audio and Data. In addition, digital compression plays a critical role in making television a truly interactive media. In the next few years, the development of new technologies using standards like the MPEG-2 is expected.
2.1 What is MPEG?
MPEG (Moving Pictures Experts Group) is a group of experts that meet under the ISO (International Standard Organization) to generate standards for digital video and audio compression. The official name is: ISO/IEC JTCI SC29 WG11 [1]. MPEG core technology includes many different patents from different companies and individuals worldwide, but the MPEG committee only sets the technical standards without dealing with patents and intellectual property issues.
2.2 What is MPEG-1?
MPEG-1 was the first project of MPEG, was published in 1993 as ISO/IEC 11172 [2,3]. It is a five part standard defining audio, video, systems, conformance testing and simulation software. It has been applied in the CD and Video-CD system for publishing full screen motion video on CD-ROM. However, it principally supports video coding at bit-rates up to about 1.5 Mbit/s giving quality similar to VHS, and stereo audio quality at 192 Kbits/s/channel.
2.3 What is MPEG-2?
During 1990, MPEG recognized the need for a second, related standard for coding video at higher data rates in an interlaced format. MPEG-2 recognized as the second project of MPEG, was published in 1994 as ISO/IEC 13818-1 (Systems), 13818-2 (Video), 13818-3 (Audio); in 1995 as ISO/IEC 13818-4 (Conformance), 13818-5 (Simulation software), 13818-6 (Digital storage media), 13818-9 (Real-time interface); in 1996 as ISO/IEC 13818-8 (10 bit video extension); and in 1997 13818-7 (Non backwards compatible audio)[3].
The MPEG-2 concept is similar to MPEG-1, but includes extensions to cover a wider range of applications and addition of syntax for efficient coding of interlaced video. The primary application was the all digital transmission of broadcast TV quality video at coded bitrates between 4 and 9 Mbit/sec. However, the MPEG-2 syntax has been found efficient for others applications. MPEG-2 is aimed at diverse applications such as television broadcasting, digital store media, digital high-definition TV (HDTV), and communication. Some of the possible applications in the MPEG-2 Video are: Broadcast Satellite Services (BSS) to the home; digital sound broadcasting (DSB); digital terrestrial television broadcasting (DTTB); electronic cinema (EC); fixed satellite television (FSS); home television theater (HTT); Digital Video Broadcasting (DVB); Digital Audio Broadcasting (DAB); Integrated Services Digital Broadcasting (ISDB); interactive storage media (ISM); videoconferencing and videophone; multimedia mailing (MMM); network database services (NDB). MPEG-2 is still developing and in the future may appears news digital applications that need MPEG-2.
2.4 Others MPEG Standards
2.4.1 MPEG-3
MPEG-3 started in 1991 targeting HDTV applications with sampling dimensions up to 1920 pixels x 1080 lines x 30 Hz and coded bit rates between 20 and 40 Mbit/sec. It was later discovered that with some (compatible) fine tuning, MPEG-2 syntax worked very well for HDTV rate video. HDTV is now part of the MPEG-2 High - 1440 Level and High Level toolkit.
2.4.2 MPEG-4
Before MPEG-2 was finished a new project, MPEG-4 was started and targeted at Very Low Bitrate applications defined loosely as having sampling dimensions up to 176 pixels x 144 lines x 10 Hz and coded bit rates between 4800 and 64,000 bits/sec. This new standard would be used, in low bit rate videophone, interactive mobile multimedia communications, mobile audio-visual, multimedia electronic mail, electronic newspapers, interactive multimedia database, games, multimedia videotext and low bitrate speech coding.
MPEG-4 is now in the application identification phase. The workplan foresees November 1998 as the date for the official sanction of the proposed standard. This standard is going to have three parts: System, Video, and Audio.
2.4.3 MPEG-7
The industry trend logically follows the increasing requirement of digital audiovisual content. The MPEG members recognized this trend, and initiated a new work item. MPEG-7 will be a standardized description of various types of multimedia information. This description will be associated with the content itself, to allow fast and efficient searching for material that is of interest to the user. MPEG-7 is formally called Multimedia Content Description Interface. MPEG-7 will address applications that can be stored (on-line or off-line) or streamed (e.g. broadcast push model on the INTERNET), and can be operated in both real time and non-real time environment. The international Standard will be available November 2000.
3.0 MPEG-2 VIDEO
3.1 Video Compression Basics
The video compression technology can be applied to provide efficient utilization of the available channel bandwidth for both video distribution and transmission systems .
A block diagram of a typical digital video application is show in Figure 1. The video source is encoded by the video encoder, and the output of the encoder is a string of bits. This process consist of three basic operations, as shown in Figure 1. The first step, the video signal is analyzed and represented in a more efficient expression, where most of the important information is concentrated in a small number of coefficients. It is in this step where the algorithm of Temporal Redundancy and Spatial is applied. The second step is the Quantization, in others words determining the scaling of the represented data. The third step provides Assignment of Codewords, which are strings of bits used to represent the data stream after quantization. Each of these steps and how they contribute to the reduction in the number of bits required to represent the images will be discussed in later sections of this document.

Figure 1. Video Compression System
The channel encoder transforms the string of bits from the video encoder into a form suitable for transmission over a communications channel through several steps (multiplexing, scrambling, error correction, modulation and transmission), but this discussion isn’t the focus of this document.
3.1.1 Video Structure Hierarchy
The first step in coding video is to organize the images into subimages or blocks for processing. The analog representation of television consists of lines of video, field and frames of video. Therefore, the television signal is sampled in the vertical and temporal dimensions. Transforming the video into its digital representation also requires sampling of the video in the horizontal and vertical dimensions, producing a rectangular array of picture elements called pel. Each pel (picture element) or pixel is sampled in three dimensions: X (horizontal), Y (vertical), and T (time). Each pel consists of light intensity (luminance) and color (chrominance) information quantized to an appropriate level of bits (8 or 10 bit accuracy). The next step is to organize the pels into a block. A block normally consists of either an array of pel values or the transform of those values into an array of coefficients. A block represents a set of 8 x 8 values representing the luminance or chrominance information. Blocks of information can be organized into a macroblock. A macroblock consists of four blocks of luminance (Y) information (or a 16 x 16 array of values) and a number of chroma (Cr and Cb) blocks. When the number of chroma block is four (two each of the Cr and Cb block), as shown in Figure 2, the format is 4:2:2. One or more contiguous macroblocks in a row are grouped together to form slices. The order of the macroblocks within a slice is the same as the conventional television raster scan being from left to right. When an error occurs in the data stream, the decoder can move to the start of next slice so that the slice represents the minimum unit for recovery and resynchronous after an error. The group of slices is called frame or Picture which constitute the active picture area (Figure 2).

Figure 2. Video Structure Hierarchy.
One or more picture (frames) in sequence are combined into a group of picture (GOP) to allow random access into the sequence and to provide boundaries for interpicture coding.
Finally, a video sequence is represented by a sequence header, one or more groups of pictures, and an end of sequence code in the data stream.
3.1.2 Removal of Temporal Redundancy
The video sequences are usually correlated in time resulting in a lot of temporal redundancy among the adjacent pictures. Motion compesation attempts to delete this temporal redundancy form the information that is transmitted. The process is of the following way: each frame of the sequence is quite similar to the preceding. Therefore coding only for the differences between adjacent pictures. The difference is called motion compesated residual or prediction error. For a typical video sequence, the energy in the residual is much less than in the original video. The motion estimation process assumes that the same imagery appears in consecutive video pictures, although possibly at different locations.
Multiple prediction methods are available to provide motion compensation for progressive and interlaced pictures. The simplest form of predictive coding is Differential Pulse Code Modulation (DPCM)[5]. The use of these kind of coding, produces two types of picture: P-picture, and B-picture. Another types of picture are the D-picture and the I-picture, but these types of pictures do not use this kind of coding.

Figure 3. Video Picture Order.
Intra picture (I-picture) are coded without reference to the other pictures. Moderate compression is achieved by reducing spatial redundancy (see section 3.1.3), but not temporal redundancy. They provide access points in the bitsstream where coding begins[4].
Predictive picture (P-picture) are coded using a motion-compesated prediction from a past I-picture or P-picture (the prediction is the forward direction only) and may be used as a reference for the further prediction. By reducing spatial and temporal redundancy, pictures offer increased compression compared to I-pictures (see Table 1)[4].
Bidirectionally-predictive picture (B-picture) use both past and future I-picture or P-picture for motion compensation and offer the highest degree of compression (see Table 1). To enable backward prediction from a future frame, the coder reorders the pictures form natural display order to bitstream order so that the B-pictures are transmitted after the past and future picture it references. This introduces a reordering delay dependent on the number of consecutive B-pictures. However, the use of B-pictures requires additional memory in the receiver[4]. It is understood that the B-pictures themselves cannot be used for predicting future pictures. In other words three B-pictures cannot be consecutive (In , Bn+1 , Bn+2 , Bn+3 .......).
|
MPEG Standard |
Picture Type (size in bits) |
|||
|
I |
P |
B |
Average |
|
|
MPEG-1 SIF (1.15 Mb/sec) |
150,000 |
50,000 |
20,000 |
38,000 |
|
MPEG-2 601 (4.00 Mb/sec) |
400,000 |
200,000 |
80,000 |
130,000 |
Table 1[5]. Typical frame size in bits in a current transmission.
3.1.3 Removal of Spatial Redundancy
Motion compensation reduces the temporal redundancy of the video signal, but there still remains spatial redundancy in the motion compensated residual. In the special case where no motion compensated processing is performed, the only opportunity to reduce the data rate is to encode the original picture as a series of I-pictures. The process is converted 8 x 8 block of spatial intensity into 8 x 8 array of coefficients relating to the spatial frequency content of the original intensity information[6]. The discrete cosine transform (DCT)[7] compacts most of the energy of the residual into only a small fraction of the transform coefficients. The coding and transmission of only these high-energy coefficients can result in the reconstruction of high quality video. The transform process does not reduce the number of bits required to represent the image. However, the transform coefficient lend themselves to further processing, such as quantizing the coefficient or run length coding of the "0s" that are suitable for bit rate reduction[1], as discussed in the following sectors
3.1.4 Quantization
The processing discussed up to this point has created a representation in the form of motion vectors and spatial frequency coefficients of the original luminance and chrominance components of the pictures. However, no compression is achieved. At this point, quantization, which is a process of dividing the coefficient by a value of N (where N>1), and rounding the answer to the nearest integer value, is performed to scale the values and thus reducing the actual bit rate required.
MPEG-2 standard allows for changing the quantization values for each block when required for coding of complex pictures. MPEG-2 syntax allows the quantization matrices to be specified for every picture for improved coding efficiency, and the quantization matrices can be adjusted to help match the distribution of the data to the channel data rate.
3.1.5 Variable length Coding (VLC) and Codeword Assignment
Quantization creates an efficient discrete representation of the data to be transmitted. The quantized values can be represented using uniform or fixed-length code-words. Every quantized value will then be represented by the same number of bits. Greater efficiency, in terms of bit rate, can be achieved by employing what is known as entropy coding . From information theory, the entropy is the theoretically minimum possible average bit rate required in coding message. One optimal codeword design method is Huffman Coding (method of variable length coding). In Huffman coding, a code book is generated which minimize the entropy, subject to the codeword constraints of integer lengths and unique decodability.
Both quantization and entropy coding can be applied in varying degrees. The result is that a slow motion scene, as for example in videoconference, may support relatively high levels of quantization and entropy coding. This translates into a higher level of compression.
Motion compensation, adaptive quantization, and variable length coding picture produce highly variable amounts of compressed video data as a function of time. A buffer is used to regulate the variable input bit rate into a fixed output bit rate for transmission.
3.2 MPEG-2 Levels and Profile:
The MPEG-2 syntax address a variety of performance grades. These different grades provide for different levels of permanence and complexity which are described in Table 2 of Profiles and Levels.
|
Profiles |
||||||||||
|
Levels |
Simple I-pictures P-pictures B-pictures 4:2:0 Non Scalable |
Main I-pictures P-pictures B-pictures 4:2:0 Non Scalable |
SNR Scalable I-pictures P-pictures B-pictures 4:2:0 |
Spatially Scalable I-pictures P-pictures B-pictures 4:2:0 |
High I-pictures P-pictures B-pictures 4:2:0/4:2:2 SNR Scalable |
|||||
|
High £ 1920 pels £ 1152 lines |
£ 80 Mbits/sec |
£ 100 Mbits/sec |
||||||||
|
High-1440 £ 1440 pels £ 1152 lines |
£ 60 Mbits/sec |
£ 60 Mbits/sec |
£ 80 Mbits/sec |
|||||||
|
Main £ 720 pels £ 576 lines |
£ 15 Mbits/sec |
£ 15 Mbits/sec |
£ 15 Mbits/sec |
£ 20 Mbits/sec |
||||||
|
Low £ 352 pels £ 288 lines |
£ 4 Mbits/sec |
£ 4 Mbits/sec |
||||||||
Table 2[8]. MPEG-2 System (Levels and Profile)
3.3 Video Bit Stream Format
The final step is to form the compressed video bit stream into packets. The video bit stream is preceded by a sequence header, information describing the format of the sequence. The Video Structure Hierarchy is shown in Figure 4a for a ratio 4:2:2 (CCIR rec. 601), where can see, how is formed video bit stream format 4b.

Figure 4a. Video Structure Hierarchy MPEG-2 (* Optional)

Figure 4b. Video Bit Stream Format (*Optional).
The sequence header extensions provide profile and level information, source video and scanning format, source chroma resolution and color information . The functions of the header are described in the Table 3.
|
Field |
Function |
|
SHC: |
32 bits Sequence Header Code (0000 01B3) |
|
HSV: |
12 bits Horizontal Size Value. |
|
VSV: |
12 bits Vertical Size Value. |
|
ARI: |
4 bits Aspect Radio Information 0 = Forbidden, 1 = Square pixels 2 = 4:3 display, 3 = 16:9 display. 4F = Reserved. |
|
FRC: |
4 bits Frame Rate Code 0 = Forbidden, 1 = 23.976 (24/1.001) 2 = 24, 3 = 25, 4 = 29.97 (30/1.001) 5 = 30, 6 = 50, 7 = 59.94 (60/1.001) 8 = 60, 9-F = Reserved. |
|
BRV: |
18-bits Bit Rate Value. |
|
MB: |
1-bit Marker Bit. |
|
VBS: |
10-bit VBV Buffer Size. |
|
CPF: |
1-bit Constrained Parameter Flag =0 (Used MPEG-1) |
|
LIQM: |
1-bit Load Intra Quantizer Flag. 1 = IQM Follow. 0 = No Change in Value |
|
IQM: |
512-bits Intra Quantizer Matrix: a list of 64, 8-bit unsigned integers. |
|
LNIQM: |
1-bit Load Non Intra Quantizer Flag 1 = NIQM Follow. 0 = No Change in Value. |
|
NIQM: |
512-bits Non Intra Quantizer Matrix: a list of 64, 8-bit unsigned integers. |
|
GOP: |
Provides time code information and B-picture flags. |
|
P. H.: |
Picture Header, describe the type of picture (I-picture, P-picture, and B-picture). |
|
E.O.S.: |
Sequence End Code (0000 01B7), terminates the video sequence |
Table 3. Video Bit Stream Sequence Header[1]
3.4 Advantages of MPEG-2 over MPEG-1
MPEG-2 is more efficient to code interlaced video signal, such as those originate from electronic cameras. MPEG-1 was done only for progressive video pictures (i.e. , noninterlaced).
Although MPEG-2 encoders may be focused on interlaced video coding, they often possess more mature and powerful coding methods that can also be applied to create better MPEG-1 bitstream. MPEG-2 encoders are at least 50% more powerful than an MPEG-1 encoder capable of processing the same sample rate.
4.0 MPEG-2 AUDIO
MPEG is developing the MPEG-2 Audio Standard for low bitrate coding of Multichannel audio. MPEG-2 Audio coding will supply up to five full bandwidth channels (left, right, center and two surround channels), plus an additional Low Frequency Enhancement channel (Subwoofer Channel), and/or up to seven commentary/multilingual channels. The MPEG-2 Audio Standard will also extend the stereo and mono coding of the MPEG-1 Audio Standard (ISO/IEC IS 1172-3) to half sampling rate (16 kHz, 22.05 kHz, and 24 kHz), for improved quality for bitrates at or below 64 Kbits/sec, per channel.
MPEG-2 audio attempts to maintain as much compatibility with MPEG-1 audio syntax as possible while adding discrete surround-sound channels to the original MPEG-1 limit of 2 channels (Left, Right or matrix center and difference). MPEG-2 can add a total of 5.1 channels (Multichannel Extension) that consist of two main channels (L, R), two side/rear [SL (Surround Left), SR (Surround Right)], center, and 20-120 Hz (Low Frequency Enhancement) special effects channel (hence ".1" in "5.1").
4.1 Audio Encoding Basics
Two mechanisms are available for reducing the bit rate of sound signals. The first utilizes statistical correlation to remove redundancy from signal stream. The second uses the psychoacoustical characteristics of the human hearing system such as spectral and temporal masking to reduce the number of bits required to recreate the original sounds.
4.2 Multichannel Extension Format
The ISO/IEC MPEG-2 audio service (ISO/IEC 13818-3) is a backward-compatible Multichannel audio channel extension of the MPEG-1 stereo audio service. The MPEG-2 audio service is capable of providing 5.1 audio channels while ensuring that the existing two-channel decoders will still be able to decode compatible stereo information. The process for handling the additional Multichannel data that provide backward compatibility are show in Figure 5 and 6.

Figure 5. MPEG Multichannel Extension.
The variable length of the MPEG-1 format in Ancillary Data (see Figure 6.) implies that it can be used either to carry the C, Ls, Rs and LFE information or that it can be used to carry a second stereo program (L2, R2).

Figure 6. Audio Bit Stream Format.
4.3 Low Sampling RatesMPEG-2 defined three new sampling frequencies, half those used MPEG-1, that are closer to Nyquist rate and that allow for improved coding schemes.
Reduced sampling rates offers the following advantages (at the price of the a reduced bandwidth):
The discrimination between MPEG-1 and MPEG-2 depends on the value of one bit in the header called ID bit or identification bit :
|
MPEG-1 Audio |
ID = 1 |
|
MPEG-2 Audio |
ID = 0. |
4.4 Dolby AC-3
The AC-3 system use a hybrid backward/forward adaptive core bit allocation routine. The core routine is relatively simple and based on a specific psychoacustic model.
The AC-3 syntax forms a 16 bits sync word and an 8 bit word which indicates the sampling rate and frame size (SI), bit stream info (BSI), the six transform coded audio blocks (32 ms of audio), and a 16 bit CRC error check code into an AC-3 sync frame. The BSI contains information about the number of channels coded, dialog level, language code, and information on associated services. A 5 bit field in the BSI indicates the level of average spoken dialog within the encoded audio program relative to the level of a full scale 1 kHz sine wave [1].
The first application of AC-3 was digital optical soundtrack in 35 mm cinema, in addition to analog Dolby Stereo, which matrix-process four channels onto two optical soundtracks (4:2:4 Multichannel system). The coder will also be a candidate in the upcoming ISO/MPEG evaluations of non compatible MPEG-2 (1318-7) coding algorithms.
4.5 MPEG-2 Advanced Audio Coding (AAC)
The MPEG-2 AAC standard is a new, state of art audio standard that provides very high audio quality at a rate of 64 Kb/s/channel for Multichannel operation. It provides capability of up to 48 main audio channels, 16 low frequency effects channels, 16 overdub/multilingual channels, and 16 data streams. There are three profiles for AAC standard, called Main Profile, Low Complexity Profile, and Scalable Sampling Rate (SSR) Profile. The Main Profile is intended for use when processing, and especially memory. The Low Complexity Profile is intended for use when cycles and memory use are constrained, and the SSR Profile when a Scalable decoder is required. The Main and the Low Complexity Profiles have been tested at 320 Kb/s for 5 channel audio Programmes, and both have desmonstrated better quality than competing audio coding algorithms running at 640 Kb/s (MPEG-2 BC) for 5 channel program.
AAC is a state of art audio compression algorithm that provides compression superior to that provided by older algorithms such as AC-3. AAC and AC-3 are both transform coders, but AAC uses a filterbank with a finer frequency resolution that enables superior signal compression. AAC also uses a number of new tools such as temporal noise shaping, backward adaptative linear prediction, joint stereo coding techniques and Huffman coding of quantized components. AAC is much more flexible than AC-3, in that AAC supports a wide range of sampling rates and bitrates, forms one to 48 audio channels, up to 15 low frequency enhancement channels, multi-language capability and up to 15 embedded data streams.
5.0 MPEG-2 SYSTEM
MPEG-2 System is an ISO/IEC standard (13818-1) that defines the syntax and semantics of bitstreams in which digital audio, visual data, and data are multiplexed. This is specified in two forms: Program Stream and Transport Stream. Each is optimized for a different set of applications.
The Program Stream is designed for use in relatively error free environments and suitable for application which may involve software processing. The reason is firstly, the program stream comprises a succession of relatively long variable length packets. Each packet begins with a packer header. An error occur in the packet header would causes the loss of entire packet (a stream packet may contain several kilobytes of data and this can represent the loss and corruption of an entire video frame). Secondly, the variability of packet length means that a decoder cannot predict where one packet will finish and a new will start. Program stream packets may be of variable and relatively long length.
The Transport Stream, was devised for multi-programme applications such as broadcasting (DVB, and ISDB) and the single transmission can accommodate many independent programs. It was also designed for use in environments where error are likely such as transmission in noisy media. Transport streams packets are 188 bytes long. The Transport Stream is also used in ATM (section 6.4).
5.1 MPEG-2 Transport Structures
The transport is constructed in a series of steps. First, the raw digital information is formatted into elementary bit streams. Second, the elementary bit streams are formed into packets with descriptive headers. Third, the packetized data is then multiplexed into a program transport. Fourth, multiple program transport streams may then be multiplexed to form a system level multiplex transport stream.
5.2 Multiplexing Video, Audio and Data
The signal ensemble for transmission is formed by multiplexing the several component video, audio and data bit streams that make up the service. The two multiplexing schemes are motivated by different application requirements. In Figure 7, digitized video, audio and data streams, termed elementary bit streams, are first formed into variable-length packet elementary streams or PES packets.

Figure 7. The MPEG-2 Multiplexing concept.
In the program stream approach, packets from various elementary bit streams are multiplexed by transmitting the bits for the complete packets in sequence, resulting in a sequence of variable-length packets (PES packet Video, PES packet Audio and PES packet Data) in the channel (see Figure 8). In the transport stream approach, PES packets including the PES headers form various elementary bit streams are carried as a payload with fixed-length transport packet (see Figure 8). Each PES packet for a particular elementary bit stream (Video, Audio, and Data) would then occupy a variable number of transport packets.

Figure 8. Program Stream and Transport Stream Approaches.
5.3 Elementary Bit Stream
The Video, Audio and Data are presented to the transport as elementary bit streams. The format of elementary bit streams is defined for the MPEG-2 system with each elementary stream consisting of 5 byte fixed-length component and a variable elementary-stream-descriptor [video elementary stream (see, section 3.3 Video Bit Stream Format), audio elementary stream component (see section 4.2 Audio Bit Stream Format) or data stream]. For this example the elementary bit stream shown in the Figure 9 is for a video format. The relation used is the same that in the example of the figure 4a (4:2:2).
Figure 9. Elementary Bit Stream Format (*Optional).
The elementary bit stream description is showing in the table 4.
|
Field |
Function/Usage |
|
Stream Type |
Indicates the application being considered in this elementary stream. |
|
Elementary PID |
Indicates the PID of the transport bit stream containing the elementary bit stream. |
|
ES Info-length |
Indicates the length of a variable-length Elementary Stream Descriptor. |
Table 4. Elementary Stream Description.
5.4 Packet Elementary Streams (PES)
Prior to entering the transport layer, elementary bit stream must be transformed into PES packets. A PES packet begins with a PES Packet Start Code, the unique elementary stream Stream ID, and the PES Packet Length. A PES packet carrying various types of elementary streams (Video, Audio or Data) can be multiplexed to form a Program Stream or Transport Stream. The PES Packet Format is showing in the Figure 10.

Figure 10 [1]. PES Packet Format.
The identification information is followed by a description of the PES header contents including PES header flags, PES packet length, header fields and a data block payload. The payload is a stream of contiguous bytes from a single elementary stream (video, audio or data ). The flags are used in MPEG system to conserve data space, when a flag is set to "0", the corresponding flag is not present, if the PES header ESCR flag is "0", the ESCR field is no present, saving 42 bits [1]. The description of the fields is show in the Table 5,6.
|
Field |
Function |
|
Packet Start Code Prefix |
Indicates the start of a new packet. Takes the value 0x00 0001. |
|
Stream ID |
Specified the type and number of the stream: 1011 1100 Reserved Stream. 1011 1101 Private Stream 1. 1011 1110 Padding Stream. 1011 1111 Private Stream 2. 110x xxxx MPEG Audio Stream (number xxxxx) 1110 xxxx MPEG Video Stream (number xxxx). 1111 xxxx Data Stream (number xxxx). |
|
PES Packet Length |
Specifies the number of bytes of the PES packet (maximum of 64 kBytes). |
|
PES Header Flag (Each Flag = 1 Bit, Unless Otherwise Noted) |
|
|
PESS (PES Scrambling Control) |
2 bits: Indicates the scrambling of the PES packet received (00), not scrambled (01-11). |
|
PESP (PES Priority) |
Indicates priority ( High = 1, No priority = 0). |
|
DAI (Data Alignment Indic.) |
Indicates the alignment (aligned =1, no alig. =0). |
|
CY (Copyright) |
1 = Copyright, 0 = not Copyright. |
|
OOC (Original or Copy) |
1 = Original, 0 = Copy. |
|
PTSDTSF (PTS DTS Flags) |
2 bits: 00 = Not present , 01 = PTS present 11 = Both present (PTS and DTS). |
|
ESCRF (ESCR Flag) |
Indicate presence of ES Clock. |
|
ESRFES (ES Rate Flag) |
Indicate presence of ES Rate field. |
|
DSMTMF (DSM Trick Mode Flag) |
Indicate presence of an 8 field describing the Digital Storage Media (DSM). |
|
ACIF(Addi.l Copy Info Flag) |
Indicate presence of ACIF. |
|
PESCRCF (PES CRC Flag) |
Indicate presence of the CRC field. |
|
PESEXT (PES Ext. Flag) |
Indicate presence of PESEXT (Private Data). |
|
PES Header Optional Fields |
|
|
PTS (Presentation Time Stamp) |
PTS informs the decoder of the intended time of presentation of the presentation unit. |
|
DTS (Decoding Time Stamp) |
DTS informs the decoder of the intended time of decoding of an access unit. |
|
DSM Trick Mode |
The field is portioned as follows: Trick Mode Control (3 bits),Field ID (2 bits) Intra Slice Refresh (1 bit)Freq. Trucant.(2 bits). |
|
Trick Mode Control |
DSM mode: Fast Forward (000), Slow Motion (001), Fast Reverse (011), Freeze Frame (010), Reserved (1xx). |
|
Field ID |
Display mode: Field 1 (00), Field 2 (01), Complete Frame (10), Reserved (11). |
|
Intra Slice Refresh |
This field indicates that each picture is composed of intraslices. |
|
Frequency Trucantion |
Coefficient of the DSM: Only DC coefficients are sent (00), the first three coefficients in scan order (01) and the first six (10). |
|
Field Rep Control |
Indicates how many times the decoder should repeat field 1 as both the top and bottom fields alternatively. |
Table 5. Description of the PES Packet Format.
|
PES Extension Flag |
|
|
Field |
Function |
|
PES Private Data Flag |
Indicates whether the PES packet contains private data. |
|
Program Private Sequence Counter Flag |
Indicates whether an MPEG-1 system packet header or an MPEG-2 program stream packet header is present. |
|
STD Buffer Flag |
Indicates whether the STD buffer scale and STD buffer size flags encoded. |
|
PES Extension Field Flag |
Indicates the presence of additional data in the PES Header. |
Table 6. Description of the PES Packet Format (PES extension flags).
The start of the PES -packet payload (Data Bytes) is not required to be aligned with the start of the Bit Stream (Video, Audio or Data), see Figure 11. Thus a new elementary stream unit may start at any point in the payload of a PES packet. The PES packet may be of variable length subject to a maximum length of 64 kBytes.

Figure 11. Conversion of an elementary stream into a PES.
5.5 Program Stream (Structure)
In a program stream. PES-packets derived form the contributing streams are organized into Packs (see Figure 12). A pack comprises a Pack Header, an optional System Header and any number of PES-packets taken from any of the contributing elementary streams in any order [9]. The length of the pack is variable and depends on the rate of transmission, because a pack header must occur at least every 0.7 seconds. The system header contains a summary of the characteristics of the program stream such as its maximum data rate, the number of contributing video and audio elementary streams and other information.
Program Stream were designed for relatively error free media such as CD-ROMs. MPEG-1 is a program stream based system. MPEG-1 was developed giving consideration to the needs of recording media, applications such as DSM (Digital Store Media).

Figure 12. Structure of the MPEG-2 Programme Stream .
5.6 Transport Stream (Structure)
A PES packet is organized into a one or various Transport Packet. A transport packet is always 188 bytes (1504 bits) long. The content of each transport packet is a 4-bytes Header followed by an Adaptation Header (used to fill the 188 bytes of the transport packet, if is necessary) and a Payload (184 Bytes without Adaptation Header). For example (see Figure 13), if the second video PES packet is 1 kByte, this is organized in various Transport Packets of 188 Bytes each one (1000 Bytes divided by 184 Bytes is 5.43, resulting in six (6) Transport Packets, and the excess space of the sixth packet (the last) is filled with the adaptation header).

Figure 13. Structure of the MPEG-2 Transport Stream Multiplexer.
The transport stream multiplex consists entirely of short, fixed length transport packets (see Figure 14 15 and 16.). The nature of the data being carried is identified by the Packet Header. The packet header structure is described by a fixed length link layer and a variable length adaptation layer, as discussed before.

Figure 14. Transport Packet Format (Header Format)

Figure 15. Transport Packet Format ( Adaptation Header Format)

Figure 16. Adaptation Header Format (PCR and OPCR structure).
The identification information is followed by a description of the Transport Packet Header contents (see Table 7), Adaptation Header Format (see Table 8.) including Optional Fields, and Payload.
|
Header Format |
|
|
Field |
Function |
|
Sync. Byte (Value: 47HEX) |
Packet Synchronization. (0 1 0 0 0 1 1 1) |
|
Transport Unit Error Indicator |
Indicates if packet is erroneous (0: no problem, 1: problem packet, payload is not used) |
|
Payload Unit Start Indicator |
Indicates presence of a PES packet header contain Program Specific Information (PSI) in the payload. The PES packet header always begings the payload of the packet. The starting byte of the PSI table in the packet is indicated using a pointer field; 0: no PES header present, 1: PES header present. |
|
Transport Priority |
Indicates priority at input to transmission channels which support prioritization (1: High, 0: Low). |
|
PID |
Packet Identifier for Multiplex / Demultiplex (13 Bits). The PID Field is used to distinguish transport packets containing data from one elementary stream from those carrying data from other elementary stream or control stream. |
|
Transport Scrambling Control |
2 Bits: Indicates the descrambling key to use for the packet (00: Not Scrambled; 01: Reserved; 10: "Even" Key; 11: "Odd" Key.).MPEG-2 definition allows scrambling at two levels, within in PES packet structure and at the transport layer. |
|
Adaptation Field Control |
2 Bits: Indicates if an adaptation field follow [00: Reserved; 01: Payload only (184 Bytes);10: Adaptation Field only; 11: Adaptation Field followed by Payload.]. |
|
Continuity Counter |
Increments by one for each packet within the same PID and Transport Priority. |
|
Table 7. Description of Header Transport Packet. |
|
|
Adaptation Header Format |
|
|
Field |
Function |
|
Discontinuity Indicator |
Indicates there is a discontinuity in the PCR values that will be received from this packet onward. This occurs when bit streams are spliced. This flag should be used at the receiver to change the phase of the local clock. |
|
Random Access Indicator |
Indicates that the packet contains data that can serve as a random access point into the bit stream. |
|
Elementary Stream Priority Indicator |
Logical indication of priority of the data transmitted in the packet. |
|
Program Clock Reference (PCR) |
The phase of the local decoder clock is compared to the PCR value in the bit stream to determine whether the decoding process is synchronized. The PCR field can be modified during the transmission process. |
|
Original Program Clock Reference (OPCR) |
An OPCR provides the same reference information for the recording and playback of a single program and isn’t modified during the transmission process. |
|
Splice Countdown |
Indicates the number of packets in the bit stream with the same PID as the current packets until a splicing point packet. |
|
Transport Private Data |
Is used to carry private information data packets in the stream. |
|
Adaptation Fld. Ext. Length. |
Indicates the length of the Adaptation field Ext. |
|
Table 8. Description of Header Transport Packet. |
|
5.7 Programme Specific Information (PSI)
Programme Specific Information is an additional information within the transport stream to explicitly state the relationship between the available Programmes and PID values of their component elementary streams. The decoder can determine which elementary streams belong to each Programmes with this information. The programme specific information is comprised by four types of table, each of these tables may be carried as the payload of one or more transport packets. These table are:
Program Map Table (PMT): This table transmits the relationship among the elementary streams that constitute a program, its attributes, and the PID of the packet in which the program is sent. Each programme has a Program Map Table.
Network Information Table (NIT): This table transmits the information about the transmission channel in which the program is sent.
Program Association Table (PAT): This table, is transmitted as payload of the Transport Packet Containing (when the PID = 0 in the header transport packet), which lists the PIDs where the PMTs and NITs can be found. This table describes how programs number associated with program services map onto the bit stream contained in the Program Map Table (each programme is listed along with the PID value of the transport packet that contains its Programme Map Table).
Conditional Access Table: If any elementary streams within a transport stream are scrambled, then a Conditional Access Table must be present. The table provides details of the scrambling system or systems in use and provides the PID values of transport packets that contain the conditional access management and entitlement information. The format of this information depends on the type of scrambling system employed.
6.0 Interoperability and News Services
The introduction of digital technology in imaging applications together with the widespread implementation of digital communications have created an opportunity to establish a universal digital image , sound coding, and data standard for a wide range of applications.
6.1 Digital Video Broadcasting (DVB)
The European Digital Video Broadcasting (DVB) project paid particular attention to constructing a digital architecture that could accommodate Satellite Media Applications. The DVB is intended to provide DTH multi-programme TV services in the BSS (Broadcast Satellite Service) and FSS (Fixed Satellite Services) bands and is addressed to consumer IRD (Integrated Receiver Decoder). The DVB Standard, is supported by the MPEG-2 ISO/IEC standard (13818-1), and will be discuss in the next IOM.
6.2 Integrated Services Digital Broadcasting (ISDB)
The Integrated Services Digital Broadcasting (ISDB) is a total digital broadcasting system for the production, scheduling, transmission and reception of programs. The ISDB system is capable of providing not only existing TV and Radio services, but also includes data and multimedia information services. Data Broadcasting services using the data channel of satellite broadcasting is being studied. A multimedia information service named PRESENT is a broadcasting services that allow users to view multimedia information , such as news, weather and stock market information, anytime interactively. Studies of a video coding scheme suitable for ISDB have been based on MPEG-2.
6.3 Digital Audio Visual Council (DAVIC)
The Digital Audio Visual Council (DAVIC) is a non-profit Association registered in Geneva. Its purpose is to advance the success of emerging digital audio visual applications and services, initially of the broadcast and interactive type, by timely availability of internationally agreed specifications of open interfaces and protocols that maximize Interoperability across countries and applications or services. The goals of DAVIC are to identify, select, argument, developed and obtain the endorsement by formal standards bodies of specifications of interfaces, protocols and architectures of digital audio visual applications and services [10]. DAVIC is progressing towards agreements for the implementation of interactive services using MPEG-2 standard.
6.4 Harmonization with ATM
To ensure that a harmonized solution to the widest range of applications is achieved, MPEG, an ISO/IEC working group designated JTC1/S29/WG11, is working jointly with the ITU-TS Study Group 15 Experts Group for ATM Video Coding. The MPEG-2 system packet multiplex structure were specifically tailored to the needs of broadcasting television services with consideration of compatibility with ATM structure.
The MPEG-2 system has a 188 bytes packet format while ATM has a 53 byte/cell format at the AAL-5 level. Two MPEG-2 transport packets contain 376 bytes of information. The 376 bytes can be transported within 8 ATM cells with 8 bytes of free data space left available for other use. This 8 bytes trailer could be used to transport application dependent information and actual payload length information, and provide additional error protection such as 32 bit CRC. The sum total is 384 bytes, which, is equal to payload of 8 ATM cells (48 bytes per cell).
7.0 Conclusions
The MPEG-2 standard has been designed as an efficient container for the video, audio and data services, that will be carried from the source to the consumers. This standard can be used as transport of any kind of broadcast services (Video, Audio or Data Broadcasting).
This document focused on describing the MPEG-2 System standard. The next activity is to study the new applications and standards, that are being built on top of the MPEG-2 transport layer such as DVB, DAVIC, DSM-CC, ISDB and others. The potential advantages in using MPEG-2 are enormous, as indicated through on the document. Specially MPEG-2 easily addresses the integration of all digital broadcasting services. However , ATM appears also to be an efficient mechanism for delivering video, sound and data services over the types of switched network envisioned for accommodating the requirements of the future global information infrastructure. Therefore the compatibility between MPEG-2 and ATM are key issues to follow up with additional studies.
References
[1] S. N. Baron, M. I. Krivocheev "Digital Image and Audio Communications", Van Nostrand Reinhold 1996.
[2] J. L. Mitchell, W. B. Pennebaker, C. E. Fogg, D. J. LeGall "MPEG Video Compression Standard", Chapman & Hall 1997.
[3] ISO/IEC IS 13818-X, International Standard, MPEG Standard, 1994.
[4] P. N. Tudor "MPEG-2 Video Compression Tutorial", BBC (R&D) 1995.
[5] R. Schafer, T. Sikora "Digital Video Coding Standards and their Role in Video Communications", Proceedings of IEEE, Vol. 83, No. 6, June 1995.
[6] ITU-R Document 11-3/15 "MPEG Digital Compression Systems", 9 August 1994.
[7] IEEE Standard 1180-1990 "Discrete Cosine Transform", Dec. 1990.
[8] P. McGarvey, F. Pearce "The Digital Broadcasting Revolution", Financial Times, Jan 1997.
[9] P. A. Sarginson "MPEG-2: A Tutorial Introduction to the System Layer", IEEE 1995.
[10] ISO/IEC JTC1/SC29/WG11 "DSM-CC FAQ Version 1.0", Feb 1997.
[11] DAVIC 1.0 Specification Part 1 "Decryption of DAVIC Funtionalities", DAVIC 1995.