 Sections
| |
Standards for Mobile Video Telephony
Standards for Mobile Video Telephony
3GPP
has specified standards for mobile video telephony, taking into account
the nature of the mobile network channel. In fact, as introduced in the
previous section,
two different types of channels can enable mobile video telephony
applications: circuit-switched and packet-switched channels. Following
this dual approach, 3GPP has defined two different sets of standard
specifications:
-
Specifications for CS mobile video telephony are
based on the ITU-T H.324 standards for video telephony terminals over
circuit-switched channels. H.324-based terminals also can be
implemented over GSM-based CS channels (HSCSD, ECSD).
-
Specifications for PS mobile video telephony are
based on the IETF SIP standard for video telephony over packet-switched
channels.
Figure 21.2 summarizes the mapping between the mobile network channels and the standards for mobile video telephony defined in 3GPP.
A more-detailed description of the standards for mobile
video telephony for CS and PS networks is given in the following
sections.
21.4.1 Circuit-Switched Mobile Video Telephony
H.324 terminals for 3GPP circuit-switched mobile video telephony are essentially ITU-T. H.324 terminals with Annex C [6] and with modifications specified by 3GPP [7] since Release '99. In 3GPP, these are called 3G-324M terminals.
The system architecture of a 3G-324M terminal is depicted in Figure 21.3. [8] The mandatory elements of this architecture are a wireless interface, the H.223 multiplexer with Annex A and B, [9] and the H.245 system control protocol (version 3 or successive). [10] 3G-324M terminals are specified to work at bit rates of at least 32 kbps.
We will give an overview of the basic building blocks
of a 3G-324M terminal, considering also some implementation guidelines,
as described in 3GPP TSGS-SA. [11] The reader interested in the differences between H.324 and 3G-324M terminals can find more information in References [12] and [13]. Here we will not emphasize these differences.
21.4.1.1 Media Elements
3G-324M terminals can support a wide set of
media. They can be either continuous media (speech and video) or
discrete media (real-time text). Among the former set, the following
codecs can be supported in a mobile terminal:
-
AMR (Adaptive MultiRate) narrowband is the mandatory speech codec for 3G-324M terminals, [14]
if speech is supported. Speech is encoded at 8 kHz sampling frequency
and at eight different bit rates ranging from 4.75 to 12.20 kbps.
-
G.723.1 is the recommended speech codec supported. [15]
It encodes speech at two bit rates, 5.3 and 6.3 kbps. The G.723.1 codec
is needed if inter-operation against GSTN (General Switched Telephone
Networks) is a requirement. [16]
-
H.263 video Profile 0 Level 10 is the mandatory codec, if video is supported. [17]
-
MPEG-4 Visual is an optional codec that can be supported at Simple Profile Level 0. [18]
-
H.261 is another optional video codec [19] that can be supported by 3G-324M terminals.
The discrete media defined in 3GPP specifications of
circuit-switched video telephony terminals are in the framework of the
optional user data application:
-
T.120 [20]
is a protocol that allows multipoint data conferencing for transfer of
data, images, and sharing of whiteboard and applications.
-
T.140 [21]
is a protocol that allows real-time text conversation between two
3G-324M terminals. Text sessions can be opened in a stand-alone fashion
or simultaneously with speech, video, and other data applications.
Further information about this capability is available in Reference. [22].
21.4.1.2 System Control and Multiplexing
In this section a general description of the system control and the multiplexing is given. Figure 21.4 shows a more detailed view of the 3G-324M protocol stack.
The control protocol H.245 [23]
provides end-to-end signaling for proper operation of a 3G-324M
terminal, capability exchange, and messages to open and fully describe
the content of logical channels. Most of the control signaling occurs
at the beginning and at the end of the terminal call. The needed
bandwidth for H.245 signaling is always allocated on-demand by the
H.223 multiplexer. [24] This ensures that most of channel bandwidth is effectively used by the media.
H.324 Annex C [25]
introduces also the Control Channel Segmentation and Reassembly Layer
(CCSRL), which is used to split large control channel packets. The
segmentation is required because successful transmission of large
packets at high error rates may be difficult, and the connection set up
may even fail without CCSRL.
Control messages can make use of retransmission for
providing guaranteed delivery. H.324 uses the (Numbered) Simple
Retransmission Protocol, or (N)SRP, [26] for this functionality.
The multiplex protocol H.223 [27] multiplexes audio, video, data, and control streams into a single bit stream, and demultiplexes the received bit stream into separate
bit streams. H.223 should support at least 32-kbps speed toward the
wireless interface. However, also lower bit rates are possible,
especially over GSH-based channels (HSCSD, ECSD). The multiplexer
consists of an adaptation layer (AL) that exchanges information between
the higher layers (i.e., audio/video codecs and system control), and a
lower layer called the multiplex layer (MUX) that is responsible for
transferring information received from the AL to the eventual mobile
multilink layer and the physical layer(s). The AL handles the
appropriate error detection and correction, sequence numbering, and
retransmission procedures for each information stream. Three different
ALs are specified in the H.223 Recommendation, each targeted to a
different type of data:
-
The AL1 adaptation layer is designed primarily
for transfer of data or control information, which is relatively delay
insensitive but requires full error correction. However, AL1 does not
provide any error control or retransmission procedure, but it relies on
higher layers (i.e., (N)SRP) for this functionality. AL1 works in
framed (AL1F) and unframed (AL1U) mode. The former is used for transfer
of control data, while the latter is used for user data transfer, such
as chat-data or other T.120- or T.140-enabled applications.
-
The AL2 adaptation layer is intended primarily
for digital audio, which is delay sensitive, but may be able to accept
occasional errors with only minor degradation of performance. AL2
receives data from its higher layer (i.e., an audio codec) and
transfers it to the MUX layer after adding an 8-bit CRC (Cyclic
Redundancy Check) and optional 8-bit sequence numbers which can be used
to detect missing or misdelivered data.
-
The AL3 adaptation layer is designed for the
transfer of digital video. It appends a 16-bit CRC to the data received
from its higher layer (i.e., a video encoder), and it passes
information to the MUX layer. AL3 includes optional provision for
retransmission and sequence numbering by means of an 8- or 16-bit
control field. 3GPP recommends encapsulating one MPEG-4 video packet
into an AL3-SDU (Service Data Unit). To avoid additional delays caused
by possible retransmissions, video data can be transferred using the
AL2 that uses a smaller packet overhead and does not allow
retransmission procedures. [28]
The MUX layer is responsible for mixing the various
logical channels from the sending ALs (e.g., data, audio, video, and
control) into a single bit stream to be forwarded to the physical layer
for transmission. All MUX layer packets are delimited using HDLC flags,
and include an 8-bit header, which contains, among other data, a 3-bit
CRC for error detection. The variable-length information field of each
MUX packet can contain 0 or more octets from multiple (segmentable)
logical channels. To guarantee error resilience and a low delay, MUX
packets are recommended to be between 100 and 200 bytes (for speech
data, this means to encapsulate 1 to 3 speech frames into a MUX
packet). [29]
To provide higher error resilience for data
transmission over mobile networks, four different H.223 multiplexer
levels are defined, [30] offering progressively increasing error
robustness at the cost of progressively increasing overhead and
complexity. The different levels are based on a different multiplexer
packet structure:
-
H.223 Level 0 describes the basic functionality
as defined in Recommendation H.223. All 3G-324M terminals should be
able to interwork using this level.
-
H.223 Level 1 is described in Annex A of
Recommendation H.223. The HDLC flag used to delimit multiplex packets
in the MUX layer of H.223 is replaced with a longer flag, and HDLC
zero-bit insertion (bit stuffing) is not used.
-
H.223 Level 2 is described in Annex B of
Recommendation H.223. In addition to the features of H.223 Level 1, a
24-bit (optionally also 32-bit) header describing the multiplexer
packet is used. The header includes error protection (using Extended
Golay Codes) and packet length fields.
-
H.223 Level 3 is described in Annexes C and D of
Recommendation H.223. The level includes the features of H.223 Level 2.
Furthermore, additional error protection and other features are
provided to increase the protection of the payload. For instance, H.223
Level 3 define changes not only to the MUX layer, but also to the AL
layer, so that the various ALs in Figure 21.4 are replaced with more robust ones that make use of Reed-Solomon codes.
Two 3G-324M terminals establish a connection at the
highest level supported by both terminals. This ensures the
interoperability also with GSTN H.324 terminals. A dynamic level change
procedure can be used to adjust error resilience when channel
conditions vary during a connection. The levels can be used
independently in receiving and transmission directions.
The optional Mobile Multilink Layer (MML) [31]
usage has been introduced in Release 4 of 3GPP 3G-324M specifications.
It allows the data transfer along up to eight independent physical
connections, which provide the same transmission rate, in order to
yield a higher aggregate bit rate. The MML provides the split
functionality toward the lower protocol stack layers (HSCSD, ECSD, or
CS UTRAN mobile networks) and the aggregation functionality toward the
upper protocol stack layers.
Call setup issues in circuit-switched networks and
capability for HTTP content downloading of 3G-324M terminals are not
addressed here. The interested reader can find additional details
respectively in Curcio and coworkers [32] and Annex I of ITU-T Recommendation H.324. [33]
21.4.2 Packet-Switched Mobile Video Telephony
Mobile video telephony applications have been
included in the framework of packet-switched conversational multimedia
applications of 3GPP Release 5 specifications. A conversational
multimedia application is any application that requires very low delays
and error rates. For instance, a Voice over IP (VoIP) application or a
one- or two-way multimedia application with the mentioned quality
requirements belongs to this category.
Release 5 3GPP specifications for video telephony are
tightly connected to the 3GPP network specification. In fact, the call
control mechanism in the IP Multimedia Subsystem (IMS) of 3GPP Network
Release 5 is based on the SIP protocol defined by IETF. This is the
same protocol used for the control plane of mobile videophones, defined
in the framework of packet-switched conversational multimedia
applications in 3GPP. Figure 21.5
shows the protocol stack for PS mobile videophones. In the next
sections, a brief description of the codecs and protocols depicted in Figure 21.5 will be given.
21.4.2.1 Media Elements
The codecs and payload formats used for mobile video telephony are described in the specification. [34]
Media either can be continuous (speech and video) or discrete
(real-time text). For interoperability issues, 3GPP has ensured that
the mandatory codecs for PS video telephony are the same codecs defined
for CS video telephony (3G-324M). However, different codecs than the
mandatory or recommended ones can be used, and these must be signaled
and negotiated through SIP/SDP.
The codecs for continuous media are:
-
AMR narrowband is the mandatory speech codec, [35] if speech is supported in PS videophones. AMR speech is packetized using the payload format described in Sjoberg et al. [36]
-
AMR wideband is the mandatory speech codec [37]
whenever wideband speech is supported in the terminal. AMR wideband
speech is packetized according to the payload format in Sjoberg et al. [38]
-
H.263 baseline is the mandatory codec when video is supported. [39] H.263 video is encapsulated following the payload format defined in Bormann et al. [40]
-
H.263 Version 2 Interactive and Streaming
Wireless Profile (Profile 3) Level 10 is an optional codec to be
supported by the terminals. [41]
It provides a better coding efficiency and error resilience in a mobile
environment, compared to the baseline H.263, because of the use of the
video codec Annexes I, J, K, and T. The packetization algorithm is the
same defined for the H.263 baseline. [42]
-
MPEG-4 Visual is an optional codec that can be supported at Simple Profile Level 0. [43] Encapsulation of MPEG-4 video is done according to the payload format defined in Kikuchi et al. [44]
Whenever static media are available in mobile
videophone terminals, T.140 is the real-time text conversation standard
to be optionally supported [45] for chat applications. Packetization of text data follows the formats defined in Hellstrom. [46]
The protocol used for the transport of packetized media data is the Real-Time Transport Protocol (RTP). [47]
RTP provides real-time delivery of media data, including
functionalities such as packet sequence numbers and time stamping. The
latter allows intermedia synchronization in the receiving terminal. RTP
runs on the top of UDP and IPv4/v6.
RTP comes with its control protocol (RTCP) that
allows QoS monitoring. Each endpoint receives and sends quality reports
to and from the other endpoint. The quality reports carry information
such as number of packets sent, number of bytes sent, fraction of
packets lost, number of packets lost, and packet interarrival jitter.
Further details about RTCP will be given in Section 21.5.
21.4.2.2 System Control
The Session Initiation Protocol (SIP) defined in IETF [48]
is an application layer control protocol for creating, modifying, and
terminating sessions with one or more participants. SIP performs the
logical bound between the media streams of two video telephony
terminals. As shown in Figure 21.5,
SIP can run on the top of TCP and UDP (other transport protocols also
are allowed). However, UDP is assumed to be the preferred transport
protocol in 3GPP IPv4- or IPv6-based networks. [49]
SIP makes use of the Session Description Protocol (SDP) [50]
to describe the session properties. Among the parameters used to
describe the session are IP addresses, ports, payload formats, types of
media (audio, video, etc.), media codecs (H.263, AMR, etc.), and
session bandwidth.
A simple IETF SIP signaling example between two video telephony terminals is presented in Figure 21.6.
A SIP call setup is essentially a three-way handshake
between caller and callee. For instance, the main legs are INVITE (to
initiate a call), 200/OK (to communicate a definitive successful
response) and ACK (to acknowledge the response). However,
implementations can make use of provisional responses, such as
100/TRYING and 180/RINGING when it is expected that a final response
will take more than 200 ms. 100/TRYING indicates that the next-hop
server has received the request and that some unspecified action is
being taken on behalf of this call (for example, a database query).
180/RINGING indicates that the callee is trying to alert the user.
After the call has been established, the actual media
transfer (speech and video) can take place. The release of the call is
made by means of the BYE method, and the successful call release is
communicated to the caller through a 200/OK message.
Quality of service of signaling is an important
issue when measuring the performance of terminals for mobile video
telephony. In Section 21.5 of this chapter we will clarify the concepts of Post Dialing Delay (T1), Answer-Signal Delay (T2), and Call Release Delay (T3) shown in Figure 21.6. The next section addresses SIP signaling in 3GPP networks.
21.4.2.3 Call Control Issues
SIP-based mobile applications based on IETF
signaling can be implemented in 3GPP Release '99 and 4 networks. In
this case, only the mobile applications resident in the mobile
terminals run the SIP protocol, while the network is not aware of it.
A further step has been made in 3GPP Release 5
specifications, where SIP has been selected to govern the core
call-control mechanism of the whole IP multimedia subsystem. Here, both
the network and the mobile terminal implement the SIP protocol and
exchange SIP messages for establishing and releasing calls. This choice
has been made to enable the transition toward all-IP mobile networks.
The SIP protocol in 3GPP Release 5 networks is more complex than the
IETF SIP, because of factors such as resource reservation or the
increased number of involved network elements. For a deeper
understanding of the call control in 3GPP networks, you are refer to
3GPP. [51], [52], [53] Here we will give an example of SIP signaling for call setup and release between a mobile terminal and a 3GPP network (see Figures 21.7 and 21.8). [54]
The mobile terminal (or UE, user equipment) initiates a
call toward the mobile originated (MO) network. The UE sends the first
INVITE (1) message to the P-CSCF (proxy-call session control function)
that works as a call router toward other network elements and the
destination mobile terminal. Before the 180/RINGING (19) message is
received by the UE, the messages (11–17) are exchanged mainly to allow
resource reservation in the network and PDP context activation between
the UE and the network. PRACK messages [55]
play the same role as ACK, but they apply to provisional responses
(such as 183/SESSION PROGRESS or 180/RINGING) that cease to be
retransmitted when PRACK is received (more details about reliability of
SIP messages are available in Section 21.5).
In a 3GPP network, the total number of SIP messages
exchanged by the UE for establishing a call is 12 (plus resource
reservation), while a simple IETF call setup requires 5 SIP messages.
Call release signaling is shown in Figure 21.8.
The number of SIP messages exchanged by the UE is 2 (the same number as
in IETF SIP call release), plus the required signaling to release the
PDP contexts resources. In this scenario, messages (2-3) can occur even
before BYE (1) and in parallel with procedure 4 (remove resource
reservation).
1228 times read
|
|
|
|
|
|
More Top News
Cisco Wireless Networking
Most Popular
Featured Author
|