End-to-End System Architecture
A mobile video telephony system is a real-time
system of the conversational type. It is real-time because the playback
of continuous media, such as audio and video, must occur in an
isochronous fashion. A video telephony application is different from a
streaming application because the former has the following properties:
-
Bidirectional data transfer: The media flow is
always carried from a source mobile videophone to a destination
videophone, and vice versa. In this perspective, the flow of data is
symmetric between the two end-points.
-
Real-time media encoding: Each videophone must
have encoding and decoding capabilities. Speech and video signals must
be encoded and transmitted in real-time to the other peer end. This
requirement implies that mobile devices need a higher processing power
because of the additional encoding capability (devices for mobile
streaming require only decoding capability). Real-time encoding must be
performed efficiently and with the shortest delays.
-
Delay sensitivity: Mobile video telephony systems
are real-time with conversational features. This implies that a high
level of interactivity between the two endpoints is a must to guarantee
that the system is usable for speech and video conversations. A
conversation can be held only if the end-to-end delays are very tight
and preferably constant. For instance, the characteristic of
conversationality and dialog interactivity between two parties would be
lost in the case of end-to-end delays larger than few hundred
milliseconds. This is the most-critical success factor for a mobile
video telephony service. In order to guarantee low end-to-end delays,
both network and mobile stations must be optimized for processing of
conversational traffic. A very important factor in mobile videophone
systems is error resilience: any mechanism for error detection and
correction/concealment must be run within the maximum delay budget
allowed. For this reason, retransmission algorithms at the network or
application level cannot normally be used, and forward error correction
(FEC) or error concealment algorithms are the only possible choice for
providing error resilience against bit errors (or packet losses)
produced by the air interface.
A mobile video telephony system consists mainly of two mobile videophones, used by the end users, and the mobile network. Figure 21.1
describes the high-level architecture of a typical mobile video
telephony system over an IP-based mobile network. We will follow an
end-to-end approach, analyzing the system in its different parts.
Mobile videophone A is connected to the mobile network
through a logical connection established between the network and the
mobile station addresses called Packet Data Protocol (PDP) context. PDP
uses physical transport channels in the downlink and uplink directions
to enable data transfer in the two directions. The mobile device has
the capability to roam (i.e., upon mobility, change the network
operator without affecting the received service), provided there is
always radio coverage to guarantee the service. The mobile videophone
is equipped with ordinary telephony hardware (microphone and speaker)
and video hardware (camera and display).
The speech and video content is created in a live
fashion from the microphone and camera input. This is encoded in
real-time by the mobile device and transmitted in the uplink direction
toward the network and the other end user. Speech and video data in the
opposite direction (downlink) is conveyed from the network to mobile
videophone A, which performs data decoding and display/playback of
video and speech data. In addition, the videophone sends and receives
information for session establishment,
QoS control, and media synchronization. The videophone may react
promptly upon reception of QoS reports, taking appropriate actions for
guaranteeing the best possible media quality at any instant.
The mobile network carries conversational multimedia
and control traffic in the uplink and downlink directions, allowing
real-time communication between the two mobile videophone users.
Mobile videophone B is placed at the other end of the architecture shown in Figure 21.1. Its functionality is symmetrically identical to that provided by mobile videophone A