A fully functional RTSP/RTP streaming server hello world example in C++ for experimentation with JPEG payloads.
Everyone whom ever tried to develop his own media streaming server gets overwhelmed by the
huge number of different transmission protocols, RFCs and different media codecs. You are buried
in technology information and thousands of pages of specifications. So it is definitely not an
easy job to recognize the general architecture of a streaming server and to concentrate on the
relevant. You have to deal at least with the following technologies to understand how a streaming
- RTP, RTSP, RTCP, RTP over TCP, UDP, Unicast, Multicast, SDP
- RTP Payload audio / video formats and codecs like MJPEG, H264, MPEG-2, G.711, PCM
The following tutorial is to provide a highly simplified C++ VS2010 hello world example of a
streaming server which could provide a base pattern for experimentation and the development of
more sophisticated solutions. Everything was stripped away which could hide the basic understanding
of RTP streaming. So there is no advanced error handling implemented. I did not deal with multicasting
and RTCP or more complex payloads than MJPEG like MPEG-2. Even the simplest implementation which seems
still to make sense as a tutorial already comprises about 800 lines of code.
The tutorial covers the following aspects of a media streaming server:
Multiclient operation of an RTSP/RTP server. Multiple clients can connect to the server (for example VLC).
A simplified RTSP parser which handles the session establishment between an RTSP player client
and the server. The parser was partially overtaken and simplified from the Live555 project
RTP media streaming with transport over UDP and TCP (RTP over RTSP)
A very basic media payload format which is packetized into RTP network packets for streaming.
We will just deal here with MJPEG as RTP payload. The streaming server of the tutorial offers
two RTP MJPEG streaming channels mjpeg/1 and mjpeg/2. To keep everything as simple as possible
the source sample of the tutorial works with synthetic images which are generated in the source code
of the project.
To get a first impression start the RTSPTestServer application in a console window. After that you
can start an RTSP player. I recommend VLC for that. To connect VLC to one of the streams which
the streaming server simulates open the menu "Media" and select "Open network stream ...". In the
"Open Media" dialog window enter the URL of one of the two available streams. If the streamer and
VLC reside on the same machine the URL for channel 1 is:
The 8554 is the RTSP port which the streamer uses to accept connections from clients. After
pressing the "Play" button the stream should start. You should see an alternating display of a
red and a green JPEG frame with a frame rate of 25 frames per second. You may open multiple
instances of VLC and connect them to the streamer. It is possible to open the same channel
multiple times. For example you see four VLC instances in the following screen shot which are
connected to the streamer.
The mjpeg/1 stream runs with an image resolution of 48x32 pixel and the second stream mjpeg/2
with 64x48 pixel.
RTSP and RTP are Internet protocol specifications which define the different aspects which make
out a streaming session between a streaming RTSP/RTP server and a player client. There are
other media streaming protocols in operation for example the big bunch of different streaming
methods which came up with the new HTML5 standard. We just deal here with RTSP and RTP.
RTSP (Real Time Streaming Protocol) is defined in the Internet standard draft RFC 2326.
Despite of its name RTSP actually does not transport any media like video and audio.
RTSP is a companion protocol for RTP which is doing the actual work of the media transport.
RTSP covers all aspects for the control of an RTP streaming session. A player connects to the
RTSP connection handler of a streaming server and exchanges RTSP requests with the server.
These requests and their responses are defined in RFC 2326. For example the RTSP requests
are used to
- select the streams which the client would like to play,
- query the format of the stream (codecs) and the transport method (for example UDP or TCP based),
- start and stop a streaming session
An RTSP/RTP server starts a stream after some requests from the client when the client
sends an RTSP PLAY request. To handle RTSP requests from the client a streaming server must
implement an RTSP parser to interpret the client messages. A simplified RTSP parser is
implemented in the Unit "RTSPSession.cpp" of the streamer implementation.
I will not discuss in detail the RTSP request / response communication here. The internet
and the RFC 2326 offer a huge amount of information to that and the possible pitfalls
for the implementation of a parser module.
RTP (Real Time Protocol) is the actual media transport protocol. As any internet
standard it is well defined in the RFC 3550. That RFC describes the packetization process
of media samples into RTP packets. A media stream consists of a series of RTP packets
which are transmitted from the streamer to the client. The RFC 3550 (and other companion
RFCs for individual payload types) describes the rules how media samples are packetized. For
example there are rules how an image of a video stream must be fragmented into multiple small
RTP packets if the image data is bigger as the payload data size which may be put into one
Each individual payload type has its own set of rules how that RTP packetization takes place.
RTP itself is a generic transport protocol and independent from the payload type it transports.
Beside the payload packaging the RTP standard describes different transport methods. Basically
we have two different transport methods for RTP packets. RTP over UDP uses UDP as low level
transport protocol. UDP is a lightweight protocol and does not waste much bandwidth for
control data and flow control. UDP does not guarantee that a transmitted packet arrives
correctly. That leads to image errors (artifacts) or audio drop outs if the network
conditions are bad. For media transports that fact is often accepted in favor of the
simplicity of the UDP transport. A big disadvantage of UDP is the fact that it is
often not possible to transmit it over heterogeneous networks like the internet because
of firewalls and router limitations. UDP transport would be the technological base for
multicast streaming. I will not discuss multicast here and my personal opinion of Multicast
is that it is a dead technology path with applications just for quite special environments.
The streamer in the tutorial just supports Unicast connections between a client and the server
even in the case of UDP transport. That means each client (VLC player) has its individual stream
and may control it without disturbing other clients which are connected to the streamer.
RTP may be transmitted as well over TCP. If that transport mode is selected by the client
in the RTSP SETUP request the streamer sends the RTP packets over the TCP connection which is
already established for the RTSP communication. In that case there are 2 logical transports on
the same TCP connection. TCP transport has some overhead in the protocol compared with UDP.
But it has significant advantages over the UDP transport which counterbalance the disadvantages
by far. TCP is a reliable transport protocol and guarantees that the data arrives at the client.
In heterogeneous networks TCP is often the only transport which will work. Despite the commonly
hold discussion in the Internet that UDP is recommended for streaming transport the author prefers
TCP based transport and made the experience that all disadvantages which are normally ascribed to
TCP may be overcome with a good implementation without loosing its advantages over UDP.
Picture 2 shows the simplified communication channels which are established for UDP and
TCP transport. TCP just uses one channel for both RTSP (control) and RTP (media) whereas UDP
always establishes at least 2 channels (if we would discuss RTCP it would become even more
complicated) which makes the implementation and the decision whether a network is suited for
that transport type or not much harder.
VLC is able to handle both RTP transport types. To switch between UDP and TCP based transport
open the dialog "Preferences" in the "Tools" menu. In the dials "Input & codecs" you find the
option "Live555 stream transport" and the two values "HTTP (default)" and RTP over "RTSP (TCP)".
The "HTTP" option actually means UDP - it is not clear why it is named incorrectly as HTTP.
You can check which transport type is really used for example by sniffing the communication
RTP as transport protocol is independent from the type of media data which it transmits.
RTP may transmit a huge variety of different video and audio formats. Some audio formats
which may be transported by RTP are G.711, GSM or PCM. Video formats which may be transported
are MPV, H261 or H264. There are much more. Each audio or video format which may be put into
RTP packets is called a payload. Normally audio and video data are transmitted in compressed
formats. That's why the payload type represents normally as well either an audio or a video
compression format or a codec. For each format there are different rules defined how to put
the data into RTP packets. We demonstrate here just a very simple MJPEG payload RTP
packetization. The rules which must be applied to put MJPEG images into RTP packets are described
in RFC 2435. Even a simple format like JPEG has lots of different characteristics or profiles.
The implementation of a general RTP packetizer just for JPEG may already be a challenge. That's
why the RTP payload packetizers normally just consider a limited subset of the degrees of freedom
which a compression standard or the belonging payload RFC offer.
The following picture shows the architecture of the tutorial sample of the RTSPTestServer.
The test server runs two different thread types. The master thread (main program thread) waits
for incoming client connections on the master TCP socket of the streamer which resides on port
8554. For each incoming client the master thread creates a session thread which handles the RTSP
and RTP communications individually for each client without disturbing the other clients. The
number of session threads and henceforth RTP channels is not limited.
A session thread handles two event types. One event signals incoming data on the RTSP TCP
socket of the connected client. That data gets interpreted by the RTSP request parser. The
parser checks the different request types and their parameters and creates the belonging responses.
In the DESCRIBE request the streamer answers with the payload type which it supports for
the requested channel. In the case of MJPEG that is the payload type 26.
- In the SETUP request the streamer and the client negotiate the RTP transport method (UDP or TCP)
- Finally the streaming gets started when the client sends a PLAY request.
- The TEARDOWN request stops the streaming
The second event which controls the session thread is an event which signals that a new image
is available for streaming. Our server simulates a video source with a frame rate of 25 frames
per second. For that a simple periodical timer is used. Each timer event alternates between two
JPEG frames. The timer event is just handled if the session is in PLAY state that means the
client has sent an RTSP PLAY request.
For a more realistic extension of the streamer the image production mechanism is the part
of the sample which must be modified first. For example one could read JPEG images from files
driven by a timer or connect to a network camera which produces a trigger event each time when a
new JPEG image for RTP streaming is available. Of course for such an extension the RTP packetization
must be extended as well to support the RFC 2435 for the characteristics of the JPEG images which are used.
The RTP packetization for our very basic sample is implemented in the CStreamer class. That class
is able to handle both - packetization for UDP and for TCP transport. To not obscure the basic
working principles we simulate 2 very simple single color JPEG images. These images are so small
that they fit into just one RTP packet which has normally a size of about 1500 bytes in ethernet
environments. For realistic purposes the images are normally much bigger than one RTP packet may transport.
In that case the image must be split into multiple fragments. Each fragment is wrapped into an RTP packet
which consists of RTP header information and the actual JPEG fragment. The RFC 2435 describes the details
The 2 images which we simulate for each stream are implemented as fixed arrays in the JPEGSamples.cpp unit.
That unit contains 4 JPEG images - 2 for each of our streams. The images just contain the so called scan
data. That is the compressed image data without any JPEG headers.
Our very simple packetization mechanism is implemented in the procedure SendRtpPacket. That procedure
sets the RTP specific header data of an RTP packet and sends it to the client dependent on the
transport mechanism which was negotiated with the client during the RTSP SETUP request/response.
The RTP header of an RTP packet has a length of 12 bytes. Its detailed structure is defined in RFC 3350.
The RTP header contains data for:
- the RTP protocol version
- a packet (sequence) counter which may be used by the client to detect packet losses
- a timestamp which controls the timing of the replay in the client
- an unique identifier for the RTP stream
The RTP header is followed by an 8 byte JPEG specific header which is described in RFC 2435. It contains
meta data which describes parameters of the image like:
- width and height
- color format
- fragment offset (the position of the scan data if the image should be bigger as the capacity of the RTP packaged)
More details about the structure of the simulated JPEG RTP packets can be found in the source code of the example.
The RTP packets which are created by the sample code can be transmitted directly by using UDP
as transport mechanism.
Sending RTP over TCP requires some additional modifications of the RTP packet. Because it is sent
on the same logical TCP connection as the RTSP communication the client must be able to differentiate
between RTSP and RTP packets. That's why the SendRtpPacket procedure inserts an additional RTP over
RTSP header into the packet. The length of that "demultiplexing" header is 4 bytes. The magic
number $ indicates for the client that the packet does not contain RTSP data. This is followed by
a channel number which identifies whether we transmit an RTP or an RTCP package.
The implementation of the streamer hello world project was done in C++ and Visual Studio 2010.
The C++ sources, the belonging header files and the complete Visual Studio 2010 project may be
downloaded as ZIP here:
The following link provides the binary of the server:
This introductory tutorial was to demonstrate some basic concepts on how to implement an RTSP/RTP
based streaming media server. A "full" featured implementation is really a challenging project. One gets
quite quickly lost into a jungle of technologies and terminologies which hide the basic working principles.
That's why I stripped off everything possible which prevents one from seeing the true stuff.
Of course it is not always necessary to have a "fully" featured streamer with all imaginable payload
types and transport mechanisms. If just one or a small number of payload types is required or if
multicast is not necessary or possible because of a limited number of clients and sufficient network
bandwidth then the challenge for an implementation goes down drastically. Furthermore it is quite
different for an implementation whether you must consider live streaming sources or file based video
on demand streaming. As always the implementation costs depend on the application scenario. For a simple
point to point RTP media transport the sample in the tutorial could already be good starting point.