This is the html version of the file http://www.ini.cmu.edu/2002/ITC/INI-SIPphone/SipPhoneThesisReport.pdf.
G o o g l e automatically generates html versions of documents as we crawl the web.
To link to or bookmark this page, use the following url: http://www.google.com/search?q=cache:TO_JoAlEgggC:www.ini.cmu.edu/2002/ITC/INI-SIPphone/SipPhoneThesisReport.pdf+&hl=en&ie=UTF-8


Google is not affiliated with the authors of this page nor responsible for its content.

Carnegie Mellon University Information Networking Institute THESIS Master of Science in Information Networking
Page 1
Carnegie Mellon University
Information Networking Institute
THESIS
S
UBMITTED
I
N
P
ARTIAL
F
ULFILLMENT
O
F
T
HE
R
EQUIREMENTS
F
OR
T
HE
D
EGREE
O
F
Master of Science in Information Networking
"Telephony on a PDA: the INI SipPhone"
P
RESENTED
B
Y
Athanasios P Kosmidis
Accepted by the Information Networking Institute
Thesis Advisor:
____________________________
Date
: __________
(Prof. Marvin Sirbu)
Reader: _______________________________________
Date:
__________
(Prof. Ragunathan Rajkumar)
MSIN Academic Advisor: _________________________
Date:
__________
(Prof. Richard Stern)

Page 2

Page 3
Carnegie Mellon University
Information Networking Institute
TELEPHONY ON A PDA:
THE INI SIPPHONE
A Thesis Submitted to the
Information Networking Institute
In Partial Fulfillment of the Requirements
for the degree of
M
ASTER
O
F
S
CIENCE
in
I
NFORMATION
N
ETWORKING
By
Athanasios P Kosmidis
Pittsburgh, Pennsylvania
May 2002

Page 4
Copyright © 2002, Athanasios P. Kosmidis, All rights reserved

Page 5
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
i
Acknowledgements
This thesis was made possible due to the efforts of many individuals beyond its
author.
First, I would like to acknowledge not only the help, support, feedback and
guidance provided by my advisor, Professor Marvin Sirbu, but also his invaluable
contribution in setting up most of the environment necessary for using the
resulting system.
My reader, Professor Ragunathan Rajkumar, set aside much of his limited time in
order to discuss the technology used in this thesis, and provide feedback on the
work done and the accompanying documents.
Furthermore, I would like to thank Sue Jones, Lisa Currin and Tracey Bragg for
their support within the Information Networking Institute; Joe Kern, Jasen Lentz
and Laura Bowser for their invaluable help for setting up the required systems;
the developers of the Wavelink system for their assistance; dynamicsoft Inc., for
donating to the INI the SIP Proxy and Registrar Servers needed to support this
project; and INI students for making life in the computer clusters more bearable.
Finally, I would dedicate this thesis to my family, my friends, and AEK and
Original 21, who have supported me in their own ways, but they deserve much
more.

Page 6
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
ii
Table of Contents
List of Tables........................................................................................................iii
List of Figures.......................................................................................................iv
Abstract.................................................................................................................v
1. Introduction ................................................................................................... 1
2. Requirements Analysis ................................................................................. 2
2.1
Functional Requirements ....................................................................... 2
2.2
Other Requirements............................................................................... 3
2.3
Selection of technology.......................................................................... 4
3. Underlying Technology Overview.................................................................. 9
3.1
Session Initiation Protocol (SIP) and Session Description Protocol
(SDP) ............................................................................................................... 9
3.2
Real-time Transport Protocol (RTP)..................................................... 16
3.3
Digest Authentication ........................................................................... 18
4. System Architecture .................................................................................... 21
4.1
Higher-level Design.............................................................................. 21
4.2
User Agent Layer ................................................................................. 22
4.3
Authentication Sub-Layer..................................................................... 22
4.4
Parsing Layer....................................................................................... 22
4.5
Transport Layer.................................................................................... 23
4.6
Application Manager Layer .................................................................. 23
4.7
Graphical User Interface Layer ............................................................ 24
4.8
An example of component interaction.................................................. 24
5. Design and Implementation ........................................................................ 28
5.1
Session Initiation Protocol (SIP) stack ................................................. 28
5.2
Authentication sub-layer ...................................................................... 34
5.3
Graphical User Interface ...................................................................... 37
5.4
Media Transmission (RTP) .................................................................. 40
6. Porting to the PDA ...................................................................................... 43
6.1
Cross-Compilation ............................................................................... 43
6.2
Issues with porting to the Zaurus ......................................................... 44
7. Project Postmortem..................................................................................... 47
8. Future work ................................................................................................. 50
9. Conclusions ................................................................................................ 52
References......................................................................................................... 54
Bibliography ....................................................................................................... 56
Appendix I ­ User's Guide.................................................................................. 57
Appendix II ­ Main Classes................................................................................ 61
Appendix III ­ Maintenance and how-to for future work ..................................... 71
Appendix IV ­ Zaurus specifications .................................................................. 73
Appendix V ­ Related Work ............................................................................... 74
Appendix VI ­ General Public License ............................................................... 76

Page 7
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
iii
List of Tables
Table 1 ­ Linux-based Personal Digital Assistants .............................................. 5
Table 2 ­ Comparison between SIP and H.323 ................................................... 6
Table 3 ­ SIP functionality in the Wavelink system .............................................. 7

Page 8
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
iv
List of Figures
Figure 1 - Direct SIP call....................................................................................13
Figure 2 - Call through SIP proxy ....................................................................... 14
Figure 3 - System Architecture diagram............................................................. 21
Figure 4 - Component interaction example ........................................................ 25
Figure 5 - Association between main User Agent data structures...................... 30
Figure 6 - REGISTER flowchart ......................................................................... 31
Figure 7 - Incoming INVITE flowchart................................................................. 31
Figure 8 - Audio transmission from the Zaurus .................................................. 45

Page 9
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
v
Abstract
The world is becoming increasingly IP-centric, with a large number of devices
getting networked every day. At the same time, individuals are starting to favor
smaller and lighter devices over their desktops and laptops. As their modalities
and patterns of use get shaped, there is a trend of adding PDA-type tools (like to-
do lists) to cellular phones, thus striving towards a single device one can carry
around and still be both productive and reachable.
This thesis follows a different path, since the increased connectivity of PDAs
creates a new challenge: turning one into a phone. More specifically, the system
built uses the Session Initiation Protocol for establishing the sessions (as Third
Generation cellular phones will) and the Real-time Transport Protocol for
transporting voice packets over an 802.11b network like the one on CMU's
campus. Furthermore, through the authenticated use of a SIP-to-PSTN gateway,
it is also able to make and accept phone calls to and from the telephone network.
The system, released under a General Public License, was built for Sharp's
Zaurus PDA, but can be run on a Linux desktop or laptop as well.

Page 10
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
1
1. Introduction
The convergence of telecommunications and computing has led to a flourishing
period of new products, services and ideas that have broken the barriers of
functionality a computer or telephone provide alone.
At the same time, the continuous shift to smaller devices and the increasing
choices for enhancing their connectivity creates the possibility of providing
communication services.
Using technologies like the Session Initiation Protocol (SIP) and the Real-time
Transport Protocol (RTP), the work done and presented in this document aims to
create software that allows communication in the form of Voice over IP using a
Personal Digital Assistant (PDA).

Page 11
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
2
2. Requirements Analysis
The purpose of this chapter is to go through the initial requirements for the
product. Given those, the selected tools, protocols and devices will be briefly
described.
2.1 Functional Requirements
Voice communication has been an integral part of everyday life for so long that a
certain pattern of use has emerged. On the other hand, using computers
introduces different modalities.
The main requirement is for the application to provide the functionality of a
"normal phone". This entails primarily the setup and termination of each call, both
when the user is called and when she is the caller. Additionally, it should support
full-duplex voice communication between the two parties, as in a normal
telephone conversation.
The call setup and termination (a part of call signaling) should incorporate
authentication in order to avoid the use of the application by someone else other
than the owner of the device it runs on. Furthermore, the phone application
should be able to access the device's address book, and vice versa, providing
the feeling of single source of contacts.
The application should be implemented in a way that the user of a PDA with a
wireless connection can take advantage of these features without being
unreasonably restricted by the limits of the device or connectivity. This includes
not only lightweight protocols and efficient algorithms, but also calls for an

Page 12
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
3
intuitive and non-obstructing user interface, providing access to required
functionality depending on the state of the application (talking, dialing etc).
Finally, although the application will be designed to run on a handheld computer,
there should be nothing preventing its use with equal ease on a desktop or laptop
machine.
2.2 Other Requirements
While the functional requirements constitute a very important part of the user's
interaction with the target application, some others go beyond the functionality
apparent to the end user.
Any telecommunications-related application calls for robust software.
Furthermore, the software should behave in a "forgiving" way: it should expect
that the individuals using it will make small mistakes, and the machines
interacting with it will not necessarily follow the protocols exactly, as they
unfortunately do quite often. It should, in that case, point out the user's mistake
or continue the interaction as long as the valid input is adequate to do so.
The user should not be exposed to the technical details of the underlying
protocols and exchange of messages between the computers involved in a call.
By the same token, these interactions should be such that the resources are not
consumed unreasonably because they may be limited (e.g. when calling from a
PDA over a wireless link); in particular, the network traffic should be minimal.
Finally, the instructions, options, limitations as well as the code should be well
documented, serving both the purposes of a seamless user experience as well
as the ease of modifications or enhancements made in the future.

Page 13
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
4
2.3 Selection of technology
The available technology for this application spans several dimensions:
- Hardware Platform
- Operating System
- Programming Language / Code base
- Call Signaling Protocol
- Media Transmission Protocol
- Authentication Protocol
- Wireless Networking capability
The above can be quite interrelated, but because the focus is on a handheld
device and the technical requirements may necessitate fairly low-level access to
the machine and operating system, the most appropriate platform and Operating
System combination is a Linux handheld device. The following table summarizes
the main alternatives in that area, as of the fall/winter of 2001 (all are based on
Intel's StrongARM 206 MHz processor).
Memory
Audio
Compact
Flash
PCMCIA
Input
methods
Compaq
iPAQ
32MB RAM,
16MB flash
Jack for
output,
integrated
microphone
No
Yes
(through
special
sleeve)
Handwriting
recognition
Samsung
YOPY
64MB RAM
(developer's
version: 32),
16MB flash
Jack for
output,
Integrated
microphone
One type-II
in the
developer's
version
No
Keyboard
(production
version
only)

Page 14
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
5
only
Sharp
Zaurus
64MB RAM
(developer
version: 32),
16MB flash
Input and
output
through
single jack
One type-II
slot
No
Retractable
keyboard
Table 1 ­ Linux-based Personal Digital Assistants [10]
The handheld of choice for this application was Sharp's Zaurus, a recently
launched product that features both a Compact Flash slot and a Secure
Digital/Multimedia Card input, while providing a combined headset and
microphone jack and full-duplex sound processing capabilities. It is based on a
206-MHz ARM processor, and the developer's version has 32MB of RAM (the
consumer version, released in April 2002, has 64MB).
The above met our requirements for this small platform: network connectivity
(through the use of an 802.11b Compact Flash card, and the lack of such
capabilities disqualified the YOPY), both audio input and output, and the ability to
use a headset (and not having to bring the device close to one's mouth in order
to talk, as in the iPAQ).
Regarding the software component of this project, the main signaling protocols
available are H.323 (an ITU standard) and the Session Initiation Protocol (SIP,
an IETF standard). While the former is more widely used, the latter is rapidly
gaining popularity, along with its sister standard, Session Description Protocol
(SDP) used for transmitting and negotiating information regarding the sessions. A
comparison between the two can be summarized in the following table:
SIP
H.323
Architecture
Horizontal protocol
Vertical protocol suite
Complexity
Low
High
Encoding
Text
Binary

Page 15
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
6
Scalability
Good
Poor
Internet "fit"
Good
Poor
Use
Limited but growing
Widespread
Table 2 ­ Comparison between SIP and H.323 [21]
As a consequence, the protocol of choice for signaling is SIP, in combination with
SDP. When it comes to the actual voice transmission, there is a single dominant
protocol: the Real-time Transport Protocol (RTP). RTP is responsible for
transmitting real-time data and supports timing reconstruction and loss detection;
it uses UDP instead of TCP due to the nature of the transmission.
A previous Information Networking Institute thesis ("Wavelink", by N. Gupta, V.
Keswani, H. Mak, R. Narjala and A. Pavuluri [5]) had resulted in an application
with a preliminary SIP stack, interfacing with an RTP stack for the media
transmission portion. This infrastructure, implemented in C++ and for a Linux
architecture, made up an ideal code base for the purposes of this thesis. In the
following table one can see the features implemented by the Wavelink stack, as
well as the ones partially implemented or missing.
Feature
Wavelink implementation
Parser
Partial (not interoperable on
any test made)
Session Initiation
Complete
Session Termination
Partial (not fully
interoperable)
Audio application
Complete
Call acceptance
Complete
Call rejection
Missing
Authentication
Missing
Ability to choose between direct calls and Missing

Page 16
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
7
calls through proxy
Response to all requests
Missing
Call progress feedback to user
Missing
Exponential backoff periods between
retransmissions
Complete
User preferences
Missing
Contact management
Missing
Registration
Partial (non interoperable
with current SIP Registrar
Server)
Periodic Registrations
Missing
Table 3 ­ SIP functionality in the Wavelink system
Finally, most projects of this kind, while acknowledging the importance of
security, only include it in the "future work" section. One of the goals of this
thesis, however, was to produce an application that can be actually used by
members of the Carnegie Mellon University community. This poses challenges
like the monitoring of the Proxy and Registrar Servers, but most importantly, on
the SIP gateway to the Public Switched Telephone Network that the Information
Networking Institute currently operates
1
. Without proper access control, anyone
would be able to make costly toll calls to landlines. Furthermore, with billing
functionality in place, one can exercise complete control over which calls users
can make and how they pay, on an individual basis.
Choosing which authentication protocol to use has been a much easier decision
than the others. SIP supports Basic and Digest Authentication; the former sends
the credentials in plain text across the network, while the latter has a challenge-
response mechanism that does not reveal a user's password. As a result, the
authentication protocol of choice for this project was Digest Authentication.
1
Cisco Systems 2600 Multiservice Router/Gateway

Page 17
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
8
Having set up the fundamentals on which this project will be based, the next
chapter will discuss the main technologies used, thus providing the reader with
the knowledge necessary for following the remainder of this document.

Page 18
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
9
3. Underlying Technology Overview
The main technologies used in this application are:
- Session Initiation Protocol (SIP, RFC 2543 [8])
- Session Description Protocol (SDP, RFC 2327 [7])
- Real-time Transport Protocol (RTP, RFC 1889 [20])
- Digest Authentication (RFC 2617 [2])
3.1 Session Initiation Protocol (SIP) and Session Description
Protocol (SDP)
The Session Initiation Protocol is a standard for initiating, modifying and
terminating communication sessions; it lies within the application layer of the OSI
reference model, and is independent of the underlying layers. It is based on
HTTP/1.1 (RFC 2616 [3]), and features few message interactions per session, as
well as simple analysis and debugging due to its text-based encoding.
SIP has been gaining in popularity compared to its competitor, H.323, since its
second version was standardized by the Internet Engineering Task Force, in
1999. In addition to this, it has been adopted as the signaling protocol for Third
Generation Wireless Systems (3G) [1] and for Windows XP [12], thus promising
even more widespread use in the near future.
Each SIP user has a unique address that resembles email addresses, with the
prefix "
sip:
". For instance, my SIP address is
<sip:thanos@ini.cmu.edu>
. Each
such address constitutes a SIP Uniform Resource Identifier (URI).

Page 19
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
10
The Session Description Protocol is a separate IETF standard used by SIP to
describe the session. It is, like SIP, text-based, consists of a series of
<attribute>=<value>
lines, and constitutes the body of a SIP message. For
instance, a user may attribute the session with the session owner, a subject, and
the media details, with the following lines appended after the SIP headers:
o=thanos 0 0 IN IP4 128.2.237.89
s=Re: Party!!!
m=audio 4987 RTP/AVP 0
The popularity that SIP enjoys has resulted in an abundance of documentation
about it, and for the purposes of this document, the focus will be on the
functionality that is directly related to the work done. As a result, the remainder of
this section will associate the actions a user will take on an actual phone call with
SIP messages.
The entities involved in the exchange of these messages are:
??
SIP client: User Agent Client
??
SIP server: User Agent Server
??
Registrar Server (or Location Server)
??
Proxy Server
The User Agent is the basic software component of a SIP stack. It is responsible
for initiating and receiving messages, holding the data structures making up the
client's state, as well as interfacing with the applications the stack supports.
The Registrar Server is similar to a lookup service; it associates each SIP
address to one or more others. For instance, my SIP address,
<sip:thanos@ini.cmu.edu>
, is an alias for my real location, which may be

Page 20
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
11
<sip:root@fluorine.ini.cmu.edu>
. The association between these SIP URIs is
held within the Registrar Server's database.
The Proxy Server acts in a way similar to an HTTP proxy; it forwards the
requests it receives to the appropriate party, which it determines with the help of
the Registrar Server. For example, when someone wants to call me, they will
send a message to the Proxy Server at
ini.cmu.edu
, requesting that
<sip:thanos@ini.cmu.edu>
gets called. The address of the Proxy Server is
typically a known entity, although the related literature mentions how a client may
go about locating the appropriate address via DNS [4, 8].
Registering a user
When users want to add contact information to be associated with their unique
SIP addresses, they have to send a registration request to a Registrar Server.
This request will contain their unique address and the actual contact address,
and, minimally, a transaction sequence number, a Call ID which is made globally
unique by including the initiator's host address, and the "last hop" (Via field).
For instance, when I want to let the Registrar Server know that I will be available
at
<sip:root@fluorine.ini.cmu.edu>
for the next hour (3600 seconds), I send a
request of type REGISTER in a message like the following:
REGISTER sip:franc.ini.cmu.edu SIP/2.0
CSeq: 1 REGISTER
Call-Id: 2_971750444@fluorine.ini.cmu.edu
Contact: sip:root@fluorine.ini.cmu.edu:5060
Expires: 3600
From: sip:thanos@ini.cmu.edu
To: sip:thanos@ini.cmu.edu
User-Agent: INI SipPhone
Accept-Language: en
Via: SIP/2.0/UDP 128.2.237.122:5060

Page 21
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
12
Although this message includes an
Expires
field, it is not required because
Registrar Servers typically have default expiration periods; including the field in
the message ensures that the appropriate registration takes place.
When I want to notify the server that I am no longer available at
<sip:root@fluorine.ini.cmu.edu>
, I will send the same message as above, but
with an expiration value of 0 (zero). The server will then delete that registration.
Making a call
When a user makes a call, the User Agent Server of the SIP layer sends a
Request message of type INVITE. This message has to include at least the SIP
address of the caller and the callee, a sequence number, a Call ID and the Via
field.
An example for such a message is:
INVITE sip:94123617323@franc.ini.cmu.edu;user=phone SIP/2.0
CSeq: 2 INVITE
Call-Id: 3_1229208662@fluorine.ini.cmu.edu
Contact: sip:root@fluorine.ini.cmu.edu:5060
Content-Length: 158
Content-Type: application/sdp
From: sip:thanos@ini.cmu.edu
Timestamp: 1020973456
To: sip:94123617323@franc.ini.cmu.edu
User-Agent: INI SipPhone
Accept-Language: en
Via: SIP/2.0/UDP 128.2.237.122:5060
v=0
o=thanos 0 0 IN IP4 128.2.237.122
m=audio 4000 RTP/AVP 0
a=rtpmap:0 pcmu/8000/1

Page 22
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
13
SDP headers like the ones above generally do not change much across
sessions. Their first line shows the protocol version number, and the second
describes the "owner" of the stream and includes IP identifiers. The media line (
m
)
describes the session's media attributes; it contains the medium type (
audio
), the
port the sender is listening on for media packets (
4000
), the protocol (
RTP
) and
the protocol profile (
AVP 0
, i.e. G.711) [6]. If the sender's client supported multiple
profiles, their corresponding numbers would follow. Following it, the attribute line
(
a
) is providing the optional details for each medium being used. In this case, it
simply expands on the attributes of the RTP stream (profile, encoding/sampling
frequency/number of channels). In the case of multiple media being described,
each will have a distinct attribute line and the receiver will be able to distinguish
between them and associate with the supported profiles based on the
rtpmap
value.
This message can either be transmitted directly to the other party or through a
proxy. Sending the message to the Proxy can only be successful if the callee is
registered on the Registrar Server, otherwise the Proxy will notify the caller that
the requested user is not found. On the other hand, the direct call will only be
successful if there is a SIP client on the other end and the requested user is
active there.
Figure 1 - Direct SIP call
Call (Session Initiation)
Accept Call
Session establishment

Page 23
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
14
Figure 2 - Call through SIP proxy
User receiving a call
When a user receives a call like the one above, its User Agent Server may
respond with a message that notifies the other end that it is trying to locate the
user (thus acting as an implicit acknowledgement). This message has a
response code of 100, and belongs to the group of "Informational" responses.
SIP/2.0 100 Trying
Via: SIP/2.0/UDP 128.2.237.122:5060
From: sip:user@ini.cmu.edu
To: sip:thanos@ini.cmu.edu
Call-ID: 3_1229208662@dsr.ini.cmu.edu
CSeq: 9345744 INVITE
Content-Length: 0
After checking whether the requested user is available, the User Agent Server
will generally respond with messages of either of these types: Ringing (code:
Call
Call
Request recipient's
address
Response
Accepted
Accepted
Session establishment
Registrar Server
Proxy Server

Page 24
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
15
180), User Not Found (code: 404), Moved Permanently (code: 301), or Moved
Temporarily (code: 302). The remainder of the message will be the same.
In the case of a "180 Ringing" response, the User Agent Server waits until the
callee "picks up the receiver"; at that point, it sends a "200 OK" to the caller,
notifying it that the call has been established. The caller will respond with an
acknowledgement (ACK) message.
SIP/2.0 200 OK
Via: SIP/2.0/UDP 128.2.237.122:5060
From: sip:user@ini.cmu.edu
To: sip:thanos@ini.cmu.edu;tag=9EF0D911-1AE
Call-ID: 3_1229208662@dsr.ini.cmu.edu
Contact:<sip:thanos@128.2.237.122:5060;user=phone>
CSeq: 9345744 INVITE
Content-Type:application/sdp
Content-Length: 134
v=0
o=user 8045 5614 IN IP4 128.2.237.89
c=IN IP4 128.2.237.89
t=0 0
m=audio 19054 RTP/AVP 0
Proxy receiving a call
When the INVITE request goes through a proxy, the proxy will send a "100
Trying" message to the caller, provided that it has an active registration for the
user (otherwise, it will terminate the transaction with a "404 User Not Found"
response). It will then forward the request to the appropriate User Agent Server,
which will behave as described above, sending its subsequent responses
through the proxy server. This facilitates the recording of sessions in the server,
which can help build billing services.

Page 25
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
16
Note that the exchange of messages described above does not take
authentication into account; the description of such messages is discussed under
the security considerations later in this chapter.
This thesis uses the SIP Proxy and Registrar Servers supplied by dynamicsoft©
2
;
they support basic and digest authentication, call detail records, as well as
service bundles and the Call Processing Language (CPL).
3.2 Real-time Transport Protocol (RTP)
The Real-time Transport Protocol, as its name implies, is a protocol for transport-
layer transmissions of real-time data (such as audio and video). It does not make
any guarantees regarding the quality of the transmission, but typically takes
advantage of the low overhead involved in UDP (as opposed to TCP), since real-
time media cannot afford the delay of TCP retransmission.
The necessary control for establishing, maintaining and terminating a real-time
data transmission session is provided by a sister protocol, the RTP Control
Protocol (RTCP).
An RTP packet includes:
??
A sequence number (2 bytes); this enables the receiver to determine
whether a packet is arriving in order or, if it is old, to discard it. The initial
sequence number is randomly assigned.
??
A timestamp (4 bytes); being a part of a real-time session, each packet
has a notion of the linear monotonic time it was produced. Since a
receiver will typically buffer data ahead (using a dejitter buffer), it can use
this field to position the data in time.
2
SIP Proxy Server Version 5.2.1.7 and SIP Location Server Version 4.0

Page 26
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
17
??
A synchronization source and contributing source identifiers, used for
session multiplexing purposes (4 bytes each).
??
The payload (data; variable size).
In the case of packets missing or arriving out of order, RTP enables filling of the
voids with "comfort noise" [14] (for audio streams) or extrapolation.
Focusing on the audio payload, it consists of the data that the audio device driver
supplies to the RTP layer; most audio drivers can be configured to supply
different types of data (e.g. 16-bit signed little-endian) and with different sampling
parameters (including sampling frequency, number of channels and number of
bits per channel). The two ends of the transmission must agree on the encoding
in each direction in order to engage in an intelligible conversation.
The telecommunications industry has specified several standard codecs through
the CCITT/ITU-T, including recommendation G.711 [15]. It defines PCMU (PCM-
µ-law companded) and PCMA (PCM-A-law companded), used in North American
and European telephone exchanges respectively.
In this project, an RTP application is launched upon successful establishment of
a session. The codec used is PCMU; its payload type (the number 0) is included
in the "media" line of the SDP portion of the message sent during an INVITE:
INVITE sip:94123617323@franc.ini.cmu.edu;user=phone SIP/2.0
CSeq: 2 INVITE
Call-Id: 3_1229208662@fluorine.ini.cmu.edu
Contact: sip:root@128.2.237.122:5060
Content-Length: 158
Content-Type: application/sdp
From: sip:thanos@ini.cmu.edu
Timestamp: 1020973456
To: sip:94123617323@franc.ini.cmu.edu
User-Agent: INI SipPhone
Accept-Language: en
Via: SIP/2.0/UDP 128.2.237.122:5060

Page 27
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
18
v=0
o=thanos 0 0 IN IP4 128.2.237.122
s=3_1229208662@fluorine.ini.cmu.edu
c=IN IP4 128.2.237.122
t=0 0
m=audio 4000 RTP/AVP 0
a=rtpmap:0 pcmu/8000/1
The actual mechanics of preparing the audio for transmission are detailed in the
implementation section.
3.3 Digest Authentication
Digest authentication is built in such a way that it can verify that two parties know
a shared secret (in this case, the password), without actually communicating that
secret either in plaintext or in an encrypted form on its own. It is based on a
challenge-response paradigm: the server sends a challenge and expects a
response that will only be valid if it uses the secret in its calculations. Digest
Authentication was originally built for HTTP authentication (RFC 2617 [2]), and
since SIP is quite similar to the HTTP protocol, it uses it as well.
The most widely used algorithm for calculating digests and, therefore, providing
challenges and responding to them, is the MD5 checksum [13], developed by R.
Rivest and RSA Data Security, Inc. [17].
When a User Agent sends a request to the Proxy or Registrar Server (like the
ones included earlier in this chapter, for instance), the Server will check whether
authentication is enabled for the particular user. If so, it will respond with "401
Unauthorized" (for Registration requests) or "407 Proxy Authentication required",
together with a challenge. This challenge (also called a "nonce") is typically a
hash over a few fields that make it less sensitive to replay attacks. Most
frequently, these fields will minimally include a timestamp and a secret key
residing on the server.

Page 28
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
19
As a result, an example response for the REGISTER request given earlier can
be:
SIP/2.0 401 Unauthorized
Via: SIP/2.0/UDP 128.2.237.122:5060
From: sip:thanos@ini.cmu.edu
To: sip:thanos@ini.cmu.edu;tag=0.9806376119931339037cb
Call-ID: 2_1592914782@fluorine.ini.cmu.edu
CSeq: 1 REGISTER
WWW-Authenticate:Digest realm="ini.cmu.edu",
domain="sip:franc.ini.cmu.edu",
nonce="4XeAA6D5dWCgCPy7MKR+qA==", algorithm="MD5"
Content-Length: 0
Upon receipt of this response, the User Agent will have to acknowledge it and
generate a new request, incrementing the sequence number and, of course,
responding to the challenge with a digest. The digest is calculated as follows:
A1 = concat(username,":",realm,":",password)
A2 = concat(Method,":",domain)
Digest = MD5(concat(MD5(A1),":",nonce,":",MD5(A2)))
where concat("one",":","two") == "one:two" and
Method == "INVITE" or "REGISTER"
The new request the User Agent will send is like the following:
REGISTER sip:franc.ini.cmu.edu SIP/2.0
CSeq: 2 REGISTER
Call-Id: 2_1592914782@fluorine.ini.cmu.edu
Expires: 3600
From: sip:thanos@ini.cmu.edu
To: sip:thanos@ini.cmu.edu
Contact: sip:root@128.2.237.122:5060
User-Agent: INI SipPhone
Accept-Language: en
Authorization: Digest
username="test",realm="ini.cmu.edu",nonce="4XeAA6D5dWCgCPy7
MKR+qA==",response="b6bba2985e55dfccf90a02053abec778",uri="
sip:franc.ini.cmu.edu"

Page 29
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
20
Via: SIP/2.0/UDP 128.2.237.122:5060
Following successful receipt of this message and verification of the user's identity
by the Server, the request goes through. The lifetime of the nonce is set on the
SIP server, and in general, the shorter it is the better; the main advantage of
having longer challenge lifetimes is that once a client authenticates for a
particular request through a response to a challenge, it can include the same
response in subsequent requests until the challenge changes (and the server re-
requests credentials).
Digest Authentication is not immune to attacks. In particular, a client can be a
victim to a man-in-the-middle attack, whereby a fake server requests only basic
authentication, or chooses a challenge that will easily lead to the password given
the response sent from the client. On the other hand, replay attacks are not very
likely, because of the timestamp which is included in the challenge. In order to
further decrease the likelihood of such an attack, it has been proposed that
Digest Authentication for SIP uses a "predictive nonce" (or pnonce), which is
computed by hashing over the source IP address, the From and To fields in a
SIP message, and the Call-ID [9].

Page 30
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
21
4. System Architecture
This section of this document deals with the architecture and design of the
system. It will follow a top-down approach, providing a higher-level view of the
system's architecture first.
The purpose of this chapter is to illustrate the boundaries between the system's
various components, as well as the interactions and data flows among them. The
inner workings of each component will be detailed in subsequent chapters.
4.1 Higher-level Design
This is an illustration of the conceptual positioning of the components of the
system in layers:
Figure 3 - System Architecture diagram
The User Agent is, as in most SIP applications, the "heart" of the system. It
contains all the logic resulting from the SIP specification and carries a substantial
portion of the burden of the application. As a consequence, the following
discussion starts with the User Agent Layer, and follows the path towards the
Graphical User Interface
Sockets:
Receiving Sending
Parsers:
Reverse Direct
Authentication
Module
Application Manager
User Agent
Server Client
Application Layer
Real-time
Transport
Protocol
Network

Page 31
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
22
Transport Layer. After that, it will deal with how the end user and the supported
application interact with it, by discussing the Graphical User Interface and the
Application Manager Layers, respectively.
4.2 User Agent Layer
The User Agent Layer is conceptually divided into two parts: the Server and the
Client. The Server is responsible for analyzing incoming SIP messages (e.g.
INVITE requests from someone else). The Client is responsible for sending SIP
messages, possibly after being instructed to do so by the Server (e.g. send a
REGISTER request or respond to an INVITE request).
4.3 Authentication Sub-Layer
The Authentication portion of the User Agent is responsible for analyzing and
creating the authentication-related part of SIP messages; it only becomes part of
the mechanics of the process for the messages that actually require this. This
makes it possible for the user to disable authentication procedures (which will, of
course, make sense only if the SIP server does not require authentication for the
particular user).
As a result, when a message with authentication-related fields arrives, the
Authentication component will store them in the relevant data structures. These
may be, for instance, parts of the challenge that the SIP server sent; in that case,
the User Agent client will instruct the Authentication component to calculate the
response that will be included in the next message.
4.4 Parsing Layer

Page 32
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
23
As is shown in the diagram, the User Agent layer communicates with the
Transport layer (and through it with the outside world) through the Parsing layer.
This is responsible for parsing the incoming messages into SIP and SDP data
structures that the User Agent can understand and manipulate (direct parsing),
as well as create the outgoing messages from SIP and SDP data structures that
the User Agent has constructed for marshalling (reverse parsing). As a
consequence, the direct parser interacts with the User Agent Server, while the
reverse parser interacts with the User Agent Client.
The parsing layer is also responsible for detecting errors in incoming messages.
4.5 Transport Layer
Following the pattern of the client and server paradigms, the Transport Layer can
be divided into two parts: the incoming (listening) socket and the outgoing
(sending) one.
As per the SIP specification, the listening socket will be accepting messages
from the outside world. On receipt of such a message, it will pass it to the higher
layers, as described above. Each outgoing socket takes the corresponding
message from the higher layers and takes care of its transmission to the other
end.
4.6 Application Manager Layer
The Application Manager is excluded from the mechanics of the system unless
(a) user intervention is involved or required, or (b) an exchange of SIP messages
leads to interaction with the applications that the stack supports.
As a result, all the user requests go through the Application Manager Layer,
since end users are only interested in the result in terms of the functionality
visible to them: a call being made, a call that needs to be answered, logging in,

Page 33
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
24
and so on. Furthermore, the applications supported by the stack will be launched
and terminated by this layer; therefore, and in the case of this telephony
application, the audio portion is controlled by this layer.
4.7 Graphical User Interface Layer
All a user sees is the Graphical User Interface, which communicates with the rest
of the application by interacting downwards in the illustration provided earlier.
This layer is therefore responsible for enabling users to use a dialpad, make and
receive calls, register with the SIP server, manage contacts, redial numbers and
so on.
Furthermore, it does some basic error checking in order to prevent a user
mistake propagating downwards; for instance, it is looking for incorrect SIP
addresses, Registrar and Proxy Servers, existent contact entries, and the like.
4.8 An example of component interaction
The following is an example of the interaction of components; the use case
selected is that of an outgoing call. In the following diagram the arrows represent
the calling of the various functions and the transition of control across the
system. Along the arrows are numbers to guide the reader through the
description that follows the diagram.

Page 34
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
25
Figure 4 - Component interaction example
Please note that this is a simplified trace leaving out details; for example, the
"100 Trying" and "180 Ringing" messages preceding the "200 OK" response, as
well as some acknowledgements are missing. Furthermore, each different
incoming message results in updates to the user's screen (like "Ringing...").
1. The user clicks on the desired buttons, forming a number to call; she then
presses the "Call" button. The GUI detects that this is a valid number and
makes a request to the Application Manager.
2. The Application Manager receives the request (that so far only has the
dialed number), adds the necessary application-specific parameters (like
the audio requirement), and then passes the request to the User Agent
Client.
3. The User Agent Client fetches the appropriate data for the purposes of
creating a message: the caller, the contact address, the sequence
number, the call-ID, and the nature of the request (INVITE), among others.
It then asks for its reverse parsing of the request into a message and
spawns a thread that will be responsible for its transmission. The request
is marked as "pending".
Graphical User Interface
Sockets:
Receiving Sending
Parsers:
Reverse Direct
Authentication
Module
Application Manager
User Agent
Server Client
Application Layer
Real-time
Transport
Protocol
Network
1a
1b
2
3
4
5
18
7
6b
6a
8
9
10
11
12
13
14a
14b
15
16
17

Page 35
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
26
4. After turning the request data structure into a plaintext message (an
INVITE request-type message), the Reverse Parser passes it onto the
sending socket created in the thread.
5. The Transport Layer's sending socket sends a UDP datagram to the
appropriate receiver, and repeats until it has tried enough times or it is told
to stop.
6. The Transport Layer's receiving socket receives a message; it promptly
pushes it to the Parsing Layer.
7. The direct parser turns it into a data structure, passing it onto the User
Agent.
8. On receipt of the data structure, the User Agent Server realizes it is for the
previously pending request, since it was created from a "407 Proxy
Authentication Required" response from the Proxy server, with the
appropriate sequence number and the same Call-ID. It instructs the thread
it had created to stop sending the datagram, since the data has already
gone through.
9. The User Agent Server contacts the Authentication sub-layer with the
corresponding fields (the realm, the method and the nonce from the
message, as well as the username and the password of the appropriate
user), looking for a response to the challenge of the Proxy server.
10. The Authentication sub-layer calculates the response, and passes it back
to the User Agent Server.
11. Having received the response, the User Agent performs some necessary
operations (like incrementing the sequence number as is required by the
protocol) and repeats the previous request with a new thread.
12. As in (4)
13. As in (5)
14. A new message arrives to the receiving socket, as in (6), and it pushes it
up.
15. As in (7)

Page 36
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
27
16. The User Agent notices that the message is a "200 OK" from the other
party. It instructs the Application Manager to launch the media
transmission application...
17. ...and notifies the user by showing the corresponding message on her
dialpad.
18. The Application Manager satisfies the User Agent's request.

Page 37
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
28
5. Design and Implementation
This chapter deals with the design and implementation of the SIP stack, the
Authentication sub-layer, the Graphical User Interface and the media
transmission (RTP) component. It should enhance the understanding of the way
in which the various modules outlined in the System Architecture chapter
contribute the functionality that makes up the application, and be helpful to
anyone willing to enhance it.
5.1 Session Initiation Protocol (SIP) stack
The SIP stack makes up most of the functionality provided, and is naturally the
basic component of this system. This section will describe the design and
implementation of the mechanisms that create the behavior that complies with
RFC 2543 [8].
The SIP stack implemented can be divided into the following components
3
:
- Application Manager
- User Agent
- SIP and SDP parser
- Transmission and receipt of messages.
3
Please note that although the Authentication component conceptually belongs to the above list,
it is separated for the purposes of this discussion since authentication is not a requirement
according to the SIP specification.

Page 38
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
29
Data Structures
While the interaction of the above components is outlined in the previous
chapter, the data structures they use are not divided that clearly among them.
This is because of the extensive sharing of data that is required for the stateful
nature of this signaling stack. The following is a description of the fundamental
data structures used by the SIP stack, and have not changed since the Wavelink
implementation:
??
Users
o
End user of the application
o
Participants in sessions
??
Sessions (pending)
o
Call-ID
o
Method
o
Participant
??
SIP Messages
o
Fields
o
Values
??
SDP Messages
o
Attributes
o
Values
The Sessions, among the fundamental data structures, are associated with the
others as is depicted below:

Page 39
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
30
Figure 5 - Association between main User Agent data structures
Implementation
The SIP engine can be divided into two parts, as in the User Agent division into a
Client and a Server: one is dealing with incoming messages (either new requests
or responses to previous ones), and the other is dealing with the user's requests
(thus constructing messages and sending them across the network). On the
other hand, it is very rare that, according to the specification, the two parts do not
get intermixed. When, for example, an INVITE request is received, we still need
to return an acknowledgement, while at the same time performing the necessary
functions for proceeding with the handling of the request itself.
As a consequence, the SIP stack is designed in such a way that the incoming
messages follow a relatively standard path (according to their nature, of course),
while using both the "Client" and the "Server" parts of the User Agent. Since the
outgoing messages are only a part of the interaction each, the stack proceeds in
Sessions
Session A
Call-ID
Message X
Message Y
Participant
Session B
Call-ID
Message Z
Participant

Page 40
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
31
a way that can be depicted by the following simplified flowcharts, which show an
outgoing REGISTER request and an incoming INVITE:
sendRegister()
handleRegisterRe
sponse()
Received
Response?
Need to send ACK?
Keep relevant
state and data
200 OK
sendAck()
wait for
responses...
Response not 200 OK
ensureAckG
etsThrough()
Figure 6 - REGISTER flowchart
wait for
messages
Notify user
through the GUI
Send "200 OK",
keep state
User Accepts
sendAck()
received an
INVITE request
ensureAckG
etsThrough()
Send "603 Decline"
User Declines
Figure 7 - Incoming INVITE flowchart

Page 41
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
32
In the above figures, the REGISTER request comes through the Application
Manager and the INVITE request from the Transport Layer. Beyond these, the
interaction of the User Agent with the lower layers of software are quite frequent
and depend on the transmission and receipt of the corresponding messages.
These interactions are primarily facilitated through the use of threads. The single
thread monitoring the port on which the application is listening for messages is
pushing them upwards. Almost every time the application sends a message
across, a new thread is spawned in order to carry out the request, without
holding down the main SIP engine from dealing with other messages or
performing the necessary operations on data structures.
It would be a mistake to assume that all incoming messages belong to the same
session! It is not impossible to receive an INVITE request while registering, for
example ­ quite the contrary. This is why concurrency and synchronization
controls are required in the right places in order to ensure the correct multiplexing
of requests with responses, and that the appropriate action is taken at every
step.
These controls are exercised through the use of semaphores: there is one
semaphore associated with each Session object. As a result, synchronization
becomes straightforward, since each SIP message data structure contains the
Call-ID that provides the appropriate Session instance; through that instance a
thread can gain exclusive access to the flow of the messages.
The session associations provided through this implementation do not only
facilitate synchronization and proper handling of messages; they also help in
determining duplicate messages. Each message can be placed in time not
necessarily by a timestamp (it is not required), but by the combination of the Call-
ID, the sequence number and the method. For example, consider the messages
A and B, which have the same Call-ID:

Page 42
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
33
Message A:
REGISTER sip:franc.ini.cmu.edu SIP/2.0
CSeq: 1 REGISTER
Call-Id: 2_1592914782@fluorine.ini.cmu.edu
Contact: sip:root@128.2.237.122:5060
Expires: 3600
From: sip:thanos@ini.cmu.edu
To: sip:thanos@ini.cmu.edu
User-Agent: INI SipPhone
Accept-Language: en
Via: SIP/2.0/UDP 128.2.237.122:5060
Message B:
ACK sip:thanos@ini.cmu.edu;tag=0.9806376119931339037cb
SIP/2.0
CSeq: 1 ACK
Call-Id: 2_1592914782@fluorine.ini.cmu.edu
Content-Length: 0
From: sip:thanos@ini.cmu.edu
To: sip:thanos@ini.cmu.edu;tag=0.9806376119931339037cb
User-Agent: INI SipPhone
Accept-Language: en
Via: SIP/2.0/UDP 128.2.237.122:5060
It is clear that B was an acknowledgement for a response sent for the same call
(an "200 OK" response for example).
Regarding the messages and their data structures, extreme care has been taken
in order to ease modifications that may be required by future incarnations of the
SIP protocol, or extensions specific to this implementation. (One can extend the
protocol to suit one's needs; for instance, custom instant messaging can be
implemented. However, the implementation should not rely on the assumption
that the other end will have the same features, thus maintaining its
interoperability.)
In particular, each supported field value is represented by a distinct C++ object,
and multiple such values may be attributed to a single Field. For example, some
messages have multiple "Via" field values. The implementation allows the

Page 43
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
34
programmer to lookup the values of a particular field by name, returning the
corresponding vector or an empty pointer.
As a result, the SIP and SDP direct and reverse parser are a layer of C++
software, the core of which was generated by flex and bison. The field values
extracted (or to be populated) are simply added to the tail of the list of the
existing ones for the particular field.
This approach has the added benefit of making responding to messages very
efficient, since this now typically involves simply taking the original message and
changing some fields, before handing it to the components responsible for
reverse parsing and transmission.
Overall, the SIP stack is designed slightly differently compared to the Wavelink
system. The main difference has to do with the way the path of the messages is
decided: the previous design had limited flexibility as to how messages can be
handled, and that made the authentication sub-layer, discussed below, very
difficult to design and implement, because authentication causes considerable
changes to this path.
Furthermore, the implementation has been through substantial change in order to
provide the necessary functionality and conform to the SIP specification.
Because of the nature of these changes, they are quite widespread rather than
self-contained, and in almost every function in the User Agent class. The
Application Manager has received fewer changes, while the Parsing layer has
been untouched, but an intermediate sub-layer was added to address some
important issues.
5.2 Authentication sub-layer

Page 44
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
35
The Authentication part of the application consists of the MD5 libraries that
perform the digest, and the related data and algorithms; no provision for it was
made by the Wavelink design and implementation.
Although authentication is applicable only for INVITE and REGISTER messages,
its effects on the SIP stack are more profound. An example exchange of
messages illustrating the mechanics of authentication for SIP follows:
- The application sends a REGISTER request to the SIP Registrar server.
- Upon receipt, the server responds with a "100 Trying" message.
- As it finds that authentication is enabled for the particular user, it sends a
"401 Unauthorized" message back, along with a challenge.
- The application receives the message and performs the necessary
computations on the challenge. It then sends a follow-up REGISTER
request with the response.
- The server receives the response, sends a "100 Trying" and subsequently
performs the tasks required to record the user's registration in the
database. Upon their successful completion, it sends a "200 OK' back.
Therefore, the authentication mechanism changes the flow of messages as it
was described in the previous section. It requires the request to be restarted, but
not with a different session altogether.
Data Structures
The data structures associated with this part of the system are:
??
Challenge
??
Authentication Realm
??
Method (REGISTER/INVITE)

Page 45
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
36
??
Requested URI
??
Username
??
Password
??
Response
All the above are required in order to perform the computations leading to the
necessary exchange of messages. The necessary modules should therefore
share the part of the data that is constant and, whenever requested, return a
response to the challenge provided.
Implementation
As the purpose of this thesis was not to build an authentication module from
scratch, but to perform authentication when required, the relevant algorithms
were the MD5 algorithms provided by RSA Labs©, under General Public
License. These provide functionality for initializing a response with data,
performing the digest over the data, and returning it.
The SIP stack uses a custom-built API for performing these functions and for
retrieving the response. Given the earlier flow of messages, the implementation
of such functionality is similar in the cases of REGISTER and INVITE requests.
More specifically, after sending a request of this kind, the User Agent is waiting
for the SIP server's response. Upon receipt of a response that indicates that
authentication is required, it returns an acknowledgement and requests from the
authentication module the digest response to the challenge provided. When this
is done, it adds the corresponding field to the SIP message and, as per the
specification, increments the sequence number but leaves the Call-ID untouched.
It then repeats the request, following the normal SIP procedures, since the SIP

Page 46
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
37
server can understand that it actually is a follow-up to the previous one based on
the Call-ID and sequence number.
Notice that it is efficient to save the original SIP message in order to facilitate its
repetition after augmenting it with the corresponding fields, while nothing in the
specification of the protocol requires it to be saved. As a result, the stack saves
all the initial REGISTER and INVITE messages until the authentication response
is sent or the server instructs it that it is not needed (in the case of a user set up
on the server to bypass authentication, if the server supports such feature).
5.3 Graphical User Interface
The Graphical User Interface is an important part of any application; no matter
how good the underlying design and implementation is, a user interface that does
not properly address the needs of users can make it much less usable. For the
purposes of this system, the GUI that came with the Wavelink thesis was
removed completely.
The choice of look-and-feel is generally limited for handhelds, and this is the
case for this application as well: the Zaurus comes with support for the QT
environment [23], running a Qtopia server. Of course, since it still is a Linux
machine, it can be refurnished with the X windowing system, but implementing it
for the Zaurus would go much beyond the scope of this thesis, and the
implementation available seems to have very little support. After all, one of the
objectives of this thesis was to produce an application that can be usable with
minimal user configuration (and by an end user with no knowledge about Linux).
Qtopia comes with a package for developing custom widgets that developers can
use. It provides a "drag and drop" interface for the appearance of the widget and

Page 47
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
38
generates an XML file containing the details; the meta-compilers provided can
turn each such file into a C++ class that when loaded shows up as was designed.
One can subsequently subclass it and provide custom implementation for its
components and integrate it with an application.
For the purposes of this system, the necessary capabilities of a Graphical User
Interface are:
- a dialpad
- contact management support
- previous calls (both incoming and outgoing) for redialing
- registering with the SIP Server.
The actual functionality, appearance and behavior of the widgets were a product
of discussion with a few people that helped make it more user-friendly and less
intrusive or full of jargon.
The following snapshot of the dialpad should provide the reader with a better
understanding of the design and implementation of the Graphical User Interface:

Page 48
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
39
When pushing the buttons named "Redial", "Find", "Add", "Login", "Proxy", a new,
smaller widget appears that enables the user to perform the corresponding
functions. Furthermore, in the case of an incoming call, a widget appears
prompting the user with the options to answer or decline the call.
Data Structures
A certain amount of sharing exists between the GUI and the lower layers of
software comprising this system. The most important data are as follows:
??
username
??
password
??
SIP URI
??
registrar and proxy servers
??
current contact URI
??
number currently displayed or being called.
As mentioned earlier, the Graphical User Interface layer performs some error
checking in order to prevent errors propagating to the lower layers. This implies
that some intelligence about the above fields lies within its design and
implementation.
Implementation
Every time a button is pressed, the event handler put in place checks how it
affects the application. In the case of the button "4" on the dialpad, for instance, it

Page 49
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
40
updates the display of the number accordingly. On the other hand, some events
need to take place without the user intervention, being triggered from the lower
layers of software. These are primarily the handling of incoming calls and a
request to login if the token lifetime has expired or the user has resumed the
device after suspending it.
For events triggered by the user, the application first checks whether the entered
data is correct or whether the state of the application permits the requested
operation. Having passed that stage, the Graphical User Interface performs a call
to the Application Manager (the immediately lower layer and a part of the SIP
stack) in the case of a registration request or a call. If the user is simply
performing address book-type functions, this layer of software implements the
fetching and storing of contacts in the appropriate files
4
, in order to avoid passing
the data to another layer of software for this relatively simple task which does not
involve SIP-related operations or network functions.
5.4 Media Transmission (RTP)
The media transmission component consists of an implementation of the full RTP
stack (provided by Vovida© [26] under General Public License) and a layer of
software that controls the mechanics associated with the rest of the application.
As a result, this section will primarily deal with this layer rather than going into
details about the Real-time Transport Protocol.
This part of the system has been changed very little since its implementation for
the Wavelink thesis.
4
The address book of the Zaurus adds a Unique ID to each record that acts as a primary key;
there is no documentation on how this ID is selected, and the Zaurus' will not recognize arbitrary
ones. For this reason, as a workaround, contact management is done by having an application-
specific address book: a new contact is stored in it, while stored contacts are retrieved from both.

Page 50
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
41
Data Structures
The main data requirements for this part of the system consist of:
??
Audio-specific information
o
Audio devices and formats
??
The other end's IP address
??
Sending port number
??
Receiving port number
In order to avoid adding unnecessary complexity to the system, the media
transmission component is an individual executable, launched from the main
program through a
fork
and
exec
. The other end's IP address and the port
numbers are passed to it as parameters and it is responsible for setting up the
communication between the two machines, thus achieving an appropriate
separation of concerns.
Implementation
On startup, the RTP executable takes care of setting the parameters for the
sound card (16 bits per sample, single channel, 8000 Hz sampling frequency). It
then opens the device with the appropriate permissions and spawns two threads:
one for receiving and one for sending sound.
In order to conform to the supported codec (G.711), this layer of software
performs the companding from linear data to µ-law (for sending) and vice versa
(for receiving) for each packet. This results in an 8-bit per sample, single

Page 51
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
42
channel, 8000 Hz sound packet, which is then handed over to the RTP stack for
transmission.

Page 52
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
43
6. Porting to the PDA
One objective of the implementation of this system was to minimize the platform-
specific aspects in such a way that porting to any Linux device was as smooth as
possible. This section describes the general steps taken to port the system from
a Linux desktop to the PDA of choice, Sharp's Zaurus, as well as the issues
raised by some discrepancies between the two systems.
6.1 Cross-Compilation
Being a C++ implementation, this project's source code was compiled and linked
using GNU's
g++
compiler and linker for Linux
5
. The same tool was used for
building the RTP libraries from the Vovida RTP stack, compiling the audio
application's implementation and linking with them.
In order to achieve compilation for a platform other than the development
machine, the programmer needs to use a cross-compiler suitable for the target
device. In general, the cross-compilation tools depend only on the processor of
the target device, but in some cases several device-dependent options (for
graphics or optimization, for instance) necessitate the use of device-specific
tools.
The cross-compiler and linker used for this system was GNU's generic
arm-
linux-g++
; it comes with the standard libraries compiled for the ARM processor
(like POSIX threads).
5
The development machine used during this thesis had Red Hat © 7.1 installed, but a
downgraded
gcc
because of known problems with the version supplied with it (2.96)

Page 53
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
44
In order to facilitate cross-compilation in the future, the
Makefile
supplied
configures itself, depending on the environment variables set for the graphics
and the target system. For more details, the reader can consult the
README
file
accompanying the release.
6.2 Issues with porting to the Zaurus
The porting process to the Zaurus was slightly more complicated than changing
the commands in the
Makefile
.
One of the earliest complications was the generation of C++ code from the SIP
grammar using
flex
and
bison
: they used a data structure that the cross-
compiler could not compile. This was solved by investigating the necessity for
using a different flavor of
flex
, and as it turned out, there is one for the ARM
processor that differs only in one of the include files; the
flex
executable is the
same. After a few changes in the source code for the application (without
breaking the desktop compilation, of course) and the inclusion of the specific file,
the cross-compiler could go ahead and produce the correct object files.
The most important issues involve the audio part of the system. A painless one is
simply that the development desktop used only one device for audio, while the
Zaurus (and some laptops) uses two (
/dev/dsp1
in addition to
/dev/dsp0
). As a
result, instead of opening one device, the Zaurus has to open
/dev/dsp1
with
write permissions and
/dev/dsp0
with read permissions.
Unfortunately, the second issue is much more important and severely impinges
on the functionality of the system. When the audio application starts, it sets the
audio parameters of the sound device; two settings are important for this
application: (a) sampling-related and (b) buffer sizes. While sampling parameters
for both input and output (8000 Hz, mono, 16 bits per sample) were set without a
problem, the input device would not allow its user to set the buffer size. This

Page 54
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
45
buffer is the place where data from the device is placed before it is pushed to the
application performing a
read()
from it.
While the output device driver permitted the desired size of 512 bytes, the only
size permitted by the input device driver was 8192 bytes, which at 8000 samples
a second and 16 bits per sample amounts to roughly 0.5 seconds of audio. This
means that the application must receive half a second of audio at a time, which
will be divided into packets by the RTP stack and sent, in a burst of network
traffic, to the other end.
Figure 8 - Audio transmission from the Zaurus
The consequences from this inflexibility are two-fold: first, the incoming sound
from the microphone leaves the audio device on its way to the network with a
delay of half a second. As a result, sound from the Zaurus to any other end
arrives with, at the very least, a 500ms latency, much more than the delay the
human ear can ignore.
Additionally, this delay generates some requirements on the receiving side of this
data: since audio packets will arrive not only late but also in a burst (about 25
packets at a time, every half a second), the receiver must be able to buffer all of
them in its dejitter buffer. While most software receivers in the conducted tests
were able to deal with this requirement (including the exact same system running
on a desktop), the most problems came when the receiver was the Cisco SIP-to-
/dev/dsp1
buffer
(size=8192 bytes)
read()
Encapsulation in
RTP packet
Transmission to
the receiver

Page 55
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
46
PSTN gateway operated by the Information Networking Institute. More
specifically, the gateway would not handle this much traffic at such rates and
would only be able to send about 200ms from every second
6
to the receiving
phone through the telephone network. The INI operates a Pingtel VoIP phone
7
as
well, which performed only slightly better, sending about 400ms worth of audio at
each second
6
.
It should be noted that traffic arriving to the Zaurus from the other end's
microphone is not only of very acceptable quality, but also has minimal delay.
Therefore, one can hear the person talking from a PSTN line through the Cisco
gateway without a problem, but the PDA user is not audible to the other end
8
.
As a result, this problem eliminates the ability to make calls from the PDA to a
telephone (the problem does not apply to the desktop version). A formal request
has been placed with the developers of the device and its ROM, but there has
not been any feedback regarding the possibility of this being fixed. The consumer
version of the Zaurus (SL-5500), which was released in the US in April 2002,
also does not allow the modification of the microphone buffer's size. On the other
hand, the general audio quality exhibited by the system when running on the
consumer version is much better, and this can be attributed to more efficient
processing of the audio data, possibly due to the increased memory capacity
(64MB compared to 32MB in the developer's version, SL-5000D). A phone call to
a landline from the consumer Zaurus through the Cisco gateway, however, is still
largely incomprehensible.
6
These figures were based on empirical measurements only.
7
Pingtel Xpressa ©
8
Cisco documentation indicates a maximum dejitter buffer capability of 250 msec.

Page 56
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
47
7. Project Postmortem
This chapter will attempt to draw some conclusions regarding the approach used
in building the INI SipPhone. It will focus on the fact that the final product is
based, like most software, on an existing software infrastructure, as well as on
some characteristics of the components of this system that affect the end result.
Modifying and extending software
Any developer is aware of the difficulties involved in modifying, extending, or
even maintaining existing software. Although the author has been in a similar
situation a few times in the past, the size and complexity of this project made this
one distinctly different.
As mentioned earlier in this document, the code base used for the final product is
a substantial part of an earlier MSIN thesis. This report would be insufficient
without acknowledging the work the five students had put in building the
"Wavelink" system, as well as their efforts to include documentation in the source
code. Furthermore, the similar coding style (including software patterns) all
developers used was not only maintained throughout most of the implementation,
but also proved very helpful.
On the other hand, an attempt to build the desired system simply relying on the
above would not have succeeded. It took a lot of communication with the
previous developers, quite a bit of trial and error, lengthy discussions with Kunal
Trivedi who is well aware of the original design, and finally a lot of tracing of the
several threads of control co-existing in the system concurrently.

Page 57
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
48
What should have been done, and is the author's advice to anyone who wants to
extend software, was to have held a meeting with the previous developers very
early in the project so that sufficient understanding of the mechanics have been
shared. Although the practical problems involved in this case were not trivial, it is
probably the best way around similar situations. Thankfully, all developers took
the time and made the effort to respond to my questions as accurately as
possible, despite the long time that has passed since the development of the
"Wavelink" system.
System components
The Linux operating system on which the application was built has been around
for a long time and has all the advantages of Linux: stability, robustness, open
source, and clean interfaces. On the other hand, a lot of other components and
aspects of this application are relatively new in the market. The development
version Zaurus used was purchased about 10 days after its public release.
Additionally, the system built is the first one of its kind available for a consumer-
level Linux device, to the best knowledge of the author.
Consequently, there is a combination of immature components, characterized by
lack of real support, and growing but limited development expertise. In addition to
this, the fact that Linux is not a commercial operating system and thus can bring
miniscule revenues to the manufacturers of the devices using it, makes
availability of device drivers for an even less widespread system such as a PDA
problematic, to say the least.
Among the results of this problem were, for instance, the lack of support for most
802.11b wireless cards for the Zaurus for a long time, but also the unsuccessful
attempts to get around the problems with the audio device by contacting the
manufacturers.

Page 58
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
49
There is no doubt that these obstacles will be overcome in the future. After all,
the past year has seen a few products combining PDA and phone functionalities
(included in Appendix V); on the other hand, they have received generally
unfavorable reviews.

Page 59
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
50
8. Future work
While the functionality desired from the part of the system has been reached,
there is definitely room for improvement and, of course, added functionality.
The first candidate for future work is the parsing layer, for two distinct reasons.
First, the grammar generated by
flex
and
bison
makes up a relatively large
fraction of the executable; redesigning it to reduce its size can make the
application more compact. Furthermore, the direct parser actually consists of two
sub-layers: because the original parser was not conforming to the standard, there
had to be an additional one, which eliminates the unnecessary lines from the
messages, although this should be done within the same layer.
Another possibility that may provide better functionality and expose the
application to more users is porting it to the Compaq iPAQ. The iPAQ does not
have the sound device problems the Zaurus has, and can run Linux (although
Linux is not officially supported!); most importantly, it is among the best selling
PDAs. In order to port the software to an iPAQ running Linux, one has to install
the QT graphics environment, cross-compile the application using the iPAQ-
specific compiler, and deal with any device-specific issues that may arise.
Regarding possible additional functionality, support for video- and multi-
conferencing can be implemented. For the case of video, apart from the
computing power and equipment requirements, one has to add support for a
videoconferencing application (like
vic
). This involves simply recognizing the
relevant SDP headers and launching the application as required.
When it comes to multi-conferencing, SIP supports it by issuing as many
INVITEs as the participants. Most of the complication is within the audio

Page 60
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
51
application, since the RTP side will need to compose a single audio stream from
multiple ones and take care of volume normalization.
Instant Message and Presence can be implemented through SIMPLE (SIP for
Instant Messaging and Presence Leveraging Extensions), and the extensions
proposed in the related draft (draft-ietf-simple-presence-06 [16]). Among the
specific requirements for such an implementation are the new SIP request types
(SUBSCRIBE and NOTIFY) as well as a SIP Presence server.
Finally, and taking advantage of the mobility provided by a Personal Digital
Assistant combined with network connectivity, a lot can be achieved by adding
location-based services. Specifically, ongoing research at Carnegie Mellon
9
is
looking into mapping signal strengths from wireless access points to actual
locations. Combining the two research projects can result in an extensive set of
offerings, although knowledge of an individual's location introduces privacy
issues that should be addressed by any such implementation.
9
This research is being undertaken by Professor Alex Hills, Professor Peter Steenkiste, and
CMU's Wearable Group, among others.

Page 61
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
52
9. Conclusions
This thesis had set a simple goal in the beginning: to enable a user of a PDA to
place calls in a secure, authenticated way. This has been achieved by the design
and implementation of a Voice Over IP solution using the Session Initiation
Protocol, which was based on a previous MSIN thesis.
Due to the combined limitations of the PDA currently used and the available
gateway, calls from it to a PSTN phone offer sound of very low quality to the
receiving end, while the desktop version has no such problems.
Authentication capabilities are implemented using Digest Authentication, which is
supported by the SIP standard and the dynamicsoft Proxy Server operated by
the Information Networking Institute. As a result, call detail recording by the SIP
Proxy side can enable billing functionality based on the identity of each caller, as
well as more advanced access control.
This system has been tested with commercial and public-domain clients and
servers (including servers from dynamicsoft, Nortel and Lucent, and clients from
eStara and Pingtel) and has been found to be fully interoperable with them.
Moreover, it is designed in such a way that its SIP portion is as forgiving as
possible of mistakes made by its counterparts. The end user can configure
preferences including default SIP server locations and authentication criteria.
Through the system developed, this thesis contributes the first Voice Over IP
solution running on a commercial-grade Linux Personal Digital Assistant, as is
the Zaurus SL-5500. The software is released under a General Public License,
found in Appendix VI, and can be ported to any device running Linux, while a
version for the desktop is readily available.

Page 62
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
53
Finally, possible future work includes making the system more compact in size,
as well as implementing conferencing functionality, presence and instant
messaging, or location-based services.

Page 63
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
54
References
1. 3G TR 23.821, 23.228, "3GPP TSG and SA: Architecture Principles for
Release 2000"
2. Franks, J., Hallam-Baker, P., Hostetler, J., Lawrence, S., Leach, P.,
Luotonen, A. and L. Stewart, "HTTP Authentication: Basic and Digest
Access Authentication, IETF RFC 2617, June 1999
3. Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P. and T. Berners-
Lee, "Hypertext Transfer Protocol -- HTTP/1.1", IETF RFC 2616, June
1999
4. Gulbrandsen, A., Vixie, P. and L. Esibov, "A DNS RR for specifying the
location of services (DNS SRV)", IETF RFC 2782, February 2000
5. Gupta, N., Keswani, V., Mak, H., Narjala, R. and A. Pavuluri, "Wavelink:
Handheld Wireless Multimedia Over IP", Carnegie Mellon University,
Information Networking Institute, Master's thesis, 2000
6. H. Schulzrinne, "RTP Profile for Audio and Video Conferences with
Minimal Control", IETF RFC 1890, January 1996
7. Handley, M. and V. Jacobson, "SDP: Session Description Protocol", IETF
RFC 2327, April 1998
8. Handley, M., Schulzrinne, H., Schooler, E. and J. Rosenberg, "SIP:
Session Initiation Protocol", IETF RFC 2543, March 1999
9. J. Rosenberg, "Request Header Integrity in SIP and HTTP Digest using
Predictive Nonces", IETF Draft, draft-rosenberg-sip-http-pnonce-00.txt,
June 16, 2001
10. Linux Devices, http://www.linuxdevices.com
11. Mehta, P. and S. Udani, "Voice over IP", IEEE Potentials, pp. 36-40,
October/November 2001
12. Microsoft Corp., http://www.microsoft.com
13. R. Rivest, "The MD5 Message-Digest Algorithm", IETF RFC 1321, April
1992

Page 64
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
55
14. R. Zopf, "RTP Payload for Comfort Noise", IETF Draft, draft-ietf-avt-rtp-cn-
06.txt, April 2002
15. Recommendation G.711, International Telecommunication Union, The
International Telegraph and Telephone Consultative Committee, Geneva,
1988
16. Rosenberg, J., Willis, D., Sparks, R., Campbell, B., Schulzrinne, H.,
Lennox, J., Huitema, C., Aboba, B., Gurle, D. and D. Oran, "Session
Initiation Protocol (SIP) Extensions for Presence", IETF Draft, draft-ietf-
simple-presence-06.txt, April 3, 2002
17. RSA Security, Inc., http://www.rsasecurity.com
18. Schulzrinne, H. and J. Rosenberg, "Signaling for Internet Telephony",
Proceedings of the 6th IEEE International Conference on Network
Protocols (ICNP), Austin, Texas, October 1998
19. Schulzrinne, H. and J. Rosenberg, "The Session Initiation Protocol:
Internet-Centric Signaling", IEEE Communications Magazine, pp. 134-141,
October 2000
20. Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A
Transport Protocol for Real-Time Applications", IETF RFC 1889, January
1996
21. SIP Center, http://www.sipcenter.com
22. SIP, Columbia University, http://www.cs.columbia.edu/sip
23. Trolltech AS, http://www.trolltech.com
24. Unofficial Sharp Zaurus SL-5500 FAQ,
http://www.newbreedsoftware.com/zaurus-faq, April 2002
25. Vlaovic, B. and Z. Brezocnik, "Packet Based Telephony",
EUROCON'2001, Trends in Communications, International Conference on
Trends in Communications, pp. 210-213, Volume 1, 2001
26. Vovida Networks, Inc., http://www.vovida.org

Page 65
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
56
Bibliography
Camarillo, Gonzalo.
SIP demystified.
New York: McGraw-Hill, 2002.
Dalheimer, Matthias Kalle.
Programming with Qt
. Second Edition. California:
O'Reilly and Associates, 2002.
Deitel, Harvey and Paul Deitel.
C++ How to program.
Third Edition. New Jersey:
Prentice Hall, 2001.
Peterson, Larry and Bruce Davie.
Computer Networks: A systems approach
.
Second Edition. California: Morgan Kaufmann, 1999.
Stevens, Richard.
UNIX Network Programming, Volume 1: Networking APIs -
Sockets and XTI
. Second Edition. New Jersey: Prentice Hall, 1997.
Stroustrup, Bjarne.
The C++ Programming Language
. Third Edition.
Massachusetts: Addison-Wesley, 1997.

Page 66
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
57
Appendix I ­ User's Guide
Configuration file
SipPhone is configurable through a file called
sip.cfg
; an example file is as
follows:
Registrar franc.ini.cmu.edu
Redirect franc.ini.cmu.edu
Port 5060
Username thanos
Authentication on
Registration_Duration 600
Login_Ticket_Lifetime 600
Require_Login_After_Suspend on
PDA_AddressBook_File
/home/root/Applications/addressbook.xml
SIP_AddressBook_File sipaddressbook.xml
Although most fields are self-explanatory, here is a short description for each and
the impact it has on the application (note that login and registration mean the
same thing in this context):
??
The first two lines supply the addresses of the SIP servers. If any of these
entries is blank, the application will call directly (this implies that the SIP
URI should be the actual location of the user).
??
Following that, the SIP User Agent Server port is given.
??
The username entry supplies the default username for use when
registering; while using the SipPhone, a user can change this on
registration. It is also the default address that the application will recognize
calls for (after registration, the registered address is recognized as well).
??
Authentication is by default "on", meaning that the SIP phone will respond
to authentication requests. For security purposes, when this option is set,
a user will be prompted to log in (register) on startup and a click on

Page 67
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
58
"Cancel" will cause the application to exit, during that or subsequent
registrations.
??
Registration duration is simply the lifetime of the registration with the
Registrar Server (in seconds).
??
The login token lifetime denotes the amount of time the application will
wait before it requests the user to login again (in seconds). Normally, it will
be equal to the registration duration.
Functionality
The main functionality of the INI SipPhone is very similar to that of a usual
phone: one uses the dialpad in order to place a call by pressing the
corresponding buttons. In addition to this, in the case of an incoming call,
"Incoming call" is displayed on the dialpad's screen and a pop-up dialog displays
the caller's identification and prompts the user to either accept or reject the call.
Furthermore, the user can select among the last 5 entries dialed or the last 5
incoming callers to place calls to, and also has the option to add any of the
incoming callers to the address book. The Find dialog will search for the given
string in the address book, and on failure will allow the user to add a new contact.

Page 68
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
59
While adding a contact, a user can select to place a call (using either a SIP URI
or a phone number) directly after the addition.
Regarding SIP-related functionality, the Login dialog has two tabs: "Simple" and
"Advanced". The first only contains a username and password field, while the
second also prompts for the SIP domain, the Contact, and the SIP Registrar
server. All these take the default value on startup, and the password field's
contents do not show up. Moreover, one can change the proxy used by clicking
on the appropriate button.

Page 69
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
60

Page 70
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
61
Appendix II ­ Main Classes
The following are descriptions of the fundamental functions of the main classes.
Please note that some arguments may not appear here for simplicity.
UserAgent class
This class contains the main SIP functionality. It contains data structures that
include the following:
string proxyHostName
string registrarHostName
SessionDb sessions
Its main functions are:
void UserAgent::setUpListen(listening_socket)
The main listening loop, which matches incoming messages with the
corresponding functions.
Request UserAgent::newRequest(call_id, sending_socket, method, URI)
Creates the body of a new request from its most basic fields; the rest will
be added depending on the nature of the request.
Response UserAgent::sendInvite(call_id)
This starts a new session by sending an INVITE request, and matches the
responses with the appropriate handling function, keeping state and

Page 71
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
62
informing the user. The return value is the final response, which may be
passed from the following function.
Response UserAgent::handleInvitexxxResp(error_handling_args)
These 6 functions (depending on which type of response arrives) handle
responses to INVITE requests; they are called by
sendInvite()
. The
return value is the final response, if any.
Response UserAgent::sendRegister(call_id, URI,
expire_registration_flag)
This generates and sends a REGISTER request, matching the responses
with the appropriate handling function, keeping state and informing the
user. If
expire_registration_flag
is raised, the request will have an
Expires
value of 0, removing the registration from the server. The return
value is the final response, which may be passed from the following
function.
Response UserAgent::handleRegisterxxxResp(error_handling_args)
Similar to
handleInvitexxxResp()
, for REGISTER requests.
void UserAgent::cancelSession(call_id)
Sends a CANCEL message, and matches the responses with the
appropriate handling function.
Response UserAgent::handleCancelxxxResp(error_handling_args)
Similar to
handleInvitexxxResp()
, for CANCEL requests.
void UserAgent::endSession(call_id, URI)

Page 72
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
63
Sends a BYE message, and matches the responses with the appropriate
handling function.
Response UserAgent::handleByexxxResp(error_handling_args)
Similar to
handleInvitexxxResp()
, for BYE requests.
void UserAgent::convertRespToAck(SipMessage, URI)
Helper function used by any handler that needs to send an
acknowledgement.
void UserAgent::ensureAckGetsThrough(acknowledgement_args)
Helper function used by any function that sends an acknowledgement that
the protocol requires to reach the other party. Its parameter contains,
among others, the socket used in sending the ACK request.
void UserAgent::sendAck(error_handling_args)
Used for sending acknowledgements; it sometimes needs to be combined
with the
ensureAckGetsThrough()
function.
void UserAgent::handleInviteRequest()
Handles an INVITE request, generating and sending the required
messages and informing the user through the GUI.
void UserAgent::handleByeRequest()
Handles a BYE request, generating and sending the required messages
and informing the user through the GUI.
void UserAgent::handleCancelRequest()

Page 73
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
64
Handles a CANCEL request, generating and sending the required
messages and informing the
user through the GUI.
Response UserAgent::sendErrorIfNotWellForm(sip_message)
Checks the incoming message for non-parsing errors, like invitations to
users who are not logged in, or duplicate messages. If appropriate, it
sends a response back to the sender. Called by
setUpListen()
void UserAgent::convertReqToResp(sip-message, response_number,
msg_body, URI)
Converts a request to a response-type message, since they share most of
the headers. It is called by request handlers.
AppManager class
The AppManager class takes most of the data from the Graphical User Interface,
and primarily holds constants for the supported applications.
void AppManager::newUser(new_uri)
Adds a new Contact to the local database (used for adding the default
contact, root@hostname).
void AppManager::registerUser(user_uri, contact_uri)
Prepares a new REGISTER request ­ calls
UserAgent::sendRegister()
.
void AppManager::deRegisterUser(user_uri, contact_uri)
As above, but sets the appropriate flag for
UserAgent::sendRegister()
.

Page 74
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
65
CallId AppManager::newSession(to_uri, application)
Prepares a new INVITE request ­ calls
UserAgent::sendInvite()
.
void AppManager::cancelSession(call_id)
Prepares a new CANCEL request ­ calls
UserAgent::cancelSession(call_id)
.
void AppManager::endSession(call_id)
Prepare a new END request ­ calls
UserAgent::endSession(call_id)
.
void AppManager::startApp(application, call_id)
Launches the appropriate application for the call.
void AppManager::closeApp(application)
Closes the specific application.
SessionDb Class
This class contains a database of Sessions and provides an API for access
functions which also deal with synchronization issues.
SessionObj* const SessionDb::newSessionObj(uri, call_id)
Creates a new Session object for a new call.
SessionObj* const SessionDb::getSessionObj(call_id)

Page 75
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
66
Retrieves a Session object for the particular call, gaining exclusive access
to its data.
void SessionDb::releaseSessionObj(call_id)
Releases the Session object, making it available to any thread waiting to
retrieve it.
UserObj* const SessionDb::newUser(uri, call_id)
Creates a new User object as part of the call.
UserObj* const SessionDb::getUser(uri)
Retrieves the User object associated with the particular URI.
void SessionDb::deleteUser(uri)
Removes the User object associated with the particular URI.
TpSocket class
The TpSocket class acts as a socket, either receiving or sending messages. Its
variables include:
string remote_host
int port
bool remote_host_status
bool socket_status
The last two variables mentioned are used for determining whether the remote
host is valid, and whether the socket is sending, receiving, or inactive,
respectively.

Page 76
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
67
void TpSocket::startSend(sip_message, number_of_retransmissions)
Starts sending the corresponding message, retrying as appropriate.
void TpSocket::sendOnce(sip_message)
For messages that only need to be sent once (like optional ACKs).
sip_message TpSocket::recv()
Listen for a new message; once it's received, parse it into the appropriate
data structure.
void TpSocket::terminateRecv()
Stop listening for messages.
MyDialPad class
This class is the main component of the Graphical User Interface. Most of its
functions are making calls to the lower layers. Its data structures are primarily the
widgets appearing on the screen, therefore it retrieves most of the necessary
data from their contents through QT-specific access functions.
void MyDialPad::clickedRegister()
Brings up the Login dialog, checks the data entered, and if no errors exist
it calls
register()
.
void MyDialPad::register()
Initiates a registration request, by calling
AppManager::registerUser()
.

Page 77
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
68
void MyDialPad::clickedCall()
Checks the data entered, and if no errors exist it changes the button
showing "Call" to show "Hung up", and it calls one of the two following
functions.
void MyDialPad::call(callee)
This is called if the callee corresponds to a PSTN number. It places a call
to
AppManager::newSession()
.
void MyDialPad::call(callee_ID, callee_hostname)
This is called if the callee is a SIP URI. It places a call to
AppManager::newSession()
.
void MyDialPad::clickedRedial()
Brings up the Redial dialog, and shows the corresponding calls.
void MyDialPad::clickedProxy()
Brings up the Proxy dialog.
void MyDialPad::showMessage(message)
Shows the corresponding message on the dialpad's screen. Called from
lower layers of software.
void MyDialPad::callEnded(message)
Shows the corresponding message on the screen, and changes the button
showing "Hang up" to show "Call" again.

Page 78
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
69
void MyDialPad::haveACall(caller, lock)
Called by
UserAgent::handleInviteRequest()
. Shows the Incoming Call
dialog, which will be displaying the caller's identity. Once the user accepts
or rejects the call, the lock (the semaphore the User Agent has requested)
will be released and control will return to the User Agent.
void MyDialPad::updateRedialList()
After having placed or received a call, it updates the redial list.
Authentication class
The Authentication class is wrapped around the MD5 algorithm and provides an
API for its functionality. Its most used functions are the following:
Authentication::Authentication(nonce, realm, URI)
Constructs the object as appropriate; called by
UserAgent::sendInvite()
or
UserAgent::sendRegister()
upon receipt of a challenge.
void Authentication::performDigest(username, password)
Calculates the response based on the data it already has and the
username and password.
string Authentication::getField()
Returns the string that will be placed within the response message.
RtpAudio class

Page 79
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
70
This class is responsible for managing the audio streams and the underlying RTP
stack.
int RtpAudio::openAudioDevice(device_name)
Opens the device with the appropriate permissions.
int RtpAudio::setAudioParams(parameters)
Sets the parameters for the device opened.
int RtpAudio::startSound()
Launches the two threads responsible for sending and receiving RTP
packets.
int RtpAudio::stopSound()
Terminates the two threads.
void RtpAudio::actualReadSend()
This function contains the code for the sending thread. It reads from the
device's buffer data coming through the microphone, performs the
appropriate conversions and passes the data to the RTP stack.
void RtpAudio::actualRecvPlay()
This function contains the code for the receiving thread. It gets the
incoming data from the RTP stack and plays it through the audio output.

Page 80
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
71
Appendix III ­ Maintenance and how-to for future work
The nature of this system is such that it can serve as a platform for even more
functionality. In order to make these additions possible, every effort has been
made to produce a modular design that separates the system components
appropriately. On the other hand, the complexity inherent in the SIP protocol will
require widespread changes in the User Agent class in the case of SIP-specific
extensions. For this purpose, the developer is encouraged to follow the path of
each SIP request and response as is described in the documentation found both
in this document and in the software release. In particular, for additional headers
in the SIP messages, one does not need to do more than creating a subclass of
FieldVal
and implementing the necessary code within that class (function
getField()
which returns the string part of the message, at a minimum).
For extensions that have limited interconnection with the SIP mechanisms, one
should be able to wrap them around the current functionality without having to
perform a detailed analysis of the current system beyond the module or layer to
be changed.
Regarding the changes suggested for the parsing layer earlier in this document,
the developer is encouraged to keep the current design and implement a lighter
parsing layer, possibly by using any SIP and SDP parsing modules that may be
publicly available at the time.
It is likely that the need may arise to use a windowing environment other than
QT. The underlying system is developed in a way that does not use special QT-
related functions, and can adapt to a new Graphical User Interface by simply
replacing the functions being called by the underlying layers with their
corresponding ones.

Page 81
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
72
Similarly, the system does not use any specific operations the Zaurus provides,
so porting it to another device can be relatively straightforward, possibly requiring
a different cross-compiler than the one used.
Supporting applications other than audio should be equally simple: audio
characteristics are currently contained in an object (
AppObj
), so a video-
conferencing application, for instance, may be supported in a similar manner.
Finally, the author can be reached at
thanoskosmidis@yahoo.com
and would be
happy to answer any questions that will help maintain or extend the system.

Page 82
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
73
Appendix IV ­ Zaurus specifications
The following are the relevant specifications for Sharp's Zaurus SL-5500, taken
from the unofficial "Frequently Asked Questions" [24].
The SL-5500 has:
- 206 MHz StrongARM SA-1100 CPU
- 64 MB SDRAM
- 16 MB ROM (flash)
- 240x320 pixels, 16 bits-per-pixel reflective 3.5" LCD display with front light
- Linux 2.4 kernel
- Personal Java
- Qtopia (Trolltech's QT/Embedded, plus the applications) [23]
- Stereo audio out, Mono audio in (1/8" headphone jack)
- On-screen handwriting recognition (with word completion)
- On-screen keyboard (with word completion)
- On-screen letter pickboard (with word completion)
- UNICode character picker
- 37-key QWERTY thumbpad/keyboard
- IR port
- USB/Serial docking station
- Compact Flash Type II expansion port
- Secure Digital / MultiMedia Card (MMC) expansion port
- Rechargeable, removable lithium-ion battery.

Page 83
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
74
Appendix V ­ Related Work
Innovation and research in related fields have resulted in a lot of new products
and initiatives, some of which are outlined in this section.
??
Cisco Systems, Inc., through its AVVID (Architecture for Voice, Video, and
Integrated Data) is offering both software and hardware-based services
using SIP or H.323. Similar, primarily software-based products are offered
by Avaya, dynamicsoft, and others.
??
SpectraLink Corporation (http://www.spectralink.com) has released an
802.11b wireless phone that uses the H.323 suite of protocols. It can
provide voice communication integration to Cisco Systems' IP telephony
software.
??
Vonage Holdings Corporation (http://www.vonage.com) has Voice-over-IP
telephony services targeting broadband users, offering traditional phone
line capabilities using SIP.
??
Net2Phone, Inc. (http://www.net2phone.com), is considered the leading
VoIP provider to the end user.
??
Audiovox Communications Corporation (http://www.audiovox.com), has
released the Thera, a Pocket-PC based PDA with a built-in CDMA phone,
which will become available in the summer of 2002.
??
Research in Motion Ltd. released the Blackberry 5810 in April 2002, a
wireless handheld with optional GSM service. Samsung released the
SPH-E120 in March 2002, a CDMA2000-compliant phone with PDA
functionality based on the Palm operating system. About a year prior to
Samsung, Kyocera had released the QCP-6035, which only differs in the
version of CDMA it supports (800/1900).
??
The OpenH323 project (http://www.openh323.org) is an open-source
initiative that aims to provide an integrated solution for various desktop
platforms. Another open-source project emerged from OpenH323

Page 84
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
75
(http://www.pocketbone.com) that focused on videoconferencing on the
Pocket-PC platform, in particular.
As can be deduced from the above, most projects use a cellular telephony
platform and create the hardware in a way that accommodates PDA functionality
(like address books, to-do lists, appointments etc.). Additionally, it is expected
that the third generation cellular phones will be hosting applications capable of
providing similar functions.
While these offerings will be appealing to the established base of cellular
telephone users, the increasing number of 802.11 installations will make systems
like the one presented in this document more suitable. In particular, for instance,
organizations with Wireless LAN installations on their campuses will be able to
provide continuous communication between their employees through their
personal devices.

Page 85
Carnegie Mellon University
Information Networking Institute
Athanasios P Kosmidis
May 2002
"Telephony on a PDA: the INI SipPhone"
MSIN thesis Report
76
Appendix VI ­ General Public License
Copyright © 2002, Athanasios P. Kosmidis. All rights reserved.
License to copy and use this software, which is derived from the Wavelink
system created by N. Gupta, V. Keswani, H. Mak, R. Narjala and A. Pavuluri, is
granted provided that it is identified as the "INI SipPhone created by Athanasios
P. Kosmidis" in all material mentioning or referencing this software or this
function.
License is also granted to make and use derivative works provided that such
works are identified as "derived from the INI SipPhone created by Athanasios P.
Kosmidis" in all material mentioning or referencing the derived work.
Athanasios P. Kosmidis makes no representations concerning either the
merchantability of this software or the suitability of this software for any particular
purpose. It is provided "as is" without express or implied warranty of any kind.
These notices must be retained in any copies of any part of this documentation
and/or software.