Using an ordinary phone for most people
is a common daily occurrence as is listening to your favorite
CD containing the digitally recorded music. It is only a small
extension to these technologies in having your voice transmitted
in data packets. The transmission of voice in the phone network
was done originally using an analog signal but this has been replaced
in much of the world by digital networks. Although many of our
phones are still analog, the network that carries that voice has
In todays phone networks, the analog voice going into our analog
phones is digitized as it enters the phone network. This digitization
process, shown in Figure 1 below, records a sample of the loudness
(voltage) of the signal at fixed intervals of time. These digital
voice samples travel through the network one byte at a time.
Figure 1. Digital Sampling of an analog voice signal
At the destination phone line, the byte
is put into a device that takes the voltage number and produces
that voltage for the destination phone. Since the output signal
is the same as the input signal, we can understand what was originally
The evolution of that technology is to take numbers that represent
the voltage and group them together in a data packet similar to
the way computers send and receive information to the Internet.
Voice over IP is the technology of taking units of sampled speech
data and using an IP (Internet Protocol) data packet to carry
the information to its destination.
So at its most basic level, the concept of VoIP is straightforward.
The complexity of VoIP comes in the many ways to represent the
data, setting up the connection between the initiator of the call
and the receiver of the call, and the types of networks that carry
Using data packets to carry voice is not just done using IP packets.
Although it won't be discussed, there is also voice over Frame
Relay (VoFR) and Voice over ATM (VoATM) technologies. Many of
the issues VoIP being discussed also apply to the other packetized
There are advantages to using a packet
of bytes representing the voice compared to sending individual
bytes of voice as is done in the phone network today. Having voice
and data share the same network is one of the prime motivators
for business since it can reduce expenses. Another advantage is
that data network equipment is significantly cheaper than the
equipment to multiplex many voice channels together onto a single
high speed link.
What are some
of the applications that use packetized voice?
Business phones that plug into Ethernet ports
at the office (i.e. Nortel i2004)
Telephone conversations using the Cable TV system
Video Conferencing (i.e. H.323)
Digital Cell Phones (i.e. GSM)
Telephone conversations over the Internet (i.e. Net2Phone)
Large businesses are at the forefront
of deploying VoIP and have been able to justify the investment.
Moving phone numbers on analog phones is a costly process (~$100/phone)
but for IP phones, it only requires the user to take their phone
from the current location and plug it into a data port at their
VoIP phones can be tied into larger systems to gain the benefit
of sophisticated call services controlled by a computer system.
In call centers, the call and the data for the call are instantly
correlated. Customer contact management for sales personal is
so much easier because the contact database can automatically
make the call and the computer can simultaneously pull up current
orders, backorders and tie that information into the company inventory
Within the backbone transport systems, VoIP has been also shown
to be cost effective. The process of bringing many low speed (64
kbits/sec) lines together onto OC-48 (2.4 Gbits/sec) is expensive
in TDM systems. In data networks, multiplexing low speed links
to high-speed links is very economical.
On the consumer side, the cable system operators are deploying
VoIP technology in the home using the PacketCable architecture
These applications are the driving factors in allowing manufactures
to make equipment, service providers to offer services, and customers
to increase their productivity. The VoIP technology only becomes
useful when compelling applications meet the needs of customers.
The chapter Compelling VoIP Applications starting on page 10 provides
more details on these applications and the advantages for the
The seminar on VoIP
Applications has more details about the VoIP applications.
Why has VoIP deployment been so slow?
benefits of packetized voice outweigh the disadvantages, it should
be noted that these disadvantages have contributed to the slow
adoption of the technology. Among the disadvantages are:
For any given compression algorithm, it takes
extra bytes for voice packets. The TCP and IP headers have
additional bytes not carried in the current voice networks.
To get lower bandwidth, the voice compression
algorithms and echo cancellation requires additional processing
power that makes digital phones more expensive than analog
The data networks have had difficulty providing
low enough delay and high enough reliability that customers
As a practical matter, VoIP can't be deployed
instantly everywhere which means that there must be connections
between the current voice networks and VoIP networks. There
are many complexities associated with standardizing the way
to interconnect these networks and the equipment can be costly.
When businesses expand their usage of
VoIP technology and the consumers adopt this technology, the access
networks will finally be able to achieve the critical mass required
to allow the economical conversion to VoIP.
The seminar on VoIP
Problems has a more complete discussion of some of the challenges
of the technology.
The VoIP Technology
The main aspects
of understanding the VoIP technology are: controlling the call,
methods of encoding (digitizing) the voice, and interconnection
with today's Public Switched Telephone Network (PSTN).
Controlling a Call
In today's PSTN, there are three types
of control (signaling) being performed for a call: supervision,
alerting, and addressing. Supervision monitors the state of the
phone which allows the central office to know when the receiver
has been picked up to make a call or when a call is terminated.
Alerting is the notification at the destination that a call is
present (ringing) and also simple call progress tones during a
call (i.e. busy signal and ringback). Addressing enables the user
to dial a specific phone anywhere in the world.
In VoIP, these same functions need to exist and they are invoked
by sending appropriate messages between the various elements that
control the call. There are also many extensions to each of the
control categories. There are advanced services such as Caller-ID,
Call Waiting, three way calling, and voice mail that need to be
provided in the VoIP system.
There are several approaches to digitizing
the voice samples. These approaches vary by the information that
is transmitted, the complexity of the algorithm, and the assumptions
of the sound being transmitted (e.g. voice, fax, music). Different
applications select the best voice coding method based on what
needs to be accomplished, the amount of bandwidth that the underlying
network can supply, and how much the user wants to spend for the
The Pulse Code Modulation (PCM) algorithm for digitizing speech
makes no assumptions about the sound and therefore does the best
job on various types of sounds. It also produces the highest bit-rate
for the data and has the shortest delay. The basics of the various
PCM algorithms, which includes ADPCM (Adaptive Differential Pulse
Code Modulation) and DPCM (Differential Pulse Code Modulation),
is that the algorithm samples the data at fixed time intervals
(i.e. 8,000 times/second) and then generates a number based on
Another way to sample speech is to use a model of the way people
generate speech. In an algorithm such as Linear Predictive Coding
(LPC), the human vocal tract has an excitation source and a vocal
tract that has constrictions in it. People change the constriction
points to make various sounds. LPC uses a series of filters that
accomplish a similar task. In the LPC algorithm, the filter coefficients
and the excitation type are the only information that needs to
LPC coding algorithms require a large amount of processing power
and provide the lowest data rate. LPC works well for sending human
speech sounds, not very well for music and it does not work at
all for transmitting fax (or computer modem) sounds.
A third type of algorithm used for digitally representing sounds
is to use the frequency of the sounds. Instead of sampling the
waveform in fixed units of time, the sound is represented in units
of frequency. This works well for speech since vowels are low
frequency and consonants are high frequencies. This third type
of algorithm is called a Sub Band Coder (SBC).
There are also algorithms that use a mixture of these algorithms
and produces adequate sound quality with medium bit rates. An
example of such a hybrid coder is the Code Excited Linear Prediction
The following table provides a quick summary of the main voice
coding algorithms. The Mean Opinion Score (MOS) is a subjective
number indicating how people feel about the quality of the voice
signal for that algorithm (higher is better). G.711 is the reference
point and this coding algorithm is used in today's public network.
Table 1. Voice Coding Standards
3 - 5
For more information on these coding standards, please see Voice
Interconnection to the PSTN
VoIP networks and the PSTN (Public Switch
Telephone Network) in many instances must work together to deliver
a phone call. Connecting these networks together has proved to
be very difficult because of the many different types of systems
involved, the many different types of interconnections possible,
and the billing/regulatory issues associated with combining regulated
and non-regulated networks.
Figure 2. Many combinations of the VoIP network are possible
As shown in Figure 2, there are many
combinations of networks and devices. These combinations require
the existing PSTN network control systems to communicate with
the data network control systems.
As an example, assume a call is going to go from a VoIP phone
to an analog phone in a different city. The VoIP phone dials the
destination phone number. The local data network only knows about
its own local IP addresses and so forwards it to the access network
which in this case is a city wide VoIP network. Not knowing where
that 10 digit phone number is located, the city IP network needs
to locate the destination city and then find a data network (if
it exists) to get to that particular city.
In the destination city, the data network needs to query the local
PSTN analog system and converts the Connect IP messages to the
proper signaling message for that type of voice switch. The voice
switch then checks to see if the phone is busy and, if so, sends
a message back to the IP network. Eventually, some piece of equipment
needs to generate a busy signal waveform to send to the VoIP phone.
Converting between IP addresses and 10 digit phone numbers is
not a trivial process and involves many steps. In the example
shown, the call may be a long distance call subject to certain
rate charges, or this may be just a local call. Knowing the underlying
regulatory structure is required to provide proper billing.
The IP addresses for the devices may not even be a public address.
In many cases, the temporary IP addresses assigned to devices
are useable only within the company. In the presence of these
private addresses or company firewalls, how does an outsider know
how to reach a VoIP phone within a company? What happens when
the temporary IP for the VoIP phone changes?
While these problems have solutions, standardizing on a common
solution has delayed the deployment of equipment.
For more information on VoIP to the Public
network, please see VoIP
Additional VoIP seminars:
Voice Coding Algorithms - A description of the various methods for digitizing speech.
VoIP Applications - The VoIP technology only becomes
useful when compelling applications meet the needs of customers.
The corporate, cable telephony, and video conferencing applications
VoIP Problems - Deployment of VoIP has been slower than expected because of problems with underlying networks, standardization issues, and network control devices.
Voice over IP carries digitized speech in IP
The major applications for VoIP are in Corporate
LANs, Cable Telephony, and Video conferencing.
VoIP has been slow to deploy because of difficulties
in underlying networks delivering reliable service and lack
of standardization connecting into the existing public network.