Seminar - An Introduction to Voice over IP (VoIP)

Introduction

Using an ordinary phone for most people is a common daily occurrence as is listening to your favorite CD containing the digitally recorded music. It is only a small extension to these technologies in having your voice transmitted in data packets. The transmission of voice in the phone network was done originally using an analog signal but this has been replaced in much of the world by digital networks. Although many of our phones are still analog, the network that carries that voice has become digital.

In todays phone networks, the analog voice going into our analog phones is digitized as it enters the phone network. This digitization process, shown in Figure 1 below, records a sample of the loudness (voltage) of the signal at fixed intervals of time. These digital voice samples travel through the network one byte at a time.

Figure 1. Digital Sampling of an analog voice signal

At the destination phone line, the byte is put into a device that takes the voltage number and produces that voltage for the destination phone. Since the output signal is the same as the input signal, we can understand what was originally spoken.

The evolution of that technology is to take numbers that represent the voltage and group them together in a data packet similar to the way computers send and receive information to the Internet. Voice over IP is the technology of taking units of sampled speech data and using an IP (Internet Protocol) data packet to carry the information to its destination.

So at its most basic level, the concept of VoIP is straightforward. The complexity of VoIP comes in the many ways to represent the data, setting up the connection between the initiator of the call and the receiver of the call, and the types of networks that carry the call.

Using data packets to carry voice is not just done using IP packets. Although it won't be discussed, there is also voice over Frame Relay (VoFR) and Voice over ATM (VoATM) technologies. Many of the issues VoIP being discussed also apply to the other packetized voice technologies.

Applications

There are advantages to using a packet of bytes representing the voice compared to sending individual bytes of voice as is done in the phone network today. Having voice and data share the same network is one of the prime motivators for business since it can reduce expenses. Another advantage is that data network equipment is significantly cheaper than the equipment to multiplex many voice channels together onto a single high speed link.

What are some of the applications that use packetized voice?

Business phones that plug into Ethernet ports at the office (i.e. Nortel i2004)
Telephone conversations using the Cable TV system (e.g. PacketCable)
Video Conferencing (i.e. H.323)
Digital Cell Phones (i.e. GSM)
Telephone conversations over the Internet (i.e. Net2Phone)

Large businesses are at the forefront of deploying VoIP and have been able to justify the investment. Moving phone numbers on analog phones is a costly process (~$100/phone) but for IP phones, it only requires the user to take their phone from the current location and plug it into a data port at their new location.

VoIP phones can be tied into larger systems to gain the benefit of sophisticated call services controlled by a computer system. In call centers, the call and the data for the call are instantly correlated. Customer contact management for sales personal is so much easier because the contact database can automatically make the call and the computer can simultaneously pull up current orders, backorders and tie that information into the company inventory control systems.

Within the backbone transport systems, VoIP has been also shown to be cost effective. The process of bringing many low speed (64 kbits/sec) lines together onto OC-48 (2.4 Gbits/sec) is expensive in TDM systems. In data networks, multiplexing low speed links to high-speed links is very economical.

On the consumer side, the cable system operators are deploying VoIP technology in the home using the PacketCable architecture of CableLabs.

These applications are the driving factors in allowing manufactures to make equipment, service providers to offer services, and customers to increase their productivity. The VoIP technology only becomes useful when compelling applications meet the needs of customers.
The chapter Compelling VoIP Applications starting on page 10 provides more details on these applications and the advantages for the customer.

The seminar on VoIP Applications has more details about the VoIP applications.

Why has VoIP deployment been so slow?

While the benefits of packetized voice outweigh the disadvantages, it should be noted that these disadvantages have contributed to the slow adoption of the technology. Among the disadvantages are:

For any given compression algorithm, it takes extra bytes for voice packets. The TCP and IP headers have additional bytes not carried in the current voice networks.
To get lower bandwidth, the voice compression algorithms and echo cancellation requires additional processing power that makes digital phones more expensive than analog phones.
The data networks have had difficulty providing low enough delay and high enough reliability that customers expect.
As a practical matter, VoIP can't be deployed instantly everywhere which means that there must be connections between the current voice networks and VoIP networks. There are many complexities associated with standardizing the way to interconnect these networks and the equipment can be costly.

When businesses expand their usage of VoIP technology and the consumers adopt this technology, the access networks will finally be able to achieve the critical mass required to allow the economical conversion to VoIP.

The seminar on VoIP Problems has a more complete discussion of some of the challenges of the technology.

The VoIP Technology

The main aspects of understanding the VoIP technology are: controlling the call, methods of encoding (digitizing) the voice, and interconnection with today's Public Switched Telephone Network (PSTN).

Controlling a Call

In today's PSTN, there are three types of control (signaling) being performed for a call: supervision, alerting, and addressing. Supervision monitors the state of the phone which allows the central office to know when the receiver has been picked up to make a call or when a call is terminated. Alerting is the notification at the destination that a call is present (ringing) and also simple call progress tones during a call (i.e. busy signal and ringback). Addressing enables the user to dial a specific phone anywhere in the world.

In VoIP, these same functions need to exist and they are invoked by sending appropriate messages between the various elements that control the call. There are also many extensions to each of the control categories. There are advanced services such as Caller-ID, Call Waiting, three way calling, and voice mail that need to be provided in the VoIP system.

Coding Algorithms

There are several approaches to digitizing the voice samples. These approaches vary by the information that is transmitted, the complexity of the algorithm, and the assumptions of the sound being transmitted (e.g. voice, fax, music). Different applications select the best voice coding method based on what needs to be accomplished, the amount of bandwidth that the underlying network can supply, and how much the user wants to spend for the call.

The Pulse Code Modulation (PCM) algorithm for digitizing speech makes no assumptions about the sound and therefore does the best job on various types of sounds. It also produces the highest bit-rate for the data and has the shortest delay. The basics of the various PCM algorithms, which includes ADPCM (Adaptive Differential Pulse Code Modulation) and DPCM (Differential Pulse Code Modulation), is that the algorithm samples the data at fixed time intervals (i.e. 8,000 times/second) and then generates a number based on each sample.

Another way to sample speech is to use a model of the way people generate speech. In an algorithm such as Linear Predictive Coding (LPC), the human vocal tract has an excitation source and a vocal tract that has constrictions in it. People change the constriction points to make various sounds. LPC uses a series of filters that accomplish a similar task. In the LPC algorithm, the filter coefficients and the excitation type are the only information that needs to be transmitted.

LPC coding algorithms require a large amount of processing power and provide the lowest data rate. LPC works well for sending human speech sounds, not very well for music and it does not work at all for transmitting fax (or computer modem) sounds.

A third type of algorithm used for digitally representing sounds is to use the frequency of the sounds. Instead of sampling the waveform in fixed units of time, the sound is represented in units of frequency. This works well for speech since vowels are low frequency and consonants are high frequencies. This third type of algorithm is called a Sub Band Coder (SBC).

There are also algorithms that use a mixture of these algorithms and produces adequate sound quality with medium bit rates. An example of such a hybrid coder is the Code Excited Linear Prediction (CELP) algorithm.

The following table provides a quick summary of the main voice coding algorithms. The Mean Opinion Score (MOS) is a subjective number indicating how people feel about the quality of the voice signal for that algorithm (higher is better). G.711 is the reference point and this coding algorithm is used in today's public network.

Table 1. Voice Coding Standards

Algorithm	Bit Rate (Kbits/sec)	Compression Complexity	Delay (milliseconds)	MOS
G.711 PCM	64	none	.25	4.4
G.723.1 MPMLQ	6.3	medium	30	3.9
G.723.1 ACELP	5.3	medium	30	3.6
G.726 ADPCM	32	low	.25	4.2
G.728 LD-CELP	16	Very high	3 - 5	4.2
G.729a CS-ACELP	8	High	10	4.2

For more information on these coding standards, please see Voice Coding Algorithms.

Interconnection to the PSTN

VoIP networks and the PSTN (Public Switch Telephone Network) in many instances must work together to deliver a phone call. Connecting these networks together has proved to be very difficult because of the many different types of systems involved, the many different types of interconnections possible, and the billing/regulatory issues associated with combining regulated and non-regulated networks.

Figure 2. Many combinations of the VoIP network are possible

As shown in Figure 2, there are many combinations of networks and devices. These combinations require the existing PSTN network control systems to communicate with the data network control systems.

As an example, assume a call is going to go from a VoIP phone to an analog phone in a different city. The VoIP phone dials the destination phone number. The local data network only knows about its own local IP addresses and so forwards it to the access network which in this case is a city wide VoIP network. Not knowing where that 10 digit phone number is located, the city IP network needs to locate the destination city and then find a data network (if it exists) to get to that particular city.

In the destination city, the data network needs to query the local PSTN analog system and converts the Connect IP messages to the proper signaling message for that type of voice switch. The voice switch then checks to see if the phone is busy and, if so, sends a message back to the IP network. Eventually, some piece of equipment needs to generate a busy signal waveform to send to the VoIP phone.

Converting between IP addresses and 10 digit phone numbers is not a trivial process and involves many steps. In the example shown, the call may be a long distance call subject to certain rate charges, or this may be just a local call. Knowing the underlying regulatory structure is required to provide proper billing.

The IP addresses for the devices may not even be a public address. In many cases, the temporary IP addresses assigned to devices are useable only within the company. In the presence of these private addresses or company firewalls, how does an outsider know how to reach a VoIP phone within a company? What happens when the temporary IP for the VoIP phone changes?

While these problems have solutions, standardizing on a common solution has delayed the deployment of equipment.

For more information on VoIP to the Public network, please see VoIP Problems.

More Information

Additional VoIP seminars:

Voice Coding Algorithms - A description of the various methods for digitizing speech.

VoIP Applications - The VoIP technology only becomes useful when compelling applications meet the needs of customers. The corporate, cable telephony, and video conferencing applications are examined.

VoIP Problems - Deployment of VoIP has been slower than expected because of problems with underlying networks, standardization issues, and network control devices.

In Summary:

Voice over IP carries digitized speech in IP packets.
The major applications for VoIP are in Corporate LANs, Cable Telephony, and Video conferencing.
VoIP has been slow to deploy because of difficulties in underlying networks delivering reliable service and lack of standardization connecting into the existing public network.