When communications occur across VoIP the speech from one end is converted to a binary format and is transmitted across a data network to the receiving end. At the receiving end it is converted back to an analog voice signal. A signal may be converted multiple times, depending on the path it takes. The basic process of this conversion is as follows:
First, the audio signal is filtered to 300-3600 Hz, as most human communication takes place in this range. The filtered signal is then sampled at a rate of 8,000 Hz (8000 samples/second), to fit with the Nyquist-Shannon Sampling Theorem. This results in a Pulse Amplitude Modulation (PAM) signal. The PAM signal is then converted into a Pulse Code Modulation (PCM) signal, which is a digital signal. Since we only have a limited amount of space in the PCM signal (it is only 8 bits!) we must find a method of mapping the analog signal to this 8-bit PCM signal. There are two methods used to do this: μlaw and alaw. μlaw is used in North America and Japan, while alaw is used in the rest of the world. These standards assign more bits to represent the lower frequencies where most communication occurs, and fewer bits for the higher frequencies, which contain less communication. These PCM samples are then packaged into packets and shipped across the network.
There are two main components that perform this voice sampling and compression: Digital Signal Processors (DSPs) and CODECs. A Digital Signal Processor is a specialized multiprocessor that performs complex operations on digital signals. A CODEC is an algorithm that converts analog signals into digital outputs. The digital output can be transmitted over a data network and then reconverted to an analog signal at the other end. A CODEC can compresss and convert voice to digital data to minimize bandwidth requirements. A CODEC can compress a standard 64 Kbps voice signal down to as low as 5.38 Kbps. I will go into further detail on CODECs in our next section.