Showing posts with label CODEC. Show all posts
Showing posts with label CODEC. Show all posts

Media Gateway Controller

The Media Card Gateway controls the interface cards and application cards in the MG1000E Chassis. On a hardware level the MGC provides conference channels and tone generation for the phones. DSP Daughterboards are connected to the MGC to provide tone and conference functions between TDM and IP phones. The MGC is a replacement for the legacy Small Systems Controller (SSC). It has about 10 x the processing power of the SSC and four times the memory. It uses a standard Compact Flash for internal storage and has two internal ports for the installation of DSP Daughterboards. There are a total of six Ethernet ports on the MGC, four on the front (two ELAN, two TLAN), and two that are available through the backplane via a 50-pin to Serial/ELAN/TLAN Adapter.

There are two types of DSP Daughterboards that can be placed into the MGC. The 96-port DSP Daughterboard can be placed in Slot 2 only, and provides 96 channels of voice. The 32-port DSP Daughterboard can be placed in either Slot 1 or Slot 2. Using one or two DSP32 DBs you can achieve 32 or 64 channels of voice, and using a standalone DSP96 DB or a DSP32 and a DSP96 you can achieve 96 or 128 channels of voice. If the system requires more than 128 channels of voice to be converted from IP to TDM a separate Voice Gateway Media Controller card will be required. The DSP resources provide VoIP CODECs, compression, and echo cancellation.

The MGC provides 60 channels of Tone and Digit Switching (TDS), sixteen Digitone Receivers (DTR) or Extended Tone Detector (XTD) units, and additional tone service ports. It also provides two loops of 30 conference units each, for a total of 60 conference channels.

Components of a SIP Network

As I mentioned previously, devices on a SIP network are called User Agents. A User Agent Client and a User Agent Server are both required for communication.

SIP defines three server elements: a Registrar, a Proxy Server, or a Redirect Server. A Registrar server accepts REGISTER requests and is typically co-located with a proxy server or a redirect server. It can offer location services, and registers SIP parties in a SIP domain. A Proxy Server functions as both a server and a client that makes and receives requests on behalf of other SIP clients. It supports requests locally or forwards them to other servers. It can interpret and re-write a SIP request message before forwarding it to another server or User Agent. A Redirect server will accept SIP requests, read the information and map the address in the request to the next hop. It will then return an appropriate message to the client. The client can then directly contact the appropriate next hop. It does not forward SIP requests to other servers and it does not accept calls. A SIP Redirect Server will only respond with 3xx series responses. It is important to note that these three types of SIP servers are logically, and not physically distinct.

In addition to the three server types and the User Agents that are present in a SIP Network, there are also Session Border Controllers (SBC) and gateways on the network. A Session Border Controller is inserted into the signaling path and handles the signaling and media being sent across the network. The SBC can also perform anti-tromboning functions. Tromboning occurs when two devices on the same subnet are both trying to connect through a third server when they should be allowed to connect directly. By directing the two devices to speak directly to each other the trunks can be opened up for use by other devices. An SBC can also implement transcoding when required, but this is not recommended, as was discussed previously. An SBC can restrict call flow or allow connections to exist where they would not have otherwise been able to exist (due to a firewall between the two end-points, or other security policies). It is also interesting to note that the Session Border Controller can enforce interception of network sessions. This is interesting because, without the SBC, there would be no way to intercept the flow of data between two SIP devices as can be required by various law enforcement agencies. This is because under normal circumstances SIP will try to have end-points communicate directly with each other. A Gateway simply acts as a connecting point between a SIP network and a dissimilar network, such as H.323 or PSTN. Any calls that are being sent out to the PSTN will have to go through a gateway.

Bandwidth Requirements

I would like to talk a little bit about the bandwidth requirements for a VoIP network. Bandwidth is a measurement of how much traffic can go through your network.

For lower bandwidth connections (i.e. less than 1Mbps) it is recommended that no more than 50-55% of the total available bandwidth be used for voice traffic. For connections greater than 1 Mbps you can use up to 85% of your available bandwidth for voice traffic. This leaves some headroom on the links to handle bursts of traffic without overloading the link. A bottleneck can occur at any point on the network, including device to switch, switch to switch, switch to gateway, and many more. As such we need to ensure that there is sufficient bandwidth in every pipe on our network to handle the traffic that is being sent.

In addition to the amount of bandwidth required by the data being transmitted, there is additional bandwidth required for every packet sent based on which layer 2 method you are using. For a Frame Relay connection there is a requirement of 6 bytes of overhead. For Point-to-Point Protocol (PPP) there is a requirement for 8 bytes of overhead. Full Duplex Ethernet can be either 38 bytes of 44 bytes – the additional 6 bytes are only used if you have implemented 802.1p, or packet prioritization.

The amount of bandwidth that is required must include the amount of bandwidth required for the CODEC and the amount of bandwidth required for the transport of the packet, including the layer 2 overhead.

Each packet has the following components:
Voice Payload à This is a variable size, and depends on the configuration of your VoIP implementation
RTP Header à 12 bytes, this contains information about the source & destination
UDP Header à 8 bytes, this contains information on the ports used at the source and destination
IPv4 Header à 20 Bytes, included in all IPv4 packets. An IPv6 header is even larger, as the header must include larger addresses and other information

In addition, there are two further components, the Frame Check Sequence of 4 Bytes and the Interframe Gap of 12 Bytes

As we can see, the packet size is 40 bytes plus the size of the payload. As I mentioned earlier, the size of the payload is variable, and depends on the sample length and the CODEC bit rate. The payload size can be calculated using the following formula:

Payload = Bit Rate * Sample Length (s) / 8 bits per byte

This will give us the number of Bytes of payload that needs to be included in the package. We then add to this the amount of overhead from Layer 2 and the amount of IP overhead to give us our total packet size. As an example, let’s take a 20 ms sample size using G.729 (8kbps) over Ethernet using 802.1p:

Payload = 8kbps * 0.02s / 8
Payload = 8000 bps * 0.02s / 8
Payload = 160 bps / 8 = 20 Byte payload size

We then add our overhead to the payload:

Packet size = Payload + Layer 2 overhead + IP Overhead + FCS + IFG
Packet size = 20 Bytes + 44 Bytes + 40 Bytes + 4 Bytes + 12 Bytes
Packet size = 120 Bytes for 20 ms of voice traffic

We can then take this and determine how much bandwidth is required by converting this to bps. We need to determine how many packets per second of voice. As we have seen, 20ms is 0.02 seconds of data, which means that we are using 50 packets per second. So, we multiply our 120 Bytes/packet * 50 packets/second to give us 6000 Bytes per second. We then multiply by 8 to give us bits per second ~ 48000 bps of bandwidth for each direction of a call.

It should also be noted that if we are using the Real-Time Control Protocol we need to add an additional 146 bytes of data to our packet size.

Voice Activity Detection can reduce our bandwidth requirements, as was mentioned before. It enables a data network carrying voice traffic to detect the absence of audio. When audio is absent, it will instead send a signal that indicates that there is no noise, thereby reducing the payload size. A conservative estimate of the bandwidth savings gained using VAD is 30%. VAD is also referred to as Silence Suppression.

EDIT: A Bandwidth calculator is available here.

CODECs

There are four main CODECs used in VoIP. These are G.711, G.729a/b, G.726, and G.723.1.

G.711 is a 64kbps PCM encoding. It uses the most bandwidth and does not offer any compression. It has the best voice quality, as voice signals are transmitted without any loss due to compression. It is also the most fault-tolerant CODEC. If one packet is lost, it will not represent any noticeable amount of disruption to the conversation, as it does not contain a lot of voice data. G.711 has the least processing delay of any CODEC as it does not have to compress the samples, simply sending them along the network uncompressed. This is only recommended for high-speed (100Mbps) LAN implementations, as it does require a great deal of bandwidth. Attempting to transmit G.711 across low-bandwidth pipes will result in undesired jitter.

G.726 can compress data down to 40, 32, 24, or 16 kbps. It uses ADPCM encoding which works by looking at the difference between the previous sample and the current sample, and only sending the difference. It uses less bandwidth than G.711, but sacrifices voice quality.

G.729a/b is able to compress data down to an 8kbps stream. It uses CS-ACELP encoding which works by matching sound patterns in the voice sample to a codebook. It then sends the code (not the sample) across the network, where the receiving end looks up the code and reproduces the sound based on the code. G.729 has near-toll quality sound, meaning that it is almost as good as using an analog phone. It also uses a technology called silence suppression to further increase bandwidth savings. Silence suppression is quite simple – in any conversation there is usually only one end talking, so half of the bandwidth would be zero noise. To deal with this, G.729 sends a silence signal to the other end, which plays a white noise to indicate that there is no voice coming from the other end. G.729 offers the best balance of quality and bandwidth savings, and is the most commonly implemented CODEC in VoIP.

G.723.1 is a standard that covers two encoding rates. It can operate at 5.38kbps using CS-ACELP encoding, much like G.729. It can also operate at 6.4 kbps using MPMLQ encoding. MPMLQ also uses a codebook to translate voice signals into data, but uses a different compression scheme. G.723.1 has the lowest bandwidth requirements, but at the same time has the lowest voice quality of the different CODECs.

So that is a summary of the different CODECs used for VoIP. G.711 is the CODEC of choice when bandwidth is not an issue, and G.729 is the CODEC of choice when bandwidth is an issue.

It should also be noted that transcoding may occur from one CODEC to another as the voice signal is carried across the network. For example, internal communications on the high-speed LAN may be in G.711, but the signal may need to be transmitted at a higher compression rate using G.729 when it is sent across the WAN. When transcoding occurs, there is additional delay caused while the signal is converted. As such, it is generally a good idea to use one CODEC for the entire network. As such, it is recommended to use the most optimal CODEC for ALL parts of the network, not for each segment.

Analog to Digital Conversion

When communications occur across VoIP the speech from one end is converted to a binary format and is transmitted across a data network to the receiving end. At the receiving end it is converted back to an analog voice signal. A signal may be converted multiple times, depending on the path it takes. The basic process of this conversion is as follows:
First, the audio signal is filtered to 300-3600 Hz, as most human communication takes place in this range. The filtered signal is then sampled at a rate of 8,000 Hz (8000 samples/second), to fit with the Nyquist-Shannon Sampling Theorem. This results in a Pulse Amplitude Modulation (PAM) signal. The PAM signal is then converted into a Pulse Code Modulation (PCM) signal, which is a digital signal. Since we only have a limited amount of space in the PCM signal (it is only 8 bits!) we must find a method of mapping the analog signal to this 8-bit PCM signal. There are two methods used to do this: μlaw and alaw. μlaw is used in North America and Japan, while alaw is used in the rest of the world. These standards assign more bits to represent the lower frequencies where most communication occurs, and fewer bits for the higher frequencies, which contain less communication. These PCM samples are then packaged into packets and shipped across the network.

There are two main components that perform this voice sampling and compression: Digital Signal Processors (DSPs) and CODECs. A Digital Signal Processor is a specialized multiprocessor that performs complex operations on digital signals. A CODEC is an algorithm that converts analog signals into digital outputs. The digital output can be transmitted over a data network and then reconverted to an analog signal at the other end. A CODEC can compresss and convert voice to digital data to minimize bandwidth requirements. A CODEC can compress a standard 64 Kbps voice signal down to as low as 5.38 Kbps. I will go into further detail on CODECs in our next section.