VoIP calls consists of 2 main parts. The call management section is the part which gets the call going. It signals call initiation, the ringing, the disconnect, and other communication performed between the 2 endpoints in order to get the call going. The second part is the audio, which is transmitted using RTP. Since the bandwidth consumed by SIP is insignificant, we shall focus on calculating the bandwidth consumed by audio in the rest of this article.
Since raw audio can be rather large, it needs to be encoded before it is sent on the network. This is done using a codec. Different codecs produce different audio quality, consume different levels of bandwidth, and some are more CPU intensive than others. Thus it is important that you select the right codec for your application.
Before we delve into the differences of the common codecs, let’s introduce another principle which would allow us to accurately calculate the bandwidth utilised. When you need to send data over the network, the data needs to be packaged. The ‘packaging’ contains information which allows the data to be sent to the destination and to be rebuilt correctly. As you might imagine, the packaging does not come free – it adds to the bandwidth consumption.
There are different network packaging layers (required by the 7 layer OSI model). The encoded audio needs to be packaged into RTP packets. In turn, the RTP packets need to be packaged into UDP packets, which then need to be packaged into IP packets. Ethernet is the most common type of network, and this requires another wrapper.
We shall refer to these packages collectively as overhead. Irrelevant of the codec used, the overhead introduced in the packet is fixed. Below is the overhead introduced by each overhead item:
- RTP – 4.8 kbps
- UDP – 3.2 kbps
- IP – 8 kbps
- Ethernet (not using QOS) – 15.2 kbps
The total overhead is 31.2 kbps.
Now that we understand the basics, let’s proceed to explaining the differences between the common codecs which can be used to encapsulate the Audio in a VoIP call. The following table shows the audio quality expected, the CPU resources required to encode and decode the audio, the base size of the audio packets, and total bandwidth consumption after taking into consideration the overhead.
Note that the above bandwidth consumption is in kilobits per second. You will need to divide by 8 in order to get the equivalent in kilobytes per second. Using the above data, we can come up with the following stats:
|Codec||Audio Quality||CPU Resources||Base Size||Total Size (Base + Overhead)|
|G711||Good||Very Low||64 kbps||95.2 kbps|
|G722||Very Good||Low||64 kbps||95.2 kbps|
|GSM||Acceptable||Average||13 kbps||44.2 kbps|
|G729||Average||High||8 kbps||39.2 kbps|
Here are some notes and suggestions for the application of specific codecs:
|Codec||Kilobits per second||Kilobytes per second||Kilobytes per minute||Megabytes per hour|
- G729 is the codec which consumes the least bandwidth and has a high audio quality. However that comes with 2 drawbacks:
- Its efficiency comes at a cost, which is CPU usage. It is more CPU intensive to encode audio in such a low size while maintaining the quality.
- G729 is a proprietary codec. Because of this, the number of simultaneous G729 calls cannot exceed half your 3CX Phone System simultaneous call license.
- Because of this, G729 should only be used when really required, such as for external calls to VoIP Providers, calls across a bridge, or for remote extensions (basically all calls being done over the internet). You can configure the PBX to fall back to GSM if G729 calls cannot be made.
- Although G711 and G722 consume over twice as much bandwidth as the other codecs, most Local Area Networks are able to handle this bandwidth. Using the above table, a 1 hour call using G711 is equivalent to transferring a 41.8 MB file. If that causes an issue, you should consider upgrading your network.