Video Transcoding Using GPUs for Efficiency in a Cloud Environment

May 25th, 2017

In previous blogs, we have explored the quantifiable benefits of doing audio transcoding using Graphical Processing Units (GPUs) instead of using Central Processing Units (CPUs). Now I want to turn to video transcoding and discuss how the use of GPUs can achieve similar efficiencies for cloud deployments.

As most of you know, Unified Communications (UC) is becoming pervasive and driving video adoption in the enterprise. And yet transport of this video traffic within an enterprise and between enterprises, is often hampered by technology incompatibilities. Add to this the demands of mobility and Bring Your Own Device (BYOD) and the situation becomes even more complex. What must happen is that “video interworking and transcoding” needs ramp in-line with video adoption.

As I examine this need to ramp, I will start by looking back in time to a few key lessons we learned from the adoption of Voice over IP (VoIP):

  • Historically VoIP interworking and transcoding was performed by highly specialized hardware at low scale. Scaling upwards often meant more hardware at similar densities. While innovation led to increased densities, it did not change the use of specialized hardware
  • As VoIP moved into virtual, cloud deployments, using CPUs for audio transcoding could not prove it would truly scale at a reasonable cost point. A new model was needed to effectively and efficiently provide VoIP interworking and transcoding
  • This new model called for using GPUs. As discussed in a prior blog (Media Transcoding in the Cloud – GPU Performance Assessment), we have already shown how GPUs enables scaling for audio transcoding with much less hardware, less power consumption, and less required rack space.

I see a similar evolution path for video transcoding where the use of GPUs enables efficient scaling resulting in significant performance vs. cost benefit.

From our lab testing, Table 1 below shows some comparative numbers for H.264 <-> H.264 video transcoding using a dual socket 20 core CPU solution* vs a (4) card M60 solution from NVIDIA.

 

Video transcode type (Decode + Encode) # sessions on CPU (20 core server) # sessions on GPU (4 M60 cards)
HD<->HD 20 152
HD<->SD 12 152
SD<->SD 58 280
SD<->HD 22 260

Table 1. Comparative # of sessions using CPU vs GPU for video transcoding

 

As can be seen from Table 1, the use of GPUs increases performance in the range of 5x – 12x depending on transcode type, at an estimated 5x incremental cost. Based on transcode type or scale requirements, the use of GPUs is equivalent or substantially more attractive than CPUs.

But here is the best part. Aggregate performance and compelling densities can be achieved when both audio and video transcoding are handled in a common GPU solution. To look at this we bumped up the CPU configuration to a dual socket 36 core solution* for both video and audio transcoding.

Beginning with Table 2, you can see just the comparison for AMR-WB <–> G.711 audio transcoding.

 

Audio transcode type (Decode + Encode) # sessions on CPU (36 core server) # of sessions on GPU (4 M60)
AMRWB <–> G711 2272 12288

Table 2. Comparative # of sessions using CPU vs. GPU for audio transcoding

 

With GPUs, we see almost a 6x increase in the number of audio sessions that can be handled, at an estimated 4x cost increase. This clearly shows GPUs add substantial value for scale and performance of audio transcoding.

Table 3 shows the aggregate results of putting video and audio transcoding together.

 

Transcode type (Decode + Encode) # of sessions on CPU (36 core server) # of sessions on GPU (4 M60)
HD<->HD / AMR-WB – G711 20 + 1136 152 + 11520
HD<->SD / AMR-WB – G711 12 + 1136 152 + 11520
SD<->SD / AMR-WB – G711 58 + 1136 280 + 11520
SD<->HD / AMR-WB – G711 22 + 1136 260 + 11520

Table 3. Comparative # of sessions using CPU vs. GPU for video and audio transcoding

 

As Table 3 shows, when combined transcoding is done on CPUs, the audio transcoding portion drops by 50%, while using GPUs this decrease is only about 6%. In three of the four transcoding scenarios, the use of a common GPU investment achieves over 10x performance increase for both audio and video transcoding. With results like this, GPUs are clearly the right answer to provide scale and performance for secure interworking and transcoding.

As a company who has been focused on secure interworking and transcoding for real-time communications, for the last 20 years, I know we are ideally positioned to enable GPUs to provide a more optimal customer experience and to ensure performance at scale for both audio and video interworking and transcoding, in our customer’s virtual, cloud networks.

* The CPU configurations mentioned here assume there will be additional 2-4 cores for hypervisor overhead

Ask Us Anything!

Ribbon's team of professionals are ready to answer your questions, guide you to the right solution or help you with your network design.