WebRTC Explained: What It Is and How It Works

Share with

The world of web-based communication is changing at the speed you might not even imagine. Now, for making a video conferencing call, sharing a file—all that you need is a URL, thanks to the power of WebRTC for opening the door of infinite possibilities for developers and businesses. Developed by Google in 2011 as an open-source framework comprising standards, protocols, and JavaScript APIs, WebRTC or Web Real–Time Communications facilitates the peer-to-peer transfer of audio, video, and data without plugins, native applications, or any third-party proprietary software. Supported by major browsers including Chrome, Safari, Mozilla, and Opera, the scope of new standard Web Real-Time Communication goes beyond the realms of web browsers. It’s indeed an exciting and disruptive technology built on top of free JavaScript APIs. All it requires is a few lines of JavaScript code, and you can enjoy a rich video calling with peer-to-peer data transfers. But wait! it’s not that simple: it leverages a variety of protocols and standards including SDP (Session Development Protocol), SIP (Session Initiation Protocols), NAT, ICE, and UDP/TCP, and many more to provide secure, interoperable (between browsers), real–time browser-based peer-to-peer communication.

Also read: The Most Comprehensive Guide on WebRTC

How WebRTC Works

Similar to any genuinely disruptive technology, WebRTC may initially sound cool and magical! IT folks can be lured by the promises of building the next Zoom application only to discover how complicated it can be to gain an idea of the multiple moving components and understand how all of them fit into the overall piece of the puzzle—called WebRTC. You can, however, get into the skin of things once you understand the basic concept. To enable WebRTC communication, the following four steps are required:

Step 1: Access to media streams through webcam or microphone (carried out via GetUserMedia JavaScript API)

Step 2: Basic network–related information such as ports and IP addresses and this information needs to be shared with another browser through signalling (carried out via RTCPeerConnection JavaScript API)

Step 3: Info about the parameters of media data (carried out via RTCPeerConnection JavaScript API)

Step 4: Transfer media data (carried out via RTCDataChannel JavaScript API)

Broadly, these objectives are achieved with the following three main JavaScript groups:

1. GetUserMedia (or MediaStream API)

The API is used to access a webcam or microphone of the device and enables developers to receive access to the video/audio stream objects. It assists in selecting the desired input user device out of multiple media capture devices. Whether it’s taking a profile picture of a user, collecting audio samples, or recording audio/video—getUserMedia API performs these tasks. For instance, to open a default media device, it works the following way:

A call to getUserMedia () instantly triggers a permission request that needs to be accepted by the user to get access to MediaStream.
In case the permission is denied, it throws PermissionDeniedError.
If it doesn’t find any matching device, it throws NotFoundError.

2. RTCPeerConnection

It is the heart and soul of WebRTC and the most complicated one. It almost performs all tasks taking place inside a peer-to-peer communication. It performs the following functions:

Setting up and creating a peer-to-peer connection

Taking care of session management
Managing all Session Description Protocol (SDP) message exchanges and handling negotiations through ICE candidates (uses STUN and Turn if required)
Encoding and decoding media streams (audio/video/text) in real-time
Handling all network-related issues such as bandwidth estimation, packet loss etc.

Once a peer connection between browsers is established, multimedia streams can be sent to the remote browser. This, however, is not as easy as it sounds because of the following three distinct possible scenarios:

It is likely that both peers might reside within their private networks or behind multiple layers of NAT. Consequently, neither of them is approachable.
They do not have the basic network information, such as IP, port, and location about the other which are vital to establishing communication.
And finally, both will require to traverse the NAT.

A proper understanding of why these scenarios arise at the first place is important. The simple reason is Internet has moved beyond the client-server paradigm long ago.

WebRTC Signaling NAT Traversal

Before starting communication between browsers, it needs three things:

Identify peers
Exchange session descriptions to setup media ports and IPs
Information about media data which is imparted through SDP (Session Description Protocol)

Nowadays, people prefer accessing the web behind firewalls or NAT that masks your original IP address by dynamically changing it. What public at large see the IP can be very different from the original IP of the user hidden behind a firewall, and there are some devices that blocks the unsolicited traffic toward users’ network. Some enterprises don’t allow any traffic to their network without vetting it. As a result, it is not always possible to communicate with the peer browser located in the private network. That’s where the roles of STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relays around NAT) servers come into the picture. This is how the process goes:

A request for public IP address is sent to STUN/TURN servers.

Now, this server responds with the IP address that it perceives correct.
It creates a set of Interactive Connectivity Establishment (ICE) that contains IP address, port, and transport protocols.
With this information about the public IP and port, it easily connects with the peer.
The peer browser, on the other hand, does the same thing while using the STUN or TURN server.

Here it is to be noted that signalling is not a part of the WebRTC framework, it was left out for valid reasons. Different applications might need different protocols and the working group for WebRTC did not want to limit the choices for developers.

3. RTCDataChannel

Apart from audio and video, WebRTC manages the bidirectional transmission of arbitrary data including text chats, games, and other files through RTC DataChannel API. Every data channel is connected through this API.

Summing up…

The importance of WebRTC in peer-to-peer communication is obvious, but multiple factors like Multiple Conferencing Unit (MCU), multitenancy, SIP integration need to be taken into account while building a reliable, robust, and scalable video calling solution. For developers, there is a better way to build a video calling solution—they can opt for a CPaaS service provider like EnableX that offers all features, building blocks, and SDKs to build exciting and scalable video calling solutions. To try it out, simply sign up here for free

About The Author

Jason Wills

Jason works behind the scenes at EnableX, helping to turn complex tech into practical tools that developers and businesses can actually use. With several years of experience in product development and platform architecture, he focuses on making communication technologies simpler, smarter and easier to build with.

Whether he’s writing step-by-step guides, product tips or explaining how our APIs work, Jason keeps things clear and useful.

See author's posts

What is WebRTC And How It Works