← CVA-101 · Module 1: Introduction to VoIP and SIP
Slide 1 / 1

CVA-101 Free interactive preview

Module 1
Introduction to VoIP and SIP

A friendly tour from "what is a phone call?" all the way to the SIP messages that move modern voice traffic. No prior telecoms knowledge assumed.

3parts
6interactives
~40minutes

Use Next / Previous below, or the arrow keys.

Part A · foundations

Before SIP, there were copper lines

To understand why SIP exists, it helps to remember what came before. The next few slides walk through how phones used to work, what changed when audio moved onto the internet, and the problem that VoIP created which SIP had to solve.

Part A · lesson 1

How phone calls worked for 100 years

From the 1880s through the 1990s, a phone call worked like this. Two copper wires ran from your house to a building called a telephone exchange. When you picked up the receiver, the exchange noticed the change in current and gave you a dial tone. When you dialled a number, you were telling the exchange which other pair of wires to connect yours to.

During the call, your voice travelled as an analogue electrical signal down the wire, through the exchange, possibly through a few more exchanges, and out to the other person. The connection was a physical circuit reserved for your call. When you hung up, the circuit was released.

The key idea:

On the old telephone network, your phone number was tied to a specific pair of wires going to a specific physical address. If you moved house, you had to call the phone company to get the wires re-mapped. Your phone could not leave the wall.

Part A · lesson 2

Then voice moved onto the internet

Starting in the late 1990s, engineers worked out how to digitise an audio stream and send it across the internet as a series of small packets. This is Voice over IP, or VoIP.

Instead of a dedicated copper circuit, your voice now travels as IP packets sharing a network with web traffic, email, and everything else. There is no longer a physical wire reserved for your call. The endpoints just need an internet connection and software (or a hardware "IP phone") that can encode and decode the audio.

QuestionOld phone network (POTS)VoIP
What carries the voice?Analogue current on copper wiresDigital IP packets across any network
Where is your phone?Wired to a specific addressAnywhere with an internet connection
What is your "phone"?A piece of hardware on the wallCould be hardware, an app, a browser tab
Who controls features?The phone company's switchesThe endpoints and the software

Part A · lesson 3

VoIP introduced a brand new problem

On the old phone network, finding you was easy: your number was wired to a known address. On VoIP, that assumption breaks. Your "phone" is now software that could be on a desktop at home, a laptop at a coffee shop, a mobile app on a flight to Tokyo, or all three of those at once.

So when someone dials your number, the network faces a question that did not exist before:

The mobility problem:

"Where, on the entire global internet, is the user with this number right now, and how do I deliver an incoming call to them?"

The next slide is the solution. It is the single most important idea in SIP, and everything else follows from it.

Part A · lesson 4 · the big idea

The solution: registration

The trick is to flip the problem around. Instead of the network trying to find the phone, the phone tells the network where it is. Repeatedly.

Every few minutes, your VoIP phone or app sends a small message to a server called a registrar, saying in effect: "Hi, I am Alice. I am currently reachable at this internet address. Please remember that for the next hour."

The registrar writes this down in its database. When someone calls Alice, the network looks her up in the registrar's database and forwards the call to whatever address she most recently registered. If she has moved, her phone has already sent a fresh registration with the new address.

Why this matters:

Registration is what lets the same SIP account ring on your desk phone in the morning, your laptop in the afternoon, and your mobile in the evening. It is the foundation of every modern VoIP feature: call forwarding, mobile twinning, Find Me / Follow Me, hosted PBX, work-from-anywhere. None of this was possible on the old phone network.

Part A check · one question

Quick check before we continue

You can change your answer and re-check as many times as you want.

  1. Why does VoIP need a "registration" mechanism that the old POTS phone network did not?

    On POTS, your number was wired to a physical address. On VoIP, the "phone" is software that can run anywhere, so the network needs the phone to tell it where it currently is. That is what registration does.

Part B · meet SIP

Now: what does SIP actually do?

You now know why we need a VoIP signaling protocol. Part B introduces SIP itself, the cast of characters in a SIP system, and how a single call moves through it.

Part B · lesson 1

What SIP is, in one paragraph

SIP, the Session Initiation Protocol, is a set of rules for the messages that a VoIP system sends behind the scenes. SIP is what carries the "I want to call Bob" message, the "Bob's phone is ringing" message, and the "Bob picked up" message. It is also what carries the registration messages that Part A introduced.

SIP is signaling only. It sets up calls, modifies them, and tears them down. It does not carry the audio. The audio is carried by a different protocol called RTP, which we will meet in Part C.

Two useful comparisons:

SIP is to a phone call what HTTP is to a web page. They look strikingly similar. Both are text-based, both have requests and responses, both have headers. If you have ever poked around a browser's network tab, you already know how to read SIP.

SIP is also like the envelope of a letter, while RTP is the letter itself. The envelope routes the message; the contents say what is being communicated.

Part B · lesson 2 · meet the cast

The endpoints: User Agents

SIP calls every endpoint a User Agent, often shortened to UA. A User Agent is anything that can make or receive a call: an IP desk phone, a softphone app, a meeting room system, even an automated voicemail box.

Within any single transaction (a request and its response), a User Agent plays one of two roles depending on which side it is on:

RoleWhat it doesPlain-English analogy
UAC (User Agent Client) Sends the request. The "caller" side of a transaction. The person dialling the phone.
UAS (User Agent Server) Receives the request and sends back the response. The person whose phone is ringing.
An endpoint is usually both:

Your softphone is a UAC when you make a call (you initiate). It is a UAS when someone calls you (you receive). Same software, different role per transaction.

Part B · lesson 3

The middlemen: proxies, registrars, B2BUAs

Calls rarely go straight from one endpoint to another. They pass through a few SIP servers along the way. Each one has a specific job.

ServerWhat it doesPlain-English analogy
Registrar Accepts REGISTER messages from User Agents and remembers where each user is currently reachable. The phone book the network looks people up in.
Proxy Forwards SIP messages along to the right destination. Reads the headers, may add its own, but does not change the contents of the message. The post office. Routes the envelope, leaves the letter inside untouched.
B2BUA (Back-to-Back User Agent) Looks like a single proxy from the outside, but is really two endpoints glued together. It can change anything in the message, including the audio details. Used by service providers, SBCs, and call recorders. A receptionist who takes one call, hangs up, then makes a new call to whoever should actually handle it.
Why the proxy/B2BUA distinction matters in real life:

If a call's audio settings mysteriously change in flight ("the codec was wrong by the time it reached the other side"), it is almost never a proxy doing it. Proxies cannot. It is almost always a B2BUA. Knowing this saves hours when you debug your first weird call.

Part B · interactive

Try it: drop each role onto the right entity

Drag each description onto the SIP role it describes. Tap-then-tap also works on touch. You can drag chips back out and re-check as many times as you want.

UAC

User Agent Client

UAS

User Agent Server

Proxy

Forwards requests

Registrar

Tracks bindings

B2BUA

Back-to-back UA

  • The endpoint that sends the request, the "calling" side.
  • The endpoint that receives the request and answers it.
  • A SIP server that forwards messages. Reads headers, must not change the call's audio details.
  • A SIP server that records where each user is currently reachable.
  • A SIP server that pretends to be one endpoint and behaves as another. Free to rewrite anything.

Part B · lesson 4

SIP addresses look like email addresses

Every user and server in SIP has a name called a URI. SIP URIs deliberately look like email addresses, with a scheme prefix on the front:

sip:alice@atlanta.com

That is Alice on the atlanta.com domain. The scheme can be sip: for plain SIP, or sips: if the call should be encrypted, or tel: for a plain phone number. URIs can also carry extra parameters after a semicolon, like ;transport=tls or ;user=phone.

Type any SIP URI below and watch the dissector break it apart in real time.

Try these:

Part B · lesson 5

Two URIs per user: AOR and Contact

Now that you know about registration (Part A) and URIs (previous slide), here is a distinction that confuses everyone the first time. SIP gives every user two URIs:

Address of Record (AOR)Contact URI
The "who". Alice's stable, public, business-card identity. The "where". Where Alice's phone or app is right now.
sip:alice@atlanta.com sip:alice@10.4.7.22:5060 (her current laptop's IP)
Stable. The thing you give out and that is dialled. Changes every time her device moves networks.
Appears in From / To headers of SIP messages. Appears in the Contact header. Sent in the REGISTER message.
Memory hook: AOR is "who", Contact is "where, right now". The whole point of registration is to keep an up-to-date map from AORs to Contacts.

Part C · the call in motion

From theory to a real call

In Part C you will see how SIP and the audio relate (separately!), pick the right network transport for different scenarios, and then watch a complete real call play out from "I want to call Bob" through "Bob hangs up".

Part C · lesson 1

SIP and the audio travel separately

Earlier we said SIP is just signaling, not the audio. The audio uses a different protocol called RTP (Real-time Transport Protocol). RTP is the actual stream of small packets carrying encoded voice between the two endpoints.

Beginners often assume SIP and RTP travel together. They do not. SIP and RTP are deliberately decoupled and frequently take totally different paths across the internet. The next slide proves it visually.

Why this is the most important troubleshooting fact in VoIP:

If a call connects but neither side hears anything, that is an RTP problem. Looking at SIP traces will not help you. You need to debug RTP, NAT, and firewalls.

Part C · interactive

SIP signaling vs RTP media: the two paths

Alice softphone Bob softphone Atlanta proxy SIP Biloxi proxy SIP SIP signaling, through proxies RTP audio, two-way, direct between endpoints
Both paths active. SIP signaling threads through the proxies; RTP flows directly (and bidirectionally) between the endpoints because nothing told it to do otherwise. This is why one-way audio is almost never a SIP problem.

Part C · lesson 2

SIP runs over UDP, TCP, or TLS

SIP itself does not care which network transport carries it. The standard defines three options and each has tradeoffs. Pick the wrong one for the scenario and you spend a month chasing weird call failures.

TransportDefault portBest forWatch out for
UDP5060Small messages, simple endpoints, low latencyFragmentation when the message gets big (more than ~1300 bytes)
TCP5060Large messages, persistent connections (helpful through NAT)Connection state, head-of-line blocking
TLS5061Anything carrier-grade or carrying personal dataCert management; sips: URIs require TLS hop-by-hop

The next slide gives you three real-world scenarios. Pick the transport you would use.

Part C · interactive

Pick the right transport for each scenario

Scenario A. A small office PBX with 30 desk phones on the same LAN. Calls are short, traffic is light, no sensitive data on the wire.

Scenario B. A SIP trunk between two service-provider boxes that routinely carries large messages with multiple codec offers and a 1.8 KB SDP body.

Scenario C. A hospital deploying softphones over the public internet. Patient information may appear in call metadata.

Part C · lesson 3

The call you are about to watch uses six SIP methods

SIP requests are named by a "method", a verb at the start of the message. You will see these on the next slide. Keep this glossary handy.

MethodWhat it means in the call
REGISTER"Here is where I am right now." (Part A.)
INVITE"I want to start a call with you."
100 TryingA response: "I got your INVITE, working on it."
180 RingingA response: "The destination phone is ringing."
200 OKA response: "The other side picked up." (Or "your request worked.")
ACK"I confirm I got your 200 OK." Only used after an INVITE/200 cycle.
BYE"I am hanging up."

The 1xx, 2xx, 3xx ... codes follow HTTP's pattern exactly: 1xx = informational, 2xx = success, 3xx-6xx = various errors.

Bonus interactive · the marquee feature

A complete real call

Now you have all the vocabulary. On the next slide you will watch the call unfold one message at a time. Press Play, or click any message in the ladder to jump to it and inspect the actual SIP text.

Animated call-flow trace

INVITE → 200 OK → ACK → BYE

Alice Atlanta proxy Biloxi proxy Bob

Click any message in the ladder, or use Play / Step. The full SIP message will appear here.

Knowledge check

Module 1 quiz: 6 questions

Answer all six, check, change any wrong answers, check again. The explanations under each question are the part that actually teaches you.

80% (5 of 6) is a pass.

Knowledge check · 6 questions

Module 1 quiz

  1. 1. Why does VoIP need registration when the old POTS phone network did not?

    On POTS, your number was wired to a physical address. On VoIP, your endpoint can be anywhere; registration is how it tells the network where it currently is.

  2. 2. Which entity is allowed to change the audio details (the message body) of a SIP call in flight?

    Proxies forward the message but cannot change its body. A B2BUA acts as two endpoints glued together and can rewrite anything, which is why it is the answer when the audio settings change in flight.

  3. 3. SIP signaling and RTP audio always travel along the same path between two endpoints.

    SIP and RTP are independent. Lesson 1 of Part C showed RTP going directly between endpoints while SIP threaded through proxies. This is the most common case.

  4. 4. What is a User's Address of Record (AOR)?

    AOR is "who". The Contact URI is "where, right now". Registration's job is to map one to the other.

  5. 5. A call connects but neither side hears any audio. Where do you start debugging?

    If the call connected, SIP did its job. Audio is RTP. No audio means RTP, NAT, or firewall. Reading SIP traces will not help you find an RTP problem.

  6. 6. In the call you stepped through, why does Bob send a 200 OK and Alice respond with ACK?

    INVITE is special: it gets a three-message sequence (INVITE, 200, ACK). All other methods (BYE, REGISTER, OPTIONS ...) are two messages, request and response.

Module 1 complete

That's Module 1.

You have just experienced one module of CVA-101 in the format every module follows: short conceptual lessons, an animated set-piece, and a knowledge check at the end.

The full CVA-101 has eight more modules built the same way: SIP message anatomy in detail, response codes, transactions and dialogs, registration in depth (including authentication), SDP and the offer/answer model, troubleshooting, and CVA exam prep, plus six Wireshark labs on real captured traces.

Ready for the rest?

CVA-101 is the first half of the CVA, Certified VoIP Associate, credential.

See pricing Back to the CVA-101 outline