Most VoIP engineers can decode an RTP packet in their sleep. Ask the same engineers what's in an RTCP Sender Report and you'll usually get hand-waving. That gap matters, because RTCP is where the real answer lives when a call "just sounded weird" but the RTP stream looks clean in Wireshark.

RTCP (RTP Control Protocol, defined alongside RTP in RFC 3550) is the out-of-band feedback channel for every media session you've ever set up. It is where endpoints tell each other about the jitter, packet loss, and round-trip delay they are observing right now. Read fluently, it will tell you exactly which leg of a call is misbehaving — often before the customer opens a ticket. This post walks through the four RTCP packet types you'll see in the wild, how to interpret the numbers, and what the values actually mean when you are staring at a trace at 2am.

The four RTCP packet types you'll actually see

RFC 3550 defines five core RTCP packet types. In real-world VoIP traffic you encounter four of them regularly:

  • SR (Sender Report, PT=200) — sent by any participant actively transmitting media.
  • RR (Receiver Report, PT=201) — sent by receive-only participants, or attached to an SR when a participant is also receiving other streams.
  • SDES (Source Description, PT=202) — CNAME, tool identifier, and other identity items.
  • BYE (PT=203) — session termination.

APP (PT=204) is vendor-private and rarely decodes into anything actionable. RFC 4585 (AVPF) and RFC 5104 add NACK, PLI, FIR and similar feedback messages, but those are mostly a WebRTC concern — on a typical SIP trunk you will not see them.

The important behavioural point: every endpoint sends RTCP periodically. The RFC 3550 algorithm scales the interval based on participant count and the 5% RTCP bandwidth budget, but for a two-party voice call you should expect one RTCP compound packet every 5 seconds. A clean 30-second call should therefore produce at least ten RTCP packets. If your trace shows two, something is dropping RTCP — almost always an SBC or firewall with an RTP-only pinhole, or a vendor implementation doing rtcp-mux poorly.

Sender Reports: the NTP timestamp trick

The SR carries a block that a lot of engineers skim past: the 64-bit NTP timestamp and the 32-bit RTP timestamp, both representing the same instant at the sender. This pairing is the hinge of call-quality diagnostics, because it lets the receiver map RTP timestamps (which are codec-relative and start at a random value) back to wall-clock time.

When you combine SR NTP timestamps from both sides with the LSR (Last SR) and DLSR (Delay since Last SR) fields carried in the RR on the return path, you can compute round-trip time exactly. The formula from RFC 3550 §6.4.1:

RTT = A − LSR − DLSR

where A is the NTP time at which the RR was received. LSR is the middle 32 bits of the most recent SR's NTP timestamp that the reporter received, and DLSR is how long it sat on the receiver before the RR was emitted (in units of 1/65536 seconds). Wireshark computes this for you; on a capture you have to build yourself, a four-line Python script is enough.

What to look for:
  • RTT above 150ms on a domestic call. Investigate the media path. An off-net leg, a transcoding SBC, or a mis-pinned route into a remote POP is usually in play.
  • RTT that grows monotonically over the call. Queueing somewhere — classic symptom of a shared uplink being saturated by a large TCP flow, or a bufferbloat-prone CPE.
  • LSR = 0 in the RR. The receiver has never seen an SR. Almost always indicates one-way RTCP, or an SBC that is suppressing SRs in one direction (common with some B2BUA implementations that don't regenerate RTCP correctly).

Receiver Reports: reading loss and jitter honestly

The RR block is where most of the diagnostic gold is. Per SSRC, each RR carries:

  • Fraction lost — 8-bit fixed-point, the fraction of packets lost since the previous report.
  • Cumulative number of packets lost — 24-bit running counter.
  • Extended highest sequence number received — low 16 bits are the sequence number, high 16 are the cycle count.
  • Interarrival jitter — a smoothed estimate of the statistical variance of the RTP packet inter-arrival times.
  • LSR / DLSR — as described above.

The subtle field is interarrival jitter. RFC 3550 §6.4.1 defines it as a smoothed mean of |D(i,j)|, where D is the difference in arrival-time spacing between consecutive packets, with a gain of 1/16 applied on each update. Two things to burn into muscle memory:

  1. It is in RTP timestamp units, not milliseconds. For G.711 at 8kHz, divide by 8 to get ms. For G.722 it's still 8 (legacy timestamp rate bug — see RFC 3551 §4.5.2). For Opus at 48kHz, divide by 48.
  2. It is a rolling smoothed value, not an instantaneous one. Network spikes show up as a slow climb and a slower decay, not a sharp edge. A jitter reading of 20ms does not mean the last packet was 20ms late; it means the long-run average spacing variance is around 20ms.
What to look for:
  • Converted jitter > 30ms sustained. You are eating the safety margin on a typical 60ms adaptive jitter buffer. Concealment or drop-on-underrun will follow.
  • Fraction lost > 1% sustained. MOS is degraded. With G.711 + PLC you'll usually survive; with G.729 or a low-bitrate Opus mode you will hear it.
  • Fraction lost alternating between 0% and 5% across reports. Bursty loss — classic wireless-radio pattern. The endpoint is on Wi-Fi, or upstream of a wireless backhaul.
  • Cumulative lost going negative. Sequence number cycled without the cycle count being updated. Implementation bug; worth raising a ticket with the vendor and pinning the firmware version.

What RTCP can't tell you (and what it can)

RTCP is essentially a "how did the last few seconds of media feel" feedback channel from the far end. That framing matters when you are triaging a ticket.

What RTCP will tell you:

  • Which direction is bad — SR/RR asymmetry between the two endpoints immediately localises a degraded leg.
  • Whether loss is steady or bursty — and therefore whether you are looking at congestion, radio, or a policer.
  • Whether jitter is rising, stable, or decaying.
  • Whether the far side is actually receiving anything from you — if the RR stream stops, it is not just your uplink.

What RTCP will not tell you:

  • Why packets are being lost. That is a layer-below question; take it to the IP/Ethernet trace.
  • Codec-level artefacts — PLC substitutions, FEC recovery, comfort noise insertion, and DTX gaps are not reported.
  • Media content problems (one-way audio caused by SRTP key mismatch, for example — the stream will often look perfect at the RTCP layer).
  • RTP/RTCP mux state. Check the SDP a=rtcp-mux attribute, not the RTCP itself.

One critical gotcha: RTCP cumulative counters persist for the life of the session. If an endpoint was reporting loss twenty seconds ago, cumulative_lost keeps climbing even after the problem clears. Always correlate fraction lost (a delta, per report) with your capture clock. A sudden jump from 0% to 5% between two reports means the event happened in those five seconds, regardless of what the cumulative counter says.

Closing the loop

If you take one thing from this, it should be this: when a customer says "that call had a weird echo / it was choppy / the audio cut out for a second," your first move should be to pull the RTCP, not the SIP. The SIP signalling established the session; the RTCP is what the session actually felt like end-to-end. Engineers who read RTCP fluently are the ones who can pinpoint a quality problem rather than describe it.

SIPT-201 spends a full module on RTCP

Not just the packet format, but the Wireshark and tshark workflows for computing RTT across an SBC, correlating SR/RR asymmetry end-to-end, and diagnosing the specific failure modes that hosted PBX and UCaaS platforms produce in production. If you're the engineer who ends up on those 2am tickets, SIPT-201 is built for you.

Browse the catalog View pricing