REGISTER is the quietest part of SIP and the one that breaks first. The signalling looks trivial, the headers fit on half a page, and the RFC 3261 §10 description seems to say everything you need. Then a softphone drops off the network at minute 31, a UA stops receiving INVITEs five minutes after its laptop sleeps, or a fleet of 4,000 endpoints all refreshes at the same second and takes the registrar offline for a slot of NTP-aligned panic. None of those failures come from anything exotic. They all come from misunderstanding three things: what the registrar actually stores, how the Contact lifetime is negotiated, and which keep-alive mechanism keeps the NAT pinhole open between refreshes.

This post is the field reference we wish every senior VoIP engineer had on the wall.

The model: AOR-to-Contact bindings

The registrar is a database. Its rows are bindings. Each binding has four columns: the Address-of-Record (AOR) the user is reachable at, the Contact URI that says where the UA actually is, the expiry timestamp that says when the binding becomes invalid, and (in any modern deployment) the +sip.instance GRUU/instance-id and reg-id that disambiguate the same AOR registering from multiple devices or multiple flows from the same device (RFC 5626 §4).

That's the entire model. Everything else - Path, Service-Route, q-values, outbound flows, GRUU - is a refinement of one of those four columns.

The lifecycle of a binding is also a one-page story. A UA sends REGISTER sip:registrar.example.net SIP/2.0 with its AOR in the To header, its location in the Contact header, and a requested lifetime in either the Expires header or as a ;expires= parameter on the Contact. The registrar either accepts (200 OK), challenges (401 Unauthorized), or pushes back on the requested lifetime (423 Interval Too Brief, with a Min-Expires header). The UA refreshes before the binding times out, typically at 75-90% of the registrar's granted lifetime. When the UA goes away cleanly, it sends a REGISTER with Contact: * and Expires: 0 to wipe its bindings.

The headers that matter, in order of how often they bite:

Header RFC What it really does
Contact (with optional ;expires=) RFC 3261 §10, §20.10 The location half of the binding. Per-Contact expiry overrides the message-level Expires.
Expires RFC 3261 §20.19 Default expiry for any Contact in the message that lacks its own ;expires=.
Min-Expires (in 423 response) RFC 3261 §20.23 Registrar's floor. UA must reissue with at least this value.
Path RFC 3327 Inserted by an edge proxy on the way IN. Tells the registrar which proxy to traverse on subsequent inbound requests.
Service-Route RFC 3608 Inserted by the registrar in the 200 OK. Tells the UA which proxies to traverse on subsequent outbound requests.
Supported: path, outbound, gruu RFC 3327, RFC 5626, RFC 5627 UA capability advertisement. The registrar/proxy will only insert/honour what was advertised.
Contact ;+sip.instance= and ;reg-id= RFC 5626 §4.2 The two pieces that turn a Contact into a flow-aware binding. The instance-id is per-device; the reg-id is per-flow from that device.

Note the deliberate omission of Route - Route is a header for non-REGISTER traffic that the UA constructs from the Service-Route it learned at registration time. Pre-loaded Route handling for REGISTER itself is a separate edge case that we'll come back to in a later post.

Per-Contact expiry overrides the message Expires

This is the first place engineers get confused. RFC 3261 §10.2.1.1 is explicit: if a Contact has its own ;expires= parameter, that value is the requested lifetime for that Contact. The message-level Expires header is the default only for Contacts that don't carry their own. A REGISTER like this is perfectly legal:

Contact: <sip:alice@192.0.2.10:5060>;expires=300
            Contact: <sip:alice@198.51.100.20:5060>;expires=3600
            Expires: 86400
            

That request asks the registrar to bind two locations with two different lifetimes; the Expires: 86400 is ignored because every Contact has its own value. Multi-Contact registers come up in failover scenarios, in soft-phone-plus-deskphone fork-and-pickup deployments, and in any GRUU + flow-aware configuration. If you see one in a trace and find yourself reading Expires: to determine the binding lifetime, stop - read the ;expires= parameter.

The other place the parameter-versus-header distinction matters: 200 OK responses. A compliant registrar returns the full current binding set in Contact headers, with each Contact's ;expires= parameter set to the actual granted lifetime, which may be less than the UA requested. If the UA asked for 3600 and the registrar's local policy caps at 600, the 200 OK comes back with ;expires=600 on that Contact. The UA must refresh before that timer hits zero, not before its own original 3600 hits zero. If your softphone is dropping off at four-minute intervals when you asked for an hour, the registrar's policy is sending the answer back in the response - look at the Contact in the 200 OK, not the request.

423 Interval Too Brief and Min-Expires

RFC 3261 §20.23 introduces Min-Expires for one purpose: to let a registrar reject a too-short lifetime without dropping the registration. The UA sends Expires: 30. The registrar's floor is 60. Instead of either silently rounding (which would leave the UA refreshing at 30s while the binding lives 60s, doubling the keep-alive cost) or 200 OK at 60 (which would leave the UA refreshing at 30s anyway), the registrar replies:

SIP/2.0 423 Interval Too Brief
            Min-Expires: 60
            

The UA is required to retry with Expires (or per-Contact ;expires=) of at least 60. A non-conformant UA that retries with the same 30 will loop into another 423. We've seen this in production with embedded SIP firmware: the device's "registration interval" config knob is hard-coded to a value below the registrar's floor, and the device fails to register at all. The fingerprint in the trace is unmistakable - REGISTER, 423, REGISTER (same Expires), 423, REGISTER, 423 - and the fix is to raise the device's interval, not to lower the registrar's floor. (Lowering the floor to accommodate one broken device costs you the registration storm and NAT pinhole arithmetic we'll get to in the next section.)

A more interesting 423 case shows up with outbound (RFC 5626): the registrar's floor for outbound is typically higher than for non-outbound bindings, because the outbound spec assumes the keep-alive is doing most of the NAT-pinhole work. Mixing outbound-advertising and non-outbound UAs against the same registrar can produce per-UA-class 423s if local policy isn't matched to advertised capabilities.

NAT pinholes, refresh storms, and the cost of short Expires

The hidden cost of a too-short Expires is signalling load. The hidden cost of a too-long Expires is NAT pinhole loss. The interaction between the two is where most SIP deployments quietly waste bandwidth or quietly lose calls.

The first number you need: most home and SOHO NAT routers expire a UDP mapping after 30 seconds of silence, give or take. Some go to 60. Linux netfilter's default nf_conntrack_udp_timeout_stream is 180s - but only after the conntrack helper has decided the flow is "assured", which doesn't happen for SIP signalling that's mostly one-way idle. The number you can rely on across the residential and small-business installed base is 30s. TCP and TLS NAT mappings expire on the order of minutes (Linux default 432,000s but most commercial NAT/firewall stacks ship 5-30 minutes for idle TCP), which is why you'll see UDP deployments paying a much higher keep-alive cost than TCP/TLS deployments doing the same work.

The second number: a REGISTER refresh on UDP costs roughly 1.5-2.5 KB on the wire (REGISTER + 200 OK, sometimes with a Path header chain) and one round trip. A keep-alive on UDP costs 4 bytes - the CRLFCRLF "double-CRLF" packet defined in RFC 5626 §3.5.1 for the outbound case, or the bare \r\n\r\n ping/pong heartbeat that most embedded UAs send. A keep-alive every 25-28 seconds for 24 hours is ~430 packets and ~13 KB out + 13 KB in. A REGISTER refresh every 60 seconds for 24 hours is ~1,440 transactions and ~2.5-3.5 MB out + in - and that's per UA. Multiply by 4,000 endpoints behind the same SBC and you've engineered a registration storm.

The right pattern, for UDP transport behind NAT, is Expires of 600-3,600 seconds plus a keep-alive every 25-28 seconds. The keep-alive holds the pinhole; the REGISTER refresh confirms the binding to the registrar. Decouple the two. Sending REGISTERs every 30 seconds because "we have to keep the NAT open" is a misconfiguration that wastes ~30× the signalling bandwidth a proper keep-alive would.

The third number: refresh storm timing. If 4,000 UAs all set Expires to 3,600 and all register at the moment the SBC reboots, you'll see all 4,000 refresh at minute 54 (90% of 3,600), then again at minute 108, and so on. The fix is jitter. RFC 3261 §10.2.4 mandates that the UA refresh some time before expiry but does not specify exact timing; a properly written UA jitters the refresh time uniformly in the 75-95% range to spread the load. If you're operating the registrar side and seeing CPU spikes at predictable cadences, your UA fleet isn't jittering. The mitigation pattern, when you can't fix the UAs, is to randomise the granted Expires value at the registrar by 5-10% to break the synchrony - most registrars expose this as a "registration jitter" or "expires randomization" setting.

RFC 5626 outbound: the spec that made flow-aware registration mean something

The original RFC 3261 model treats the AOR-to-Contact map as a flat list. When two devices register the same AOR, the registrar has two Contacts; when an inbound INVITE arrives, it may fork to both. That model breaks the moment a single device is reachable over two flows (a mobile phone with both WiFi and LTE, an SBC client with primary and backup) or when the same logical user has the same softphone running on a laptop and a phone simultaneously and you want server-controlled failover instead of parallel forking.

RFC 5626 introduced two parameters that turn the flat list into a flow-aware map. +sip.instance is a UUID per device (a Globally Routable User Agent URI's anchor); reg-id is a number 1..2^32-1 per flow from that device. The combination (AOR, instance-id, reg-id) uniquely identifies a flow. The registrar stores all flows but routes inbound traffic to one at a time in a server-controlled priority order (sequential failover) instead of forking. The UA declares its instance with Contact: <...>;+sip.instance="<urn:uuid:...>" and the flow with ;reg-id=N.

The relevant headers in the same trace:

REGISTER sip:registrar.example.net SIP/2.0
            ...
            Supported: outbound, path, gruu
            Contact: <sip:alice@198.51.100.20:5060;ob>;
              +sip.instance="<urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6>";
              reg-id=1;
              expires=3600
            

The ;ob parameter on the Contact URI is the outbound-flag - the UA signalling that traffic to this Contact should be routed back over the registered flow rather than to the literal URI host:port. If the UA is behind NAT (it almost always is, in outbound deployments) the literal 198.51.100.20:5060 would be either a private IP or a NAT-mapped port that won't survive the pinhole expiring. The registrar (and any path proxy) uses the flow - i.e., the recorded connection - instead.

This is why outbound deployments require the registrar to insert Path on the way in, the registrar to insert Service-Route in the response, and the UA to honour both for subsequent traffic. Without those two header chains, the AOR-to-flow mapping is meaningless on inbound INVITEs.

Path (RFC 3327) and Service-Route (RFC 3608) without confusing them

Two-sentence rule of thumb. Path is inserted by an edge proxy on the inbound REGISTER and tells the registrar "to reach this UA, route the inbound request back through me." Service-Route is inserted by the registrar in the outbound 200 OK and tells the UA "to send subsequent requests outward, put these proxies in your Route header."

They are mirror images of each other and they get swapped in conversation constantly.

Path goes in the REGISTER on the way to the registrar. Each edge proxy that the REGISTER traverses can push itself onto the Path header (top-down: closest to the UA is at the bottom, closest to the registrar is at the top, by the time the registrar sees it). The registrar stores the complete Path list as a property of the binding - alongside the Contact, expiry, and instance/reg-id. When an INVITE arrives at the registrar (or, more commonly, at a proxy that queries the registrar's location service), the binding's Path is converted into Route headers on the way back out to the UA. This is what makes the "remember the way back" property of REGISTER work: the breadcrumb trail isn't reconstructed on every inbound call; it's stored at registration time.

Service-Route goes in the 200 OK on the way back to the UA. Each proxy that the registrar's response traverses can be added by the registrar (or by path-aware proxies in cooperation). The UA stores the Service-Route as a per-AOR property and inserts it as the leading Route headers on any subsequent OUT-bound INVITE, BYE, OPTIONS, etc.

Two practical consequences. First, REGISTER itself doesn't use Service-Route (the REGISTER reaches the registrar by host part of the Request-URI, not by Route). Second, a UA that loses its Service-Route memory between REGISTERs (poorly written embedded firmware that doesn't persist it, or a softphone that crashed) will send subsequent INVITEs directly to the registrar's domain instead of through the intermediate proxy chain, often producing a 403 or a 480 from a proxy that expects to be in the path.

Authentication: 401, the nonce, and the cost of a re-challenge

REGISTER almost always traverses an authentication challenge. The UA sends an unauthenticated REGISTER; the registrar replies 401 Unauthorized with a WWW-Authenticate header carrying a realm and a nonce (RFC 3261 §22, with the Digest mechanics in RFC 7616 superseding the obsolete RFC 2617). The UA computes the Digest hash with its credentials and the registrar's nonce and re-sends. The registrar verifies.

Two things that trip people up.

First, the nonce can carry a server-side state - usually an opaque timestamp or sequence - and the registrar can elect to challenge again on a subsequent REGISTER if the nonce has expired. This is what Authentication-Info: nextnonce="..." is for (RFC 7616 §3.5, originally RFC 2617): instead of forcing a 401 on every refresh, the registrar can pass the next valid nonce back inside the 200 OK, and the UA uses that nonce on the next REGISTER. Without nextnonce, every REGISTER refresh on a stale-nonce registrar costs an extra round trip and an extra Digest computation.

Second, the nonce-count (nc) field. A registrar that allows nonce reuse will accept the same nonce on multiple REGISTERs from the same UA as long as nc increments monotonically (nc=00000001, nc=00000002, ...). A registrar that doesn't allow nonce reuse - typically configured for higher security - challenges on every REGISTER. The cost difference is measurable on large fleets; we've seen registrars CPU-bound on Digest computation when nonce reuse was inadvertently disabled.

The interaction with refresh storms: if every REGISTER refresh produces a 401 + REGISTER + 200 OK exchange instead of just REGISTER + 200 OK, the registration storm doubles in transactions and triples in CPU. Confirm in trace that you see Authentication-Info: nextnonce=... in 200 OKs and that subsequent REGISTERs don't trigger a 401.

De-registration and ghost bindings

The clean way for a UA to de-register is Contact: * with Expires: 0. The wildcard tells the registrar "clear all bindings for this AOR"; the zero-expiry tells it to do so now. This is what a softphone should send when the user clicks "log out" or when the OS signals a clean shutdown.

What actually happens in the field is messier. Mobile devices lose IP without sending anything. Laptops sleep mid-REGISTER. Embedded UAs reboot without de-registering. The result is ghost bindings - Contact URIs still recorded in the registrar that don't correspond to a reachable device. Until the binding expires, inbound calls fork to the ghost and either silently time out or produce a 408. Long-Expires deployments have more ghosts than short-Expires deployments. This is one of the few legitimate reasons to choose a shorter Expires than the keep-alive arithmetic strictly requires - typically 1,200-1,800 seconds is the sweet spot for residential/mobile fleets where ungraceful disconnects are common.

A subtle source of ghosts is the port-swap on NAT rebinding. The UA's external port behind a NAT is unstable: if the UA's outbound flow has been idle past the NAT mapping timeout, the next packet from the UA gets a new external port, and the registrar (without RFC 5626 outbound) sees this as a new Contact URI rather than the same binding. If the UA's REGISTER refresh interval is longer than the NAT mapping timeout - which is exactly the failure mode we warned about above - every refresh creates a new binding without expiring the old one. The registrar's binding table for a single AOR grows over time as a ghost-binding linked list. Fix: shorten the keep-alive (not the Expires) so the NAT mapping never expires, OR deploy outbound (RFC 5626) so the registrar tracks the flow rather than the literal URI.

q-value forking, GRUU, and "where the call actually goes"

Multiple Contacts under one AOR is the normal state of affairs once a user has more than one device. Without RFC 5626 outbound, the registrar by default parallel forks an inbound INVITE to every Contact whose q-value is highest, then sequentially forks down through lower q-values. The q-value (;q=0.5 on the Contact) is a UA-asserted preference 0..1 that the registrar may honour or override per local policy.

GRUU (RFC 5627) extends the model by issuing a Globally Routable User Agent URI per +sip.instance - i.e., a URI that targets one specific device rather than the AOR. The registrar includes pub-gruu and temp-gruu Contact parameters in the 200 OK, and the UA uses the GRUU as its Contact in dialog-forming requests (so that mid-dialog re-INVITEs and BYEs route to the same physical device that handled the initial INVITE, not to an arbitrary fork target). GRUU is widely supported in modern registrars and underused; many one-way-audio and call-transfer bugs that show up only on multi-device users are really "the mid-dialog request forked to the wrong device" bugs that GRUU prevents.

The interaction between forking and outbound matters: a registrar with both outbound and non-outbound capabilities advertised by the same UA prefers the outbound flow but will fall back to URI-based routing on flow failure. This is the value of reg-id - the UA can advertise primary and backup flows for the same +sip.instance, and the registrar treats them as a failover pair rather than as targets to fork to.

What to look for in a real trace

Five fingerprints we keep on the wall.

(1) Per-Contact expiry mismatch. The UA sent Expires: 3600 and is happy. The registrar returned Contact: <...>;expires=300 in the 200 OK. The UA refreshes at minute 54 (90% of 3,600) - but the binding expired at minute 5. The fix is on the UA: respect the returned ;expires=, not the requested Expires. The fingerprint in the trace is "registrar accepts, then minute-5 inbound call gets 404, then UA registers again only at minute 54."

(2) 423 ping-pong on registration. REGISTER (Expires: 30) → 423 (Min-Expires: 60) → REGISTER (Expires: 30, again) → 423. The UA isn't honouring Min-Expires. Reconfigure its registration-interval knob to >=60.

(3) Ghost-binding accretion under NAT rebinding. Look at the registrar's per-AOR binding table. If you see 4-12 Contacts for the same AOR with very similar but slightly different host:port values, the UA is being rebound by NAT between REGISTERs. Shorten the keep-alive interval to ~25 seconds, or deploy outbound (RFC 5626) and switch the UA to flow-aware bindings.

(4) 401 on every refresh. REGISTER → 401 → REGISTER (with credentials) → 200 OK → minute-54 REGISTER → 401 → REGISTER → 200 OK. The 200 OK doesn't carry Authentication-Info: nextnonce. Either the registrar isn't configured for nonce reuse, or the UA isn't honouring nextnonce. Cost: 2× transactions and ~3× CPU on the registrar. Confirm Authentication-Info in the 200 OK, then fix the configuration.

(5) Service-Route ignored on outbound INVITE. REGISTER → 200 OK with Service-Route: <sip:proxy.example.net;lr>. Later, the UA sends an INVITE directly to the registrar's domain without prepending Route: <sip:proxy.example.net;lr>. The proxy responds 403 or the call routes oddly. The UA isn't honouring Service-Route. Confirm with a tshark filter for sip.Route headers on subsequent requests.

The diagnostic ordering is always the same: read the 200 OK headers first (Contact ;expires, Service-Route, Path, Authentication-Info), then check the UA's refresh cadence in the trace timeline, then check the keep-alive packets between REGISTERs. Most bugs surface in the first step; the timeline and keep-alives confirm the diagnosis.

Putting it together: the registration profile we recommend

For UDP transport behind NAT, on a real-world deployment:

  • Expires: 1200-3600 seconds (registrar-granted; UA-requested may be higher and clamped down by the registrar).
  • Keep-alive: CRLFCRLF every 25-28 seconds (RFC 5626 §3.5.1 if outbound is in play; otherwise the bare double-CRLF heartbeat most UAs implement).
  • Outbound (RFC 5626) advertised and used whenever the UA spec supports it: Supported: outbound, Contact: ...;ob;+sip.instance="..."; reg-id=....
  • Path (RFC 3327) inserted at the edge proxy and stored as part of the binding.
  • Service-Route (RFC 3608) inserted in the 200 OK and respected on subsequent outbound traffic.
  • GRUU (RFC 5627) issued and used on dialog-forming requests for multi-device users.
  • Nonce reuse with Authentication-Info: nextnonce to avoid 401 round trips on every refresh.
  • Expires jitter (registrar-side ±5-10% randomisation, or UA-side 75-95% refresh window) to break refresh-storm synchrony.

For TCP and TLS transports, the keep-alive interval can extend to 60-120 seconds (the underlying NAT mapping is longer-lived), and CRLFCRLF over a long-lived TCP connection is even cheaper than over UDP. The Expires can comfortably extend to 7200-86400 seconds because the binding survives transport-level reconnects, provided the UA implements RFC 5626 outbound flow-recovery semantics.

For TLS specifically (and SIPS URIs), there's a further consideration: the registrar's binding includes the transport, so a UA that registers over TLS and tries to receive an inbound INVITE over UDP will see the registrar route the INVITE over TLS regardless. This is correct per RFC 5630 §3.3 but surprises engineers used to thinking of REGISTER as transport-agnostic - it isn't.


REGISTER is small, but it's load-bearing. If the bindings are wrong, calls don't arrive. If the expiry is wrong, calls drop at predictable minutes past the hour. If the keep-alive is wrong, the NAT pinhole closes and the fleet goes dark in 30-second increments. None of these failures look like SIP failures from the application - they look like network problems. The diagnostic is always in the REGISTER trace, and it's almost always one of the five fingerprints above.

If you want the full mechanics - including the registrar state machine, the Path/Service-Route header insertion rules at the proxy layer, the RFC 7616 SHA-256 Digest transition, the GRUU temp-vs-pub URI lifecycle, the q-value-plus-outbound forking decision tree, and the field-tested keep-alive arithmetic for every transport - those land squarely in SIPT-101 Module 5: Registration and Location Services, part of the Certified VoIP Associate (CVA) track at SIP Train.

That module is one of three CVA modules where the protocol design directly maps to the operational arithmetic you'll do in production. The other two are Module 3 (transactions and dialogs) and Module 7 (troubleshooting). Together they're the bedrock of CVA. If you've read this far, you already have most of the mental model - the course gives you the trace exercises, the registrar-side configuration patterns, and the failure-mode fingerprints to turn it into reliable diagnostic muscle.

Learn more about SIPT-101 SIP Fundamentals


References:

  • RFC 3261 §10, §20.10, §20.19, §20.23 - REGISTER mechanics, Contact, Expires, Min-Expires
  • RFC 3327 - Path header
  • RFC 3608 - Service-Route header
  • RFC 5626 - Managing Client-Initiated Connections in SIP (outbound, +sip.instance, reg-id, ;ob, CRLFCRLF keep-alive)
  • RFC 5627 - Obtaining and Using GRUUs
  • RFC 5630 - Use of TLS in SIP (transport binding in REGISTER)
  • RFC 7616 - HTTP Digest Access Authentication (current; supersedes RFC 2617)
  • RFC 3261 §22, §22.4 - SIP authentication framework

SIPT-101 Module 5 covers REGISTER, Path, Service-Route, and outbound

Registration and Location Services is one of three SIPT-101 modules where the protocol design maps directly to the operational arithmetic you do in production. Module 3 (transactions and dialogs) and Module 7 (troubleshooting) round out the bedrock of the Certified VoIP Associate track. If you've read this far, the course turns the mental model into reliable diagnostic muscle.

Browse the catalog View pricing