Debug Flow
Generate filtered flow trace blocks. Fill the form, copy, paste.
Trace a packet through FortiOS
Fill in a destination or source IP and click Generate.
Manual reference · raw commands
diagnose debug disable diagnose debug reset diagnose debug flow trace stop diagnose debug flow filter clear
diagnose debug flow show iprope enable
diagnose debug enable on a busy box without filters will flood the console and may impact CPU.
Packet Sniffer
Built-in tcpdump-style capture. Useful for proving traffic is or isn't reaching the box.
Build a sniffer one-liner
Fill in at least one host and click Generate.
| Level | Includes | When to use |
|---|---|---|
1 | Packet headers | Quick check that traffic is hitting |
2 | Headers + IP payload | Decode protocol data |
3 | Headers + ethernet | L2 / MAC investigation |
4 | Headers + interface name | Most common, see ingress/egress port |
5 | Headers + IP payload + interface | Full L3 trace |
6 | Headers + ethernet + interface | L2/L3 full dump |
0 l at the end to log packet count + add a local timestamp. Use 0 a for absolute UTC timestamps if you need to correlate with logs.
VPN
IPsec, SSL VPN, and Remote Access. Filtered IKE debugs and status commands.
IPsec / RAVPN IKE debug (filtered)
Enter the peer IP and click Generate.
diag vpn ike gateway diag vpn ike gateway list name <name> diag vpn tunnel list diag vpn tunnel list name <name> get vpn ipsec tunnel summary
get vpn ssl mon get vpn ssl monitor
diag debug application pppoe -1 diag debug application ppp -1 diag debug enable
Concepts · local-id / peer-id / exchange-interface-ip
When terminating multiple dial-up IPsec tunnels on a single public IP, and you have IKE phase-1 configured for each site, explicit Local and Peer IDs are required. Without them, the hub firewall cannot differentiate incoming connections and decide which IKE Phase 1 the tunnel lands on.
Setting these IDs ensures the FortiGate correctly matches the inbound IKE_SA_INIT payload to the specific Phase 1 configuration. Standard practice is to configure the Local ID on the remote sites and the corresponding Peer ID on the DC firewall.
If there is only one dial-up Phase 1 on the DC's public IP, you can safely skip setting IKE IDs. The FortiGate only has one configuration to match against anyway, so it knows exactly where to land the tunnel. Just keep in mind that skipping IDs is a bit of a trap if you ever need to add a second dial-up tunnel to that IP later.
When building the BGP overlay, the hub and spoke need to dynamically exchange IPs to route to each other. You can use set exchange-interface-ip enable to just swap the tunnel interface IPs, but the cleaner approach is peering BGP over loopbacks.
To get loopback peering working across a dial-up setup, use set exchange-ip-addr4 <loopback-ip>. This explicitly hands over the loopback address during the IKE exchange so BGP can establish properly.
Example real-world use case: large multi-spoke overlay where each spoke needs its IPsec tunnel interface IP advertised back so the hub can route. Without it, you'd have to statically configure every tunnel IP.
Reference · KB articles
- VPN troubleshooting:
FD46611 - HA reference:
FD49264 - Yuriskinfo debug cheat sheet:
github.com/yuriskinfo/cheat-sheets
Routing & BGP
Route inspection, BGP peering, soft reconfigs.
get router info routing-table details <host> get router info routing-table database show system interface
get router info bgp sum get router info bgp network # see weight get router info bgp network <x.x.x.x> get router info bgp neighbors <ip> advertised-routes get router info bgp neighbors <ip> received-routes
execute router clear bgp all soft # soft reconfig all execute router clear bgp ip 172.16.x.x # specific neighbor # Inside neighbor: set shutdown enable # take peer down unset shutdown # bring it back
execute router clear bgp all) bounces the TCP session and re-establishes from scratch. Always try soft first.
SD-WAN
Health-check, service mapping, and the FIB interaction model.
diagnose sys sdwan health-check diagnose sys sdwan health-check status diagnose sys sdwan health-check status <name>
diagnose sys sdwan service diagnose sys sdwan member diagnose sys sdwan neighbor diagnose sys sdwan zone
diagnose internet-service id <id> diagnose internet-service match root <ip> diagnose firewall internet-service-app match root <ip>
Concepts · how FIB and SD-WAN rules interact
The FIB acts as the gatekeeper for SD-WAN rules. You cannot apply an SD-WAN rule if the routing table doesn't already agree the traffic belongs in the SD-WAN zone.
Order of evaluation:
- Regular Policy Routes (PBR) — always checked first. If a manual PBR matches with a valid gateway, traffic routes immediately and both SD-WAN rules and the standard table are bypassed.
- FIB lookup — if no PBR match, FortiOS does a standard routing-table lookup before applying SD-WAN rules.
- If the best route points to a non-SD-WAN interface, SD-WAN rules are ignored entirely. This protects LAN-to-LAN and out-of-band management traffic from being hijacked.
- If the best route points to an SD-WAN member (e.g. default route via the sd-wan zone), the SD-WAN engine unlocks.
- SD-WAN rules — evaluated top-down. Application signatures and SLA health pick the best physical member.
- Caveat: the chosen member must also have a valid route to the destination, otherwise FortiGate skips it and evaluates the next.
- Implicit SD-WAN rule — fallback at the bottom. Hands control back to the FIB and load-balances across members using ECMP.
This is why "I added an SD-WAN rule and nothing happened" almost always traces back to step 2: the routing table doesn't see the SD-WAN zone as the best path for that destination.
Firewall
Sessions, NAT, policy lookup, Geo-IP.
# Filter diagnose sys session filter clear diagnose sys session filter src <ip> diagnose sys session filter dst <ip> diagnose sys session filter dport <port> diagnose sys session filter proto 6 # List / count / clear matching diagnose sys session list diagnose sys session stat diagnose sys session clear
diagnose firewall ippool list diagnose firewall ippool-all list diagnose firewall ippool-all stats
config system dhcp server
edit 4
config exclude-range
edit 1
set start-ip 172.16.11.137
set end-ip 172.16.11.137
next
end
end
end
config firewall address
edit "China"
set type geography
set associated-interface "INTERNET"
set country "CN"
next
edit "Russia"
set type geography
set associated-interface "INTERNET"
set country "RU"
next
end
config firewall addrgrp
edit "Geo Blocked Countries"
set member "Russia" "China"
next
end
config firewall policy
edit 76
set name "Geo Block"
set srcintf "INTERNET"
set dstintf "any"
set srcaddr "Geo Blocked Countries"
set dstaddr "all"
set schedule "always"
set service "ALL"
set logtraffic all
set logtraffic-start enable
next
end
HA & System
Failover, sync checks, LTE APN, FortiGuard, performance.
execute ha failover set 1 # force failover execute ha failover status # check execute ha failover unset 1 # revert
get system ha status diag sys ha reset-uptime exec ha manage 1 ct # connect to peer diag system ha checksum show
get system performance status diagnose sys top 1 50 diagnose hardware sysinfo memory diagnose hardware sysinfo conserve get system status
diag fdsm central-management status diag debug rating diag test update info execute update-now show sys fortiguard show sys ntp show sys dns
diagnose sys waninfo ipify
get system arp diag sys arp delete <port> <x.x.x.x> get hardware nic <port>
config system lte-modem
set apn "yesbusinessip"
end
# Wait for the profile-changed prompt, answer Y to reboot.
# Allow up to ~5 minutes for the modem to come back.
config system console
set output standard
end
diag traffictest client-intf port1 diag traffictest server-intf port1 diag traffictest port 5201 diag traffictest run -c 116.216.16.4
SSH crypto · disable vs enable strong-crypto
When to disable: connecting to legacy gear (older switches, OOB consoles) that only supports older MAC algorithms. Note this weakens management plane security; only do it if you must.
config system global
set strong-crypto disable
set ssh-mac-algo hmac-md5 hmac-sha1 hmac-sha2-256 hmac-sha2-256-etm@openssh.com hmac-sha2-512 hmac-sha2-512-etm@openssh.com
end
Restore strong-crypto:
config system global
set strong-crypto enable
set ssh-mac-algo hmac-sha2-256 hmac-sha2-256-etm@openssh.com hmac-sha2-512 hmac-sha2-512-etm@openssh.com
end
Support recovery details
Fortinet AU support: 1 800 043 218
Recovery password: 131703-234421-082764-435666-575696-441881-056496-671858
Keep this collapsed in shared screens.
VoIP & SIP
SIP ALG management for 3CX and similar PBX deployments.
diagnose debug disable diagnose debug reset diagnose debug application sip -1 diagnose debug enable
diagnose sys sip-proxy calls list diagnose sys sip-proxy stats list diagnose sys sip-proxy stats clear diagnose sys sip status diagnose sys sip dialog list diagnose sys sip mapping list
config system settings
set default-voip-alg-mode kernel-helper-based
set sip-expectation disable
set sip-nat-trace disable
end
config system session-helper
delete 13
end
# Clear all sessions and restart the PBX server afterwards.
config system settings
set default-voip-alg-mode kernel-helper-based
set sip-nat-trace disable
set gui-voip-profile enable
set gui-security-profile-group enable
end
config system session-helper
delete 13
end
Restore SIP ALG to factory defaults
config system settings
unset default-voip-alg-mode
unset sip-nat-trace
end
config system session-helper
edit 13
set name sip
set protocol 17
set port 5060
next
end
Concepts
Background knowledge and reference material. More entries to come.
VXLAN
What is VXLAN?
VXLAN (Virtual eXtensible LAN) is a way to carry a Layer 2 Ethernet network over a Layer 3 IP network. In simple terms, it lets you take an Ethernet frame from one switch or server, wrap it, send it across a routed network, and unwrap it at the other end. RFC 7348 describes it as a Layer 2 overlay scheme on a Layer 3 network.
How it works
VXLAN encapsulates the original Ethernet frame inside:
- Outer Ethernet
- Outer IP
- Outer UDP
- VXLAN header
So VXLAN is basically a Layer 2 overlay over a Layer 3 underlay. The standard VXLAN header is 8 bytes, and the VNI is a 24-bit field inside that header. VXLAN runs over UDP with IANA-assigned destination port 4789.
Why VXLAN is used
- Traditional VLANs are limited to 4094 IDs
- Large Layer 2 domains do not scale well
- It allows Layer 2 extension across routed networks
- It provides much larger segmentation using VNI (24 bits, around 16 million segments)
- Commonly used in datacentres and EVPN fabrics
VLAN vs VXLAN
- VLAN = local Layer 2 segmentation
- VXLAN = Layer 2 overlay carried across Layer 3
- VLAN ID = 12 bits
- VNI = 24 bits
A VLAN and a VNI can be mapped, for example VLAN 20 -> VNI 10020, but they are not the same field.
VXLAN packet structure
A VXLAN packet wraps an original Ethernet frame inside:
Outer Ethernet + Outer IP + Outer UDP + VXLAN header + Inner Ethernet frame
VXLAN header (8 bytes)
- Flags: 8 bits
- Reserved: 24 bits
- VNI: 24 bits
- Reserved: 8 bits
The VNI lives inside the VXLAN header, not inside the 802.1Q VLAN tag.
0 7 8 15 16 23 24 31 +--------------------+--------------------+--------------------+--------------------+ | Flags | Reserved | +--------------------+--------------------+--------------------+--------------------+ | VNI (24 bits) | Reserved | +--------------------+--------------------+--------------------+--------------------+
Encapsulation overhead
VXLAN over IPv4 = 50 bytes total:
- Outer Ethernet = 14 bytes
- Outer IPv4 = 20 bytes
- Outer UDP = 8 bytes
- VXLAN header = 8 bytes
VXLAN over IPv6 = 70 bytes total:
- Outer Ethernet = 14 bytes
- Outer IPv6 = 40 bytes
- Outer UDP = 8 bytes
- VXLAN header = 8 bytes
Why UDP and not TCP?
- Lightweight
- Avoids TCP overhead and session handling
- Allows ECMP and load-balancing using UDP source-port entropy
Standard VXLAN uses UDP destination port 4789.
Mental model
- 802.1Q VLAN tag = sticker on the Ethernet frame
- VXLAN header = wrapper around the whole frame
- VNI = label on that wrapper
One-line summary
VXLAN is a Layer 2 overlay over a Layer 3 IP network that encapsulates Ethernet frames using UDP and identifies segments using a 24-bit VNI.
FortiOS Packet Flow Architecture
1. Architecture Overview
To maintain deterministic performance and low latency across enterprise deployments, FortiOS utilizes an architecture called Parallel Path Processing (PPP). When a packet arrives at an interface, FortiGate determines whether the traffic matches an active session in the stateful session table or requires a new session evaluation through the kernel.
[ Ingress Packet Flow ] -> [ Kernel Processing Layer ] -> [ UTM/NGFW Inspection ] -> [ Egress Packet Flow ]
Processing splits into two primary paths:
- The Slow Path (Kernel Session Creation): The first packet of a new flow traverses the complete ingress, routing, and security policy stack to instantiate an entry in the session table.
- The Fast Path (ASIC Acceleration): Subsequent packets matching an established session bypass the core routing and policy evaluation engines, offloading directly to Network Processors (NP6/NP7) or Content Processors (CP9/CP10).
Phase 1: Ingress Packet Flow (Physical to Link Layer)
The packet enters the physical interface transceiver and is placed into the Rx FIFO queue for initial hardware validation baseline checks.
- Network Interface & Driver Layer: The network interface card (NIC) or NP processor validates physical layer integrity. Packets with invalid checksums, malformed structures, or protocol header length mismatches (TCP, UDP, ICMP, SCTP, or GRE) are dropped immediately.
- Stateful Session Lookup: The FortiGate checks the packet's 5-tuple (Source IP, Destination IP, Source Port, Destination Port, Protocol) against the master session table. If a match is found, the packet transitions directly to UTM/NGFW Inspection or Egress processing depending on acceleration status. If no match is found, the packet is flagged as a new flow.
- DoS Policy Inspection: Before consuming CPU cycles in the kernel, traffic is evaluated against configured IPv4 or IPv6 DoS policies. Volumetric checks (such as SYN floods, UDP floods, or ICMP sweeps) are enforced here at the hardware or driver level.
- IPsec VPN Decryption: If the incoming packet matches a configured IPsec tunnel, the IPsec engine decrypts it (accelerated by CP9/CP10). The unencrypted inner packet is then re-injected into the pipeline.
- Admission Control: FortiOS verifies that the packet source or destination does not match the system quarantine list and evaluates captive portal authentication criteria if enforced.
Phase 2: Kernel Processing Layer (Routing & The Gatekeeper Architecture)
For a new session, FortiOS must determine the egress path before evaluating firewall privileges. The Forwarding Information Base (FIB) and SD-WAN rules are deeply intertwined, with the FIB acting as the ultimate gatekeeper.
Order of Evaluation Sequence:
- Destination NAT (DNAT / Virtual IP Lookup): FortiOS evaluates Virtual IPs (VIPs) early in the kernel. If the destination IP matches a VIP configuration, the packet destination is rewritten. Subsequent routing and firewall policy lookups depend entirely on this post-NAT destination IP.
- Regular Policy Routes (PBR): PBR entries are always checked first. If a manually created Policy Route matches the traffic and contains a valid, reachable gateway, FortiGate routes the traffic immediately. When a PBR match occurs, both the SD-WAN engine and the standard routing table (FIB) are completely bypassed.
- The FIB Lookup (The Gatekeeper): If no PBR rule matches, the FortiGate executes a standard Routing Table (FIB) lookup for the destination IP before any SD-WAN logic is considered.
- If the best route points to a NON-SD-WAN interface, the FortiGate completely ignores all SD-WAN rules and routes the traffic directly out of that specific physical or logical interface. This protects internal LAN-to-LAN, inter-VLAN, or out-of-band management traffic from being accidentally hijacked by SD-WAN configurations.
- If the best route points to an SD-WAN member (such as a default 0.0.0.0/0 route pointing to the logical SD-WAN zone or a member interface), only then does the firewall unlock and evaluate the SD-WAN rule base.
- SD-WAN Rules Evaluation: Once unlocked, the engine evaluates your custom SD-WAN service rules sequentially from top to bottom. It reviews application signatures, user groups, and live performance metrics (latency, jitter, packet loss via Performance SLA probes) to select the optimal physical member interface.
- Crucial Caveat: The specific SD-WAN member selected by a rule must also possess a valid route to the destination in the routing table. If a valid route for that specific member does not exist, the firewall skips that member and evaluates the next one in the rule strategy.
- The Implicit SD-WAN Rule (FIB Fallback): If the traffic hits the SD-WAN engine but fails to match any user-defined, custom SD-WAN rules, it falls through to the "Implicit Rule" at the bottom of the stack. This rule hands control back to the FIB, load-balancing traffic across the available SD-WAN member interfaces using standard Equal-Cost Multi-Path (ECMP) routing based on your configured implicit algorithm.
Phase 3: Firewall Policy & Session Management
With the ingress interface, egress interface, and post-NAT IP attributes established, the firewall determines access privileges.
- Firewall Policy Matching (iprope table lookup): The kernel scans the firewall policy list sequentially inside the internal security policy table (known as the
ipropetable). It evaluates match criteria based on source/destination interfaces, source/destination IPs (or Internet Service Database / ISDB objects), service ports, and user schedules. - Implicit Deny: If the packet fails to match any user-defined policy in the
ipropetable, it hits Policy 0 (the implicit deny rule) and is dropped. - Session Helpers / Application Layer Gateways (ALGs): For complex protocols that embed IP/port information within their payloads (such as SIP, FTP, TFTP, or H.323), FortiOS applies built-in session helpers to open dynamic pinholes for secondary data channels.
- Session Instantiation: The kernel allocates an entry in the master session table, transitions the state from dirty to validated, and writes the routing, NAT, and security profile application tags into the session structure.
Phase 4: Security Profile Processing (UTM/NGFW Engine)
If the matching firewall policy contains security profiles, the packet is directed to the inspection engines. FortiOS executes this using two distinct architectural models based on policy configuration.
| Feature / Step | Flow-Based Inspection Mode | Proxy-Based Inspection Mode |
|---|---|---|
Official Engine Core |
IPS Engine (IPS Decoders & IPSA Engine) | WAD Daemon (Worker Application Daemon) |
Memory Footprint |
Low (packets processed stream-style on the fly) | High (buffers connections, acts as termination point) |
Connection Handling |
Original TCP handshake passes through to destination. | Connection is split into two independent segments (Client-to-FGT and FGT-to-Server). |
Inspection Mechanism |
Pattern matching occurs inside packet streams as they transit. | Files/payloads are fully reassembled in memory before inspection. |
Security Engine Sequence (Flow Mode):
- SSL/TLS Decryption: If configured, the built-in CP9/CP10 processor intercepts the TLS handshake for Certificate or Deep Inspection.
- IPS Engine Decoders: The IPS engine applies specific decoders to identify the exact application protocol and format the stream data.
- Parallel Inspection Pass: IPS signatures, Application Control, Local URL Filtering, and Botnet checking happen simultaneously in a single pass accelerated via Content Processors.
- Flow-Based AntiVirus: The AV engine loaded by the IPS process performs stream-based scanning against known malware hashes without waiting for the complete file to download.
Phase 5: Egress Packet Flow (Data Link to Physical Layer)
Once a packet is approved by the routing, policy, and security inspection engines, it enters the final phase before physical transmission.
- Source NAT (SNAT) Translation: The packet headers are modified at this post-processing stage. The source IP is rewritten to the configured IP pool or egress interface IP, and the source port is translated inside the NAT session tracking range.
- Forwarding & Traffic Shaping (QoS): The packet is evaluated against Traffic Shaping policies. If interface bandwidth limits are breached, packets are queued, delayed, or dropped based on priority configurations (such as Strict Priority or Guaranteed Bandwidth allocations).
- IPsec VPN Encapsulation: If the routing table dictates that the packet exit via an IPsec tunnel, the payload is encrypted (AES/3DES), an ESP header is prepended, and the packet is re-routed through the physical path toward the VPN gateway endpoint.
- Layer 2 Frame Assembly (ARP Lookup): The FortiGate updates the Layer 2 headers. It queries its ARP table for the MAC address of the next-hop gateway. The destination MAC is updated with the next-hop hardware address, and the source MAC is updated with the address of the egress physical interface.
- Tx Queue and Transmission: The finalized frame is loaded into the interface Tx FIFO queue, converted into electrical or optical signals by the transceiver, and transmitted onto the wire.
Flow Engineering & Diagnostics Field Reference
get hardware nic <port>
diagnose debug reset diagnose debug flow filter src <ip> diagnose debug flow filter dport <port> diagnose debug flow show function-name enable diagnose debug flow show iprope enable diagnose debug flow trace start 100 diagnose debug enable
config system npu
set fastpath disable
end
802.1X · PEAP-MSCHAPv2
What it is
A password-based EAP method. The RADIUS server presents a certificate so the client can validate it. Once a TLS tunnel is built, the client sends its username and password inside the tunnel using MSCHAPv2. The RADIUS server validates the password against AD.
Server proves itself with a cert. Client proves itself with a password. One-way cert auth on the outside, password auth inside the tunnel.
The three players
| Role | Component | Responsibility |
|---|---|---|
| Supplicant | Client device (laptop, phone) | Sends credentials, validates server cert |
| Authenticator | NAS (AP, WLC, switch) | Relays EAP between supplicant and RADIUS. Does not participate in the auth itself |
| Authentication Server | RADIUS (NPS, ClearPass, FreeRADIUS, RADIUSaaS) | Validates the credentials, makes the policy decision |
EAP runs end-to-end between supplicant and RADIUS. The NAS is a postman. It wraps EAP in EAPOL on the wireless side, repackages it inside RADIUS attributes on the wired side, and forwards it. It cannot read the EAP payload.
What each side needs
| Side | Requirement |
|---|---|
| RADIUS server | Server certificate trusted by clients |
| Client | Username, password, and trust of the RADIUS server's CA |
| Backend | AD account with valid password |
| PKI | Server cert only. No client certs needed |
End-to-end auth flow
- User connects to SSID, supplicant prompted for credentials
- AP wraps EAP-Identity in Access-Request (code 1) to RADIUS on UDP 1812
- RADIUS replies with Access-Challenge (code 11) to start the PEAP TLS handshake
- Multiple Request and Challenge round trips negotiate the TLS tunnel. Server presents its certificate, supplicant validates it, both derive session keys
- Supplicant sends MSCHAPv2 challenge/response inside the encrypted TLS tunnel
- RADIUS validates the MSCHAPv2 response against AD via the Netlogon Secure Channel to a domain controller
- RADIUS queries AD via LDAP for group memberships
- RADIUS walks Network Policies top to bottom, first match wins
- Matched policy injects tunnel attributes into reply
- RADIUS sends Access-Accept (code 2) with Tunnel-Type, Tunnel-Medium-Type, Tunnel-Assignment-Id, MS-MPPE keys
- AP reads Tunnel-Assignment-Id, matches its VLAN Assignment Rule, bridges client into assigned VLAN
- AP uses MS-MPPE keys to derive PMK for WPA2 4-way handshake
- Client gets DHCP from the assigned VLAN's scope
- AP sends Accounting-Request Start (code 4) on UDP 1813
- RADIUS replies Accounting-Response (code 5)
- Periodic Interim-Updates throughout session, Stop on disconnect
Critical truth about the request packet
The Access-Request does not contain a VLAN or tunnel ID. It carries only:
- User-Name
- EAP-Message (outer plaintext, inner encrypted once TLS is up)
- NAS-IP-Address, NAS-Identifier
- Called-Station-Id (AP BSSID + SSID)
- Calling-Station-Id (client MAC)
- NAS-Port-Type (Wireless-802.11)
The tunnel attributes only appear on the return leg (Access-Accept), generated by RADIUS based on policy match.
RADIUS message codes - Authentication (UDP 1812)
| Code | Message | Direction | Purpose |
|---|---|---|---|
| 1 | Access-Request | NAS to RADIUS | Here are credentials, please authenticate |
| 2 | Access-Accept | RADIUS to NAS | Yes, authenticated, here are the attributes |
| 3 | Access-Reject | RADIUS to NAS | No, denied |
| 11 | Access-Challenge | RADIUS to NAS | I need more info, send next EAP message |
RADIUS message codes - Accounting (UDP 1813)
| Code | Message | Direction | Purpose |
|---|---|---|---|
| 4 | Accounting-Request | NAS to RADIUS | Session started, updated, or ended. Log it |
| 5 | Accounting-Response | RADIUS to NAS | Logged, acknowledged |
Acct-Status-Type values
| Value | Meaning |
|---|---|
| 1 | Start - session began |
| 2 | Stop - session ended |
| 3 | Interim-Update - periodic heartbeat |
| 7 | Accounting-On - NAS booted |
| 8 | Accounting-Off - NAS shutting down |
Key RADIUS attributes for VLAN assignment
| Attribute | Number | Type | Value |
|---|---|---|---|
| Tunnel-Type | 64 | Integer | 13 (VLAN) |
| Tunnel-Medium-Type | 65 | Integer | 6 (IEEE-802) |
| Tunnel-Private-Group-ID | 81 | String | VLAN ID, e.g. 520 |
| Tunnel-Assignment-Id | 82 | String | Freeform tag, e.g. 520 or staff-corp |
Aruba Instant typically uses attribute 82. Cisco WLC typically uses attribute 81. Both achieve the same outcome.
RADIUS packet structure
+--------+--------+----------------+--------------------+----------+ | Code | ID | Length | Authenticator | Attribs | | 1 byte | 1 byte | 2 bytes | 16 bytes | variable | +--------+--------+----------------+--------------------+----------+
ID pairs requests with replies. The Authenticator field plus the Message-Authenticator attribute (80, HMAC-MD5 over the packet using the shared secret) provide integrity. Message-Authenticator is mandatory when EAP is in use.
Onboarding workflow for new user category
One-time setup (network team):
- Create RADIUS Network Policy with condition: User Groups contains X
- Set RADIUS attributes: Tunnel-Type=VLAN, Tunnel-Medium-Type=802, Tunnel-Assignment-Id=N
- Create AP VLAN Assignment Rule: if Tunnel-Assignment-Id equals N then bridge VLAN N
- Ensure VLAN N exists on switch, has DHCP scope, has SVI / gateway
Per user (systems team):
- Create user in AD
- Add to the appropriate AD group
AD touches identity only. RADIUS holds the policy. AP holds the rule.
Strengths
- No client-side PKI required, just username and password
- Easy onboarding, especially for BYOD where pushing a client cert is hard
- Works with existing AD passwords, no parallel credential lifecycle
- Cheap and fast to deploy at scale
Weaknesses
- Rogue RADIUS attack surface. If the supplicant is misconfigured (no CA trust enforcement, no server name validation), an attacker can stand up a fake RADIUS server, capture the MSCHAPv2 challenge/response, and crack it offline to recover the password
- Passwords are crackable offline. MSCHAPv2 is a known-weak hash protocol. Capture once, brute force forever
- User-typed credentials. Subject to phishing, shoulder surfing, weak passwords, reuse across services
- Password expiry causes mass WiFi failures. Every password rotation cycle generates support tickets
- Server cert validation is the only thing keeping it secure. If clients are not enforcing it (and many supplicants default to permissive), the whole method falls apart
Common NPS gotcha
On the policy Overview tab, the "Access permission" radio must be set to "Grant access". You can have perfect Conditions, Constraints, and Settings and the policy will still deny if this is wrong. Catches everyone exactly once.
Debug cheat sheet
| Symptom | Likely cause | Where to look |
|---|---|---|
| No reply from RADIUS | UDP 1812 blocked or wrong shared secret | Firewall, RADIUS client config |
| Access-Reject with no reason | Wrong shared secret or AP not registered as RADIUS client | NPS Event Viewer event 6273 |
| Many Challenges then Reject | Cert validation failure | RADIUS cert trust chain, client trust store |
| Auth succeeds, wrong VLAN | Wrong Tunnel-Assignment-Id value or no AP rule match | RADIUS policy attributes, AP rule list |
| Session drops at fixed interval | Session-Timeout reauth failing | Look for reauth attempt in capture |
| Auth works, no accounting | UDP 1813 blocked or NAS not configured for accounting | Firewall, accounting client config |
Capture and log sources
- Wireshark on RADIUS NIC, filter
radius. Cleanest view of all RADIUS traffic - NPS Event Viewer. Custom Views > Server Roles > Network Policy and Access Services. Event 6272 = granted, 6273 = denied. Shows matched policy name and reason
- NPS log files in
%SystemRoot%\System32\LogFiles\in IAS format - SPAN or mirror on AP uplink if RADIUS access not available
Combine packet capture (what flew on the wire) with RADIUS event log (which policy was picked and why) for full diagnosis.
Reference values
| Item | Value |
|---|---|
| RADIUS auth port | UDP 1812 (old: 1645) |
| RADIUS accounting port | UDP 1813 (old: 1646) |
| CoA port | UDP 3799 |
| Tunnel-Type for VLAN | 13 |
| Tunnel-Medium-Type for IEEE-802 | 6 |
| NAS-Port-Type for wireless | 19 |
| RFC for RADIUS | 2865 (auth), 2866 (accounting) |
| RFC for tunnel attributes | 2868 |
| RFC for CoA | 5176 |
| RFC for EAP | 3748 |
802.1X · EAP-TLS
What it is
A certificate-based EAP method. Both sides present certificates. The RADIUS server validates the client's certificate. The client validates the RADIUS server's certificate. The TLS handshake itself is the authentication. No password is ever exchanged.
Server proves itself with a cert. Client proves itself with a cert. Mutual cert authentication.
The three players
Same as PEAP-MSCHAPv2. Supplicant, authenticator, authentication server. EAP runs end-to-end between supplicant and RADIUS. The NAS is a relay.
What each side needs
| Side | Requirement |
|---|---|
| RADIUS server | Server certificate trusted by clients, plus the CA chain to validate client certs |
| Client | Client certificate, private key, and trust of the RADIUS server's CA |
| Backend | AD account or device object the cert can be mapped to |
| PKI | Full PKI: issuing CA, cert enrolment automation, revocation mechanism (CRL or OCSP) |
End-to-end auth flow
- Client connects to SSID
- AP wraps EAP-Identity in Access-Request (code 1) to RADIUS on UDP 1812
- RADIUS replies with Access-Challenge (code 11) containing EAP-TLS Start
- Client sends TLS ClientHello inside Access-Request
- RADIUS replies with ServerHello, server Certificate, CertificateRequest, ServerHelloDone
- Client validates server cert against its trusted CA store
- Client sends its own Certificate, ClientKeyExchange, CertificateVerify (signs the handshake with its private key, proving possession), ChangeCipherSpec, Finished
- RADIUS validates the client certificate: chains to a trusted CA, not expired, not revoked (CRL or OCSP check), CertificateVerify signature valid (proves client holds the private key)
- RADIUS maps the certificate to an identity (cert-to-account mapping, usually via UPN or DNS name in the Subject Alternative Name)
- RADIUS queries AD via LDAP for the mapped account's group memberships
- RADIUS walks Network Policies top to bottom, first match wins
- Matched policy injects tunnel attributes into reply
- RADIUS sends Access-Accept (code 2) with Tunnel-Type, Tunnel-Medium-Type, Tunnel-Assignment-Id, MS-MPPE keys
- AP reads Tunnel-Assignment-Id, matches its VLAN Assignment Rule, bridges client into assigned VLAN
- AP uses MS-MPPE keys to derive PMK for WPA2 4-way handshake
- Client gets DHCP from the assigned VLAN's scope
- AP sends Accounting-Request Start (code 4) on UDP 1813
- Periodic Interim-Updates, Stop on disconnect
What is different from PEAP-MSCHAPv2
| Stage | PEAP-MSCHAPv2 | EAP-TLS |
|---|---|---|
| TLS tunnel built | Yes, to protect inner method | Yes, and it IS the auth |
| Server proves itself | Server cert | Server cert |
| Client proves itself | Password inside the tunnel | Client cert + CertificateVerify signature |
| Inner method | MSCHAPv2 | None, the handshake is the auth |
| Credential on the wire | Encrypted password hash | Nothing secret transmitted |
| Backend validation | AD checks password | RADIUS validates cert, maps to AD account |
| Group to VLAN mapping | RADIUS policy, identical | RADIUS policy, identical |
| MS-MPPE keys delivered | Yes | Yes |
The auth half changes, the policy half does not. Tunnel attributes, VLAN assignment, MS-MPPE key delivery, accounting, all identical.
Device certs vs user certs
This is the design decision that bites schools and corporate fleets. The cert determines what the network sees.
| Cert type | Lives in | What gets authenticated | When auth happens |
|---|---|---|---|
| Device (machine) cert | Local Machine cert store | The laptop | At boot, before any user logs in |
| User cert | Current User cert store | The logged-in user | When the user logs in |
Implications for VLAN assignment:
- Device cert: VLAN follows the laptop. A staff laptop stays on the staff VLAN regardless of who logs in. If a teacher hands the laptop to a student, the student lands on the staff VLAN. This is by design, not a bug
- User cert: VLAN follows the user. When a different user logs in, the supplicant re-authenticates with the new user's cert and lands on a different VLAN
User certs solve the lending scenario but introduce operational cost: every user needs a cert on every device they use, provisioning is per-user-per-device, and the chicken-and-egg of "first login on a new device needs cert enrolment but the network needs the cert" usually requires a wired bridge or a staged onboarding flow.
The common compromise for managed-fleet environments: device certs for the standard fleet, plus a documented policy that loaner devices come from the appropriate VLAN's pool, not by lending across user categories.
Cert-to-account mapping
When EAP-TLS succeeds, the RADIUS server has a validated certificate but no AD account yet. Mapping the cert to an account is what lets policy evaluation proceed.
| Method | How it works | Notes |
|---|---|---|
| UPN in SAN | Cert's Subject Alternative Name contains a UPN (user@domain.com), RADIUS looks up the AD user with that UPN | Most common for user certs |
| DNS name in SAN | Cert's SAN contains the device DNS name, RADIUS looks up the AD computer object | Most common for device certs |
| Explicit mapping | AD account has an altSecurityIdentities attribute with the cert's issuer and serial | Used for high-assurance environments |
| Implicit mapping | RADIUS derives the account from cert fields without explicit linkage | Default for many setups |
Cert-to-account mapping is where many EAP-TLS deployments quietly fail. Symptoms: cert validates fine, but RADIUS rejects because it cannot find a matching account. Look at NPS Event 6273 reason codes.
Strengths
- No password on the wire, ever. Removes phishing, password spray, MSCHAPv2 cracking, password expiry headaches
- Mutual authentication enforced by the protocol. Cannot be silently weakened by client misconfiguration the way PEAP can
- Private key never leaves the device. Often hardware-backed (TPM), so even stealing the device does not yield extractable credentials
- Cleanest path for zero-trust and posture-based access
Weaknesses
- PKI is real infrastructure. Issuing CA, enrolment automation, renewal lifecycle, revocation. None of it is free
- Onboarding friction. Every device needs a cert before it can connect. BYOD and unmanaged devices are painful
- Cert expiry causes silent dropouts. If renewal automation breaks, devices fall off the network with no obvious cause
- Non-domain devices need a separate path. Printers, IoT, AV gear usually fall back to MAB or a PSK SSID
- Troubleshooting is harder. Debugging PKI chains, revocation reachability, SAN mismatches, clock skew, cert-to-account mapping
Common deployment patterns
| Environment | Typical EAP method |
|---|---|
| Managed Intune fleet (corporate / school staff and students) | EAP-TLS with device certs via SCEP or PKCS profiles |
| BYOD / personal devices | PEAP-MSCHAPv2 or a separate PSK SSID with portal onboarding |
| Printers, IoT, AV | MAB (MAC Authentication Bypass) into restricted VLAN |
| Guest WiFi | PSK or captive portal, no 802.1X |
Pure-EAP-TLS-everywhere designs are rare outside high-security environments. Most real deployments mix EAP-TLS for managed devices with PEAP or MAB for everything else.
SCEPman and RADIUSaaS specifics
When using cloud-based RADIUS like RADIUSaaS with SCEPman issuing certs via Intune:
- SCEPman acts as the issuing CA, certs are deployed to devices via Intune SCEP profiles
- RADIUSaaS validates the cert chain and provides the RADIUS endpoint reachable over RadSec (RADIUS over TLS, TCP 2083) from the APs
- Cert-to-account mapping happens against Entra ID rather than on-prem AD
- VLAN assignment via Tunnel-Private-Group-Id or Tunnel-Assignment-Id works the same way
Product capabilities change, verify current SCEPman and RADIUSaaS docs before committing a design.
Debug cheat sheet (EAP-TLS specific)
| Symptom | Likely cause | Where to look |
|---|---|---|
| Client never connects, no auth attempt | No client cert provisioned, or supplicant not configured for EAP-TLS | Client cert store, supplicant config |
| Auth fails immediately after server cert | Client cert not trusted by RADIUS, or cert expired | RADIUS server CA chain config |
| Auth fails after client cert sent | Cert-to-account mapping failure, or revocation check failed | NPS Event 6273, CRL/OCSP reachability |
| Cert valid but no group assignment | AD account exists but not in any group the policy matches | AD group membership, policy conditions |
| Random intermittent failures | Clock skew, CRL fetch timeout, expired intermediate CA | Time sync, CRL distribution points |
| Works on Windows, fails on macOS / iOS | Supplicant cert validation differences, SAN format requirements | Apple requires SAN, not just Subject CN |
Related concepts
- MAB (MAC Authentication Bypass) - fallback for devices that cannot do 802.1X. Same RADIUS server, different policy that matches on Calling-Station-Id (MAC) instead of EAP
- CoA (Change of Authorization) - RFC 5176, UDP 3799. RADIUS pushes session changes (force re-auth, terminate, bounce port) without the client reconnecting. Useful for posture, quarantine, role changes
- RadSec - RADIUS over TLS, TCP 2083. Replaces UDP plus shared-secret with a TLS tunnel. Used by RADIUSaaS and any cloud RADIUS service where AP-to-server traverses the public internet
- EAP-TTLS - similar to PEAP, server cert + inner method, but the inner method is more flexible. Rare in Microsoft shops, more common in mixed environments
Reference values
| Item | Value |
|---|---|
| RFC for EAP-TLS | 5216 |
| RFC for EAP | 3748 |
| RFC for RADIUS | 2865 |
| RadSec port | TCP 2083 |
| CoA port | UDP 3799 |
| Tunnel-Type for VLAN | 13 |
| Tunnel-Medium-Type for IEEE-802 | 6 |
TCP/IP
Scope
Deep reference for packet capture analysis, MTU/MSS sizing, and troubleshooting transport-layer issues. Covers IPv4, IPv6, TCP, UDP at the byte level. The diagrams below follow standard RFC bit-position notation: each character pair represents one bit, bits 0-31 left to right, MSB first.
IPv4 header (RFC 791, 20 bytes minimum, up to 60 with options)
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| IHL | DSCP |ECN| Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TTL | Protocol | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options (if IHL > 5) | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field | Bits | Notes |
|---|---|---|
| Version | 4 | Always 4 for IPv4 |
| IHL (Internet Header Length) | 4 | In 32-bit words. Min 5 (=20 bytes), max 15 (=60 bytes) |
| DSCP | 6 | QoS class. EF=46 (voice), AF41=34, CS6=48 (routing), etc. |
| ECN | 2 | Explicit Congestion Notification. 00=not-ECT, 11=CE (congestion) |
| Total Length | 16 | Header + data, in bytes. Max 65535 |
| Identification | 16 | Fragment reassembly key |
| Flags | 3 | bit0=Reserved (0), bit1=DF (Don't Fragment), bit2=MF (More Fragments) |
| Fragment Offset | 13 | In 8-byte units, offset of this fragment in the original packet |
| TTL | 8 | Hop count. Decremented at each router. 0 = drop + ICMP Time Exceeded |
| Protocol | 8 | Next-layer protocol (see protocol numbers table below) |
| Header Checksum | 16 | Header only, not payload. Recomputed at every hop (TTL changes) |
| Source / Destination | 32 each | IPv4 addresses |
IPv4 fragmentation flags - what to look for in pcap
- DF set, MF clear - normal unfragmented packet. If too big for an MTU on path, gets dropped with ICMP Type 3 Code 4 (Fragmentation Needed). This is how PMTU Discovery works.
- DF clear, MF set - this is a fragment, more to come
- DF clear, MF clear, Fragment Offset > 0 - this is the LAST fragment
- DF set, MF set - illegal combination, indicates a broken sender or malicious packet
IPv6 header (RFC 8200, fixed 40 bytes)
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| Traffic Class | Flow Label | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Payload Length | Next Header | Hop Limit | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + + | Source Address (128 bits) | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + + | Destination Address (128 bits) | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Key differences from IPv4
- Fixed 40-byte header - no IHL field, no options inside the header. Options go in extension headers (Hop-by-Hop, Routing, Fragment, AH, ESP, etc.) chained via Next Header.
- No header checksum - relies on L2 (Ethernet FCS) and L4 (TCP/UDP checksum). Routers don't have to recompute anything per hop.
- No router-level fragmentation - only the source can fragment, via the Fragment extension header. Path MTU Discovery is effectively mandatory.
- Hop Limit replaces TTL - same function, more honest name.
- Next Header replaces Protocol - same numbering scheme, but can point to an extension header rather than directly to L4.
- Flow Label (20 bits) - for ECMP hashing and QoS. Rarely populated by endpoints today.
Common protocol numbers (IPv4 Protocol / IPv6 Next Header)
| Number | Protocol | Where you'll see it |
|---|---|---|
| 1 | ICMP | ping, traceroute, PMTU discovery |
| 2 | IGMP | multicast group membership |
| 6 | TCP | most things |
| 17 | UDP | DNS, DHCP, VXLAN, RADIUS, IPsec NAT-T |
| 41 | IPv6-in-IPv4 | 6in4, 6to4 tunnels |
| 47 | GRE | Generic Routing Encapsulation |
| 50 | ESP | IPsec encrypted payload |
| 51 | AH | IPsec auth header (rare today) |
| 58 | ICMPv6 | v6 ping, ND, RA, PMTU |
| 88 | EIGRP | Cisco IGP |
| 89 | OSPF | OSPFv2/v3 hello and LSA |
| 112 | VRRP / CARP | FortiGate HA, virtual router redundancy |
| 132 | SCTP | signalling, some telco |
TCP header (RFC 9293, 20 bytes minimum, up to 60 with options)
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Data Of| Rsvd |C|E|U|A|P|R|S|F| Window Size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options (if Data Offset > 5) | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field | Bits | Notes |
|---|---|---|
| Source / Destination Port | 16 each | 0-65535 |
| Sequence Number | 32 | Byte position in the stream. Wraps at 2^32 |
| Acknowledgment Number | 32 | Next sequence number expected. Only meaningful if ACK flag set |
| Data Offset | 4 | Header length in 32-bit words. Min 5, max 15 |
| Reserved | 4 | Must be zero (was 3+NS, NS was deprecated by RFC 9293) |
| Control bits (flags) | 8 | CWR, ECE, URG, ACK, PSH, RST, SYN, FIN (see table below) |
| Window Size | 16 | Receive buffer space available, in bytes. Scaled if Window Scale option negotiated in SYN |
| Checksum | 16 | Over pseudo-header + TCP header + data |
| Urgent Pointer | 16 | Only valid if URG set. Rarely used today |
| Options | 0-320 | MSS, Window Scale, SACK Permitted, Timestamps, etc. |
TCP control bits (flags)
| Bit | Hex | Flag | Purpose |
|---|---|---|---|
| 0 (MSB) | 0x80 | CWR | Congestion Window Reduced (ECN response) |
| 1 | 0x40 | ECE | ECN-Echo (peer experienced congestion) |
| 2 | 0x20 | URG | Urgent Pointer field is significant |
| 3 | 0x10 | ACK | Acknowledgment field is significant |
| 4 | 0x08 | PSH | Push buffered data to receiving application now |
| 5 | 0x04 | RST | Reset the connection (abort) |
| 6 | 0x02 | SYN | Synchronize sequence numbers (connection setup) |
| 7 (LSB) | 0x01 | FIN | No more data from sender (graceful close) |
Common flag combinations seen in captures
| Combo | Hex | Meaning |
|---|---|---|
| SYN | 0x02 | Client opening connection |
| SYN + ACK | 0x12 | Server accepting connection |
| ACK | 0x10 | Data acknowledgment, no data this segment |
| PSH + ACK | 0x18 | Data carrying segment, deliver to app immediately |
| FIN + ACK | 0x11 | Graceful close, half-shutdown |
| RST | 0x04 | Hard abort, no negotiation (often used by firewalls) |
| RST + ACK | 0x14 | Hard abort in response to data on a dead connection |
TCP connection states (RFC 9293)
| State | Meaning |
|---|---|
| CLOSED | No connection. Fictional starting state |
| LISTEN | Server waiting for an incoming SYN |
| SYN-SENT | Client sent SYN, waiting for SYN+ACK |
| SYN-RECEIVED | Server got SYN, sent SYN+ACK, waiting for ACK |
| ESTABLISHED | Connection open, data can flow both ways |
| FIN-WAIT-1 | Sent FIN, waiting for ACK or peer's FIN |
| FIN-WAIT-2 | Our FIN was ACKed, waiting for peer's FIN |
| CLOSE-WAIT | Peer sent FIN, we ACKed it. Waiting for local app to call close() |
| CLOSING | Simultaneous close - both sides sent FIN before ACKs arrived |
| LAST-ACK | Sent our FIN (passive close), waiting for final ACK from peer |
| TIME-WAIT | Active closer waits 2*MSL after sending final ACK, to absorb stragglers |
Normal 3-way handshake
Client Server ------ ------ CLOSED LISTEN | | |---- SYN (seq=x) ---->| SYN-SENT | | | |<--- SYN+ACK (seq=y, ack=x+1) -----| | SYN-RECEIVED | | |---- ACK (seq=x+1, ack=y+1) ----->| ESTABLISHED ESTABLISHED
Normal 4-way close (active vs passive)
Active closer Passive closer ------------- -------------- ESTABLISHED ESTABLISHED | | |---- FIN, ACK ---->| FIN-WAIT-1 | | | |<--- ACK -----| FIN-WAIT-2 CLOSE-WAIT | | | (app calls close) | | |<--- FIN, ACK -----| | LAST-ACK | | |---- ACK ---->| TIME-WAIT CLOSED | | (wait 2*MSL) | CLOSED
TIME-WAIT and 2*MSL - what every senior engineer should know
- The side that sends FIN first is the active closer and ends up in TIME-WAIT.
- RFC says wait
2 * MSL(Maximum Segment Lifetime). RFC suggests MSL = 2 minutes, so 2*MSL = 4 minutes default. - Linux: hardcoded at 60 seconds, not configurable (kernel constant
TCP_TIMEWAIT_LEN). - Windows: registry
TcpTimedWaitDelay, default 240s, range 30-300s. - Why it exists: absorb delayed segments from this connection so they don't pollute a new connection on the same 4-tuple, and ensure the peer's FIN gets retransmitted and ACKed if the final ACK was lost.
- Symptom of exhaustion: "Address already in use" when restarting a server on a well-known port. Or ephemeral port exhaustion on a busy client (HTTP load test, monitoring poller).
- Linux tunables:
net.ipv4.tcp_tw_reuse=1(safe to enable, reuses TIME-WAIT for outgoing).tcp_tw_recyclewas removed in 4.12 (NAT-hostile, do not look for it). - Quick check on Linux:
ss -tan state time-wait | wc -l
UDP header (RFC 768, fixed 8 bytes)
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- Length covers header + data (so minimum 8).
- Checksum is optional in IPv4 (set to 0 to skip), mandatory in IPv6.
- No sequence numbers, no flow control, no retransmission. Anything above that lives in the application protocol (QUIC, DNS, RTP, etc.).
MTU / MSS math reference
Standard Ethernet MTU is 1500 bytes (payload after Ethernet header). MSS is negotiated in the SYN/SYN-ACK Options field and is what TCP advertises as its max segment size.
| Path type | Headers consumed | Effective MSS |
|---|---|---|
| Plain TCP over IPv4 | 20 IPv4 + 20 TCP = 40 | 1460 |
| Plain TCP over IPv6 | 40 IPv6 + 20 TCP = 60 | 1440 |
| TCP in PPPoE (DSL, FTTN) | 40 + 8 PPPoE = 48 | 1452 |
| TCP in GRE | 40 + 24 GRE = 64 | 1436 |
| TCP in IPsec ESP (tunnel mode, AES-GCM) | ~40 + 50-60 ESP overhead | ~1380-1400 |
| TCP in VXLAN over IPv4 | 40 + 50 VXLAN = 90 | 1410 |
| TCP in VXLAN over IPv6 | 60 + 70 VXLAN = 130 | 1370 |
FortiGate tip: set tcp-mss-sender and set tcp-mss-receiver on the firewall policy or interface clamp the MSS in transit. Common for VPN paths where PMTU is unreliable.
Path MTU Discovery (PMTUD) - how it breaks
- Sender sets DF flag. Router along path with smaller MTU drops the packet and returns ICMP Type 3 Code 4 (Fragmentation Needed, DF Set) with the next-hop MTU.
- Sender caches the new MTU per destination and reduces segment size.
- Common breakage: firewall along path drops ICMP unreachable. Sender keeps retransmitting full-size DF-set packets, application hangs after the SYN works fine but data stalls. Classic "small requests work, large ones don't" symptom.
- Workaround: MSS clamping at the tunnel ingress (see FortiGate tip above), or set DF=0 to allow router fragmentation (IPv4 only, slow).
Wireshark display filters - deep troubleshooting picks
| Filter | What it catches |
|---|---|
tcp.flags.syn == 1 and tcp.flags.ack == 0 | Connection attempts (SYN only) |
tcp.flags.reset == 1 | RST packets - who killed the connection |
tcp.analysis.retransmission | Wireshark-detected retransmits (loss indicator) |
tcp.analysis.fast_retransmission | Triggered by 3 duplicate ACKs |
tcp.analysis.duplicate_ack | Dup ACKs - receiver telling sender it's missing data |
tcp.analysis.zero_window | Receiver buffer full - application not draining |
tcp.analysis.window_full | Sender has filled the peer's window |
tcp.analysis.out_of_order | Packets arriving out of sequence |
tcp.analysis.lost_segment | Wireshark sees a gap in sequence numbers |
tcp.stream eq N | Isolate one specific TCP conversation |
tcp.window_size == 0 | Same as zero_window but the raw field |
ip.ttl < 5 | Low TTL - either looping or near end-of-life |
ip.flags.df == 1 and ip.len > 1400 | Large DF-set packets - PMTU candidates |
icmp.type == 3 and icmp.code == 4 | ICMP Frag Needed - PMTUD signal |
ip.frag_offset > 0 or ip.flags.mf == 1 | Any IPv4 fragmentation |
BPF capture filters (tcpdump / wireshark capture mode)
| Filter | Effect |
|---|---|
tcp port 443 | HTTPS to/from any host |
host 10.1.1.1 and not port 22 | Everything to/from a host except SSH (don't capture your own session) |
tcp[tcpflags] & (tcp-syn|tcp-fin) != 0 | Any SYN or FIN - connection setup/teardown only |
tcp[13] == 0x02 | SYN only (no ACK) |
tcp[13] == 0x12 | SYN+ACK only |
tcp[13] & 4 != 0 | Any RST |
icmp | All ICMP |
icmp[icmptype] == icmp-unreach | ICMP Type 3 (Destination Unreachable, includes Frag Needed) |
vlan 100 and host 10.1.1.1 | Tagged traffic on VLAN 100 |
net 192.168.0.0/16 | RFC1918 16-bit space |
greater 1400 | Packets larger than 1400 bytes - useful for MTU hunting |
Wireshark TCP analysis flags - what they actually mean
| Flag | Heuristic |
|---|---|
| [TCP Retransmission] | Same seq seen again, more than RTT later. Probably loss. |
| [TCP Fast Retransmission] | Retransmit triggered after 3 dup ACKs (RFC 5681) |
| [TCP Dup ACK] | Receiver got out-of-order data, ACKing the last in-order byte again |
| [TCP Out-Of-Order] | Lower seq arrives after higher seq |
| [TCP Spurious Retransmission] | Retransmit of data already ACKed - sender timed out unnecessarily |
| [TCP Zero Window] | Receiver advertised window = 0, asking sender to pause |
| [TCP Window Update] | Receiver re-opens window after Zero Window |
| [TCP Previous Segment Not Captured] | Capture started mid-stream or capture missed packets - not necessarily a network problem |
| [TCP Keep-Alive] | 1-byte segment with seq one less than expected, used to probe a quiet connection |
Useful ports for network engineers
| Port | Proto | Service |
|---|---|---|
| 22 | TCP | SSH |
| 53 | UDP/TCP | DNS (TCP for AXFR and >512 byte replies) |
| 67/68 | UDP | DHCP server / client |
| 69 | UDP | TFTP (firmware uploads, FortiManager backup) |
| 123 | UDP | NTP |
| 161/162 | UDP | SNMP get / trap |
| 179 | TCP | BGP |
| 443 | TCP | HTTPS, FortiGate GUI, SSL VPN portal |
| 500 | UDP | IKE (IPsec phase 1) |
| 514 | UDP | syslog |
| 520 | UDP | RIP |
| 541 | TCP | FortiGuard (older), FortiManager |
| 636 | TCP | LDAPS |
| 1645/1646 | UDP | RADIUS auth/acct (legacy) |
| 1701 | UDP | L2TP |
| 1812/1813 | UDP | RADIUS auth/acct |
| 2055 | UDP | NetFlow |
| 3306 | TCP | MySQL |
| 3389 | TCP | RDP |
| 3478 | UDP | STUN (Teams, WebRTC) |
| 3799 | UDP | RADIUS CoA |
| 4500 | UDP | IPsec NAT-T |
| 4789 | UDP | VXLAN |
| 5060/5061 | UDP/TCP | SIP / SIP-TLS |
| 6081 | UDP | Geneve (used by Azure GWLB and others) |
| 8443 | TCP | FortiGate GUI alternate, common admin port |
| 10443 | TCP | FortiGate SSL VPN default |
Reference RFCs
| RFC | Title |
|---|---|
| 768 | UDP |
| 791 | IPv4 |
| 792 | ICMP |
| 1191 | Path MTU Discovery |
| 2474 / 3168 | DSCP / ECN |
| 4443 | ICMPv6 |
| 5681 | TCP Congestion Control |
| 8200 | IPv6 |
| 9293 | TCP (current, obsoletes RFC 793) |
QoS
Scope
QoS only matters when there is congestion. When the pipe is wider than demand, QoS does nothing useful. When demand exceeds capacity, QoS decides what gets through first and what waits or drops. The full job is: classify, mark, then enforce (queue, police, shape, drop). This collapsible focuses on the practical reference values and FortiGate-specific behaviour, with deep dives into Microsoft Teams QoS and SD-WAN shaping.
The ToS byte: DSCP + ECN
Byte 1 of the IPv4 header, also Byte 1 of the IPv6 Traffic Class field. Same format, same semantics.
0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ | DSCP |ECN| +-+-+-+-+-+-+-+-+
Top 6 bits = DSCP. Bottom 2 bits = ECN. The legacy "IP Precedence" field was the top 3 bits of DSCP, which is why CS values (Class Selector) line up: CS5 (5 in IPPrec) = 101000 in DSCP = decimal 40.
DSCP codepoint reference
| Name | Decimal | Hex | Binary | Typical use |
|---|---|---|---|---|
| CS0 / DF (Best Effort) | 0 | 0x00 | 000000 | Default. Everything unmarked. |
| CS1 | 8 | 0x08 | 001000 | Scavenger / lower than best effort (bulk backups, P2P) |
| AF11 / AF12 / AF13 | 10 / 12 / 14 | 0x0A / 0x0C / 0x0E | 001010 / 001100 / 001110 | Low priority data, drop precedence 1/2/3 |
| CS2 | 16 | 0x10 | 010000 | OAM, network management |
| AF21 / AF22 / AF23 | 18 / 20 / 22 | 0x12 / 0x14 / 0x16 | 010010 / 010100 / 010110 | Low-latency data (transactional). AF21 = Teams screen share |
| CS3 | 24 | 0x18 | 011000 | Call signalling (SIP, H.323, SCCP) |
| AF31 / AF32 / AF33 | 26 / 28 / 30 | 0x1A / 0x1C / 0x1E | 011010 / 011100 / 011110 | Multimedia streaming (one-way video) |
| CS4 | 32 | 0x20 | 100000 | Real-time interactive (gaming) |
| AF41 / AF42 / AF43 | 34 / 36 / 38 | 0x22 / 0x24 / 0x26 | 100010 / 100100 / 100110 | Multimedia conferencing. AF41 = Teams video |
| CS5 | 40 | 0x28 | 101000 | Broadcast video (legacy) |
| VA (Voice Admit) | 44 | 0x2C | 101100 | Admission-controlled voice (RFC 5865) |
| EF (Expedited Forwarding) | 46 | 0x2E | 101110 | Real-time / voice. Teams audio |
| CS6 | 48 | 0x30 | 110000 | Network control (OSPF, BGP, ISIS). Do NOT use for user traffic |
| CS7 | 56 | 0x38 | 111000 | Reserved. Don't use. |
How to read AFxy
- x (first digit) = the queue / class. 1=low priority data, 2=low-latency data, 3=streaming, 4=conferencing.
- y (second digit) = drop precedence within that class. 1=lowest probability of drop, 3=highest. WRED uses this to decide what to drop first under congestion.
- So AF43 means "conferencing class, drop me first if congested". AF41 means "conferencing class, drop me last".
ECN bits
| Value | Code | Meaning |
|---|---|---|
00 | Not-ECT | Sender is not ECN capable. Drop on congestion. |
10 | ECT(0) | ECN capable. Mark instead of drop. |
01 | ECT(1) | ECN capable. Same as ECT(0) functionally. |
11 | CE | Congestion Experienced. A router along the path is congested. |
An ECN-aware router sees congestion, finds ECT(0)/ECT(1), and flips both bits to 11 (CE). Receiver echoes via TCP ECE flag in the next ACK. Sender reduces window (sends CWR flag in next segment). No packet drop, no retransmit. ECN works only end-to-end if every hop and both endpoints support it. Many middleboxes still zero it out.
802.1Q PCP / CoS (Layer 2 priority)
Inside the 802.1Q tag, the TCI (Tag Control Information) field is 16 bits:
0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PCP |D| VLAN ID (VID) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
PCP (Priority Code Point) = 3 bits, also called CoS or 802.1p. DEI = 1 bit, drop eligibility indicator. VID = 12 bits, VLAN ID 0-4095.
| PCP | IEEE name | Typical use |
|---|---|---|
| 0 | BK - Background | Bulk, lowest priority |
| 1 | BE - Best Effort | Default |
| 2 | EE - Excellent Effort | Important data |
| 3 | CA - Critical Applications | Call signalling |
| 4 | VI - Video | Conferencing video, streaming |
| 5 | VO - Voice | Real-time voice (<10ms latency) |
| 6 | IC - Internetwork Control | OSPF, routing |
| 7 | NC - Network Control | STP, LLDP |
Standard DSCP ↔ CoS mapping
Most vendors copy the top 3 bits of DSCP into the 3-bit CoS field. Useful when a router strips the 802.1Q tag on egress to a routed interface and the L2 priority gets lost - the DSCP survives.
| DSCP | CoS / PCP | Why |
|---|---|---|
| EF (46) / VA (44) | 5 | Top 3 bits = 101 = 5 (Voice) |
| AF41-43 (34-38) / CS4 (32) | 4 | Top 3 bits = 100 = 4 (Video) |
| AF31-33 (26-30) / CS3 (24) | 3 | Top 3 bits = 011 = 3 (Signalling) |
| AF21-23 (18-22) / CS2 (16) | 2 | Top 3 bits = 010 = 2 (Excellent Effort) |
| AF11-13 (10-14) / CS1 (8) | 1 | Top 3 bits = 001 = 1 (Best Effort) |
| CS0 (0) | 0 | Top 3 bits = 000 |
| CS6 (48) | 6 | Network control |
| CS7 (56) | 7 | Reserved |
Trust boundaries
- The trust boundary is the point in your network where you start honouring whatever DSCP/CoS the upstream device set. Inside: trust. Outside: classify and re-mark.
- Access ports (user devices): generally don't trust. Re-mark from CoS 0 / DSCP 0 unless the device is known good (Teams client with GPO, SIP phone with config).
- SIP phone uplink with PC dangling off it: trust CoS from the phone (CDP/LLDP-MED tells you what VLAN and how to trust), don't trust the PC behind it.
- Inter-switch trunks within your admin domain: trust.
- WAN edge (to ISP / MPLS): re-mark to whatever the carrier expects. Some scrub on ingress (set to 0), some preserve, some map DSCP to MPLS EXP. Confirm in writing with the carrier.
- Internet egress: assume markings will be wiped by the first ISP hop. End-to-end DSCP only works inside one admin domain or across a contracted MPLS service.
Microsoft Teams QoS - the canonical configuration
| Traffic type | Source port range | Protocol | DSCP | Class |
|---|---|---|---|---|
| Audio | 50000-50019 | TCP / UDP | 46 | EF (Expedited Forwarding) |
| Video | 50020-50039 | TCP / UDP | 34 | AF41 |
| Application / Screen sharing | 50040-50059 | TCP / UDP | 18 | AF21 |
These are source port ranges on the Teams client. You must enable QoS in the Teams Admin Centre (Meetings > Meeting settings) AND configure the client to insert DSCP markings via GPO/Intune. By default Teams uses any ephemeral port 1024-65535 and classification by port range fails.
Where to enable Teams QoS marking
- Teams Admin Centre tenant setting: Meetings > Meeting settings > Network > Quality of Service. Turn on "Insert Quality of Service (QoS) markers" and lock to the 50000-50059 port ranges.
- Windows clients (domain joined): GPO Policy-based QoS targeting
ms-teams.exe(new Teams) andteams.exe(classic), DSCP 46, source ports 50000-50019, protocol TCP+UDP. Repeat for video and sharing. - Windows clients (Intune managed): NetworkQoSPolicy CSP via Intune. Same OMA-URI settings, applied per-device.
- Windows PowerShell (one-off):
New-NetQosPolicy -Name "TeamsAudio" -AppPathNameMatchCondition "ms-teams.exe" -IPProtocolMatchCondition Both -IPSrcPortStartMatchCondition 50000 -IPSrcPortEndMatchCondition 50019 -DSCPAction 46 - Teams Rooms on Android: tenant-level only. Note: Android Teams Rooms uses DSCP 34 (AF41) for both video AND screen sharing, not 18.
- Teams Rooms on Windows: same as Windows clients.
- macOS / Linux Teams clients: client-side DSCP marking is limited or absent. Mark at the network layer (FortiGate shaping policy matching the port ranges).
Teams QoS gotchas
| Symptom / situation | Cause / fix |
|---|---|
| QoS enabled but packets unmarked in capture | Tenant setting not on. Or client built date predates QoS support. Or admin port range not locked (still using 1024-65535). |
| Markings present LAN-side, absent WAN-side | ISP scrubbed DSCP. Mark again at WAN edge or accept that DSCP only works to the SD-WAN tunnel egress. |
| Audio fine, video and sharing degrade | EF queue is fine but AF41/AF21 are not policed/protected. Build a hierarchical shaper that reserves bandwidth for AF41 too. |
| Mac users complain, Windows fine | Mac client doesn't reliably mark DSCP. Use FortiGate to classify by source port range and mark on the firewall. |
| "Why is my BGP / OSPF flapping under load?" | You marked routing traffic with low DSCP, or you didn't and EF is starving it. Network control traffic should always be CS6. |
| Bandwidth usage estimate | Audio ~100 kbps, video ~1.5 Mbps per stream, screen share ~500 kbps - 4 Mbps. Plan EF queue at audio+headroom, AF41 for sum of expected concurrent video. |
Queueing strategies
| Method | How it works | Trade-off |
|---|---|---|
| FIFO | First in, first out. Single queue. | No QoS at all. Default on simple ports. |
| PQ (Priority Queueing) | Strict priority - high queue served before low. | High can starve everything else. Only safe with policed high. |
| WFQ (Weighted Fair Queueing) | Flows get fair share weighted by IP precedence. | Flow-based, no admin control over which apps win. |
| CBWFQ (Class-Based WFQ) | Admin-defined classes, each with a bandwidth guarantee. | Better but voice can still queue behind data in its class. |
| LLQ (Low Latency Queueing) | CBWFQ + a strict-priority queue for voice/real-time, policed. | Industry standard for voice/video. Police the PQ class so it can't starve others. |
| HQF / H-QoS | Hierarchical: parent shaper sets the ceiling, child classes share within it. | What FortiGate shaping profiles implement. Best for WAN edges. |
Policing vs shaping
- Policing: hard ceiling. Excess is dropped (or re-marked) immediately. No buffering. Low memory, sharp edges, can be brutal on TCP. Use on ingress to protect downstream.
- Shaping: smooth ceiling. Excess is buffered and released at the configured rate. Adds latency but avoids loss. Use on egress to fit the next-hop pipe (ISP CIR, MPLS contract rate).
- FortiGate does shaping, not policing. Conceptually it's a token-bucket egress shaper.
- Rule of thumb: shape down, police up. Shape your egress to the carrier rate. Police user-side ingress to protect your shaper from overflowing.
WRED / RED - drop strategies
- Tail drop (default): queue full, drop new arrivals. Causes global TCP synchronisation - everyone halves window at once, queue drains, everyone ramps up together, queue refills, repeat. Sawtooth utilisation.
- RED (Random Early Detection): probabilistic drop as queue depth grows. Smooths TCP behaviour by hitting individual flows at different times.
- WRED (Weighted RED): per-class drop curves. AF11 starts dropping at 30% queue depth, AF13 at 60% - drop precedence in action. EF rarely uses WRED because it's normally strict-priority and policed instead.
FortiGate QoS - architecture and quirks
- FortiGate shapes on egress only. To limit download speed for users, apply a shaper on the LAN-side interface (the egress for return traffic) - or use the
reversefield on the shaping policy. - Three shaper types: shared (one bucket per policy or aggregate), per-IP (one bucket per source IP within a policy), interface-based / shaping profile (hierarchical, attached to physical interface).
- Shaping policies are separate from firewall policies, evaluated after the firewall policy matches. Order matters.
outbandwidthmust be set on the interface for any shaping profile to work. Without a ceiling, the shaper has nothing to apportion. This is the single most common QoS misconfig on FortiGate.- SD-WAN service rules can match on DSCP and steer based on it (
dscp-forward enable). Useful when CE-marked traffic should prefer MPLS over internet. - Three priority levels: high, medium, low. Within a priority, guaranteed-bandwidth is honoured first, then maximum-bandwidth caps.
FortiGate shaper + shaping policy (Teams audio example)
config firewall shaper traffic-shaper
edit "Teams-Audio-EF"
set maximum-bandwidth 100000
set per-policy enable
set priority high
set diffserv-forward enable
set diffservcode-forward 101110
set diffserv-reverse enable
set diffservcode-rev 101110
next
edit "Teams-Video-AF41"
set maximum-bandwidth 4000000
set per-policy enable
set priority medium
set diffserv-forward enable
set diffservcode-forward 100010
next
edit "Bulk-CS1"
set guaranteed-bandwidth 1000
set maximum-bandwidth 1048576
set priority low
set diffserv-forward enable
set diffservcode-forward 001000
next
end
config firewall shaping-policy
edit 1
set name "Teams audio (port 50000-50019)"
set srcaddr "internal-subnets"
set dstaddr "all"
set service "ALL_UDP"
set srcintf "internal"
set dstintf "virtual-wan-link"
set traffic-shaper "Teams-Audio-EF"
set traffic-shaper-reverse "Teams-Audio-EF"
next
end
For port-range matching you'll typically build a custom firewall service object with the UDP source port range, then reference it in the shaping-policy service field.
FortiGate hierarchical shaping profile (egress)
config firewall shaping-profile
edit "wan-uplink"
set type queuing
set default-class-id 31
config shaping-entries
edit 1
set class-id 5
set priority high
set guaranteed-bandwidth-percentage 30
set maximum-bandwidth-percentage 100
next
edit 2
set class-id 10
set priority medium
set guaranteed-bandwidth-percentage 40
set maximum-bandwidth-percentage 100
next
edit 3
set class-id 31
set priority low
set guaranteed-bandwidth-percentage 10
set maximum-bandwidth-percentage 100
next
end
next
end
config system interface
edit "wan1"
set egress-shaping-profile "wan-uplink"
set outbandwidth 100000
next
end
Match traffic into class IDs via shaping policies (set class-id 5) or via firewall policies. Class IDs 2-31 are admin-defined; class 0 is internal, class 1 is reserved.
FortiGate SD-WAN service rule with DSCP forwarding
config system sdwan
config service
edit 1
set name "Teams-EF-prefer-MPLS"
set mode priority
set priority-members 1 2
set dst "Microsoft-365-services"
set tos 0xb8
set tos-mask 0xfc
next
edit 2
set name "Set-EF-on-egress"
set priority-members 1
set dscp-forward enable
set dscp-forward-tag 101110
set dst "voip-server"
next
end
end
The tos / tos-mask pair matches packets already marked. 0xb8 = 10111000 = EF (46) shifted left 2 bits because ToS field is 8 bits wide. Mask 0xfc ignores the bottom 2 ECN bits. dscp-forward-tag sets a new DSCP on egress through the SD-WAN.
FortiGate QoS troubleshooting
diagnose firewall shaper traffic-shaper list diagnose firewall shaper per-ip-shaper list diagnose firewall shaper interface-shaper list diagnose firewall iprope list 100015 diagnose netlink interface list <intf> get system performance status diag sniffer packet any 'host x.x.x.x' 4 0 a # 'a' includes timestamp; check TOS byte
diag firewall shaper traffic-shaper listshows per-shaper counters: current bandwidth, packets dropped, bytes dropped. Drops here mean your shaper is doing its job.diag firewall iprope list 100015shows shaping-policy rules with the shaper attached (forward and reverse). This is where you confirm a policy is actually getting the shaper you expect.get system performance statusshows CPU. Shaping is CPU-bound on lower-end models, so check before blaming the shaper for slowness.
Wireshark filters for QoS work
| Filter | What it catches |
|---|---|
ip.dsfield.dscp == 46 | EF packets (Teams audio if marked) |
ip.dsfield.dscp == 34 | AF41 (Teams video) |
ip.dsfield.dscp == 18 | AF21 (Teams screen share) |
ip.dsfield.dscp != 0 | Anything marked (sanity check that markings are surviving the path) |
ip.dsfield.ecn != 0 | ECN-capable packets, including CE-marked |
ip.dsfield.ecn == 3 | CE-marked (congestion experienced) - someone's queue is filling |
vlan.priority == 5 | CoS 5 frames (voice on 802.1Q) |
vlan.priority != 0 | Any non-default CoS |
udp.srcport >= 50000 and udp.srcport <= 50019 | Teams audio source port range |
Common gotchas across the board
| Gotcha | What's actually happening |
|---|---|
| FortiGate shaper does nothing | outbandwidth not set on the interface. Or shaping policy not matching - check diag firewall iprope list 100015. |
| Download speed not limited by FortiGate shaper | You shaped the WAN interface egress (uploads). For downloads, shape on the LAN interface egress OR use traffic-shaper-reverse. |
| DSCP wiped at ISP | Normal for internet. Re-mark inside the SD-WAN overlay, or accept end-to-end DSCP only works to the tunnel egress. |
| IPsec strips inner DSCP | Default behaviour: outer ESP packet gets DSCP 0. Configure copy-dscp on the IPsec tunnel (set copy-tos enable on FortiGate vpn ipsec phase1-interface) to preserve. |
| MPLS provider doesn't honour your DSCP | Most map DSCP→EXP at PE ingress, with a contracted mapping. Confirm the mapping table with the carrier. Default is usually first 3 bits of DSCP into 3-bit EXP. |
| Voice quality worse with QoS enabled | EF queue too small or strict-priority not policed - EF traffic exceeds its allocation and tail-drops itself. Or you marked everything EF and now nothing is preferred. |
| SD-WAN performance SLA + QoS not playing nice | SD-WAN steers based on link health (jitter/loss/latency). QoS prioritises within a link. They're complementary, not redundant. SLA decides which member, QoS decides queue order on that member. |
| Teams traffic marked EF but not surviving the tunnel | FortiGate IPsec copy-dscp or set copy-tos enable on phase1-interface. Without it, inner DSCP is lost on the outer ESP header. |
| Switch trust on access port marked everything CS6 | Trust was on, attacker / misconfigured device sent DSCP 48 (CS6 = network control). Either don't trust access ports, or use port-level policers that clamp DSCP to a max value. |
Reference RFCs
| RFC | Title |
|---|---|
| 2474 | DSCP definition (replaces ToS interpretation) |
| 2475 | DiffServ architecture |
| 2597 | Assured Forwarding (AF) PHB group |
| 3168 | ECN |
| 3246 | Expedited Forwarding (EF) PHB |
| 4594 | DiffServ Service Class configuration guidelines (the "what should I use for X" guide) |
| 5865 | Voice-Admit (VA) codepoint |
| IEEE 802.1Q | PCP, DEI, VID definitions |