Aaqhil · Layer 8

Debug Flow

Generate filtered flow trace blocks. Fill the form, copy, paste.

Generator

Trace a packet through FortiOS

Source IP

Destination IP

Protocol

Port (optional)

Protocol # (1–254)

Recent

Set at least one filter (source IP, destination IP, or protocol), then click Generate.

Manual reference · raw commands

Reset filters and stop tracing

diagnose debug disable
diagnose debug reset
diagnose debug flow trace stop
diagnose debug flow filter clear

Pro tip: Always set filters before enabling debug. Running diagnose debug enable on a busy box without filters will flood the console and may impact CPU.

Packet Sniffer

Built-in tcpdump-style capture. Useful for proving traffic is or isn't reaching the box.

Generator

Build a sniffer one-liner

Interface

Host 1

Host 2 (optional)

Port (optional)

Verbosity

Fill in at least one host and click Generate.

Verbosity reference

FortiOS sniffer

Level	Includes	When to use
`1`	Packet headers	Quick check that traffic is hitting
`2`	Headers + IP payload	Decode protocol data
`3`	Headers + ethernet	L2 / MAC investigation
`4`	Headers + interface name	Most common, see ingress/egress port
`5`	Headers + IP payload + interface	Full L3 trace
`6`	Headers + ethernet + interface	L2/L3 full dump

Add 0 l at the end to log packet count + add a local timestamp. Use 0 a for absolute UTC timestamps if you need to correlate with logs.

VPN

IPsec, SSL VPN, and Remote Access. Filtered IKE debugs and status commands.

Generator

IPsec / RAVPN IKE debug (filtered)

Remote peer / client IP

Enter the peer IP and click Generate.

IPsec status

diag vpn ike gateway
diag vpn ike gateway list name <name>
diag vpn tunnel list
diag vpn tunnel list name <name>
get vpn ipsec tunnel summary

SSL VPN status

get vpn ssl mon
get vpn ssl monitor

PPP / PPPoE debug

diag debug application pppoe -1
diag debug application ppp -1
diag debug enable

Concepts · local-id / peer-id / exchange-interface-ip

When terminating multiple dial-up IPsec tunnels on a single public IP, and you have IKE phase-1 configured for each site, explicit Local and Peer IDs are required. Without them, the hub firewall cannot differentiate incoming connections and decide which IKE Phase 1 the tunnel lands on.

Setting these IDs ensures the FortiGate correctly matches the inbound IKE_SA_INIT payload to the specific Phase 1 configuration. Standard practice is to configure the Local ID on the remote sites and the corresponding Peer ID on the DC firewall.

If there is only one dial-up Phase 1 on the DC's public IP, you can safely skip setting IKE IDs. The FortiGate only has one configuration to match against anyway, so it knows exactly where to land the tunnel. Just keep in mind that skipping IDs is a bit of a trap if you ever need to add a second dial-up tunnel to that IP later.

When building the BGP overlay, the hub and spoke need to dynamically exchange IPs to route to each other. You can use set exchange-interface-ip enable to just swap the tunnel interface IPs, but the cleaner approach is peering BGP over loopbacks.

To get loopback peering working across a dial-up setup, use set exchange-ip-addr4 <loopback-ip>. This explicitly hands over the loopback address during the IKE exchange so BGP can establish properly.

Example real-world use case: large multi-spoke overlay where each spoke needs its IPsec tunnel interface IP advertised back so the hub can route. Without it, you'd have to statically configure every tunnel IP.

Reference · KB articles

VPN troubleshooting: FD46611
HA reference: FD49264
Yuriskinfo debug cheat sheet: github.com/yuriskinfo/cheat-sheets

Routing & BGP

Route inspection, BGP peering, soft reconfigs.

Routing table inspection

get router info routing-table all
get router info routing-table database
get router info routing-table details <host>
get router info routing-table bgp
get router info routing-table static
get router info routing-table connected
show system interface

BGP inspection

get router info bgp sum
get router info bgp network                # see weight
get router info bgp network <203.0.113.10>
get router info bgp neighbors <ip> advertised-routes
get router info bgp neighbors <ip> received-routes
get router info bgp neighbors <ip> routes
diagnose ip router bgp all enable
diagnose ip router bgp level info

BGP peer reset

production impact

execute router clear bgp all soft           # soft reconfig all
execute router clear bgp ip 172.16.x.x soft # soft reset specific neighbor
execute router clear bgp ip 172.16.x.x in   # soft inbound only
execute router clear bgp ip 172.16.x.x out  # soft outbound only
execute router clear bgp ip 172.16.x.x      # hard reset specific neighbor
# Inside neighbor:
set shutdown enable        # take peer down
unset shutdown             # bring it back

Soft reset uses the route-refresh capability and doesn't tear the session. Hard reset (execute router clear bgp all) bounces the TCP session and re-establishes from scratch. Always try soft first.

SD-WAN

Health-check, service mapping, and the FIB interaction model.

Health-check (member SLA)

diagnose sys sdwan health-check
diagnose sys sdwan health-check status
diagnose sys sdwan health-check status <name>

Service rules & members

diagnose sys sdwan service4
diagnose sys sdwan member
diagnose sys sdwan neighbor
diagnose sys sdwan zone

Internet service database (ISDB) lookup

app-aware steering

diagnose internet-service id <id>
diagnose internet-service match root <ip>
diagnose firewall internet-service-app match root <ip>

Concepts · how FIB and SD-WAN rules interact

The FIB acts as the gatekeeper for SD-WAN rules. You cannot apply an SD-WAN rule if the routing table doesn't already agree the traffic belongs in the SD-WAN zone.

Order of evaluation:

Regular Policy Routes (PBR) — always checked first. If a manual PBR matches with a valid gateway, traffic routes immediately and both SD-WAN rules and the standard table are bypassed.
FIB lookup — if no PBR match, FortiOS does a standard routing-table lookup before applying SD-WAN rules.
- If the best route points to a non-SD-WAN interface, SD-WAN rules are ignored entirely. This protects LAN-to-LAN and out-of-band management traffic from being hijacked.
- If the best route points to an SD-WAN member (e.g. default route via the sd-wan zone), the SD-WAN engine unlocks.
SD-WAN rules — evaluated top-down. Application signatures and SLA health pick the best physical member.
- Caveat: the chosen member must also have a valid route to the destination, otherwise FortiGate skips it and evaluates the next.
Implicit SD-WAN rule — fallback at the bottom. Hands control back to the FIB and load-balances across members using ECMP.

This is why "I added an SD-WAN rule and nothing happened" almost always traces back to step 2: the routing table doesn't see the SD-WAN zone as the best path for that destination.

Firewall

Sessions, NAT, policy lookup, Geo-IP.

Session inspection

# Filter
diagnose sys session filter clear
diagnose sys session filter src <ip>
diagnose sys session filter dst <ip>
diagnose sys session filter dport <port>
diagnose sys session filter proto 6

# List / count / clear matching
diagnose sys session list
diagnose sys session stat
diagnose sys session clear

NAT / IP pool

diagnose firewall ippool list
diagnose firewall ippool-all list
diagnose firewall ippool-all stats

DHCP — exclude an IP from lease

config system dhcp server
    edit 4
        config exclude-range
            edit 1
                set start-ip 172.16.11.137
                set end-ip   172.16.11.137
            next
        end
    end
end

Geo-IP block template

template

config firewall address
    edit "China"
        set type geography
        set associated-interface "INTERNET"
        set country "CN"
    next
    edit "Russia"
        set type geography
        set associated-interface "INTERNET"
        set country "RU"
    next
end

config firewall addrgrp
    edit "Geo Blocked Countries"
        set member "Russia" "China"
    next
end

config firewall policy
    edit 76
        set name "Geo Block"
        set srcintf "INTERNET"
        set dstintf "any"
        set srcaddr "Geo Blocked Countries"
        set dstaddr "all"
        set schedule "always"
        set service "ALL"
        set logtraffic all
        set logtraffic-start enable
    next
end

HA & System

Failover, sync checks, LTE APN, FortiGuard, performance.

HA failover

disruptive

execute ha failover set 1          # force failover
execute ha failover status         # check
execute ha failover unset 1        # revert

HA status & sync

get system ha status
diag sys ha reset-uptime
exec ha manage 1 ct                # connect to peer
diag system ha checksum show

Performance & conserve mode

get system performance status
diagnose sys top 1 50
diagnose hardware sysinfo memory
diagnose hardware sysinfo conserve
get system status

FortiGuard & central mgmt

diag fdsm central-management status
diag debug rating
diag test update info
execute update-now
show sys fortiguard
show sys ntp
show sys dns

Get egress public IP

diagnose sys waninfo ipify

ARP / NIC

get system arp
diag sys arp delete <port> <203.0.113.10>
get hardware nic <port>

LTE / 4G modem

field

config system lte-modem
    set apn "yesbusinessip"
end

# Wait for the profile-changed prompt, answer Y to reboot.
# Allow up to ~5 minutes for the modem to come back.

config system console
    set output standard
end

Built-in traffic test (iperf)

diag traffictest client-intf port1
diag traffictest server-intf port1
diag traffictest port 5201
diag traffictest run -c 203.0.113.10

SSH crypto · disable vs enable strong-crypto

When to disable: connecting to legacy gear (older switches, OOB consoles) that only supports older MAC algorithms. Note this weakens management plane security; only do it if you must.

config system global
    set strong-crypto disable
    set ssh-mac-algo hmac-md5 hmac-sha1 hmac-sha2-256 hmac-sha2-256-etm@openssh.com hmac-sha2-512 hmac-sha2-512-etm@openssh.com
end

Restore strong-crypto:

config system global
    set strong-crypto enable
    set ssh-mac-algo hmac-sha2-256 hmac-sha2-256-etm@openssh.com hmac-sha2-512 hmac-sha2-512-etm@openssh.com
end

Support recovery details

Fortinet AU support: 1 800 043 218

Keep this collapsed in shared screens.

VoIP & SIP

SIP ALG management for 3CX and similar PBX deployments.

SIP debug

diagnose debug disable
diagnose debug reset
diagnose debug application sip -1
diagnose debug enable

SIP ALG session status

diagnose sys sip-proxy calls list
diagnose sys sip-proxy stats list
diagnose sys sip-proxy stats clear
diagnose sys sip status
diagnose sys sip dialog list
diagnose sys sip mapping list

Remove SIP ALG (kernel-helper based — recommended for 3CX)

3CX preset

config system settings
    set default-voip-alg-mode kernel-helper-based
    set sip-expectation disable
    set sip-nat-trace disable
end

config system session-helper
    delete 13
end

# Clear all sessions and restart the PBX server afterwards.

3CX preferred settings

3CX preset

config system settings
    set default-voip-alg-mode kernel-helper-based
    set sip-nat-trace disable
    set gui-voip-profile enable
    set gui-security-profile-group enable
end

config system session-helper
    delete 13
end

Restore SIP ALG to factory defaults

config system settings
    unset default-voip-alg-mode
    unset sip-nat-trace
end

config system session-helper
    edit 13
        set name sip
        set protocol 17
        set port 5060
    next
end

Concepts

Background knowledge and reference material. More entries to come.

VXLAN

What is VXLAN?

VXLAN (Virtual eXtensible LAN) is a way to carry a Layer 2 Ethernet network over a Layer 3 IP network. In simple terms, it lets you take an Ethernet frame from one switch or server, wrap it, send it across a routed network, and unwrap it at the other end. RFC 7348 describes it as a Layer 2 overlay scheme on a Layer 3 network.

How it works

VXLAN encapsulates the original Ethernet frame inside:

Outer Ethernet
Outer IP
Outer UDP
VXLAN header

So VXLAN is basically a Layer 2 overlay over a Layer 3 underlay. The standard VXLAN header is 8 bytes, and the VNI is a 24-bit field inside that header. VXLAN runs over UDP with IANA-assigned destination port 4789.

Why VXLAN is used

Traditional VLANs are limited to 4094 IDs
Large Layer 2 domains do not scale well
It allows Layer 2 extension across routed networks
It provides much larger segmentation using VNI (24 bits, around 16 million segments)
Commonly used in datacentres and EVPN fabrics

VLAN vs VXLAN

VLAN = local Layer 2 segmentation
VXLAN = Layer 2 overlay carried across Layer 3
VLAN ID = 12 bits
VNI = 24 bits

A VLAN and a VNI can be mapped, for example VLAN 20 -> VNI 10020, but they are not the same field.

VXLAN packet structure

A VXLAN packet wraps an original Ethernet frame inside:

Outer Ethernet + Outer IP + Outer UDP + VXLAN header + Inner Ethernet frame

VXLAN header (8 bytes)

Flags: 8 bits
Reserved: 24 bits
VNI: 24 bits
Reserved: 8 bits

The VNI lives inside the VXLAN header, not inside the 802.1Q VLAN tag.

 0                  7 8                 15 16                23 24                31
+--------------------+--------------------+--------------------+--------------------+
|       Flags        |                           Reserved                           |
+--------------------+--------------------+--------------------+--------------------+
|                        VNI (24 bits)                         |      Reserved      |
+--------------------+--------------------+--------------------+--------------------+

Encapsulation overhead

VXLAN over IPv4 = 50 bytes total:

Outer Ethernet = 14 bytes
Outer IPv4 = 20 bytes
Outer UDP = 8 bytes
VXLAN header = 8 bytes

VXLAN over IPv6 = 70 bytes total:

Outer Ethernet = 14 bytes
Outer IPv6 = 40 bytes
Outer UDP = 8 bytes
VXLAN header = 8 bytes

Why UDP and not TCP?

Lightweight
Avoids TCP overhead and session handling
Allows ECMP and load-balancing using UDP source-port entropy

Standard VXLAN uses UDP destination port 4789.

Mental model

802.1Q VLAN tag = sticker on the Ethernet frame
VXLAN header = wrapper around the whole frame
VNI = label on that wrapper

One-line summary

VXLAN is a Layer 2 overlay over a Layer 3 IP network that encapsulates Ethernet frames using UDP and identifies segments using a 24-bit VNI.

FortiOS Packet Flow Architecture

1. Architecture Overview

To maintain deterministic performance and low latency across enterprise deployments, FortiOS utilizes an architecture called Parallel Path Processing (PPP). When a packet arrives at an interface, FortiGate determines whether the traffic matches an active session in the stateful session table or requires a new session evaluation through the kernel.

[ Ingress Packet Flow ] -> [ Kernel Processing Layer ] -> [ UTM/NGFW Inspection ] -> [ Egress Packet Flow ]

Processing splits into two primary paths:

The Slow Path (Kernel Session Creation): The first packet of a new flow traverses the complete ingress, routing, and security policy stack to instantiate an entry in the session table.
The Fast Path (ASIC Acceleration): Subsequent packets matching an established session bypass the core routing and policy evaluation engines, offloading directly to Network Processors (NP6/NP7) or Content Processors (CP9/CP10).

Phase 1: Ingress Packet Flow (Physical to Link Layer)

The packet enters the physical interface transceiver and is placed into the Rx FIFO queue for initial hardware validation baseline checks.

Network Interface & Driver Layer: The network interface card (NIC) or NP processor validates physical layer integrity. Packets with invalid checksums, malformed structures, or protocol header length mismatches (TCP, UDP, ICMP, SCTP, or GRE) are dropped immediately.
Stateful Session Lookup: The FortiGate checks the packet's 5-tuple (Source IP, Destination IP, Source Port, Destination Port, Protocol) against the master session table. If a match is found, the packet transitions directly to UTM/NGFW Inspection or Egress processing depending on acceleration status. If no match is found, the packet is flagged as a new flow.
DoS Policy Inspection: Before consuming CPU cycles in the kernel, traffic is evaluated against configured IPv4 or IPv6 DoS policies. Volumetric checks (such as SYN floods, UDP floods, or ICMP sweeps) are enforced here at the hardware or driver level.
IPsec VPN Decryption: If the incoming packet matches a configured IPsec tunnel, the IPsec engine decrypts it (accelerated by CP9/CP10). The unencrypted inner packet is then re-injected into the pipeline.
Admission Control: FortiOS verifies that the packet source or destination does not match the system quarantine list and evaluates captive portal authentication criteria if enforced.

Phase 2: Kernel Processing Layer (Routing & The Gatekeeper Architecture)

For a new session, FortiOS must determine the egress path before evaluating firewall privileges. The Forwarding Information Base (FIB) and SD-WAN rules are deeply intertwined, with the FIB acting as the ultimate gatekeeper.

Order of Evaluation Sequence:

Destination NAT (DNAT / Virtual IP Lookup): FortiOS evaluates Virtual IPs (VIPs) early in the kernel. If the destination IP matches a VIP configuration, the packet destination is rewritten. Subsequent routing and firewall policy lookups depend entirely on this post-NAT destination IP.
Regular Policy Routes (PBR): PBR entries are always checked first. If a manually created Policy Route matches the traffic and contains a valid, reachable gateway, FortiGate routes the traffic immediately. When a PBR match occurs, both the SD-WAN engine and the standard routing table (FIB) are completely bypassed.
The FIB Lookup (The Gatekeeper): If no PBR rule matches, the FortiGate executes a standard Routing Table (FIB) lookup for the destination IP before any SD-WAN logic is considered.
- If the best route points to a NON-SD-WAN interface, the FortiGate completely ignores all SD-WAN rules and routes the traffic directly out of that specific physical or logical interface. This protects internal LAN-to-LAN, inter-VLAN, or out-of-band management traffic from being accidentally hijacked by SD-WAN configurations.
- If the best route points to an SD-WAN member (such as a default 0.0.0.0/0 route pointing to the logical SD-WAN zone or a member interface), only then does the firewall unlock and evaluate the SD-WAN rule base.
SD-WAN Rules Evaluation: Once unlocked, the engine evaluates your custom SD-WAN service rules sequentially from top to bottom. It reviews application signatures, user groups, and live performance metrics (latency, jitter, packet loss via Performance SLA probes) to select the optimal physical member interface.
- Crucial Caveat: The specific SD-WAN member selected by a rule must also possess a valid route to the destination in the routing table. If a valid route for that specific member does not exist, the firewall skips that member and evaluates the next one in the rule strategy.
The Implicit SD-WAN Rule (FIB Fallback): If the traffic hits the SD-WAN engine but fails to match any user-defined, custom SD-WAN rules, it falls through to the "Implicit Rule" at the bottom of the stack. This rule hands control back to the FIB, load-balancing traffic across the available SD-WAN member interfaces using standard Equal-Cost Multi-Path (ECMP) routing based on your configured implicit algorithm.

Reverse Path Forwarding (RPF) Check: To prevent spoofing and asymmetric routing complications, FortiGate cross-references the source IP against the routing table. Strict RPF drops the packet if the optimal return route to the source IP does not match the exact ingress interface the packet arrived on. Loose RPF allows the packet if a valid route to the source IP exists via any interface on the system.

Phase 3: Firewall Policy & Session Management

With the ingress interface, egress interface, and post-NAT IP attributes established, the firewall determines access privileges.

Firewall Policy Matching (iprope table lookup): The kernel scans the firewall policy list sequentially inside the internal security policy table (known as the iprope table). It evaluates match criteria based on source/destination interfaces, source/destination IPs (or Internet Service Database / ISDB objects), service ports, and user schedules.
Implicit Deny: If the packet fails to match any user-defined policy in the iprope table, it hits Policy 0 (the implicit deny rule) and is dropped.
Session Helpers / Application Layer Gateways (ALGs): For complex protocols that embed IP/port information within their payloads (such as SIP, FTP, TFTP, or H.323), FortiOS applies built-in session helpers to open dynamic pinholes for secondary data channels.
Session Instantiation: The kernel allocates an entry in the master session table, transitions the state from dirty to validated, and writes the routing, NAT, and security profile application tags into the session structure.

Phase 4: Security Profile Processing (UTM/NGFW Engine)

If the matching firewall policy contains security profiles, the packet is directed to the inspection engines. FortiOS executes this using two distinct architectural models based on policy configuration.

Feature / Step	Flow-Based Inspection Mode	Proxy-Based Inspection Mode
`Official Engine Core`	IPS Engine (IPS Decoders & IPSA Engine)	WAD Daemon (Worker Application Daemon)
`Memory Footprint`	Low (packets processed stream-style on the fly)	High (buffers connections, acts as termination point)
`Connection Handling`	Original TCP handshake passes through to destination.	Connection is split into two independent segments (Client-to-FGT and FGT-to-Server).
`Inspection Mechanism`	Pattern matching occurs inside packet streams as they transit.	Files/payloads are fully reassembled in memory before inspection.

Security Engine Sequence (Flow Mode):

SSL/TLS Decryption: If configured, the built-in CP9/CP10 processor intercepts the TLS handshake for Certificate or Deep Inspection.
IPS Engine Decoders: The IPS engine applies specific decoders to identify the exact application protocol and format the stream data.
Parallel Inspection Pass: IPS signatures, Application Control, Local URL Filtering, and Botnet checking happen simultaneously in a single pass accelerated via Content Processors.
Flow-Based AntiVirus: The AV engine loaded by the IPS process performs stream-based scanning against known malware hashes without waiting for the complete file to download.

Phase 5: Egress Packet Flow (Data Link to Physical Layer)

Once a packet is approved by the routing, policy, and security inspection engines, it enters the final phase before physical transmission.

Source NAT (SNAT) Translation: The packet headers are modified at this post-processing stage. The source IP is rewritten to the configured IP pool or egress interface IP, and the source port is translated inside the NAT session tracking range.
Forwarding & Traffic Shaping (QoS): The packet is evaluated against Traffic Shaping policies. If interface bandwidth limits are breached, packets are queued, delayed, or dropped based on priority configurations (such as Strict Priority or Guaranteed Bandwidth allocations).
IPsec VPN Encapsulation: If the routing table dictates that the packet exit via an IPsec tunnel, the payload is encrypted (AES/3DES), an ESP header is prepended, and the packet is re-routed through the physical path toward the VPN gateway endpoint.
Layer 2 Frame Assembly (ARP Lookup): The FortiGate updates the Layer 2 headers. It queries its ARP table for the MAC address of the next-hop gateway. The destination MAC is updated with the next-hop hardware address, and the source MAC is updated with the address of the egress physical interface.
Tx Queue and Transmission: The finalized frame is loaded into the interface Tx FIFO queue, converted into electrical or optical signals by the transceiver, and transmitted onto the wire.

Flow Engineering & Diagnostics Field Reference

Physical Layer & NIC Status Check

FortiOS hardware nic

get hardware nic <port>

Real-Time Kernel Packet Flow Trace

kernel trace

diagnose debug reset
diagnose debug flow filter src <ip>
diagnose debug flow filter dport <port>
diagnose debug flow show function-name enable
diagnose debug flow trace start 100
diagnose debug enable

Bypass FastPath (Force Kernel Processing for Debugging)

diagnostic override

config system npu
    set fastpath disable
end

Visual Flow Reference Slow path on first packet, fast path on every packet after.

Security / policy phase Kernel routing phase Fast path (ASIC) Side branch outcome Drop

802.1X · PEAP-MSCHAPv2

What it is

A password-based EAP method. The RADIUS server presents a certificate so the client can validate it. Once a TLS tunnel is built, the client sends its username and password inside the tunnel using MSCHAPv2. The RADIUS server validates the password against AD.

Server proves itself with a cert. Client proves itself with a password. One-way cert auth on the outside, password auth inside the tunnel.

The three players

Role	Component	Responsibility
Supplicant	Client device (laptop, phone)	Sends credentials, validates server cert
Authenticator	NAS (AP, WLC, switch)	Relays EAP between supplicant and RADIUS. Does not participate in the auth itself
Authentication Server	RADIUS (NPS, ClearPass, FreeRADIUS, RADIUSaaS)	Validates the credentials, makes the policy decision

EAP runs end-to-end between supplicant and RADIUS. The NAS is a postman. It wraps EAP in EAPOL on the wireless side, repackages it inside RADIUS attributes on the wired side, and forwards it. It cannot read the EAP payload.

What each side needs

Side	Requirement
RADIUS server	Server certificate trusted by clients
Client	Username, password, and trust of the RADIUS server's CA
Backend	AD account with valid password
PKI	Server cert only. No client certs needed

End-to-end auth flow

User connects to SSID, supplicant prompted for credentials
AP wraps EAP-Identity in Access-Request (code 1) to RADIUS on UDP 1812
RADIUS replies with Access-Challenge (code 11) to start the PEAP TLS handshake
Multiple Request and Challenge round trips negotiate the TLS tunnel. Server presents its certificate, supplicant validates it, both derive session keys
Supplicant sends MSCHAPv2 challenge/response inside the encrypted TLS tunnel
RADIUS validates the MSCHAPv2 response against AD via the Netlogon Secure Channel to a domain controller
RADIUS queries AD via LDAP for group memberships
RADIUS walks Network Policies top to bottom, first match wins
Matched policy injects tunnel attributes into reply
RADIUS sends Access-Accept (code 2) with Tunnel-Type, Tunnel-Medium-Type, Tunnel-Assignment-Id, MS-MPPE keys
AP reads Tunnel-Assignment-Id, matches its VLAN Assignment Rule, bridges client into assigned VLAN
AP uses MS-MPPE keys to derive PMK for WPA2 4-way handshake
Client gets DHCP from the assigned VLAN's scope
AP sends Accounting-Request Start (code 4) on UDP 1813
RADIUS replies Accounting-Response (code 5)
Periodic Interim-Updates throughout session, Stop on disconnect

Critical truth about the request packet

The Access-Request does not contain a VLAN or tunnel ID. It carries only:

User-Name
EAP-Message (outer plaintext, inner encrypted once TLS is up)
NAS-IP-Address, NAS-Identifier
Called-Station-Id (AP BSSID + SSID)
Calling-Station-Id (client MAC)
NAS-Port-Type (Wireless-802.11)

The tunnel attributes only appear on the return leg (Access-Accept), generated by RADIUS based on policy match.

RADIUS message codes - Authentication (UDP 1812)

Code	Message	Direction	Purpose
1	Access-Request	NAS to RADIUS	Here are credentials, please authenticate
2	Access-Accept	RADIUS to NAS	Yes, authenticated, here are the attributes
3	Access-Reject	RADIUS to NAS	No, denied
11	Access-Challenge	RADIUS to NAS	I need more info, send next EAP message

RADIUS message codes - Accounting (UDP 1813)

Code	Message	Direction	Purpose
4	Accounting-Request	NAS to RADIUS	Session started, updated, or ended. Log it
5	Accounting-Response	RADIUS to NAS	Logged, acknowledged

Acct-Status-Type values

Value	Meaning
1	Start - session began
2	Stop - session ended
3	Interim-Update - periodic heartbeat
7	Accounting-On - NAS booted
8	Accounting-Off - NAS shutting down

Key RADIUS attributes for VLAN assignment

Attribute	Number	Type	Value
Tunnel-Type	64	Integer	13 (VLAN)
Tunnel-Medium-Type	65	Integer	6 (IEEE-802)
Tunnel-Private-Group-ID	81	String	VLAN ID, e.g. `520`
Tunnel-Assignment-Id	82	String	Freeform tag, e.g. `520` or `staff-corp`

Aruba Instant typically uses attribute 82. Cisco WLC typically uses attribute 81. Both achieve the same outcome.

RADIUS packet structure

+--------+--------+----------------+--------------------+----------+
| Code   | ID     | Length         | Authenticator      | Attribs  |
| 1 byte | 1 byte | 2 bytes        | 16 bytes           | variable |
+--------+--------+----------------+--------------------+----------+

ID pairs requests with replies. The Authenticator field plus the Message-Authenticator attribute (80, HMAC-MD5 over the packet using the shared secret) provide integrity. Message-Authenticator is mandatory when EAP is in use.

Onboarding workflow for new user category

One-time setup (network team):

Create RADIUS Network Policy with condition: User Groups contains X
Set RADIUS attributes: Tunnel-Type=VLAN, Tunnel-Medium-Type=802, Tunnel-Assignment-Id=N
Create AP VLAN Assignment Rule: if Tunnel-Assignment-Id equals N then bridge VLAN N
Ensure VLAN N exists on switch, has DHCP scope, has SVI / gateway

Per user (systems team):

Create user in AD
Add to the appropriate AD group

AD touches identity only. RADIUS holds the policy. AP holds the rule.

Strengths

No client-side PKI required, just username and password
Easy onboarding, especially for BYOD where pushing a client cert is hard
Works with existing AD passwords, no parallel credential lifecycle
Cheap and fast to deploy at scale

Weaknesses

Rogue RADIUS attack surface. If the supplicant is misconfigured (no CA trust enforcement, no server name validation), an attacker can stand up a fake RADIUS server, capture the MSCHAPv2 challenge/response, and crack it offline to recover the password
Passwords are crackable offline. MSCHAPv2 is a known-weak hash protocol. Capture once, brute force forever
User-typed credentials. Subject to phishing, shoulder surfing, weak passwords, reuse across services
Password expiry causes mass WiFi failures. Every password rotation cycle generates support tickets
Server cert validation is the only thing keeping it secure. If clients are not enforcing it (and many supplicants default to permissive), the whole method falls apart

Common NPS gotcha

On the policy Overview tab, the "Access permission" radio must be set to "Grant access". You can have perfect Conditions, Constraints, and Settings and the policy will still deny if this is wrong. Catches everyone exactly once.

Debug cheat sheet

Symptom	Likely cause	Where to look
No reply from RADIUS	UDP 1812 blocked or wrong shared secret	Firewall, RADIUS client config
Access-Reject with no reason	Wrong shared secret or AP not registered as RADIUS client	NPS Event Viewer event 6273
Many Challenges then Reject	Cert validation failure	RADIUS cert trust chain, client trust store
Auth succeeds, wrong VLAN	Wrong Tunnel-Assignment-Id value or no AP rule match	RADIUS policy attributes, AP rule list
Session drops at fixed interval	Session-Timeout reauth failing	Look for reauth attempt in capture
Auth works, no accounting	UDP 1813 blocked or NAS not configured for accounting	Firewall, accounting client config

Capture and log sources

Wireshark on RADIUS NIC, filter radius. Cleanest view of all RADIUS traffic
NPS Event Viewer. Custom Views > Server Roles > Network Policy and Access Services. Event 6272 = granted, 6273 = denied. Shows matched policy name and reason
NPS log files in %SystemRoot%\System32\LogFiles\ in IAS format
SPAN or mirror on AP uplink if RADIUS access not available

Combine packet capture (what flew on the wire) with RADIUS event log (which policy was picked and why) for full diagnosis.

Reference values

Item	Value
RADIUS auth port	UDP 1812 (old: 1645)
RADIUS accounting port	UDP 1813 (old: 1646)
CoA port	UDP 3799
Tunnel-Type for VLAN	13
Tunnel-Medium-Type for IEEE-802	6
NAS-Port-Type for wireless	19
RFC for RADIUS	2865 (auth), 2866 (accounting)
RFC for tunnel attributes	2868
RFC for CoA	5176
RFC for EAP	3748

802.1X · EAP-TLS

What it is

A certificate-based EAP method. Both sides present certificates. The RADIUS server validates the client's certificate. The client validates the RADIUS server's certificate. The TLS handshake itself is the authentication. No password is ever exchanged.

Server proves itself with a cert. Client proves itself with a cert. Mutual cert authentication.

The three players

Same as PEAP-MSCHAPv2. Supplicant, authenticator, authentication server. EAP runs end-to-end between supplicant and RADIUS. The NAS is a relay.

What each side needs

Side	Requirement
RADIUS server	Server certificate trusted by clients, plus the CA chain to validate client certs
Client	Client certificate, private key, and trust of the RADIUS server's CA
Backend	AD account or device object the cert can be mapped to
PKI	Full PKI: issuing CA, cert enrolment automation, revocation mechanism (CRL or OCSP)

End-to-end auth flow

Client connects to SSID
AP wraps EAP-Identity in Access-Request (code 1) to RADIUS on UDP 1812
RADIUS replies with Access-Challenge (code 11) containing EAP-TLS Start
Client sends TLS ClientHello inside Access-Request
RADIUS replies with ServerHello, server Certificate, CertificateRequest, ServerHelloDone
Client validates server cert against its trusted CA store
Client sends its own Certificate, ClientKeyExchange, CertificateVerify (signs the handshake with its private key, proving possession), ChangeCipherSpec, Finished
RADIUS validates the client certificate: chains to a trusted CA, not expired, not revoked (CRL or OCSP check), CertificateVerify signature valid (proves client holds the private key)
RADIUS maps the certificate to an identity (cert-to-account mapping, usually via UPN or DNS name in the Subject Alternative Name)
RADIUS queries AD via LDAP for the mapped account's group memberships
RADIUS walks Network Policies top to bottom, first match wins
Matched policy injects tunnel attributes into reply
RADIUS sends Access-Accept (code 2) with Tunnel-Type, Tunnel-Medium-Type, Tunnel-Assignment-Id, MS-MPPE keys
AP reads Tunnel-Assignment-Id, matches its VLAN Assignment Rule, bridges client into assigned VLAN
AP uses MS-MPPE keys to derive PMK for WPA2 4-way handshake
Client gets DHCP from the assigned VLAN's scope
AP sends Accounting-Request Start (code 4) on UDP 1813
Periodic Interim-Updates, Stop on disconnect

What is different from PEAP-MSCHAPv2

Stage	PEAP-MSCHAPv2	EAP-TLS
TLS tunnel built	Yes, to protect inner method	Yes, and it IS the auth
Server proves itself	Server cert	Server cert
Client proves itself	Password inside the tunnel	Client cert + CertificateVerify signature
Inner method	MSCHAPv2	None, the handshake is the auth
Credential on the wire	Encrypted password hash	Nothing secret transmitted
Backend validation	AD checks password	RADIUS validates cert, maps to AD account
Group to VLAN mapping	RADIUS policy, identical	RADIUS policy, identical
MS-MPPE keys delivered	Yes	Yes

The auth half changes, the policy half does not. Tunnel attributes, VLAN assignment, MS-MPPE key delivery, accounting, all identical.

Device certs vs user certs

This is the design decision that bites schools and corporate fleets. The cert determines what the network sees.

Cert type	Lives in	What gets authenticated	When auth happens
Device (machine) cert	Local Machine cert store	The laptop	At boot, before any user logs in
User cert	Current User cert store	The logged-in user	When the user logs in

Implications for VLAN assignment:

Device cert: VLAN follows the laptop. A staff laptop stays on the staff VLAN regardless of who logs in. If a teacher hands the laptop to a student, the student lands on the staff VLAN. This is by design, not a bug
User cert: VLAN follows the user. When a different user logs in, the supplicant re-authenticates with the new user's cert and lands on a different VLAN

User certs solve the lending scenario but introduce operational cost: every user needs a cert on every device they use, provisioning is per-user-per-device, and the chicken-and-egg of "first login on a new device needs cert enrolment but the network needs the cert" usually requires a wired bridge or a staged onboarding flow.

The common compromise for managed-fleet environments: device certs for the standard fleet, plus a documented policy that loaner devices come from the appropriate VLAN's pool, not by lending across user categories.

Cert-to-account mapping

When EAP-TLS succeeds, the RADIUS server has a validated certificate but no AD account yet. Mapping the cert to an account is what lets policy evaluation proceed.

Method	How it works	Notes
UPN in SAN	Cert's Subject Alternative Name contains a UPN (user@domain.com), RADIUS looks up the AD user with that UPN	Most common for user certs
DNS name in SAN	Cert's SAN contains the device DNS name, RADIUS looks up the AD computer object	Most common for device certs
Explicit mapping	AD account has an `altSecurityIdentities` attribute with the cert's issuer and serial	Used for high-assurance environments
Implicit mapping	RADIUS derives the account from cert fields without explicit linkage	Default for many setups

Cert-to-account mapping is where many EAP-TLS deployments quietly fail. Symptoms: cert validates fine, but RADIUS rejects because it cannot find a matching account. Look at NPS Event 6273 reason codes.

Strengths

No password on the wire, ever. Removes phishing, password spray, MSCHAPv2 cracking, password expiry headaches
Mutual authentication enforced by the protocol. Cannot be silently weakened by client misconfiguration the way PEAP can
Private key never leaves the device. Often hardware-backed (TPM), so even stealing the device does not yield extractable credentials
Cleanest path for zero-trust and posture-based access

Weaknesses

PKI is real infrastructure. Issuing CA, enrolment automation, renewal lifecycle, revocation. None of it is free
Onboarding friction. Every device needs a cert before it can connect. BYOD and unmanaged devices are painful
Cert expiry causes silent dropouts. If renewal automation breaks, devices fall off the network with no obvious cause
Non-domain devices need a separate path. Printers, IoT, AV gear usually fall back to MAB or a PSK SSID
Troubleshooting is harder. Debugging PKI chains, revocation reachability, SAN mismatches, clock skew, cert-to-account mapping

Common deployment patterns

Environment	Typical EAP method
Managed Intune fleet (corporate / school staff and students)	EAP-TLS with device certs via SCEP or PKCS profiles
BYOD / personal devices	PEAP-MSCHAPv2 or a separate PSK SSID with portal onboarding
Printers, IoT, AV	MAB (MAC Authentication Bypass) into restricted VLAN
Guest WiFi	PSK or captive portal, no 802.1X

Pure-EAP-TLS-everywhere designs are rare outside high-security environments. Most real deployments mix EAP-TLS for managed devices with PEAP or MAB for everything else.

SCEPman and RADIUSaaS specifics

When using cloud-based RADIUS like RADIUSaaS with SCEPman issuing certs via Intune:

SCEPman acts as the issuing CA, certs are deployed to devices via Intune SCEP profiles
RADIUSaaS validates the cert chain and provides the RADIUS endpoint reachable over RadSec (RADIUS over TLS, TCP 2083) from the APs
Cert-to-account mapping happens against Entra ID rather than on-prem AD
VLAN assignment via Tunnel-Private-Group-Id or Tunnel-Assignment-Id works the same way

Product capabilities change, verify current SCEPman and RADIUSaaS docs before committing a design.

Debug cheat sheet (EAP-TLS specific)

Symptom	Likely cause	Where to look
Client never connects, no auth attempt	No client cert provisioned, or supplicant not configured for EAP-TLS	Client cert store, supplicant config
Auth fails immediately after server cert	Client cert not trusted by RADIUS, or cert expired	RADIUS server CA chain config
Auth fails after client cert sent	Cert-to-account mapping failure, or revocation check failed	NPS Event 6273, CRL/OCSP reachability
Cert valid but no group assignment	AD account exists but not in any group the policy matches	AD group membership, policy conditions
Random intermittent failures	Clock skew, CRL fetch timeout, expired intermediate CA	Time sync, CRL distribution points
Works on Windows, fails on macOS / iOS	Supplicant cert validation differences, SAN format requirements	Apple requires SAN, not just Subject CN

Related concepts

MAB (MAC Authentication Bypass) - fallback for devices that cannot do 802.1X. Same RADIUS server, different policy that matches on Calling-Station-Id (MAC) instead of EAP
CoA (Change of Authorization) - RFC 5176, UDP 3799. RADIUS pushes session changes (force re-auth, terminate, bounce port) without the client reconnecting. Useful for posture, quarantine, role changes
RadSec - RADIUS over TLS, TCP 2083. Replaces UDP plus shared-secret with a TLS tunnel. Used by RADIUSaaS and any cloud RADIUS service where AP-to-server traverses the public internet
EAP-TTLS - similar to PEAP, server cert + inner method, but the inner method is more flexible. Rare in Microsoft shops, more common in mixed environments

Reference values

Item	Value
RFC for EAP-TLS	5216
RFC for EAP	3748
RFC for RADIUS	2865
RadSec port	TCP 2083
CoA port	UDP 3799
Tunnel-Type for VLAN	13
Tunnel-Medium-Type for IEEE-802	6

TCP/IP

Scope

Deep reference for packet capture analysis, MTU/MSS sizing, and troubleshooting transport-layer issues. Covers IPv4, IPv6, TCP, UDP at the byte level. The diagrams below follow standard RFC bit-position notation: each character pair represents one bit, bits 0-31 left to right, MSB first.

IPv4 header (RFC 791, 20 bytes minimum, up to 60 with options)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version|  IHL  |   DSCP    |ECN|         Total Length          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|        Identification         |Flags|     Fragment Offset     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|      TTL      |   Protocol    |        Header Checksum        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Source Address                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      Destination Address                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             Options (if IHL > 5)              |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Field	Bits	Notes
Version	4	Always 4 for IPv4
IHL (Internet Header Length)	4	In 32-bit words. Min 5 (=20 bytes), max 15 (=60 bytes)
DSCP	6	QoS class. EF=46 (voice), AF41=34, CS6=48 (routing), etc.
ECN	2	Explicit Congestion Notification. 00=not-ECT, 11=CE (congestion)
Total Length	16	Header + data, in bytes. Max 65535
Identification	16	Fragment reassembly key
Flags	3	bit0=Reserved (0), bit1=DF (Don't Fragment), bit2=MF (More Fragments)
Fragment Offset	13	In 8-byte units, offset of this fragment in the original packet
TTL	8	Hop count. Decremented at each router. 0 = drop + ICMP Time Exceeded
Protocol	8	Next-layer protocol (see protocol numbers table below)
Header Checksum	16	Header only, not payload. Recomputed at every hop (TTL changes)
Source / Destination	32 each	IPv4 addresses

IPv4 fragmentation flags - what to look for in pcap

DF set, MF clear - normal unfragmented packet. If too big for an MTU on path, gets dropped with ICMP Type 3 Code 4 (Fragmentation Needed). This is how PMTU Discovery works.
DF clear, MF set - this is a fragment, more to come
DF clear, MF clear, Fragment Offset > 0 - this is the LAST fragment
DF set, MF set - illegal combination, indicates a broken sender or malicious packet

IPv6 header (RFC 8200, fixed 40 bytes)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| Traffic Class |              Flow Label               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|        Payload Length         |  Next Header  |   Hop Limit   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                   Source Address (128 bits)                   |
+                                                               +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
+                                                               +
|                                                               |
+                                                               +
|                Destination Address (128 bits)                 |
+                                                               +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Key differences from IPv4

Fixed 40-byte header - no IHL field, no options inside the header. Options go in extension headers (Hop-by-Hop, Routing, Fragment, AH, ESP, etc.) chained via Next Header.
No header checksum - relies on L2 (Ethernet FCS) and L4 (TCP/UDP checksum). Routers don't have to recompute anything per hop.
No router-level fragmentation - only the source can fragment, via the Fragment extension header. Path MTU Discovery is effectively mandatory.
Hop Limit replaces TTL - same function, more honest name.
Next Header replaces Protocol - same numbering scheme, but can point to an extension header rather than directly to L4.
Flow Label (20 bits) - for ECMP hashing and QoS. Rarely populated by endpoints today.

Common protocol numbers (IPv4 Protocol / IPv6 Next Header)

Number	Protocol	Where you'll see it
1	ICMP	ping, traceroute, PMTU discovery
2	IGMP	multicast group membership
6	TCP	most things
17	UDP	DNS, DHCP, VXLAN, RADIUS, IPsec NAT-T
41	IPv6-in-IPv4	6in4, 6to4 tunnels
47	GRE	Generic Routing Encapsulation
50	ESP	IPsec encrypted payload
51	AH	IPsec auth header (rare today)
58	ICMPv6	v6 ping, ND, RA, PMTU
88	EIGRP	Cisco IGP
89	OSPF	OSPFv2/v3 hello and LSA
112	VRRP / CARP	FortiGate HA, virtual router redundancy
132	SCTP	signalling, some telco

TCP header (RFC 9293, 20 bytes minimum, up to 60 with options)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Source Port          |       Destination Port        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Sequence Number                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     Acknowledgment Number                     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Data Of| Rsvd  |C|E|U|A|P|R|S|F|          Window Size          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Checksum            |        Urgent Pointer         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Options (if Data Offset > 5)          |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Field	Bits	Notes
Source / Destination Port	16 each	0-65535
Sequence Number	32	Byte position in the stream. Wraps at 2^32
Acknowledgment Number	32	Next sequence number expected. Only meaningful if ACK flag set
Data Offset	4	Header length in 32-bit words. Min 5, max 15
Reserved	4	Must be zero (was 3+NS, NS was deprecated by RFC 9293)
Control bits (flags)	8	CWR, ECE, URG, ACK, PSH, RST, SYN, FIN (see table below)
Window Size	16	Receive buffer space available, in bytes. Scaled if Window Scale option negotiated in SYN
Checksum	16	Over pseudo-header + TCP header + data
Urgent Pointer	16	Only valid if URG set. Rarely used today
Options	0-320	MSS, Window Scale, SACK Permitted, Timestamps, etc.

TCP control bits (flags)

Bit	Hex	Flag	Purpose
0 (MSB)	`0x80`	CWR	Congestion Window Reduced (ECN response)
1	`0x40`	ECE	ECN-Echo (peer experienced congestion)
2	`0x20`	URG	Urgent Pointer field is significant
3	`0x10`	ACK	Acknowledgment field is significant
4	`0x08`	PSH	Push buffered data to receiving application now
5	`0x04`	RST	Reset the connection (abort)
6	`0x02`	SYN	Synchronize sequence numbers (connection setup)
7 (LSB)	`0x01`	FIN	No more data from sender (graceful close)

Common flag combinations seen in captures

Combo	Hex	Meaning
SYN	`0x02`	Client opening connection
SYN + ACK	`0x12`	Server accepting connection
ACK	`0x10`	Data acknowledgment, no data this segment
PSH + ACK	`0x18`	Data carrying segment, deliver to app immediately
FIN + ACK	`0x11`	Graceful close, half-shutdown
RST	`0x04`	Hard abort, no negotiation (often used by firewalls)
RST + ACK	`0x14`	Hard abort in response to data on a dead connection

TCP connection states (RFC 9293)

State	Meaning
CLOSED	No connection. Fictional starting state
LISTEN	Server waiting for an incoming SYN
SYN-SENT	Client sent SYN, waiting for SYN+ACK
SYN-RECEIVED	Server got SYN, sent SYN+ACK, waiting for ACK
ESTABLISHED	Connection open, data can flow both ways
FIN-WAIT-1	Sent FIN, waiting for ACK or peer's FIN
FIN-WAIT-2	Our FIN was ACKed, waiting for peer's FIN
CLOSE-WAIT	Peer sent FIN, we ACKed it. Waiting for local app to call close()
CLOSING	Simultaneous close - both sides sent FIN before ACKs arrived
LAST-ACK	Sent our FIN (passive close), waiting for final ACK from peer
TIME-WAIT	Active closer waits 2*MSL after sending final ACK, to absorb stragglers

Normal 3-way handshake

Client                                Server
------                                ------
CLOSED                                LISTEN
   |                                    |
   |---- SYN  (seq=x)             ---->|
SYN-SENT                                |
   |                                    |
   |<--- SYN+ACK (seq=y, ack=x+1) -----|
   |                              SYN-RECEIVED
   |                                    |
   |---- ACK  (seq=x+1, ack=y+1) ----->|
ESTABLISHED                          ESTABLISHED

Normal 4-way close (active vs passive)

Active closer                         Passive closer
-------------                         --------------
ESTABLISHED                           ESTABLISHED
   |                                    |
   |---- FIN, ACK                ---->|
FIN-WAIT-1                              |
   |                                    |
   |<--- ACK                     -----|
FIN-WAIT-2                           CLOSE-WAIT
   |                                    |
   |                              (app calls close)
   |                                    |
   |<--- FIN, ACK                -----|
   |                                LAST-ACK
   |                                    |
   |---- ACK                     ---->|
TIME-WAIT                            CLOSED
   |
   |  (wait 2*MSL)
   |
CLOSED

TIME-WAIT and 2*MSL - what every senior engineer should know

The side that sends FIN first is the active closer and ends up in TIME-WAIT.
RFC says wait 2 * MSL (Maximum Segment Lifetime). RFC suggests MSL = 2 minutes, so 2*MSL = 4 minutes default.
Linux: hardcoded at 60 seconds, not configurable (kernel constant TCP_TIMEWAIT_LEN).
Windows: registry TcpTimedWaitDelay, default 240s, range 30-300s.
Why it exists: absorb delayed segments from this connection so they don't pollute a new connection on the same 4-tuple, and ensure the peer's FIN gets retransmitted and ACKed if the final ACK was lost.
Symptom of exhaustion: "Address already in use" when restarting a server on a well-known port. Or ephemeral port exhaustion on a busy client (HTTP load test, monitoring poller).
Linux tunables: net.ipv4.tcp_tw_reuse=1 (safe to enable, reuses TIME-WAIT for outgoing). tcp_tw_recycle was removed in 4.12 (NAT-hostile, do not look for it).
Quick check on Linux: ss -tan state time-wait | wc -l

UDP header (RFC 768, fixed 8 bytes)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Source Port          |       Destination Port        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|            Length             |           Checksum            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Length covers header + data (so minimum 8).
Checksum is optional in IPv4 (set to 0 to skip), mandatory in IPv6.
No sequence numbers, no flow control, no retransmission. Anything above that lives in the application protocol (QUIC, DNS, RTP, etc.).

MTU / MSS math reference

Standard Ethernet MTU is 1500 bytes (payload after Ethernet header). MSS is negotiated in the SYN/SYN-ACK Options field and is what TCP advertises as its max segment size.

Path type	Headers consumed	Effective MSS
Plain TCP over IPv4	20 IPv4 + 20 TCP = 40	1460
Plain TCP over IPv6	40 IPv6 + 20 TCP = 60	1440
TCP in PPPoE (DSL, FTTN)	40 + 8 PPPoE = 48	1452
TCP in GRE	40 + 24 GRE = 64	1436
TCP in IPsec ESP (tunnel mode, AES-GCM)	~40 + 50-60 ESP overhead	~1380-1400
TCP in VXLAN over IPv4	40 + 50 VXLAN = 90	1410
TCP in VXLAN over IPv6	60 + 70 VXLAN = 130	1370

FortiGate tip: set tcp-mss-sender and set tcp-mss-receiver on the firewall policy or interface clamp the MSS in transit. Common for VPN paths where PMTU is unreliable.

Path MTU Discovery (PMTUD) - how it breaks

Sender sets DF flag. Router along path with smaller MTU drops the packet and returns ICMP Type 3 Code 4 (Fragmentation Needed, DF Set) with the next-hop MTU.
Sender caches the new MTU per destination and reduces segment size.
Common breakage: firewall along path drops ICMP unreachable. Sender keeps retransmitting full-size DF-set packets, application hangs after the SYN works fine but data stalls. Classic "small requests work, large ones don't" symptom.
Workaround: MSS clamping at the tunnel ingress (see FortiGate tip above), or set DF=0 to allow router fragmentation (IPv4 only, slow).

Wireshark display filters - deep troubleshooting picks

Filter	What it catches
`tcp.flags.syn == 1 and tcp.flags.ack == 0`	Connection attempts (SYN only)
`tcp.flags.reset == 1`	RST packets - who killed the connection
`tcp.analysis.retransmission`	Wireshark-detected retransmits (loss indicator)
`tcp.analysis.fast_retransmission`	Triggered by 3 duplicate ACKs
`tcp.analysis.duplicate_ack`	Dup ACKs - receiver telling sender it's missing data
`tcp.analysis.zero_window`	Receiver buffer full - application not draining
`tcp.analysis.window_full`	Sender has filled the peer's window
`tcp.analysis.out_of_order`	Packets arriving out of sequence
`tcp.analysis.lost_segment`	Wireshark sees a gap in sequence numbers
`tcp.stream eq N`	Isolate one specific TCP conversation
`tcp.window_size == 0`	Same as zero_window but the raw field
`ip.ttl < 5`	Low TTL - either looping or near end-of-life
`ip.flags.df == 1 and ip.len > 1400`	Large DF-set packets - PMTU candidates
`icmp.type == 3 and icmp.code == 4`	ICMP Frag Needed - PMTUD signal
`ip.frag_offset > 0 or ip.flags.mf == 1`	Any IPv4 fragmentation

BPF capture filters (tcpdump / wireshark capture mode)

Filter	Effect
`tcp port 443`	HTTPS to/from any host
`host 10.1.1.1 and not port 22`	Everything to/from a host except SSH (don't capture your own session)
`tcp[tcpflags] & (tcp-syn\|tcp-fin) != 0`	Any SYN or FIN - connection setup/teardown only
`tcp[13] == 0x02`	SYN only (no ACK)
`tcp[13] == 0x12`	SYN+ACK only
`tcp[13] & 4 != 0`	Any RST
`icmp`	All ICMP
`icmp[icmptype] == icmp-unreach`	ICMP Type 3 (Destination Unreachable, includes Frag Needed)
`vlan 100 and host 10.1.1.1`	Tagged traffic on VLAN 100
`net 192.168.0.0/16`	RFC1918 16-bit space
`greater 1400`	Packets larger than 1400 bytes - useful for MTU hunting

Wireshark TCP analysis flags - what they actually mean

Flag	Heuristic
[TCP Retransmission]	Same seq seen again, more than RTT later. Probably loss.
[TCP Fast Retransmission]	Retransmit triggered after 3 dup ACKs (RFC 5681)
[TCP Dup ACK]	Receiver got out-of-order data, ACKing the last in-order byte again
[TCP Out-Of-Order]	Lower seq arrives after higher seq
[TCP Spurious Retransmission]	Retransmit of data already ACKed - sender timed out unnecessarily
[TCP Zero Window]	Receiver advertised window = 0, asking sender to pause
[TCP Window Update]	Receiver re-opens window after Zero Window
[TCP Previous Segment Not Captured]	Capture started mid-stream or capture missed packets - not necessarily a network problem
[TCP Keep-Alive]	1-byte segment with seq one less than expected, used to probe a quiet connection

Useful ports for network engineers

Port	Proto	Service
22	TCP	SSH
53	UDP/TCP	DNS (TCP for AXFR and >512 byte replies)
67/68	UDP	DHCP server / client
69	UDP	TFTP (firmware uploads, FortiManager backup)
123	UDP	NTP
161/162	UDP	SNMP get / trap
179	TCP	BGP
443	TCP	HTTPS, FortiGate GUI, SSL VPN portal
500	UDP	IKE (IPsec phase 1)
514	UDP	syslog
520	UDP	RIP
541	TCP	FortiGuard (older), FortiManager
636	TCP	LDAPS
1645/1646	UDP	RADIUS auth/acct (legacy)
1701	UDP	L2TP
1812/1813	UDP	RADIUS auth/acct
2055	UDP	NetFlow
3306	TCP	MySQL
3389	TCP	RDP
3478	UDP	STUN (Teams, WebRTC)
3799	UDP	RADIUS CoA
4500	UDP	IPsec NAT-T
4789	UDP	VXLAN
5060/5061	UDP/TCP	SIP / SIP-TLS
6081	UDP	Geneve (used by Azure GWLB and others)
8443	TCP	FortiGate GUI alternate, common admin port
10443	TCP	FortiGate SSL VPN default

Reference RFCs

RFC	Title
768	UDP
791	IPv4
792	ICMP
1191	Path MTU Discovery
2474 / 3168	DSCP / ECN
4443	ICMPv6
5681	TCP Congestion Control
8200	IPv6
9293	TCP (current, obsoletes RFC 793)

QoS

Scope

QoS only matters when there is congestion. When the pipe is wider than demand, QoS does nothing useful. When demand exceeds capacity, QoS decides what gets through first and what waits or drops. The full job is: classify, mark, then enforce (queue, police, shape, drop). This collapsible focuses on the practical reference values and FortiGate-specific behaviour, with deep dives into Microsoft Teams QoS and SD-WAN shaping.

The ToS byte: DSCP + ECN

Byte 1 of the IPv4 header, also Byte 1 of the IPv6 Traffic Class field. Same format, same semantics.

 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|   DSCP    |ECN|
+-+-+-+-+-+-+-+-+

Top 6 bits = DSCP. Bottom 2 bits = ECN. The legacy "IP Precedence" field was the top 3 bits of DSCP, which is why CS values (Class Selector) line up: CS5 (5 in IPPrec) = 101000 in DSCP = decimal 40.

DSCP codepoint reference

Name	Decimal	Hex	Binary	Typical use
CS0 / DF (Best Effort)	0	`0x00`	`000000`	Default. Everything unmarked.
CS1	8	`0x08`	`001000`	Scavenger / lower than best effort (bulk backups, P2P)
AF11 / AF12 / AF13	10 / 12 / 14	`0x0A` / `0x0C` / `0x0E`	`001010` / `001100` / `001110`	Low priority data, drop precedence 1/2/3
CS2	16	`0x10`	`010000`	OAM, network management
AF21 / AF22 / AF23	18 / 20 / 22	`0x12` / `0x14` / `0x16`	`010010` / `010100` / `010110`	Low-latency data (transactional). AF21 = Teams screen share
CS3	24	`0x18`	`011000`	Call signalling (SIP, H.323, SCCP)
AF31 / AF32 / AF33	26 / 28 / 30	`0x1A` / `0x1C` / `0x1E`	`011010` / `011100` / `011110`	Multimedia streaming (one-way video)
CS4	32	`0x20`	`100000`	Real-time interactive (gaming)
AF41 / AF42 / AF43	34 / 36 / 38	`0x22` / `0x24` / `0x26`	`100010` / `100100` / `100110`	Multimedia conferencing. AF41 = Teams video
CS5	40	`0x28`	`101000`	Broadcast video (legacy)
VA (Voice Admit)	44	`0x2C`	`101100`	Admission-controlled voice (RFC 5865)
EF (Expedited Forwarding)	46	`0x2E`	`101110`	Real-time / voice. Teams audio
CS6	48	`0x30`	`110000`	Network control (OSPF, BGP, ISIS). Do NOT use for user traffic
CS7	56	`0x38`	`111000`	Reserved. Don't use.

How to read AFxy

x (first digit) = the queue / class. 1=low priority data, 2=low-latency data, 3=streaming, 4=conferencing.
y (second digit) = drop precedence within that class. 1=lowest probability of drop, 3=highest. WRED uses this to decide what to drop first under congestion.
So AF43 means "conferencing class, drop me first if congested". AF41 means "conferencing class, drop me last".

ECN bits

Value	Code	Meaning
`00`	Not-ECT	Sender is not ECN capable. Drop on congestion.
`10`	ECT(0)	ECN capable. Mark instead of drop.
`01`	ECT(1)	ECN capable. Same as ECT(0) functionally.
`11`	CE	Congestion Experienced. A router along the path is congested.

An ECN-aware router sees congestion, finds ECT(0)/ECT(1), and flips both bits to 11 (CE). Receiver echoes via TCP ECE flag in the next ACK. Sender reduces window (sends CWR flag in next segment). No packet drop, no retransmit. ECN works only end-to-end if every hop and both endpoints support it. Many middleboxes still zero it out.

802.1Q PCP / CoS (Layer 2 priority)

Inside the 802.1Q tag, the TCI (Tag Control Information) field is 16 bits:

 0                   1
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PCP |D|     VLAN ID (VID)     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

PCP (Priority Code Point) = 3 bits, also called CoS or 802.1p. DEI = 1 bit, drop eligibility indicator. VID = 12 bits, VLAN ID 0-4095.

PCP	IEEE name	Typical use
0	BK - Background	Bulk, lowest priority
1	BE - Best Effort	Default
2	EE - Excellent Effort	Important data
3	CA - Critical Applications	Call signalling
4	VI - Video	Conferencing video, streaming
5	VO - Voice	Real-time voice (<10ms latency)
6	IC - Internetwork Control	OSPF, routing
7	NC - Network Control	STP, LLDP

Standard DSCP ↔ CoS mapping

Most vendors copy the top 3 bits of DSCP into the 3-bit CoS field. Useful when a router strips the 802.1Q tag on egress to a routed interface and the L2 priority gets lost - the DSCP survives.

DSCP	CoS / PCP	Why
EF (46) / VA (44)	5	Top 3 bits = 101 = 5 (Voice)
AF41-43 (34-38) / CS4 (32)	4	Top 3 bits = 100 = 4 (Video)
AF31-33 (26-30) / CS3 (24)	3	Top 3 bits = 011 = 3 (Signalling)
AF21-23 (18-22) / CS2 (16)	2	Top 3 bits = 010 = 2 (Excellent Effort)
AF11-13 (10-14) / CS1 (8)	1	Top 3 bits = 001 = 1 (Best Effort)
CS0 (0)	0	Top 3 bits = 000
CS6 (48)	6	Network control
CS7 (56)	7	Reserved

Trust boundaries

The trust boundary is the point in your network where you start honouring whatever DSCP/CoS the upstream device set. Inside: trust. Outside: classify and re-mark.
Access ports (user devices): generally don't trust. Re-mark from CoS 0 / DSCP 0 unless the device is known good (Teams client with GPO, SIP phone with config).
SIP phone uplink with PC dangling off it: trust CoS from the phone (CDP/LLDP-MED tells you what VLAN and how to trust), don't trust the PC behind it.
Inter-switch trunks within your admin domain: trust.
WAN edge (to ISP / MPLS): re-mark to whatever the carrier expects. Some scrub on ingress (set to 0), some preserve, some map DSCP to MPLS EXP. Confirm in writing with the carrier.
Internet egress: assume markings will be wiped by the first ISP hop. End-to-end DSCP only works inside one admin domain or across a contracted MPLS service.

Microsoft Teams QoS - the canonical configuration

Traffic type	Source port range	Protocol	DSCP	Class
Audio	50000-50019	TCP / UDP	46	EF (Expedited Forwarding)
Video	50020-50039	TCP / UDP	34	AF41
Application / Screen sharing	50040-50059	TCP / UDP	18	AF21

These are source port ranges on the Teams client. You must enable QoS in the Teams Admin Centre (Meetings > Meeting settings) AND configure the client to insert DSCP markings via GPO/Intune. By default Teams uses any ephemeral port 1024-65535 and classification by port range fails.

Where to enable Teams QoS marking

Teams Admin Centre tenant setting: Meetings > Meeting settings > Network > Quality of Service. Turn on "Insert Quality of Service (QoS) markers" and lock to the 50000-50059 port ranges.
Windows clients (domain joined): GPO Policy-based QoS targeting ms-teams.exe (new Teams) and teams.exe (classic), DSCP 46, source ports 50000-50019, protocol TCP+UDP. Repeat for video and sharing.
Windows clients (Intune managed): NetworkQoSPolicy CSP via Intune. Same OMA-URI settings, applied per-device.
Windows PowerShell (one-off): New-NetQosPolicy -Name "TeamsAudio" -AppPathNameMatchCondition "ms-teams.exe" -IPProtocolMatchCondition Both -IPSrcPortStartMatchCondition 50000 -IPSrcPortEndMatchCondition 50019 -DSCPAction 46
Teams Rooms on Android: tenant-level only. Note: Android Teams Rooms uses DSCP 34 (AF41) for both video AND screen sharing, not 18.
Teams Rooms on Windows: same as Windows clients.
macOS / Linux Teams clients: client-side DSCP marking is limited or absent. Mark at the network layer (FortiGate shaping policy matching the port ranges).

Teams QoS gotchas

Symptom / situation	Cause / fix
QoS enabled but packets unmarked in capture	Tenant setting not on. Or client built date predates QoS support. Or admin port range not locked (still using 1024-65535).
Markings present LAN-side, absent WAN-side	ISP scrubbed DSCP. Mark again at WAN edge or accept that DSCP only works to the SD-WAN tunnel egress.
Audio fine, video and sharing degrade	EF queue is fine but AF41/AF21 are not policed/protected. Build a hierarchical shaper that reserves bandwidth for AF41 too.
Mac users complain, Windows fine	Mac client doesn't reliably mark DSCP. Use FortiGate to classify by source port range and mark on the firewall.
"Why is my BGP / OSPF flapping under load?"	You marked routing traffic with low DSCP, or you didn't and EF is starving it. Network control traffic should always be CS6.
Bandwidth usage estimate	Audio ~100 kbps, video ~1.5 Mbps per stream, screen share ~500 kbps - 4 Mbps. Plan EF queue at audio+headroom, AF41 for sum of expected concurrent video.

Queueing strategies

Method	How it works	Trade-off
FIFO	First in, first out. Single queue.	No QoS at all. Default on simple ports.
PQ (Priority Queueing)	Strict priority - high queue served before low.	High can starve everything else. Only safe with policed high.
WFQ (Weighted Fair Queueing)	Flows get fair share weighted by IP precedence.	Flow-based, no admin control over which apps win.
CBWFQ (Class-Based WFQ)	Admin-defined classes, each with a bandwidth guarantee.	Better but voice can still queue behind data in its class.
LLQ (Low Latency Queueing)	CBWFQ + a strict-priority queue for voice/real-time, policed.	Industry standard for voice/video. Police the PQ class so it can't starve others.
HQF / H-QoS	Hierarchical: parent shaper sets the ceiling, child classes share within it.	What FortiGate shaping profiles implement. Best for WAN edges.

Policing vs shaping

Policing: hard ceiling. Excess is dropped (or re-marked) immediately. No buffering. Low memory, sharp edges, can be brutal on TCP. Use on ingress to protect downstream.
Shaping: smooth ceiling. Excess is buffered and released at the configured rate. Adds latency but avoids loss. Use on egress to fit the next-hop pipe (ISP CIR, MPLS contract rate).
FortiGate does shaping, not policing. Conceptually it's a token-bucket egress shaper.
Rule of thumb: shape down, police up. Shape your egress to the carrier rate. Police user-side ingress to protect your shaper from overflowing.

WRED / RED - drop strategies

Tail drop (default): queue full, drop new arrivals. Causes global TCP synchronisation - everyone halves window at once, queue drains, everyone ramps up together, queue refills, repeat. Sawtooth utilisation.
RED (Random Early Detection): probabilistic drop as queue depth grows. Smooths TCP behaviour by hitting individual flows at different times.
WRED (Weighted RED): per-class drop curves. AF11 starts dropping at 30% queue depth, AF13 at 60% - drop precedence in action. EF rarely uses WRED because it's normally strict-priority and policed instead.

FortiGate QoS - architecture and quirks

FortiGate shapes on egress only. To limit download speed for users, apply a shaper on the LAN-side interface (the egress for return traffic) - or use the reverse field on the shaping policy.
Three shaper types: shared (one bucket per policy or aggregate), per-IP (one bucket per source IP within a policy), interface-based / shaping profile (hierarchical, attached to physical interface).
Shaping policies are separate from firewall policies, evaluated after the firewall policy matches. Order matters.
outbandwidth must be set on the interface for any shaping profile to work. Without a ceiling, the shaper has nothing to apportion. This is the single most common QoS misconfig on FortiGate.
SD-WAN service rules can match on DSCP and steer based on it (dscp-forward enable). Useful when CE-marked traffic should prefer MPLS over internet.
Three priority levels: high, medium, low. Within a priority, guaranteed-bandwidth is honoured first, then maximum-bandwidth caps.

FortiGate shaper + shaping policy (Teams audio example)

config firewall shaper traffic-shaper
    edit "Teams-Audio-EF"
        set maximum-bandwidth 100000
        set per-policy enable
        set priority high
        set diffserv-forward enable
        set diffservcode-forward 101110
        set diffserv-reverse enable
        set diffservcode-rev 101110
    next
    edit "Teams-Video-AF41"
        set maximum-bandwidth 4000000
        set per-policy enable
        set priority medium
        set diffserv-forward enable
        set diffservcode-forward 100010
    next
    edit "Bulk-CS1"
        set guaranteed-bandwidth 1000
        set maximum-bandwidth 1048576
        set priority low
        set diffserv-forward enable
        set diffservcode-forward 001000
    next
end

config firewall shaping-policy
    edit 1
        set name "Teams audio (port 50000-50019)"
        set srcaddr "internal-subnets"
        set dstaddr "all"
        set service "ALL_UDP"
        set srcintf "internal"
        set dstintf "virtual-wan-link"
        set traffic-shaper "Teams-Audio-EF"
        set traffic-shaper-reverse "Teams-Audio-EF"
    next
end

For port-range matching you'll typically build a custom firewall service object with the UDP source port range, then reference it in the shaping-policy service field.

FortiGate hierarchical shaping profile (egress)

config firewall shaping-profile
    edit "wan-uplink"
        set type queuing
        set default-class-id 31
        config shaping-entries
            edit 1
                set class-id 5
                set priority high
                set guaranteed-bandwidth-percentage 30
                set maximum-bandwidth-percentage 100
            next
            edit 2
                set class-id 10
                set priority medium
                set guaranteed-bandwidth-percentage 40
                set maximum-bandwidth-percentage 100
            next
            edit 3
                set class-id 31
                set priority low
                set guaranteed-bandwidth-percentage 10
                set maximum-bandwidth-percentage 100
            next
        end
    next
end

config system interface
    edit "wan1"
        set egress-shaping-profile "wan-uplink"
        set outbandwidth 100000
    next
end

Match traffic into class IDs via shaping policies (set class-id 5) or via firewall policies. Class IDs 2-31 are admin-defined; class 0 is internal, class 1 is reserved.

FortiGate SD-WAN service rule with DSCP forwarding

config system sdwan
    config service
        edit 1
            set name "Teams-EF-prefer-MPLS"
            set mode priority
            set priority-members 1 2
            set dst "Microsoft-365-services"
            set tos 0xb8
            set tos-mask 0xfc
        next
        edit 2
            set name "Set-EF-on-egress"
            set priority-members 1
            set dscp-forward enable
            set dscp-forward-tag 101110
            set dst "voip-server"
        next
    end
end

The tos / tos-mask pair matches packets already marked. 0xb8 = 10111000 = EF (46) shifted left 2 bits because ToS field is 8 bits wide. Mask 0xfc ignores the bottom 2 ECN bits. dscp-forward-tag sets a new DSCP on egress through the SD-WAN.

FortiGate QoS troubleshooting

diagnose firewall shaper traffic-shaper list
diagnose firewall shaper per-ip-shaper list
diagnose firewall shaper interface-shaper list
diagnose firewall iprope list 100015
diagnose netlink interface list <intf>
get system performance status
diag sniffer packet any 'host 203.0.113.10' 4 0 a    # 'a' includes timestamp; check TOS byte

diag firewall shaper traffic-shaper list shows per-shaper counters: current bandwidth, packets dropped, bytes dropped. Drops here mean your shaper is doing its job.
diag firewall iprope list 100015 shows shaping-policy rules with the shaper attached (forward and reverse). This is where you confirm a policy is actually getting the shaper you expect.
get system performance status shows CPU. Shaping is CPU-bound on lower-end models, so check before blaming the shaper for slowness.

Wireshark filters for QoS work

Filter	What it catches
`ip.dsfield.dscp == 46`	EF packets (Teams audio if marked)
`ip.dsfield.dscp == 34`	AF41 (Teams video)
`ip.dsfield.dscp == 18`	AF21 (Teams screen share)
`ip.dsfield.dscp != 0`	Anything marked (sanity check that markings are surviving the path)
`ip.dsfield.ecn != 0`	ECN-capable packets, including CE-marked
`ip.dsfield.ecn == 3`	CE-marked (congestion experienced) - someone's queue is filling
`vlan.priority == 5`	CoS 5 frames (voice on 802.1Q)
`vlan.priority != 0`	Any non-default CoS
`udp.srcport >= 50000 and udp.srcport <= 50019`	Teams audio source port range

Common gotchas across the board

Gotcha	What's actually happening
FortiGate shaper does nothing	`outbandwidth` not set on the interface. Or shaping policy not matching - check `diag firewall iprope list 100015`.
Download speed not limited by FortiGate shaper	You shaped the WAN interface egress (uploads). For downloads, shape on the LAN interface egress OR use `traffic-shaper-reverse`.
DSCP wiped at ISP	Normal for internet. Re-mark inside the SD-WAN overlay, or accept end-to-end DSCP only works to the tunnel egress.
IPsec strips inner DSCP	Default behaviour: outer ESP packet gets DSCP 0. Configure copy-dscp on the IPsec tunnel (`set copy-tos enable` on FortiGate `vpn ipsec phase1-interface`) to preserve.
MPLS provider doesn't honour your DSCP	Most map DSCP→EXP at PE ingress, with a contracted mapping. Confirm the mapping table with the carrier. Default is usually first 3 bits of DSCP into 3-bit EXP.
Voice quality worse with QoS enabled	EF queue too small or strict-priority not policed - EF traffic exceeds its allocation and tail-drops itself. Or you marked everything EF and now nothing is preferred.
SD-WAN performance SLA + QoS not playing nice	SD-WAN steers based on link health (jitter/loss/latency). QoS prioritises within a link. They're complementary, not redundant. SLA decides which member, QoS decides queue order on that member.
Teams traffic marked EF but not surviving the tunnel	FortiGate IPsec `copy-dscp` or `set copy-tos enable` on phase1-interface. Without it, inner DSCP is lost on the outer ESP header.
Switch trust on access port marked everything CS6	Trust was on, attacker / misconfigured device sent DSCP 48 (CS6 = network control). Either don't trust access ports, or use port-level policers that clamp DSCP to a max value.

Reference RFCs

RFC	Title
2474	DSCP definition (replaces ToS interpretation)
2475	DiffServ architecture
2597	Assured Forwarding (AF) PHB group
3168	ECN
3246	Expedited Forwarding (EF) PHB
4594	DiffServ Service Class configuration guidelines (the "what should I use for X" guide)
5865	Voice-Admit (VA) codepoint
IEEE 802.1Q	PCP, DEI, VID definitions

Network engineering. Sharpened.

Debug Flow

Trace a packet through FortiOS

Packet Sniffer

Build a sniffer one-liner

VPN

IPsec / RAVPN IKE debug (filtered)

Routing & BGP

SD-WAN

Firewall

HA & System

VoIP & SIP

Concepts

Network engineering.
Sharpened.