util/linuxfw: fix nftables endianness and add connmark conditional check (#19725)

Fix the following issues:

1. Endianness Bug: The nftables runner used hardcoded
   big-endian byte arrays for firewall mark values (0xff0000, etc.), breaking
   bitwise operations on little-endian systems (all x86/x64, ARM). This caused
   connmark save/restore rules to silently fail. Fixed by using
   binary.NativeEndian to generate correct byte order for the host system.

2. Connmark Restore Conditional Check: The connmark restore
   mechanism unconditionally overwrote packet marks, even when Tailscale
   hadn't set any mark bits in conntrack. This destroyed mark bits set by
   other systems (VPNs, policy routing, vendor flags), breaking coexistence.
   Fixed by adding a conditional check to only restore when (ct mark &
   0xff0000) != 0, preventing the worst case of wiping all marks to zero.

Changes:
- util/linuxfw/linuxfw.go: Added nativeEndianUint32() helper and updated
  all mask functions to use native byte order instead of hardcoded bytes
- util/linuxfw/nftables_runner.go: Added conditional check in
  makeConnmarkRestoreExprs() to only restore when ct mark has Tailscale
  bits set; added detailed comment about bit preservation limitations
- util/linuxfw/iptables_runner.go: Added conditional check using -m
  connmark ! --mark to match nftables behavior
- Tests updated: Fixed byte-level regression tests to expect little-endian
  byte sequences and verify the new conditional check

Note: Perfect bit preservation in nftables remains challenging
due to nftables expression VM limitations. The current implementation
prevents the critical case of wiping marks with zero.

Updates #3310
Fixes #11803
Related to #8555

Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
This commit is contained in:
Mike O'Driscoll
2026-05-14 09:11:24 -04:00
committed by GitHub
parent e7415e6393
commit 48919f708b
5 changed files with 55 additions and 15 deletions
+26 -4
View File
@@ -1917,9 +1917,7 @@ func (n *nftablesRunner) DelSNATRule() error {
}
func nativeUint32(v uint32) []byte {
b := make([]byte, 4)
binary.NativeEndian.PutUint32(b, v)
return b
return nativeEndianUint32(v)
}
func makeStatefulRuleExprs(tunname string) []expr.Any {
@@ -2106,6 +2104,24 @@ func (n *nftablesRunner) DelStatefulRule(tunname string) error {
// makeConnmarkRestoreExprs creates nftables expressions to restore mark from conntrack.
// Implements: ct state established,related ct mark & 0xff0000 != 0 meta mark set ct mark & 0xff0000
//
// LIMITATION: Unlike iptables CONNMARK --restore-mark with --nfmask, this implementation
// overwrites non-Tailscale bits in the packet mark rather than merging them. This is a
// fundamental limitation of the Linux kernel's nftables expression VM (not the Go library).
//
// The nftables Bitwise expression only supports: (register & CONSTANT_MASK) ^ CONSTANT_XOR.
// It cannot perform register-to-register operations needed for perfect bit preservation:
//
// meta mark = (meta mark & ~0xff0000) | (ct mark & 0xff0000)
// ^^^^^ ^^^^^^^
// needs meta mark and ct mark combined
//
// In contrast, iptables CONNMARK is a specialized kernel module with custom C code that
// can atomically merge marks from different sources.
//
// The conditional check (ct mark & 0xff0000 != 0) prevents the worst case of wiping all
// mark bits to zero. Perfect bit preservation would require kernel
// changes to add register-to-register bitwise operations to nftables.
func makeConnmarkRestoreExprs() []expr.Any {
return []expr.Any{
// Load conntrack state into register 1
@@ -2141,7 +2157,13 @@ func makeConnmarkRestoreExprs() []expr.Any {
Mask: getTailscaleFwmarkMask(),
Xor: []byte{0x00, 0x00, 0x00, 0x00},
},
// Set packet mark from register 1
// Check if masked ct mark is non-zero (critical: prevents wiping marks with 0)
&expr.Cmp{
Op: expr.CmpOpNeq,
Register: 1,
Data: []byte{0, 0, 0, 0},
},
// Set packet mark from register 1 (contains ct mark & 0xff0000)
&expr.Meta{
Key: expr.MetaKeyMARK,
SourceRegister: true,