Commit Graph

10656 Commits

Author SHA1 Message Date
codinget c2ddadca72 fix(wasm): validate ping type early; fallback DNS resolver for exit node
Add a switch guard before the 30-second context in ping() so that invalid
ping type strings (e.g. "disco" vs "Disco") reject immediately with a clear
error rather than silently timing out because userspaceEngine.Ping has no
default case.

For queryDNS(), detect SERVFAIL responses returned with an empty resolver
list (the typical state when an exit node is active but the DNS manager
forwarder has no configured upstreams) and fall back to querying 8.8.8.8
via the dialer — which honours exit-node routing — for A/AAAA record types.
Fall further back to the browser's native resolver if UserDial fails.

Also accept bare IP addresses in whoIs() (in addition to ip:port) so
callers don't need to fabricate a port when they only have a peer IP.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 18:01:34 +00:00
codinget 453261aef0 feat(tsconnect): add peerAPIURL to netmap and localAPI in-process bridge
Include the PeerAPI base URL (http://ip:port) in every node entry of the
notifyNetMap payload — for self via LocalBackend.GetPeerAPIPort, for peers
by reading the PeerAPI4/PeerAPI6 Services entries in their Hostinfo. The URL
mirrors the address-family preference used by peerAPIBase (prefer IPv4).

Add a localAPI(method, path, body?) WASM binding that dispatches in-process
HTTP requests directly to a LocalAPI handler with full read/write/cert
permissions, returning {status, body}. Enables TypeScript callers to access
any LocalAPI endpoint (ACL policy, Taildrive shares, etc.) without network
setup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 18:01:34 +00:00
codinget bd124abc3c feat(tsconnect): add whoIs, queryDNS, ping, suggestExitNode WASM bindings
Expose four LocalBackend capabilities to JavaScript:
- whoIs(addrPort, proto?): resolves a connecting ip:port to a tailnet node
  and user profile; returns null for unknown peers
- queryDNS(name, type?): queries the tailnet DNS resolver (MagicDNS +
  upstream); parses A/AAAA/CNAME/TXT answers into strings
- ping(ip, type?, size?): pings a tailnet peer (TSMP, disco, ICMP, peerapi)
  with a 30 s context timeout; returns latency and path details
- suggestExitNode(): asks the coordination server for the best exit node

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 17:56:31 +00:00
codinget 9fd2f3bbf4 feat(tsconnect): add getCert, listenTLS, setFunnel + fix TLS cert for WASM
Enable ACME TLS certificates on js/wasm by dropping the !js build tag from
cert.go and routing storage through the state store. Add getCert, listenTLS,
and setFunnel WASM bindings with a combinedTLSListener that merges Funnel
ingress and direct tailnet connections. Notify the control plane immediately
after serve config changes to accelerate Funnel DNS provisioning.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 17:56:31 +00:00
codinget a6b286b414 fix(tsconnect): pin types to avoid monorepo @types pollution
Replace skipLibCheck with an explicit types list so TypeScript and
dts-bundle-generator only auto-include @types/golang-wasm-exec and
@types/qrcode, preventing @types/eslint-scope and @types/ws from
leaking in from a parent node_modules when built inside a monorepo.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 17:56:31 +00:00
codinget bc9884ce69 fix(tsconnect): skipLibCheck to avoid monorepo @types conflicts
When tsconnect is built inside a JS monorepo, TypeScript walks up the
directory tree and auto-discovers @types/eslint-scope and @types/ws
from the root node_modules, causing spurious type errors unrelated to
tsconnect itself. skipLibCheck suppresses these.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 17:56:31 +00:00
codinget 3f52ae7be2 fix(tsconnect): lowercase name/size in waitingFiles JSON
apitype.WaitingFile has no json tags so it serialised as {Name, Size}.
Introduce a local jsWaitingFile struct with json:"name" / json:"size"
so the JS side receives idiomatic camelCase property names.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 17:56:31 +00:00
codinget fbc7982e01 fix(taildrop): restore incoming file progress notifications
The io.Copy in PutFile was writing directly to wc, bypassing the
incomingFile wrapper whose Write method increments f.copied and fires
a throttled sendFileNotify on progress. As a result, notifyIncomingFiles
on the JS side only ever fired once (on completion) with received=0,
making progress UI impossible. The original inFile wrapping was lost
during the Android SAF refactor.

Also surface the PartialFile.Done flag through jsIncomingFile so JS can
distinguish the final "transfer complete" notification from in-progress
updates.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 17:56:31 +00:00
codinget c4a2eb3451 fix(tsconnect): guard nil n.Prefs in notify callback
n.Prefs is *PrefsView (a pointer), so calling n.Prefs.Valid() on a
Notify where Prefs is nil auto-dereferenced nil and panicked. The
callback's defer recover() swallowed the panic, which meant every
Notify without Prefs (Health-only, FilesWaiting, IncomingFiles,
OutgoingFiles, etc.) never reached the file-related JS calls.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 17:56:31 +00:00
codinget 705eebe5fc feat(tsconnect): add outgoing file transfer progress notifications
- Export UpdateOutgoingFiles on taildrop.Extension so it can be called
  from outside the package (wasm bridge, package main).
- Wrap sendFile's PUT body with progresstracking.NewReader so bytes-sent
  is sampled roughly once per second during transfer.
- Create an OutgoingFile entry (with UUID, peer ID, name, declared size)
  before the PUT and call UpdateOutgoingFiles on each progress tick and
  on completion (setting Finished/Succeeded). This flows into the IPN
  notify stream as OutgoingFiles notifications.
- Add jsOutgoingFile struct and wire n.OutgoingFiles into a new
  notifyOutgoingFiles callback in run(), mirroring notifyIncomingFiles.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 17:56:31 +00:00
codinget 4ef06f2498 feat(tsconnect): add notifyFilesWaiting and notifyIncomingFiles callbacks
Wire two new callbacks into the IPN notify stream:

- notifyFilesWaiting: fires when a completed inbound transfer is staged
  and ready to retrieve via waitingFiles(). Triggered by n.FilesWaiting
  in the notify stream.
- notifyIncomingFiles: fires with a JSON snapshot of in-progress inbound
  transfers whenever progress changes (roughly once per second while
  active, plus once at completion). The jsIncomingFile struct carries
  name, started (Unix ms), declaredSize, and received bytes. An empty
  array indicates all active transfers have finished.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 17:56:31 +00:00
codinget bdfcc55797 feat(taildrop): fix DirectFileMode, void callbacks, and empty WaitingFiles
- Add SetStagedFileOps to Extension: sets fileOps without enabling
  DirectFileMode, so WASM clients use staged retrieval (WaitingFiles,
  OpenFile, DeleteFile) instead of direct-write mode.
- Add directFileOps bool field: SetFileOps (Android SAF) sets it true;
  SetStagedFileOps (WASM JS) leaves it false. onChangeProfile now uses
  `fops != nil && e.directFileOps` to determine DirectFileMode.
- Add jsCallVoid to jsFileOps: void ops (openWriter, write, closeWriter,
  remove) now use cb(err?: string) instead of cb(null, err: string).
- Fix waitingFiles() returning JSON null when no files are waiting:
  normalise nil slice to empty slice before marshalling.
- Update wireTaildropFileOps to call SetStagedFileOps.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 17:56:31 +00:00
codinget 2ddaf2f5aa feat(tsconnect): expose exit node selection to JS
Add exit node support to the wasm JS bridge:

- Include `exitNodeOption` and `stableNodeID` on each peer in the
  notifyNetMap payload so callers can identify which peers are exit
  nodes and reference them by stable ID.
- Call `notifyExitNode(stableNodeID)` whenever prefs change, so
  callers can track which exit node (if any) is currently active.
- Expose `setExitNode(stableNodeID)` — sets ExitNodeID via EditPrefs.
- Expose `setExitNodeEnabled(enabled)` — toggles the last-used exit
  node on/off via SetUseExitNodeEnabled.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 17:56:31 +00:00
codinget 8357137a59 feat(tsconnect): add TCP listening to ipn.listen
Extend ipn.listen to also accept "tcp"/"tcp4"/"tcp6" and return a
TCPListener bound to a netstack gonet.TCPListener. The listener
exposes accept/close/addr like a Go net.Listener and additionally
implements Symbol.asyncIterator so JS callers can write:

  for await (const conn of listener) { ... }

The async iterator returns done when the listener is closed (via
errors.Is(net.ErrClosed)) and rejects on any other accept error.
Symbol-keyed properties are set via Reflect.set since syscall/js
only exposes string-keyed Set.
2026-05-18 01:18:09 +00:00
codinget 4acd937b0f feat(tsconnect): expose dialTLS to JS
Add ipn.dialTLS(addr, opts?) which dials a TCP connection through
the Tailscale dialer and performs a TLS handshake on top, returning
a JS Conn just like ipn.dial.

WASM has no system root pool, so verification defaults to the
baked-in LetsEncrypt ISRG roots already linked via net/bakedroots.
That covers any tailnet HTTPS endpoint provisioned via
`tailscale cert`. Callers can override with opts.caCerts (PEM) or
bypass entirely with opts.insecureSkipVerify, and override SNI with
opts.serverName.

Marginal binary cost is ~10 KiB on top of the existing ~31.6 MiB
wasm: crypto/tls and the x509 verification path are already pulled
in by control/controlclient and net/tlsdial.
2026-05-18 01:18:09 +00:00
codinget 301137edc4 feat(tsconnect): expose dial, listen and listenICMP to JS
Wire up the userspace networking primitives to the JS bridge so
browser callers can initiate outbound and receive inbound traffic
over the Tailscale network:

- ipn.dial(network, addr) wraps a tsdial UserDial into a JS Conn
  with read/write/close/localAddr/remoteAddr.
- ipn.listen(network, addr) wraps a netstack ListenPacket into a
  JS PacketConn with readFrom/writeTo/close/localAddr.
- ipn.listenICMP("icmp4"|"icmp6"|"icmp") creates a raw ICMP
  endpoint on the underlying gVisor stack and wraps it as a
  PacketConn for sending/receiving ping traffic.

To support listenICMP, netstack.Impl gains a Stack() accessor that
returns the underlying *stack.Stack so jsIPN can call NewEndpoint
with icmp.ProtocolNumber4/6.

Binary I/O uses js.CopyBytesToGo / js.CopyBytesToJS to move bytes
across the syscall/js boundary without base64 round-trips.
2026-05-18 01:18:09 +00:00
codinget dec913b1e3 fix(tsconnect): drop nethttpomithttp2 build tag
After 1d93bdce2 ("control/controlclient: remove x/net/http2, use
net/http"), the noise control client uses net/http's Transport with
Protocols.SetUnencryptedHTTP2(true). The nethttpomithttp2 build tag
strips the bundled HTTP/2 implementation from net/http, so at runtime
the control client fails the first register request with "http:
Transport does not support unencrypted HTTP/2" and the wasm never
connects.

Drop the tag so the bundled HTTP/2 ships in the wasm binary.
2026-05-18 01:18:09 +00:00
Brad Fitzpatrick 2b338dd6a8 wgengine, cmd/tailscaled, control/controlclient: remove Engine watchdog
The Engine watchdog wrapped every wgengine.Engine method call in a
goroutine with a 45s timeout and crashed the process on timeout. It
was added years ago to surface deadlocks during development, but the
underlying deadlocks have long since been fixed, and even when it did
fire it produced obscure stack traces (from inside the watchdog
goroutine, not the original caller) without buying much.

Audit of userspaceEngine's methods shows none have cyclic locking or
unbounded blocking now that ResetAndStop no longer loops waiting for
DERPs to drain (fa49009ee). The watchdog is dead weight; remove it
along with the TS_DEBUG_DISABLE_WATCHDOG escape hatch.

Updates #19759

Change-Id: Iba9d718fe1f8718a6631296e336b138c31b99ff1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-15 16:49:28 -07:00
Simon Law 5d1bf80597 feature/routecheck: add ts_omit_routecheck feature flag (#19638)
RouteCheck, which checks that overlapping routers are reachable, is
enabled by default for both tailscaled and tsnet.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-15 15:50:50 -07:00
Noel O'Brien 894ff5d8ee cmd/hello: split css and js into separate files (#19771)
Move the inline CSS and JS into separate files to be more friendly
to Content Security Policies. ServeHTTP is updated to serve these
assets from the '/static/' path.

Updates tailscale/corp#32398

Signed-off-by: Noel O'Brien <noel@tailscale.com>
2026-05-15 09:37:22 -07:00
Alex Chan 0cb432ed84 all: update more references to Tailnet/Network Lock
Updates tailscale/corp#37904

Change-Id: I09e73b3248b9ddf86dafe33dfb621bd560f6596d
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-15 16:23:50 +01:00
Fernando Serboncini c355618e73 wgengine/router/osrouter: skip netfilter add-ons when chain setup fails (#19757)
linuxRouter has two blocks (connmark rules and the CGNAT drop rule) that
gate on cfg.NetfilterMode, the requested config state. This may cause an
error when setNetfilterModeLocked fails, since it may keep assuming this
config is valid.

We now gate both blocks on r.netfilterMode, matching the pattern used by
SNAT, stateful, and loopback paths.

Fixes #19737

Change-Id: Ia6003a082db99c376e662132d725661afbac0ee9

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-05-15 09:32:30 -04:00
License Updater 1d3562b314 licenses: update license notices
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
2026-05-14 21:04:41 -07:00
Brad Fitzpatrick ef1bb5ac16 util/cibuild, cache_key_test: skip TestTsgoRevInCacheKey outside Tailscale CI
cibuild.On() returns true for any CI environment that sets CI=true,
including Alpine Linux's package build CI. TestTsgoRevInCacheKey was
guarded by cibuild.On() (or use of tsgo), so it ran under Alpine's CI
with stock Go, where go.toolchain.rev isn't blended into build cache
keys, and unsurprisingly failed.

Add cibuild.OnTailscaleCI, which keys off GITHUB_REPOSITORY_OWNER to
distinguish tailscale/tailscale's own GitHub Actions CI from arbitrary
downstream CI, and use it in TestTsgoRevInCacheKey.

Fixes #19754

Change-Id: Id31cfe71903a235f1460dca1e2fdf334e3ba1ee5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-14 15:55:05 -07:00
Brad Fitzpatrick fa49009eee wgengine: simplify ResetAndStop, drop drain loop
Since f343b496c3 ("wgengine, all: remove LazyWG, use wireguard-go
callback API for on-demand peers"), Reconfig is fully synchronous:
magicConn.UpdatePeers, wgdev.RemovePeer, router.Set, and dns.Set all
return when the work is done, and the peer list is updated under
wgLock before Reconfig returns. So after Reconfig with empty configs,
len(st.Peers) is already 0.

The old loop also waited for st.DERPs to drain to 0, but UpdatePeers
only edits maps; active DERP connections idle out on their own
timeout. The sole caller (LocalBackend.stopEngineAndWait) doesn't
inspect st.DERPs anyway; it just hands the Status to
setWgengineStatusLocked. So the drain-wait was for nothing observable
and could theoretically (or at least appear to readers to) loop
forever holding b.mu. Remove that reader confusion by removing
the backoff loop entirely.

Updates #19759

Change-Id: Ibfac3f0baabcad7604b713c934a8fc37932e0a50
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-14 15:45:38 -07:00
Brad Fitzpatrick 93440604e0 tstest/natlab/vmtest: add TestPeerRelay
Add a VM-based natlab test that exercises the peer-relay feature
(feature/relayserver) end-to-end across three Tailscale nodes whose
network topology makes a direct A<->B UDP path impossible: both peers
are behind HardNAT (FreeBSD/pfSense-style endpoint-dependent NAT) with
no port-mapping services, while the relay node is behind One2OneNAT so
its STUN-discovered WAN endpoint is reachable from both peers. The
test enables the relay server via EditPrefs, then waits for an a->b
PingDisco whose PingResult.PeerRelay is set (proving magicsock chose
the peer-relay path, not DERP), and finally asserts that the relay's
DebugPeerRelaySessions LocalAPI reports the session.

The existing TestPeerRelayPing in tstest/integration runs three
tailscaled processes on the loopback interface with no NATs; this new
vmtest covers peer relay through real per-VM kernels and NATs.

To wire control-server capabilities into vmtest, also add a
PeerRelayGrants() EnvOption (sibling of AllOnline,
SameTailnetUser) that flips testcontrol.Server.PeerRelayGrants so the
wildcard packet filter grants tailcfg.PeerCapabilityRelay and
PeerCapabilityRelayTarget; without those caps magicsock won't consider
any peer a candidate relay.

Updates #13038

Change-Id: Ib3440b83ec442da0d3b89ffa48ceea9398ea9062
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-14 14:47:29 -07:00
Andrew Lytvynov 9437a634e6 scripts/installer.sh: handle Zorin OS versions separately from Ubuntu (#19758)
Their version scheme is different, even though the OS is based on
Ubuntu. We need to check Zorin's version numbers to pick the right
APT_KEY_TYPE.

Updates #18925

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2026-05-14 14:04:04 -07:00
M. J. Fromberger 4eb977413a tstest/natlab/vmtest: add helpers for fatal step errors (#19753)
In a lot of places, we construct an error to End a step, then immediately log
it to the governing test as test fatal. Save ourselves a bit of boilerplate by
putting methods on Step for that.

There are a couple cases this doesn't cover, e.g., where we construct the Step
outside a subtest that wants to fail individually, but it helps enough to pay
for its lines.

Updates #13038

Change-Id: I71f9900942962de16609b6b198d3ba13d6958a5f
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-05-14 09:24:47 -07:00
Claus Lensbøl 8203edc099 .github/workflows: change natlab test trigger label (#19750)
The label "natlab" is a bit confusing and also used for other things.
Instead, change the trigger label to "run-natlab-tests".

Updates #13038

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-05-14 11:53:13 -04:00
Fernando Serboncini 2a06fb66d0 cmd/cloner: preserve nil-valued entries when cloning map (#19749)
The codegen path for map-of-slice-of-pointer fields, skipped
nil-valued entries. That dropped the key from the map.

This broke how dns.Config.Routes uses nil values sentinels.

Fixes #19730
Fixes #19732
Fixes #19746
Fixes #19744

Change-Id: Ic6400227f4ab21b3ca0e8c0eeecf9b83d145a9ab

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-05-14 10:30:59 -04:00
Mike O'Driscoll 48919f708b util/linuxfw: fix nftables endianness and add connmark conditional check (#19725)
Fix the following issues:

1. Endianness Bug: The nftables runner used hardcoded
   big-endian byte arrays for firewall mark values (0xff0000, etc.), breaking
   bitwise operations on little-endian systems (all x86/x64, ARM). This caused
   connmark save/restore rules to silently fail. Fixed by using
   binary.NativeEndian to generate correct byte order for the host system.

2. Connmark Restore Conditional Check: The connmark restore
   mechanism unconditionally overwrote packet marks, even when Tailscale
   hadn't set any mark bits in conntrack. This destroyed mark bits set by
   other systems (VPNs, policy routing, vendor flags), breaking coexistence.
   Fixed by adding a conditional check to only restore when (ct mark &
   0xff0000) != 0, preventing the worst case of wiping all marks to zero.

Changes:
- util/linuxfw/linuxfw.go: Added nativeEndianUint32() helper and updated
  all mask functions to use native byte order instead of hardcoded bytes
- util/linuxfw/nftables_runner.go: Added conditional check in
  makeConnmarkRestoreExprs() to only restore when ct mark has Tailscale
  bits set; added detailed comment about bit preservation limitations
- util/linuxfw/iptables_runner.go: Added conditional check using -m
  connmark ! --mark to match nftables behavior
- Tests updated: Fixed byte-level regression tests to expect little-endian
  byte sequences and verify the new conditional check

Note: Perfect bit preservation in nftables remains challenging
due to nftables expression VM limitations. The current implementation
prevents the critical case of wiping marks with zero.

Updates #3310
Fixes #11803
Related to #8555

Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
2026-05-14 09:11:24 -04:00
James Tucker e7415e6393 util/eventbus: unify Subscriber/SubscriberFunc cores; structural symmetry
Brings Subscriber[T] in line with the same non-generic-core pattern already
applied to SubscriberFunc[T] and Publisher[T]:

  - Renames subscriberFuncCore to subscriberCore and shares it between
    Subscriber[T] and SubscriberFunc[T]. Both typed facades hold a
    *subscriberCore plus their respective per-T delivery state
    (Subscriber: chan T; SubscriberFunc: nothing, the user callback is
    captured in the dispatch closure).

  - The bus's outputs map and subscriber-interface itab key on
    *subscriberCore for both subscriber kinds, so adding a new Subscribe[T]
    call site no longer pays a per-T itab, dictionary, or equality function
    for the subscriber-interface side.

  - Subscribe[T] now hoists the non-generic constructor portion into
    newSubscriberCore (timer setup, core allocation, cached type/typeName,
    unregister method-value), matching SubscribeFunc.

The dispatch loop is intentionally NOT extracted to a non-generic helper for
Subscriber[T], unlike SubscriberFunc[T]. The reason is the typed channel send
'case s.read <- t:' must appear lexically inside the select; the only way to
lift it into a non-generic loop is to bridge typed and untyped via a per-event
goroutine, which costs ~2.7x throughput on BenchmarkBasicThroughput. We keep
dispatchTyped on the generic facade and accept the per-shape stencil cost as
the cheaper alternative.

Symbol-level effect on tailscaled (linux/amd64, measured via
`go tool nm -size`):

  Before:
    (*Subscriber[T]).dispatch
      2 shape stencils:        1,682 + 1,549 = 3,231 B
      3 thin per-T wrappers:   124 B each   =   372 B
      2 deferwrap1 helpers:    62 B each    =   124 B
      total:                                 3,727 B

  After:
    (*Subscriber[T]).dispatchTyped
      2 shape stencils:        1,678 + 1,582 = 3,260 B
      0 per-T wrappers (replaced by closure stored on core)
      2 deferwrap1 helpers:    62 B each    =   124 B
      total:                                 3,384 B

  dispatch path .text delta:                   -343 B (-9.2%)

Per-shape stencils are ~1,600 B (.text body) + ~1,100 B (pclntab) =
~2,700 B each on production tailscaled. The shape count matches before/after
(two distinct GC shapes for the Subscriber[T] event types in this binary).
What changes is that the per-T thin wrappers are eliminated because
Subscriber[T] no longer implements the subscriber interface directly.

Whole-binary section deltas:

  .text:        -2,304 B  (includes the dispatch savings plus other
                            small downstream effects)
  .rodata:        +512 B  (additional closure-type metadata)
  .gopclntab:   -2,981 B  (fewer per-T compiled functions => less metadata)

Stripped tailscaled (linux/amd64): no change at the file level (the savings
fall below the linker's section-alignment boundary). Unstripped builds shrink
by ~2,900 B.

Behavior is unchanged:
  BenchmarkBasicThroughput:       2,161 ns/op,  0 B/op,  0 allocs/op
  BenchmarkBasicFuncThroughput:   2,493 ns/op, 144 B/op, 2 allocs/op
  BenchmarkSubsThroughput:        3,727 ns/op,  0 B/op,  0 allocs/op

Updates #12614

Change-Id: I97918ec68bd2cdb15958bbfd7687592b39663efe
Signed-off-by: James Tucker <james@tailscale.com>
2026-05-13 17:36:30 -07:00
Brad Fitzpatrick dc323b1351 derp/derpserver: collapse clients and clientsAtomic into one hashtriemap
Server.clientsAtomic was introduced in 6b729795c3 as a lock-free
mirror of Server.clients to skip Server.mu on the packet send hot
path. This drops the non-concurrent map and makes all the existing
callers of the old plain map just use the concurrent map, but still
holding Server.mu.

BenchmarkLookupDestHashTrie is unchanged at ~2ns/op.

Fixes #19726

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I0894e4d86914d152b9b5fef969a3184bcb96f678
2026-05-13 16:57:26 -07:00
Nick Khyl 4d68493144 health: avoid publishing health.Change when warnable visibility remains unchanged
Warnables with a non-zero TimeToVisible are only published on the eventbus when
they remain unhealthy long enough to become visible.

However, we still publish a health.Change when a warning that was never visible
(and was never published to the eventbus) becomes healthy.

This PR fixes that and reduces churn when there is no actual state change. In
particular, it avoids unnecessary IPN bus notifications sent to GUI/CLI clients,
captive portal detection, etc.

Updates tailscale/corp#39759 (noticed while working on it)

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2026-05-13 17:02:35 -05:00
Adriano Sela Aviles 41286c2b56 ipn/ipnlocal,tsd: add NoiseRoundTripper to tsd.Sys
Adds a new NoiseRoundTripper field to tsd.Sys
to expose an http.RoundTripper to make requests
over the control plane Noise connection.

This will be used in PAM use cases soon.

Updates tailscale/corp#41800

Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
2026-05-13 14:56:28 -07:00
Nick Khyl 32f984f54c net/dns: create a new hosts file if it doesn't exist on Windows
A missing hosts file is not a fatal error. We should log it, but still proceed
and create a new one instead of failing the DNS reconfiguration completely.

Fixes #19733

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2026-05-13 16:10:36 -05:00
Claus Lensbøl bb47ea2c6b tstest/natlab/vmtest: start migrating old natlab tests to vmtest (#19727)
Instead of having two entry points for running natlab tests, start
converting the connectivity tests to use the vmtest framework.

Grid and pair tests have yet to be moved over.

Updates #13038

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-05-13 16:44:53 -04:00
Fran Bull 3a6261b79b feature/conn25: keep addrAssignments through pool reconfig
Fixes tailscale/corp#40250

Signed-off-by: Fran Bull <fran@tailscale.com>
2026-05-13 11:00:47 -07:00
Simon Law e4e59a2af0 wgengine/netstack: stop inject goroutine from leaking in Impl.Start (#19721)
This patch fixes a data race in wgengine/netstack that surfaced while
running both TestTCPForwardLimits and TestTCPForwardLimits_PerClient.
Because these two tests both setup the TS_DEBUG_NETSTACK envknob, a
race happens because netstack.Impl.Close leaked its inject goroutine.
The inject goroutine also reads the TS_DEBUG_NETSTACK envknob, so if
it is still running when the next test starts, then it will break.

This patch also cleans up the tests a bit, ensuring that neither of
them run in T.Parallel. It also adds a T.Cleanup call to clear the
envknob.

Fixes #19720

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-13 08:13:40 -07:00
Simon Law 6467f0d067 ipn/ipnlocal: fix minor typo in shouldUseOneCGNATRoute (#19719)
This fixes a log message where ipn/ipnlocal.shouldUseOneCGNATRoute
would claim that an android machines was actually macOS.

Updates #cleanup
Updates #19652

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-12 21:55:29 -07:00
Brad Fitzpatrick 6b729795c3 derp/derpserver: use hashtriemap for peer lookup
Replace the process-global Server.mu lookup in the packet send hot path
with a global hashtriemap mirror of local clientSet entries. The
authoritative clients map remains guarded by Server.mu; clientsAtomic is
only a lock-free fast path for active local clients.

Misses, stale inactive client sets, duplicate accounting, and mesh
forwarding still fall back to lookupDestUncached. This avoids taking
Server.mu for the common local active-client send path, at the cost of
adding one global concurrent map that mirrors Server.clients for local
peers.

The benchmark uses four destination peers. The before run sets
TS_DEBUG_DERP_DISABLE_PEER_HASHTRIE=true to force the old mutex lookup
path; the after run uses the hashtrie fast path.

    goos: linux
    goarch: amd64
    pkg: tailscale.com/derp/derpserver
    cpu: Intel(R) Xeon(R) 6975P-C
                          │    before     │                after                │
                          │    sec/op     │   sec/op     vs base                │
    LookupDestHashTrie-16   176.050n ± 1%   1.904n ± 6%  -98.92% (p=0.000 n=10)

                          │   before   │             after              │
                          │    B/op    │    B/op     vs base            │
    LookupDestHashTrie-16   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
    ¹ all samples are equal

                          │   before   │             after              │
                          │ allocs/op  │ allocs/op   vs base            │
    LookupDestHashTrie-16   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
    ¹ all samples are equal

Updates #3560 (very indirectly, historically)
Updates #19713 (as an alternative to that PR)

Change-Id: Ifb72e5c9854ad00e938cd24c6ab9c27312f297e8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-12 16:08:16 -07:00
Adriano Sela Aviles 72578de033 ipn/{ipnlocal,localapi},client/local: add per-dst cap resolution for services
Adds two new cap resolution methods alongside the existing PeerCaps:

PeerCapsForService(src netip.Addr, svcName tailcfg.ServiceName) resolves
the service name to its VIP addresses via the node's service IP mappings
and returns caps scoped to that service. Exposed on /v0/whois via the
svc_name query parameter and on client/local.Client as WhoIsForService.

PeerCapsForIP(src, dst netip.Addr) resolves caps against an arbitrary
destination IP. Exposed on /v0/whois via the svc_addr query parameter
and on client/local.Client as WhoIsForIP.

svc_name takes priority over svc_addr when both are present. Invalid
values for either return 400. The existing PeerCaps/WhoIs path is
unchanged: without a service parameter, WhoIs returns only host-level
caps.

Updates tailscale/corp#41632

Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
2026-05-12 15:50:39 -07:00
DeedleFake ad8ead9c94 cmd/tailscale/cli: add RunWithContext
Fixes #12778

Change-Id: If9f8b299cef0cb68f93b344845b5c6a5b7554d2c
Signed-off-by: DeedleFake <deedlefake@users.noreply.github.com>
2026-05-12 12:27:55 -07:00
M. J. Fromberger 9f48567bf1 ipn/ipnlocal,wgengine/magicsock: add basic counters for cached peer connectivity (#19699)
Add new clientmetric counters for establishing contact with peers while using
cached network map data. To do this, instrument the magicsock.Conn with a bit
to indicate whether its peer data came from a cached netmap. If so, there are
two conditions we will count as establishing connectivity to a peer:

  - Receipt of a CallMeMaybe from a peer via disco.
  - Establishing a valid endpoint address for a peer.

In vmtest, add Env.ClientMetrics to scrape metrics from the specified node.
Use this to check that counters were updated in caching tests.

Updates https://github.com/tailscale/projects/issues/13
Updates #12639

Change-Id: Ie8cf3244ac8af4f5bcfe4d0d944078da2ba08990
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-05-12 12:01:05 -07:00
James Tucker 120bfcf1cc util/eventbus: extract non-generic SubscriberFunc constructor body and cache type name
Two changes that share the same intent of reducing per-T duplication
in code that doesn't actually depend on T:

1. Hoist the non-generic portion of newSubscriberFunc[T] into a
   newSubscriberFuncCore() helper. The hoisted work is the time
   timer setup, the subscriberFuncCore allocation, and the
   unregister closure (which captures only the non-generic
   reflect.Type and *subscribeState). The generic body now does
   only the two T-bound things it has to: compute reflect.TypeFor[T]
   and create the dispatch closure.

   Effect on the per-shape-stencil body of newSubscriberFunc[T]:
     before: 523 B per shape (in synthetic test)
     after:  293 B per shape (-230 B per shape; -56% on this body)

2. Cache reflect.Type.String() once at construction (in core.typeName)
   instead of recomputing it every time the dispatch closure runs.
   The dispatch closure also now takes the *subscriberFuncCore directly
   rather than building an intermediate dispatchFuncState struct on
   every call.

   Effect on the dispatch closure body (newSubscriberFunc[T].func1):
     before: 581 B per shape
     after:  480 B per shape (-101 B per shape; -17%)

Combined effect on tailscaled (linux/amd64):
  named-symbol savings via symcost: ~7 KB
  stripped binary delta:            -8 KB (page-quantized)
  arm64 binary delta:                0 (page-quantized)

  cumulative reduction from baseline (5167ff412):
    linux/amd64:  -110,592 bytes (-0.391%)
    linux/arm64:  -131,072 bytes (-0.499%)

Throughput is also improved by the typeName cache: BenchmarkBasic
goes from 2018 ns/op to 1864 ns/op (-7.6%) because the dispatch hot
path no longer allocates a string on every event.

Updates #12614

Change-Id: Ib3a3d6796785e16506330ec034e1144580d467a3
Signed-off-by: James Tucker <james@tailscale.com>
2026-05-12 11:16:04 -07:00
Brad Fitzpatrick 758ebe9839 tstest/natlab/vmtest: use short paths for Unix sockets
macOS limits Unix socket paths to 104 bytes. The Go test TempDir
path (e.g. /var/folders/.../TestDirectConnection...679197086/001/)
easily exceeds that, causing "bind: invalid argument". Create a
short /tmp/vmtest* directory for all socket files (vnet, QMP,
dgram) so the paths stay well under the limit on every platform.

Updates #13038

Change-Id: I721d24561d1766aaa964692bc77f40a131aa9455
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-11 21:54:27 -07:00
Brad Fitzpatrick f4c5613156 tstest/natlab/vmtest: don't require KVM; use TCG on macOS
startCloudQEMU hardcoded -machine q35,accel=kvm and -cpu host,
which fails on any host without KVM (notably macOS). Replace
with a qemuAccelArgs helper that probes /dev/kvm and falls back
to QEMU's TCG software emulation, matching the pattern already
used by tstest/integration/nat. Also wire the helper into
startGokrazyQEMU so gokrazy VMs pick up KVM when available.

Updates #13038

Change-Id: I7745518db823279b1880957bb14ca2ffdaab4c50
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-11 19:18:17 -07:00
Brad Fitzpatrick e062b46984 tstest/natlab, .github/workflows: add opt-in natlab CI workflow
The natlab vmtest suite (tstest/natlab/vmtest) and the integration nat
tests are gated behind --run-vm-tests because they need KVM and are
slow. Until now nothing in CI exercised them apart from a single
canary TestEasyEasy run on every PR.

Add .github/workflows/natlab-test.yml that runs the full opt-in suite
on demand (workflow_dispatch), on PRs labeled "natlab", and on main
every 12 hours via cron. The workflow has two phases:

  - "prepare" builds the gokrazy VM image, downloads the Ubuntu and
    FreeBSD cloud images once via the new natlabprep tool, and emits
    a dynamic JSON matrix of every TestX function it finds in the two
    opt-in packages.
  - "test" is a per-test matrix that depends on prepare. Each matrix
    job restores the shared caches and runs a single test, so adding
    a new TestFoo is automatically picked up on the next run without
    any workflow edits.

Rename the existing natlab-integrationtest.yml to natlab-basic.yml
since it's the small smoke variant (just TestEasyEasy on every PR);
the new natlab-test.yml is the bigger suite. The job inside is
renamed to EasyEasy for the same reason.

Move the macOS arm64 host check from vmtest.Env.Start into
vmtest.Env.AddNode so a test that adds a vmtest.MacOS node skips
immediately on a non-macOS host, and add an explicit
skipIfNotMacOSArm64 helper at the top of the two macOS-only tests
so the platform requirement is obvious to readers.

Quiet the takeAgentConnOne miss log in tstest/natlab/vnet by default
(it was the overwhelming majority of bytes in CI logs, with no signal
in healthy runs) and replace it with a periodic "still waiting" line
that only fires after 10s, so a truly stuck agent connection still
surfaces.

Updates #13038

Change-Id: I4582098d8865200fd5a73a9b696942319ccf3bf0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-11 17:14:46 -07:00
James Tucker 4eec4423b4 util/eventbus: move Publisher publisher-interface impl to a non-generic core
Mirrors the same refactor previously applied to SubscriberFunc:

  - Publisher[T]: a thin user-facing facade. Holds a pointer to a
    non-generic publisherCore and exposes Publish/Close/ShouldPublish.
  - publisherCore: a non-generic struct that owns the *Client back-
    pointer, stop flag, and cached reflect.Type. It implements the
    package-private publisher interface (publishType, Close).
    The bus's per-Client publisher set is set.Set[publisher] keyed
    on this single non-generic type.

The publisher interface only exists to support diagnostic
introspection (Debugger.PublishTypes returning the list of types a
client publishes). Previously, satisfying that diagnostic-only
interface forced *Publisher[T] to be the implementor and cost a
per-T itab, generic dictionary, and equality function on every
event type ever passed through Publish[T]. Moving the
implementation to a non-generic core lets the diagnostic surface
work unchanged while charging zero per-T cost for the
diagnostic-driven generic interface.

Publisher[T].Publish is also slimmed: the channel/select/stopFlag
loop is now a non-generic publish() helper that takes the value as
'any'. The per-T body is reduced to forwarding the boxed value to
the helper.

Measured impact (util/eventbus/sizetest):

  total per-flow binary cost:
    linux/amd64:  2252.8 B/flow -> 1900.5 B/flow  (-352.3 B / -15.6%)
    linux/arm64:  2228.2 B/flow -> 1835.0 B/flow  (-393.2 B / -17.6%)

  Publisher per-receiver attribution:
    linux/amd64:   635.2 B/flow ->  369.6 B/flow  (-265.6 B / -41.8%)
    linux/arm64:   751.7 B/flow ->  373.2 B/flow  (-378.5 B / -50.4%)

Cumulative reduction from the original baseline (5167ff412):
    linux/amd64:  3096.6 B/flow -> 1900.5 B/flow  (-1196.1 B / -38.6%)
    linux/arm64:  3145.7 B/flow -> 1835.0 B/flow  (-1310.7 B / -41.7%)

Dropped per-T symbols (200-flow eventbus binary):

  - .dict.Publisher[T]                   was 14,400 B (72 B/T)
  - type:.eq.Publisher[T]                was 11,832 B (58 B/T)
  - go:itab.*Publisher[T],publisher      was  8,000 B (40 B/T)
  - (*Publisher[T]).Close shape stencils collapsed to 1

Behavior is unchanged: BenchmarkBasicThroughput is within noise
(2018 -> 2038 ns/op at -benchtime=2s) and all eventbus tests pass.

Updates #12614

Change-Id: I61979c2bf95d2a711c2321e6e0b4b7d15980e9f5
Signed-off-by: James Tucker <james@tailscale.com>
2026-05-11 14:39:42 -07:00
James Tucker d72cde1a6b util/eventbus: move SubscriberFunc subscriber-interface impl to a non-generic core
Splits SubscriberFunc[T] into:

  - SubscriberFunc[T]: a thin user-facing facade that holds only a
    pointer to a non-generic core. It exposes Close() to user code,
    which forwards to the core.
  - subscriberFuncCore: a non-generic struct that owns all the
    subscriber state (stop flag, unregister, logf, slow timer,
    cached reflect.Type) and implements the bus's package-private
    subscriber interface. Its dispatch() invokes a closure
    captured at construction time that performs the
    vals.Peek().Event.(T) type assertion and runs the user
    callback on the unboxed value.

The bus's outputs map and subscriber-interface itab are
parameterized only by *subscriberFuncCore, not by T, eliminating
both the per-T itab and the per-T generic dictionary that
previously scaled with the number of subscribed event types.

Measured impact (util/eventbus/sizetest):

  total per-flow binary cost:
    linux/amd64:  3039.2 B/flow -> 2252.8 B/flow  (-786.4 B / -25.9%)
    linux/arm64:  3145.7 B/flow -> 2228.2 B/flow  (-917.5 B / -29.2%)

  SubscriberFunc per-receiver attribution:
    linux/amd64:   840.8 B/flow ->  300.8 B/flow  (-540.0 B / -64.2%)
    linux/arm64:   849.9 B/flow ->  303.8 B/flow  (-546.1 B / -64.3%)

Dropped per-T symbols (200-flow eventbus binary):

  - (*SubscriberFunc[T]).dispatch     was 26,639 B total (130 B/T)
  - (*SubscriberFunc[T]).subscribeType was  3,600 B total ( 18 B/T)
  - .dict.SubscriberFunc[T]            was 14,400 B total ( 72 B/T)
  - go:itab.*SubscriberFunc[T],...     was  9,600 B total ( 48 B/T)

Of the original 913 B/flow attributed to SubscriberFunc, 540 B/flow
is now gone, dropping the receiver to 300 B/flow.

Behavior is unchanged: BenchmarkBasicThroughput is within noise
(1955 -> 1941 ns/op on the test box) and all eventbus tests pass.

Updates #12614

Change-Id: I646b3b05fd8d95f9afead59bfd0f69cd18b7a709
Signed-off-by: James Tucker <james@tailscale.com>
2026-05-11 12:14:05 -07:00