WIP: rebase for 2026-05-18 #7
Reference in New Issue
Block a user
Delete Branch "rebase/2026-05-18"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Add a --headless flag to the Host.app Run subcommand for running macOS VMs without a GUI, enabling use from test frameworks. Key changes: - HostCli.swift: When --headless is set, run the VM via VMController + RunLoop.main.run() instead of NSApplicationMain. Using the RunLoop (not dispatchMain) is required because VZ framework callbacks depend on RunLoop sources. - VMController.swift: Add headless parameter to createVirtualMachine that configures a single socket-based NIC (no NAT NIC). This matches the NIC configuration used when creating/saving VMs, so saved state restoration works correctly. A NIC count mismatch causes VZ to silently fail to execute guest code. - TailMacConfigHelper.swift: Clean up socket network device logging. - Config.swift: Move VM storage from ~/VM.bundle to ~/.cache/tailscale/vmtest/macos/. - TailMac.swift: Fix dispatchMain→RunLoop.main.run() in the create command (same VZ RunLoop requirement). Updates #13038 Change-Id: Iea51c043aa92e8fc6257139b9f0e2e7677072fa2 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>Add an opt-in metrics.LabelMap tracking why patchifyPeer fails to convert a PeersChanged entry into a PeersChangedPatch. The stats are gated behind the TS_DEBUG_PATCHIFY_PEER_MISS envknob so there is zero overhead in normal operation. peerChangeDiff now takes an optional onFalse callback that is called with the field name on every non-patchable return path. When the envknob is off, nil is passed and replaced with a no-op at the top of peerChangeDiff. The resulting metric renders as: counter_patchify_miss{why="Hostinfo"} 2 counter_patchify_miss{why="peer_not_found"} 1170 Updates tailscale/corp#40088 Change-Id: I2d4b9074bf42ec03ab296c0629a54106bafa873e Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>Before: tka initialized at head 325557575a59525354484e4a534f494b4c4e56575435583737564b5036584c4d4c335534554255344c344c36484c5a444a323341 After: tka initialized at head 2UWWZYRSTHNJSOIKLNVWT5X77VKP6XLML3U4UBU4L4L6HLZDJ23A Printing the AUM hash as hex makes it difficult to compare to other AUM hashes; stringifying it will make it consistent with other printing. Updates #cleanup Change-Id: Ic1e23a9ce6a71a53cff7d2190f9fa06eb838ab89 Signed-off-by: Alex Chan <alexc@tailscale.com>Adds a CI check to keep opted-in directories' README.md files in sync with their package godoc. For now tsnet (and its sub-packages under tsnet/example) is the only opted-in tree. The list of directories lives in misc/genreadme/genreadme.go as defaultRoots, so CI and humans both just run `./tool/go run ./misc/genreadme` with no arguments. The check piggybacks on the existing go_generate job in test.yml and fails if any README.md is out of date, pointing the user at the same command. Along the way: - tempfork/pkgdoc now emits Markdown instead of plain text: headings become level-2 with no {#hdr-...} anchors, and [Symbol] doc links resolve to pkg.go.dev URLs, including for symbols in the current package (which the default Printer would otherwise emit as bare #Name fragments with no backing anchor in a README). Parsing no longer uses parser.ImportsOnly, so doc.Package knows the package's symbols and can resolve [Symbol] links at all. - genreadme also emits a pkg.go.dev Go Reference badge at the top of a library package's README; suppressed for package main. - tsnet/tsnet.go's package godoc is expanded in idiomatic godoc syntax — [Type], [Type.Method], reference-style [link]: URL definitions — rather than Markdown-flavored [text](url) or backtick-quoted identifiers, so that both pkg.go.dev and the generated README.md render cleanly from a single source. Fixes #19431 Fixes #19483 Fixes #19470 Change-Id: I8ca37e9e7b3bd446b8bfa7a91ac548f142688cb1 Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com> Signed-off-by: Walter Poupore <walterp@tailscale.com> Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>CmdName was re-opening the running executable and scanning it in 64KiB chunks for the Go modinfo markers on every call. The same modinfo is already parsed at startup and exposed via runtime/debug.ReadBuildInfo, so prefer that on non-Windows. Windows still takes the scanning path because its GUI-binary override keys off the on-disk executable name. benchstat of BenchmarkCmdName (Linux, before vs after): goos: linux goarch: amd64 pkg: tailscale.com/version cpu: Intel(R) Xeon(R) 6975P-C │ /tmp/old.txt │ /tmp/new.txt │ │ sec/op │ sec/op vs base │ CmdName-16 556045.5n ± 1% 825.6n ± 1% -99.85% (p=0.000 n=10) │ /tmp/old.txt │ /tmp/new.txt │ │ B/op │ B/op vs base │ CmdName-16 64.587Ki ± 0% 1.156Ki ± 0% -98.21% (p=0.000 n=10) │ /tmp/old.txt │ /tmp/new.txt │ │ allocs/op │ allocs/op vs base │ CmdName-16 8.000 ± 0% 7.000 ± 0% -12.50% (p=0.000 n=10) Fixes #19486 Change-Id: I925c5e28b64815a602459beb6c8dab8779339a6c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>Previously, handleLocalPackets intercepted traffic to the Tailscale service IP (100.100.100.100 / fd7a:115c:a1e0::53) only for an allow-list of ports: TCP 53/80/8080 and UDP 53. Any other port returned filter.Accept, letting the packet fall through to the ACL filter and wireguard-go, which would attempt a peer lookup. No peer owns the quad-100 AllowedIP, so after ~5s pendopen.go would log: open-conn-track: timeout opening ...; no associated peer node This is the common "conntrack error no peer found for 100.100.100.100:853" log spam seen in the wild (e.g. from systemd-resolved or another resolver speculatively trying DoT on quad-100). It also leaks quad-100 packets onto the tailnet. Remove the port allow-list so handleLocalPackets absorbs every quad-100 packet into netstack regardless of IP protocol or port. Traffic never reaches the conntrack / peer-routing layers. With the allow-list gone, acceptTCP needs a corresponding guard: on a quad-100 TCP port we don't serve, execution used to fall through to the isTailscaleIP case (quad-100 is in the tailscale IP range), which rewrote the dial target to 127.0.0.1:<port> and forwardTCP'd the connection to whatever happened to be listening on the host's loopback at that port. Add a hittingServiceIP case that RSTs cleanly instead, placed before the isTailscaleIP fallthrough. TestQuad100UnservedTCPPortDoesNotForward is a new integration test that injects a TCP SYN to 100.100.100.100:853 via handleLocalPackets, stubs forwardDialFunc, and asserts the dialer is not invoked; it catches regressions of the acceptTCP recursion/loopback-redirection case. Fixes #15796 Fixes #19421 Updates #3261 Updates #11305 Signed-off-by: James Tucker <james@tailscale.com>Add a Go benchmark that exercises a single tailnet client (a [tsnet.Server] running in the test process) against a synthetic large initial netmap and a stream of caller-driven peer add/remove deltas, all in-process. The harness is split in two parts: - tstest/largetailnet, a reusable package containing a [Streamer] that hijacks the map long-poll on a [testcontrol.Server] via the new AltMapStream hook, sends one initial MapResponse with N synthetic peers, and forwards caller-supplied delta MapResponses on the same stream. Helpers like MakePeer / AllocPeer build synthetic peers with unique IDs and addresses derived from the Tailscale ULA range. - tstest/largetailnet/largetailnet_test.go, BenchmarkGiantTailnet (headless tailscaled workload, no IPN bus subscriber) and BenchmarkGiantTailnetBusWatcher (GUI-client workload with one Notify subscriber attached). Both are gated on --actually-test-giant-tailnet (skipped by default), stand up an in-process testcontrol + tsnet.Server, let Up block until the initial N-peer netmap has been processed, then ResetTimer and run add+remove pairs via b.Loop. Per-delta sync is via a test-only [ipnlocal.LocalBackend.AwaitNodeKeyForTest] channel that closes once the just-added peer key appears in the netmap (no-watcher variant) or via bus-Notify drain (bus-watcher variant). To support the hijack, [testcontrol.Server] grows an AltMapStream hook and a small MapStreamWriter interface for benchmarks/stress tests that need to drive a controlled MapResponse sequence; the normal serveMap path is untouched when AltMapStream is nil. The streamer answers non-streaming "lite" map polls (which controlclient issues before the streaming long-poll to push HostInfo) with an empty MapResponse and returns immediately, so the streaming poll that follows is the one that gets the initial netmap. The benchmark is intended for before/after comparisons of netmap- and delta-handling changes targeted at large tailnets. CPU profiles on unmodified main show the expected O(N) hotspots: setControlClientStatusLocked / authReconfigLocked / userspaceEngine.Reconfig / setNetMapLocked, plus JSON encoding of the full Notify.NetMap to bus watchers (which dominates the BusWatcher variant). Median ms/op over 10 runs on unmodified main, by tailnet size N: N no-watcher bus-watcher 10000 32 166 50000 222 865 100000 504 1765 250000 1551 4696 Recommended invocation: go test ./tstest/largetailnet/ -run=^$ \ -bench='BenchmarkGiantTailnet(BusWatcher)?$' \ -benchtime=2000x -timeout=10m \ --actually-test-giant-tailnet \ --giant-tailnet-n=250000 \ -cpuprofile=/tmp/giant.cpu.pprof Updates #12542 Change-Id: I4f5b2bb271a36ba853d5a0ffe82054ef2b15c585 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>The Online bit in PeerStatus comes from control's last-known state and can lag reality, so gating "tailscale file cp" on it is both unreliable and pushes correctness onto the server. Just try the push directly. In runCp, when the target's PeerStatus says it's offline, no longer fail upfront; getTargetStableID returns the StableID anyway. Replace the static "is offline" warning with a 3-second timer armed for the first file: if the timer fires before peerAPI bytes have flowed, we print a warning to stderr. The wording depends on whether control reported the peer offline ("is reportedly offline; trying anyway") or online ("is not replying; trying anyway"). The warning is printed with a leading vt100 clear-line and a trailing newline so it doesn't get painted over by the progress redraw and so the next progress redraw lands on a fresh line below it. Both the timer disarm and the progress display now read from tailscaled's OutgoingFile.Sent (subscribed via WatchIPNBus) instead of the local-body counter. That's the difference between bytes-acked-by- local-tailscaled (what countingReader.n was measuring; useless for detecting an unreachable peer because for small files net/http buffers the entire body into the unix-socket conn before the peerAPI dial has even started) and bytes-pulled-toward-peerAPI (what tailscaled is actually doing, reflected in OutgoingFile.Sent). The previous code reported 100% within milliseconds for a 3 KiB file even when the peer was unreachable. Add --update-interval (default 250ms) to control the progress repaint cadence; zero or negative disables the progress display entirely. The printer now also stops repainting once it observes Sent at full size with a near-zero rate for >2s, so a stuck transfer doesn't keep clobbering whatever the rest of runCp is trying to print. Updates #18740 Change-Id: I189bd1c2cd8e094d372c4fee23114b1d2f8024b4 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>When --vmtest-web is set, Host.app is launched with --screenshot-port 0 to start a localhost HTTP server that captures the VZVirtualMachineView display. The Go test harness parses the SCREENSHOT_PORT=<port> line from stdout, then polls every 2 seconds for JPEG thumbnails and pushes them over WebSocket to the web dashboard. Clicking a screenshot thumbnail opens a full-resolution image proxied through the web UI's /screenshot/{node} endpoint. Screenshot events are excluded from the EventBus history (they're large and only the latest matters, stored in NodeStatus.Screenshot). Updates #13038 Change-Id: I9bc67ddd1cc72948b33c555d4be3d8db06a41f6d Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>DNSConfigresource (#19429) a29e42135bTwo cloud-platform nodes (e.g. sr-a and sr-b in TestSiteToSite) boot in parallel via errgroup and both call ensureCompiled and the inline image preparation block, racing to Begin() the same shared *Step (which is deduped by name in Env.Step). The second goroutine panics: panic: Step "Compile linux_amd64 binaries": Begin called in state running panic: Step "Prepare ubuntu-24.04 image": Begin called in state done ensureCompiled had a TOCTOU dedup attempt (released compileMu before doing the work, only added to the compiled set at the end), and image preparation had no dedup at all. Replace the compiled set with a per-key map[string]*sync.Once for each of compile and image preparation, so concurrent callers serialize on the Once and only the first executes Begin/work/End. Fixes commit02ffe5baa8. Updates #13038 Change-Id: If710bcc9e0aafebf0ad5b61553bae11458d976d7 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>Add a vmtest that brings up two gokrazy nodes A and B behind two One2OneNAT networks (so direct UDP works in both directions and any slowness can't be blamed on NAT traversal), establishes a WireGuard tunnel A → B with TSMP, then rotates B's disco key four times and asserts that the data plane recovers in both directions after each rotation. All pings are TSMP (the data-plane ping; disco pings would not exercise the WireGuard tunnel itself). The five pings: 1. A → B (initial; brings up the tunnel; 30s budget) 2. B → A after rotate (LocalAPI rotate-disco-key debug action) 3. A → B after rotate (LocalAPI) 4. B → A after restart (SIGKILL; gokrazy supervisor respawns) 5. A → B after restart (SIGKILL) Each post-rotation ping gets a 15-second budget. Two unavoidable multi-second waits dominate today: - The rotate-then-a→b phase takes ~10s on main because of LazyWG. After B's WantRunning bounce, B's wgengine resets its sentActivityAt/recvActivityAt maps and trims A out of the wireguard-go config as an "idle peer"; B only re-adds A on inbound activity, by which point A's first few TSMP packets have been silently dropped at B's tundev. The bradfitz/rm_lazy_wg branch removes that trimming entirely (verified locally: this phase drops to <100ms there). - The restart phases take ~5s for wireguard-go's RekeyTimeout handshake retry. After SIGKILL+respawn the first WG handshake init from the restarted node sometimes goes into the void (likely the brief peer-removed window in the receiver's two-step maybeReconfigWireguardLocked reconfig during which the peer is absent from wireguard-go), and wg-go's 5s+jitter retransmit timer is the next opportunity to retry. That retry succeeds and the staged TSMP packet flushes. Intrinsic to the protocol's retransmit policy. Once LazyWG is removed and the first-handshake-after-reconfig race is fixed, the budget should drop to 5s. Supporting changes: ipn/ipnlocal: DebugRotateDiscoKey now toggles WantRunning off and back on after rotating the disco key. magicsock.Conn.RotateDiscoKey only resets local disco state; without also dropping wireguard-go session keys, peers keep encrypting with their stale per-peer session against us until their rekey timer fires (WireGuard has no data-plane signaling to invalidate sessions). Bouncing WantRunning runs the engine through Reconfig(empty) → authReconfig, which drops every peer's WG session so the next packet either way triggers a fresh handshake. ipn/ipnlocal, ipn/localapi: add a debug-only "peer-disco-keys" LocalAPI action ([LocalBackend.DebugPeerDiscoKeys]) that returns a map[NodePublic]DiscoPublic from the current netmap. Tests reach it via [local.Client.DebugResultJSON]. We do not surface disco keys via [ipnstate.PeerStatus] because adding a non-comparable [key.DiscoPublic] field there breaks reflect-based test helpers (e.g. TestFilterFormatAndSortExitNodes' use of cmp.Diff), and general LocalAPI clients have no need for disco keys. Since the debug LocalAPI is gated behind the ts_omit_debug build tag, this endpoint is automatically stripped from small binaries. cmd/tta: add /restart-tailscaled handler (Linux-only, via /proc walk) to drive the SIGKILL phase. On gokrazy the supervisor respawns tailscaled within a second. tstest/integration/testcontrol: add Server.AllOnline. When set, every peer entry in MapResponses is marked Online=true. Several disco-key handling fast paths in controlclient and wgengine (removeUnwantedDiscoUpdates, removeUnwantedDiscoUpdatesFromFull NetmapUpdate, the wgengine tsmpLearnedDisco fast path) only fire for online peers; without this flag, tests exercising disco-key rotation only hit the offline-peer code paths, which mask issues and are several seconds slower in this scenario. Finer-grained per-node online tracking can be added later. tstest/natlab/vmtest: add Env.RotateDiscoKey, Env.RestartTailscaled, Env.PeerDiscoKey, Node.Name, an [AllOnline] EnvOption that plumbs through to testcontrol.Server.AllOnline, and an exported Env.Ping(from, to, type, timeout). Ping replaces the unexported helper so callers can specify both a ping type (PingDisco for warming peer state, PingTSMP for asserting end-to-end connectivity) and a deadline. PeerDiscoKey returns its LocalAPI error so callers inside tstest.WaitFor can retry transient failures rather than fataling the test. Updates #12639 Updates #13038 Change-Id: I3644f27fc30e52990ba25a3983498cc582ddb958 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>If a user passes --advertise-tags=foo,bar (with no colons in any segment), automatically prepend "tag:" client-side so it goes on the wire as "tag:foo,tag:bar". Segments that already contain a colon are left untouched and must be fully-qualified ("tag:foo"), which keeps the door open for future colon-bearing syntax. This was originally added incd07437ad(2020-10-28) and then reverted in 1be01ddc6 (2020-11-10) over forward-compatibility concerns. But then it was realized in 2026-04-29 that this was always safe for future extensiblity anyway (tags can't contain colons-- tag:foo:bar is invalid anyway, per the 2020 CheckTag restrictions). So if we wanted to perhaps some hypothetical --advertise-tags=tagset:setfoo or "group:foo", we'd still have syntax to do, as it can't conflict with tag:group:foo. Avery signed off on this on Slack: "Ok, I withdraw my objection to auto-qualifying tag names in advertise-tags and I hope I won't regret it :)" Updates #861 Change-Id: I06935b0d3ae909894c95c9c2e185b7d6a219ff32 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>Add two narrower accessors alongside the existing [LocalBackend.NetMap], with docs that distinguish their semantics: - NetMapNoPeers: cheap (returns the cached *netmap.NetworkMap with a possibly-stale Peers slice). For callers that only read non-Peers fields like SelfNode, DNS, PacketFilter, capabilities. - NetMapWithPeers: documented as returning an up-to-date Peers slice. For callers that genuinely need to iterate Peers or call PeerByXxx. Mark the existing NetMap deprecated and point readers at the two new accessors. NetMap, NetMapNoPeers, and NetMapWithPeers all currently return the same value (b.currentNode().NetMap()): this commit is a no-op behaviorally, just a renaming and migration of in-tree callers. A subsequent change in the same series will switch NetMapWithPeers to actually rebuild the Peers slice from the live per-node-backend peers map (O(N) per call), at which point the distinction between the two new accessors becomes load-bearing. Migrate in-tree callers to the appropriate accessor based on what fields they read: - NetMapNoPeers (most common): localapi handlers, peerapi accept, GetCertPEMWithValidity, web client noise request, doctor DNS resolver check, tsnet CertDomains/TailscaleIPs, ssh/tailssh SSH-policy/cap reads, several LocalBackend internals (isLocalIP, allowExitNodeDNSProxyToServeName, pauseForNetwork nil-check, serve config). - NetMapWithPeers: writeNetmapToDiskLocked (persist full netmap to disk for fast restart), PeerByTailscaleIP lookup. Tests still call the legacy NetMap; they'll see the deprecation warning but otherwise behave identically. Also add two pieces of plumbing the next change in this series will need, but which are already useful on their own: - [client/local.GetDebugResultJSON]: a generic [Client.DebugResultJSON] that decodes directly into a target type T, avoiding the marshal/unmarshal roundtrip callers otherwise need. - localapi "current-netmap" debug action: returns the current netmap (with peers) as JSON. Documented as debug-only — the netmap.NetworkMap shape is internal and may change without notice. This commit is part of a series breaking up a larger change for review; on its own it is a no-op refactor. Updates #12542 Change-Id: Idbb30707414f8da3149c44ca0273262708375b02 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>Add two narrow LocalAPI accessors so callers don't have to subscribe to the IPN bus and pull a full *netmap.NetworkMap just to read DNS-shaped fields: - GET /localapi/v0/cert-domains returns DNS.CertDomains. - GET /localapi/v0/dns-config returns the full tailcfg.DNSConfig. Migrate in-tree callers off the netmap-on-the-bus pattern: - kube/certs.waitForCertDomain still wakes on the IPN bus but now queries CertDomains via LocalClient.CertDomains rather than reading n.NetMap.DNS.CertDomains. The kube LocalClient interface and FakeLocalClient gain a CertDomains method. - cmd/tailscale dns status calls LocalClient.DNSConfig directly instead of opening a NotifyInitialNetMap watcher. - cmd/tailscale configure kubeconfig switches from a netmap watcher + serviceDNSRecordFromNetMap to LocalClient.DNSConfig + serviceDNSRecordFromDNSConfig. This is part of a series moving callers away from depending on the netmap traveling on the IPN bus, so the bus payload can shrink in a later change. Updates #12542 Change-Id: Ie10204e141d085fbac183b4cfe497226b670ad6c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>The purpose of this package is to test the iOS dependency closure, but it had drifted from the actual import list of the ipn-go-bridge package in the corp repo (the Go side of the iOS / macOS app). Update the imports to match ipn-go-bridge's GOOS=ios import list, adding many missing packages including wgengine/netstack, feature/{taildrop,syspolicy,condregister}, the util/syspolicy/* subpackages, types/{key,lazy,logid,netmap}, tsd, safesocket, util/{eventbus,must,set}, and several net/* and ipn/* packages. Drop two now-stale BadDeps entries (for now!): database/sql/driver and github.com/google/uuid are reached via wgengine/netstack -> github.com/prometheus-community/pro-bing, which netstack imports on darwin || ios for ICMP user-ping, so the iOS app already ships them. But we should fix that later. Updates #19633 Change-Id: Ic50779fdb195685a2e8ccd7c513eee91b0feeaf8 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>--report postureflag inupeac531da8eThe tailscale.com/wif package brings in the AWS SDK (github.com/aws/aws-sdk-go-v2/{config,sts,...} and github.com/aws/smithy-go) to support fetching ID tokens from AWS IMDS for workload identity federation. Until now, tsnet pulled this in unconditionally via feature/condregister/identityfederation, costing ~70 unwanted deps for every tsnet program whether or not it uses workload identity federation. These AWS SDK deps were originally removed from tsnet on 2025-09-29 by commit69c79cb9f("ipn/store, feature/condregister: move AWS + Kube store registration to condregister"). They were then accidentally added back on 2026-01-14 by commit6a6aa805d("cmd,feature: add identity token auto generation for workload identity", PR #18373) when the new wif package was wired into tsnet via feature/identityfederation. Drop the blanket import. tsnet programs that want workload identity federation now opt in with: import _ "tailscale.com/feature/identityfederation" The hook lookup in resolveAuthKey already uses GetOk and degrades gracefully when the feature isn't linked, so existing programs that don't use workload identity federation see no behavior change. The tailscale CLI still imports the condregister wrapper directly, so its behavior is also unchanged. Lock this in with TestDeps additions: tailscale.com/wif as a BadDep, plus substring checks in OnDep that fail on any github.com/aws/ or k8s.io/ dependency creeping back in. Also, switch cmd/gitops-pusher from the condregister wrapper to a direct import of feature/identityfederation: gitops-pusher's auth flow calls HookExchangeJWTForTokenViaWIF directly, so it shouldn't be subject to the ts_omit_identityfederation build tag. Updates #12614 Change-Id: I70599f2bdd4d3666b26a859d5b76caa5d6b94507 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>The (*SubscriberFunc[T]).dispatch method body — a ~40-line select loop with slow-subscriber timer, snapshot handling, ctx-cancel draining, and a CI stack-dump branch — was previously fully duplicated by the Go compiler for every distinct GC shape of T. None of that body actually depends on T except for the type assertion and the user callback invocation. This change moves the loop body into a non-generic dispatchFunc() helper, leaving (*SubscriberFunc[T]).dispatch as a tiny wrapper that: - performs the vals.Peek().Event.(T) type assertion - spawns the callback goroutine via `go runFuncCallback(s.read, t, callDone)` — a regular generic function call, not a closure, so that `go` binds the args to the goroutine's frame instead of allocating a closure on the heap. This preserves the zero-extra-allocation behavior of the original (*SubscriberFunc[T]).runCallback method. - resolves T's name via reflect.TypeFor[T]().String() (cached on the stack rather than recomputed on each %T formatting) - calls dispatchFunc with the callDone channel The %T formatting in the original logf calls is replaced with %s on the resolved name string, removing per-T fmt instantiations. A new BenchmarkBasicFuncThroughput is added alongside the existing BenchmarkBasicThroughput so per-event allocation behavior on the SubscribeFunc dispatch path is covered by the benchmark suite. Measured impact (util/eventbus/sizetest): SubscriberFunc per-flow attribution: linux/amd64: 912.5 B/flow -> 840.8 B/flow (-71.7 B/flow) linux/arm64: 917.5 B/flow -> 849.9 B/flow (-67.6 B/flow) The total per-flow size delta on amd64 dropped from 3,096.6 B to 3,039.2 B (-57 B/flow). The arm64 total stayed at 3,145.7 B because the linker's page-aligned section sizing absorbed the improvement on this binary; the symcost-attributed per-receiver number is the real signal. Behavior is unchanged: BenchmarkBasicThroughput stays at 0 allocs/op and BenchmarkBasicFuncThroughput holds at the same 2 allocs/op, 144 B/op as the prior eventbus implementation. All eventbus tests pass. Updates #12614 Change-Id: I85f933f50f58cd25bbfe5cc46bdda7aab22f0bf7 Signed-off-by: James Tucker <james@tailscale.com>Running all vmtests in tstest/natlab/vmtest locally was breaking later tasks in the queue. The goroutine dump on timeout had goroutines hanging around for 9 minutes, meaning that something was not getting cleaned up. goroutine 262 [select, 9 minutes]: gvisor.dev/gvisor/pkg/tcpip/adapters/gonet.commonRead({...}) Add a timeout of Now() to gonet TCP connections when the test ends (inspired by ServeUnixConn()), and wait for them to shut down before exiting the test. Updates #13038 Signed-off-by: Claus Lensbøl <claus@tailscale.com>Splits SubscriberFunc[T] into: - SubscriberFunc[T]: a thin user-facing facade that holds only a pointer to a non-generic core. It exposes Close() to user code, which forwards to the core. - subscriberFuncCore: a non-generic struct that owns all the subscriber state (stop flag, unregister, logf, slow timer, cached reflect.Type) and implements the bus's package-private subscriber interface. Its dispatch() invokes a closure captured at construction time that performs the vals.Peek().Event.(T) type assertion and runs the user callback on the unboxed value. The bus's outputs map and subscriber-interface itab are parameterized only by *subscriberFuncCore, not by T, eliminating both the per-T itab and the per-T generic dictionary that previously scaled with the number of subscribed event types. Measured impact (util/eventbus/sizetest): total per-flow binary cost: linux/amd64: 3039.2 B/flow -> 2252.8 B/flow (-786.4 B / -25.9%) linux/arm64: 3145.7 B/flow -> 2228.2 B/flow (-917.5 B / -29.2%) SubscriberFunc per-receiver attribution: linux/amd64: 840.8 B/flow -> 300.8 B/flow (-540.0 B / -64.2%) linux/arm64: 849.9 B/flow -> 303.8 B/flow (-546.1 B / -64.3%) Dropped per-T symbols (200-flow eventbus binary): - (*SubscriberFunc[T]).dispatch was 26,639 B total (130 B/T) - (*SubscriberFunc[T]).subscribeType was 3,600 B total ( 18 B/T) - .dict.SubscriberFunc[T] was 14,400 B total ( 72 B/T) - go:itab.*SubscriberFunc[T],... was 9,600 B total ( 48 B/T) Of the original 913 B/flow attributed to SubscriberFunc, 540 B/flow is now gone, dropping the receiver to 300 B/flow. Behavior is unchanged: BenchmarkBasicThroughput is within noise (1955 -> 1941 ns/op on the test box) and all eventbus tests pass. Updates #12614 Change-Id: I646b3b05fd8d95f9afead59bfd0f69cd18b7a709 Signed-off-by: James Tucker <james@tailscale.com>Mirrors the same refactor previously applied to SubscriberFunc: - Publisher[T]: a thin user-facing facade. Holds a pointer to a non-generic publisherCore and exposes Publish/Close/ShouldPublish. - publisherCore: a non-generic struct that owns the *Client back- pointer, stop flag, and cached reflect.Type. It implements the package-private publisher interface (publishType, Close). The bus's per-Client publisher set is set.Set[publisher] keyed on this single non-generic type. The publisher interface only exists to support diagnostic introspection (Debugger.PublishTypes returning the list of types a client publishes). Previously, satisfying that diagnostic-only interface forced *Publisher[T] to be the implementor and cost a per-T itab, generic dictionary, and equality function on every event type ever passed through Publish[T]. Moving the implementation to a non-generic core lets the diagnostic surface work unchanged while charging zero per-T cost for the diagnostic-driven generic interface. Publisher[T].Publish is also slimmed: the channel/select/stopFlag loop is now a non-generic publish() helper that takes the value as 'any'. The per-T body is reduced to forwarding the boxed value to the helper. Measured impact (util/eventbus/sizetest): total per-flow binary cost: linux/amd64: 2252.8 B/flow -> 1900.5 B/flow (-352.3 B / -15.6%) linux/arm64: 2228.2 B/flow -> 1835.0 B/flow (-393.2 B / -17.6%) Publisher per-receiver attribution: linux/amd64: 635.2 B/flow -> 369.6 B/flow (-265.6 B / -41.8%) linux/arm64: 751.7 B/flow -> 373.2 B/flow (-378.5 B / -50.4%) Cumulative reduction from the original baseline (5167ff412): linux/amd64: 3096.6 B/flow -> 1900.5 B/flow (-1196.1 B / -38.6%) linux/arm64: 3145.7 B/flow -> 1835.0 B/flow (-1310.7 B / -41.7%) Dropped per-T symbols (200-flow eventbus binary): - .dict.Publisher[T] was 14,400 B (72 B/T) - type:.eq.Publisher[T] was 11,832 B (58 B/T) - go:itab.*Publisher[T],publisher was 8,000 B (40 B/T) - (*Publisher[T]).Close shape stencils collapsed to 1 Behavior is unchanged: BenchmarkBasicThroughput is within noise (2018 -> 2038 ns/op at -benchtime=2s) and all eventbus tests pass. Updates #12614 Change-Id: I61979c2bf95d2a711c2321e6e0b4b7d15980e9f5 Signed-off-by: James Tucker <james@tailscale.com>The natlab vmtest suite (tstest/natlab/vmtest) and the integration nat tests are gated behind --run-vm-tests because they need KVM and are slow. Until now nothing in CI exercised them apart from a single canary TestEasyEasy run on every PR. Add .github/workflows/natlab-test.yml that runs the full opt-in suite on demand (workflow_dispatch), on PRs labeled "natlab", and on main every 12 hours via cron. The workflow has two phases: - "prepare" builds the gokrazy VM image, downloads the Ubuntu and FreeBSD cloud images once via the new natlabprep tool, and emits a dynamic JSON matrix of every TestX function it finds in the two opt-in packages. - "test" is a per-test matrix that depends on prepare. Each matrix job restores the shared caches and runs a single test, so adding a new TestFoo is automatically picked up on the next run without any workflow edits. Rename the existing natlab-integrationtest.yml to natlab-basic.yml since it's the small smoke variant (just TestEasyEasy on every PR); the new natlab-test.yml is the bigger suite. The job inside is renamed to EasyEasy for the same reason. Move the macOS arm64 host check from vmtest.Env.Start into vmtest.Env.AddNode so a test that adds a vmtest.MacOS node skips immediately on a non-macOS host, and add an explicit skipIfNotMacOSArm64 helper at the top of the two macOS-only tests so the platform requirement is obvious to readers. Quiet the takeAgentConnOne miss log in tstest/natlab/vnet by default (it was the overwhelming majority of bytes in CI logs, with no signal in healthy runs) and replace it with a periodic "still waiting" line that only fires after 10s, so a truly stuck agent connection still surfaces. Updates #13038 Change-Id: I4582098d8865200fd5a73a9b696942319ccf3bf0 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>Two changes that share the same intent of reducing per-T duplication in code that doesn't actually depend on T: 1. Hoist the non-generic portion of newSubscriberFunc[T] into a newSubscriberFuncCore() helper. The hoisted work is the time timer setup, the subscriberFuncCore allocation, and the unregister closure (which captures only the non-generic reflect.Type and *subscribeState). The generic body now does only the two T-bound things it has to: compute reflect.TypeFor[T] and create the dispatch closure. Effect on the per-shape-stencil body of newSubscriberFunc[T]: before: 523 B per shape (in synthetic test) after: 293 B per shape (-230 B per shape; -56% on this body) 2. Cache reflect.Type.String() once at construction (in core.typeName) instead of recomputing it every time the dispatch closure runs. The dispatch closure also now takes the *subscriberFuncCore directly rather than building an intermediate dispatchFuncState struct on every call. Effect on the dispatch closure body (newSubscriberFunc[T].func1): before: 581 B per shape after: 480 B per shape (-101 B per shape; -17%) Combined effect on tailscaled (linux/amd64): named-symbol savings via symcost: ~7 KB stripped binary delta: -8 KB (page-quantized) arm64 binary delta: 0 (page-quantized) cumulative reduction from baseline (5167ff412): linux/amd64: -110,592 bytes (-0.391%) linux/arm64: -131,072 bytes (-0.499%) Throughput is also improved by the typeName cache: BenchmarkBasic goes from 2018 ns/op to 1864 ns/op (-7.6%) because the dispatch hot path no longer allocates a string on every event. Updates #12614 Change-Id: Ib3a3d6796785e16506330ec034e1144580d467a3 Signed-off-by: James Tucker <james@tailscale.com>Replace the process-global Server.mu lookup in the packet send hot path with a global hashtriemap mirror of local clientSet entries. The authoritative clients map remains guarded by Server.mu; clientsAtomic is only a lock-free fast path for active local clients. Misses, stale inactive client sets, duplicate accounting, and mesh forwarding still fall back to lookupDestUncached. This avoids taking Server.mu for the common local active-client send path, at the cost of adding one global concurrent map that mirrors Server.clients for local peers. The benchmark uses four destination peers. The before run sets TS_DEBUG_DERP_DISABLE_PEER_HASHTRIE=true to force the old mutex lookup path; the after run uses the hashtrie fast path. goos: linux goarch: amd64 pkg: tailscale.com/derp/derpserver cpu: Intel(R) Xeon(R) 6975P-C │ before │ after │ │ sec/op │ sec/op vs base │ LookupDestHashTrie-16 176.050n ± 1% 1.904n ± 6% -98.92% (p=0.000 n=10) │ before │ after │ │ B/op │ B/op vs base │ LookupDestHashTrie-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ before │ after │ │ allocs/op │ allocs/op vs base │ LookupDestHashTrie-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal Updates #3560 (very indirectly, historically) Updates #19713 (as an alternative to that PR) Change-Id: Ifb72e5c9854ad00e938cd24c6ab9c27312f297e8 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>Brings Subscriber[T] in line with the same non-generic-core pattern already applied to SubscriberFunc[T] and Publisher[T]: - Renames subscriberFuncCore to subscriberCore and shares it between Subscriber[T] and SubscriberFunc[T]. Both typed facades hold a *subscriberCore plus their respective per-T delivery state (Subscriber: chan T; SubscriberFunc: nothing, the user callback is captured in the dispatch closure). - The bus's outputs map and subscriber-interface itab key on *subscriberCore for both subscriber kinds, so adding a new Subscribe[T] call site no longer pays a per-T itab, dictionary, or equality function for the subscriber-interface side. - Subscribe[T] now hoists the non-generic constructor portion into newSubscriberCore (timer setup, core allocation, cached type/typeName, unregister method-value), matching SubscribeFunc. The dispatch loop is intentionally NOT extracted to a non-generic helper for Subscriber[T], unlike SubscriberFunc[T]. The reason is the typed channel send 'case s.read <- t:' must appear lexically inside the select; the only way to lift it into a non-generic loop is to bridge typed and untyped via a per-event goroutine, which costs ~2.7x throughput on BenchmarkBasicThroughput. We keep dispatchTyped on the generic facade and accept the per-shape stencil cost as the cheaper alternative. Symbol-level effect on tailscaled (linux/amd64, measured via `go tool nm -size`): Before: (*Subscriber[T]).dispatch 2 shape stencils: 1,682 + 1,549 = 3,231 B 3 thin per-T wrappers: 124 B each = 372 B 2 deferwrap1 helpers: 62 B each = 124 B total: 3,727 B After: (*Subscriber[T]).dispatchTyped 2 shape stencils: 1,678 + 1,582 = 3,260 B 0 per-T wrappers (replaced by closure stored on core) 2 deferwrap1 helpers: 62 B each = 124 B total: 3,384 B dispatch path .text delta: -343 B (-9.2%) Per-shape stencils are ~1,600 B (.text body) + ~1,100 B (pclntab) = ~2,700 B each on production tailscaled. The shape count matches before/after (two distinct GC shapes for the Subscriber[T] event types in this binary). What changes is that the per-T thin wrappers are eliminated because Subscriber[T] no longer implements the subscriber interface directly. Whole-binary section deltas: .text: -2,304 B (includes the dispatch savings plus other small downstream effects) .rodata: +512 B (additional closure-type metadata) .gopclntab: -2,981 B (fewer per-T compiled functions => less metadata) Stripped tailscaled (linux/amd64): no change at the file level (the savings fall below the linker's section-alignment boundary). Unstripped builds shrink by ~2,900 B. Behavior is unchanged: BenchmarkBasicThroughput: 2,161 ns/op, 0 B/op, 0 allocs/op BenchmarkBasicFuncThroughput: 2,493 ns/op, 144 B/op, 2 allocs/op BenchmarkSubsThroughput: 3,727 ns/op, 0 B/op, 0 allocs/op Updates #12614 Change-Id: I97918ec68bd2cdb15958bbfd7687592b39663efe Signed-off-by: James Tucker <james@tailscale.com>Wire up the userspace networking primitives to the JS bridge so browser callers can initiate outbound and receive inbound traffic over the Tailscale network: - ipn.dial(network, addr) wraps a tsdial UserDial into a JS Conn with read/write/close/localAddr/remoteAddr. - ipn.listen(network, addr) wraps a netstack ListenPacket into a JS PacketConn with readFrom/writeTo/close/localAddr. - ipn.listenICMP("icmp4"|"icmp6"|"icmp") creates a raw ICMP endpoint on the underlying gVisor stack and wraps it as a PacketConn for sending/receiving ping traffic. To support listenICMP, netstack.Impl gains a Stack() accessor that returns the underlying *stack.Stack so jsIPN can call NewEndpoint with icmp.ProtocolNumber4/6. Binary I/O uses js.CopyBytesToGo / js.CopyBytesToJS to move bytes across the syscall/js boundary without base64 round-trips.Extend ipn.listen to also accept "tcp"/"tcp4"/"tcp6" and return a TCPListener bound to a netstack gonet.TCPListener. The listener exposes accept/close/addr like a Go net.Listener and additionally implements Symbol.asyncIterator so JS callers can write: for await (const conn of listener) { ... } The async iterator returns done when the listener is closed (via errors.Is(net.ErrClosed)) and rejects on any other accept error. Symbol-keyed properties are set via Reflect.set since syscall/js only exposes string-keyed Set.apitype.WaitingFile has no json tags so it serialised as {Name, Size}. Introduce a local jsWaitingFile struct with json:"name" / json:"size" so the JS side receives idiomatic camelCase property names. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>tka: keep the CompactionDefaults alongside the other limitsto WIP: rebase for 2026-05-18View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.