Commit Graph

2719 Commits

Author SHA1 Message Date
Alex Chan 0cb432ed84 all: update more references to Tailnet/Network Lock
Updates tailscale/corp#37904

Change-Id: I09e73b3248b9ddf86dafe33dfb621bd560f6596d
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-15 16:23:50 +01:00
Fernando Serboncini 2a06fb66d0 cmd/cloner: preserve nil-valued entries when cloning map (#19749)
The codegen path for map-of-slice-of-pointer fields, skipped
nil-valued entries. That dropped the key from the map.

This broke how dns.Config.Routes uses nil values sentinels.

Fixes #19730
Fixes #19732
Fixes #19746
Fixes #19744

Change-Id: Ic6400227f4ab21b3ca0e8c0eeecf9b83d145a9ab

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-05-14 10:30:59 -04:00
Brad Fitzpatrick 6b729795c3 derp/derpserver: use hashtriemap for peer lookup
Replace the process-global Server.mu lookup in the packet send hot path
with a global hashtriemap mirror of local clientSet entries. The
authoritative clients map remains guarded by Server.mu; clientsAtomic is
only a lock-free fast path for active local clients.

Misses, stale inactive client sets, duplicate accounting, and mesh
forwarding still fall back to lookupDestUncached. This avoids taking
Server.mu for the common local active-client send path, at the cost of
adding one global concurrent map that mirrors Server.clients for local
peers.

The benchmark uses four destination peers. The before run sets
TS_DEBUG_DERP_DISABLE_PEER_HASHTRIE=true to force the old mutex lookup
path; the after run uses the hashtrie fast path.

    goos: linux
    goarch: amd64
    pkg: tailscale.com/derp/derpserver
    cpu: Intel(R) Xeon(R) 6975P-C
                          │    before     │                after                │
                          │    sec/op     │   sec/op     vs base                │
    LookupDestHashTrie-16   176.050n ± 1%   1.904n ± 6%  -98.92% (p=0.000 n=10)

                          │   before   │             after              │
                          │    B/op    │    B/op     vs base            │
    LookupDestHashTrie-16   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
    ¹ all samples are equal

                          │   before   │             after              │
                          │ allocs/op  │ allocs/op   vs base            │
    LookupDestHashTrie-16   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
    ¹ all samples are equal

Updates #3560 (very indirectly, historically)
Updates #19713 (as an alternative to that PR)

Change-Id: Ifb72e5c9854ad00e938cd24c6ab9c27312f297e8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-12 16:08:16 -07:00
DeedleFake ad8ead9c94 cmd/tailscale/cli: add RunWithContext
Fixes #12778

Change-Id: If9f8b299cef0cb68f93b344845b5c6a5b7554d2c
Signed-off-by: DeedleFake <deedlefake@users.noreply.github.com>
2026-05-12 12:27:55 -07:00
Francois Marier ead5ce65a3 cmd/pgproxy: fix client TLS handshake timeout
There is a 30-second timeout set on client TLS connections but the handshake was
called on the wrong connection and so the timeout was never used in practice.

Signed-off-by: Francois Marier <francois@fmarier.org>
2026-05-11 11:12:11 -07:00
Brad Fitzpatrick 87a74c3aa2 tsnet: make workload identity federation opt-in
The tailscale.com/wif package brings in the AWS SDK
(github.com/aws/aws-sdk-go-v2/{config,sts,...} and github.com/aws/smithy-go)
to support fetching ID tokens from AWS IMDS for workload identity
federation. Until now, tsnet pulled this in unconditionally via
feature/condregister/identityfederation, costing ~70 unwanted deps for
every tsnet program whether or not it uses workload identity federation.

These AWS SDK deps were originally removed from tsnet on 2025-09-29 by
commit 69c79cb9f ("ipn/store, feature/condregister: move AWS + Kube
store registration to condregister"). They were then accidentally added
back on 2026-01-14 by commit 6a6aa805d ("cmd,feature: add identity
token auto generation for workload identity", PR #18373) when the new
wif package was wired into tsnet via feature/identityfederation.

Drop the blanket import. tsnet programs that want workload identity
federation now opt in with:

    import _ "tailscale.com/feature/identityfederation"

The hook lookup in resolveAuthKey already uses GetOk and degrades
gracefully when the feature isn't linked, so existing programs that
don't use workload identity federation see no behavior change. The
tailscale CLI still imports the condregister wrapper directly, so its
behavior is also unchanged.

Lock this in with TestDeps additions: tailscale.com/wif as a BadDep,
plus substring checks in OnDep that fail on any github.com/aws/ or
k8s.io/ dependency creeping back in.

Also, switch cmd/gitops-pusher from the condregister wrapper to a
direct import of feature/identityfederation: gitops-pusher's auth flow
calls HookExchangeJWTForTokenViaWIF directly, so it shouldn't be
subject to the ts_omit_identityfederation build tag.

Updates #12614

Change-Id: I70599f2bdd4d3666b26a859d5b76caa5d6b94507
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-06 18:43:45 -07:00
Tom Proctor b74eeda055 cmd/testwrapper: print unit for package duration (#19663)
Include the unit (s) when printing the time taken to test each package.

Updates #cleanup

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2026-05-06 22:31:48 +01:00
Alex Chan eac531da8e cmd/tailscale/cli: unhide --report posture flag in up
This was originally hidden during the beta period in both `up` and `set`,
then when device posture went GA we unhid the flag in `set` but not in
`up`.

This is confusing for users, because an error message can direct them to
run `tailscale up` with this flag if they've set it previously, but the
help text won't tell them what it does.

Updates #5902
Updates #17972

Change-Id: I9a31946f4b3bb411feed0f5a6449d7ff9a5ba9d3
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-05 10:12:36 +01:00
Brad Fitzpatrick 883d4fd2cd wgengine/netstack, net/ping: stop using pro-bing and use our net/ping instead
Fixes #19633
Fixes #13760

Change-Id: I0fa9423523a3a0fb1dfcde57de0f26e51723ff97
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-04 14:05:24 -07:00
Brad Fitzpatrick 9bb7ca6116 cmd/vet/lowerell, drive/driveimpl: forbid variables named "l" or "I"
Add a new vet checker that rejects variables, parameters, named
return values, receivers, range/type-switch bindings, type
parameters, struct fields, and constants named "l" (lowercase ell)
or "I" (uppercase i). Both are hard to distinguish from the digit
"1" and from each other in too many fonts.

Rename the two pre-existing struct fields named "l" (both of type
net.Listener) in drive/driveimpl/drive_test.go to "ln", matching the
convention used elsewhere for net.Listener locals.

Rename the test-fixture struct fields "I" (single int label) to
"Int" in metrics/multilabelmap_test.go and util/deephash/deephash_test.go,
preserving the "first letters of types" convention used alongside
neighboring fields like I8/I16/U/U8.

Also teach pkgdoc_test.go to skip testdata/ directories, which
the go tool ignores; they are not real packages.

Fixes #19631

Change-Id: I71ad2fa990705f7a070406ebcdb8cefa7487d849
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-04 14:03:28 -07:00
Tom Meadows ee10f9881c cmd/k8s-operator: add authkey reissuing to recorder reconciler (#19556)
also fixes memory leak with authKeyReissuing map on ProxyGroup
reconciler authkey reissue.

Updates #19311

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-05-01 18:26:55 +01:00
Andrew Lytvynov f15a4f4416 client/web: move API permission checks into handlers (#19576)
There are only a couple endpoints that check peer capabilities. Keeping
permission checks with the code that assumes they were performed, rather
than with the routing layer, feels easier to reason about.

Check that the caller is actually a peer and pass their capabilities via
a context value for handlers that want to check them.

Along with this, simplify the helper handler wrappers that are not
needed for most of the endpoints.

Updates #40851

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2026-05-01 09:01:53 -07:00
Brad Fitzpatrick bbcb8650d4 cmd/tailscale/cli: fetch netmap via current-netmap debug action
Stop opening an IPN bus subscription with NotifyInitialNetMap purely to
read the current netmap once. Use the LocalAPI debug current-netmap
action (added in 159cf8707) instead, which returns the current netmap
synchronously without subscribing to the bus.

Updates #12542

Change-Id: I8aa2096d65aaea4dfe62634f03ce06b5470e0e51
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-01 07:53:51 -07:00
Brad Fitzpatrick 4c3ed5ab32 all: migrate code off Notify.NetMap to Notify.SelfChange
Move tailscaled's in-tree reactive users from of IPN bus Notify.NetMap
updates to the narrower Notify.SelfChange signal introduced earlier in
this series. Consumers that need additional state (peers, DNS config,
etc.) fetch it on demand via the LocalAPI.

It is a step toward the larger goal of not fanning Notify.NetMap out
to every bus watcher on Linux/non-GUI hosts.

A future change stops sending Notify.NetMap entirely on Linux and
non-GUI platforms. (eventually once macOS/iOS/Windows migrate to the
upcoming new Notify APIs, we'll remove ipn.Notify.NetMap entirely)

Updates #12542

Change-Id: I51ea9d86bdca1909d6ac0e7d5bd3934a3a4e8516
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-01 06:51:40 -07:00
Brad Fitzpatrick 9f343fdc0c client/local, ipn/localapi, all: add CertDomains and DNSConfig accessors
Add two narrow LocalAPI accessors so callers don't have to subscribe to
the IPN bus and pull a full *netmap.NetworkMap just to read DNS-shaped
fields:

  - GET /localapi/v0/cert-domains returns DNS.CertDomains.
  - GET /localapi/v0/dns-config returns the full tailcfg.DNSConfig.

Migrate in-tree callers off the netmap-on-the-bus pattern:

  - kube/certs.waitForCertDomain still wakes on the IPN bus but now
    queries CertDomains via LocalClient.CertDomains rather than
    reading n.NetMap.DNS.CertDomains. The kube LocalClient interface
    and FakeLocalClient gain a CertDomains method.
  - cmd/tailscale dns status calls LocalClient.DNSConfig directly
    instead of opening a NotifyInitialNetMap watcher.
  - cmd/tailscale configure kubeconfig switches from a netmap watcher
    + serviceDNSRecordFromNetMap to LocalClient.DNSConfig +
    serviceDNSRecordFromDNSConfig.

This is part of a series moving callers away from depending on the
netmap traveling on the IPN bus, so the bus payload can shrink in a
later change.

Updates #12542

Change-Id: Ie10204e141d085fbac183b4cfe497226b670ad6c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-30 13:50:46 -07:00
Brad Fitzpatrick 92179b1fc7 cmd/hello: split server into helloserver package
Move the template, request handler, and HTTP/HTTPS server wiring out
of package main and into a new cmd/hello/helloserver package so the
server can be embedded in other binaries. The main package now only
constructs a helloserver.Server with the production addresses and
calls Run.

While here, drop the -http, -https, and -test-ip flags along with the
dev-mode template and fake-data fallbacks they enabled; the binary is
only run in production.

Updates tailscale/corp#32398

Change-Id: Id1d38b981733334cafc596021130f36e1c1eed67
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-30 08:40:55 -07:00
David Bond 644c3224e9 cmd/{containerboot,k8s-operator}: don't return pointers to maps (#19593)
This commit modifies the usage of the `egressservices.Configs` type
within containerboot and the k8s operator.

Originally it was being thrown around as a pointer which is not required
as maps are already pointers under the hood.

Signed-off-by: David Bond <davidsbond93@gmail.com>
2026-04-30 16:11:00 +01:00
Brad Fitzpatrick 815bb291c9 cmd/tailscale/cli: allow tag without "tag:" prefix in 'tailscale up'
If a user passes --advertise-tags=foo,bar (with no colons in any
segment), automatically prepend "tag:" client-side so it goes on the
wire as "tag:foo,tag:bar". Segments that already contain a colon are
left untouched and must be fully-qualified ("tag:foo"), which keeps
the door open for future colon-bearing syntax.

This was originally added in cd07437ad (2020-10-28) and then reverted
in 1be01ddc6 (2020-11-10) over forward-compatibility concerns. But
then it was realized in 2026-04-29 that this was always safe for
future extensiblity anyway (tags can't contain colons-- tag:foo:bar is
invalid anyway, per the 2020 CheckTag restrictions). So if we wanted
to perhaps some hypothetical --advertise-tags=tagset:setfoo or "group:foo",
we'd still have syntax to do, as it can't conflict with tag:group:foo.

Avery signed off on this on Slack: "Ok, I withdraw my objection to
auto-qualifying tag names in advertise-tags and I hope I won't regret
it :)"

Updates #861

Change-Id: I06935b0d3ae909894c95c9c2e185b7d6a219ff32
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-30 07:13:48 -07:00
Brad Fitzpatrick 15cba0a3f6 tstest/natlab/vmtest: add TestDiscoKeyChange
Add a vmtest that brings up two gokrazy nodes A and B behind two
One2OneNAT networks (so direct UDP works in both directions and any
slowness can't be blamed on NAT traversal), establishes a WireGuard
tunnel A → B with TSMP, then rotates B's disco key four times and
asserts that the data plane recovers in both directions after each
rotation. All pings are TSMP (the data-plane ping; disco pings would
not exercise the WireGuard tunnel itself).

The five pings:

  1. A → B  (initial; brings up the tunnel; 30s budget)
  2. B → A  after rotate (LocalAPI rotate-disco-key debug action)
  3. A → B  after rotate (LocalAPI)
  4. B → A  after restart (SIGKILL; gokrazy supervisor respawns)
  5. A → B  after restart (SIGKILL)

Each post-rotation ping gets a 15-second budget. Two unavoidable
multi-second waits dominate today:

  - The rotate-then-a→b phase takes ~10s on main because of LazyWG.
    After B's WantRunning bounce, B's wgengine resets its
    sentActivityAt/recvActivityAt maps and trims A out of the
    wireguard-go config as an "idle peer"; B only re-adds A on
    inbound activity, by which point A's first few TSMP packets
    have been silently dropped at B's tundev. The
    bradfitz/rm_lazy_wg branch removes that trimming entirely
    (verified locally: this phase drops to <100ms there).

  - The restart phases take ~5s for wireguard-go's RekeyTimeout
    handshake retry. After SIGKILL+respawn the first WG handshake
    init from the restarted node sometimes goes into the void
    (likely the brief peer-removed window in the receiver's
    two-step maybeReconfigWireguardLocked reconfig during which
    the peer is absent from wireguard-go), and wg-go's 5s+jitter
    retransmit timer is the next opportunity to retry. That retry
    succeeds and the staged TSMP packet flushes. Intrinsic to the
    protocol's retransmit policy.

Once LazyWG is removed and the first-handshake-after-reconfig race
is fixed, the budget should drop to 5s.

Supporting changes:

  ipn/ipnlocal: DebugRotateDiscoKey now toggles WantRunning off and
  back on after rotating the disco key. magicsock.Conn.RotateDiscoKey
  only resets local disco state; without also dropping wireguard-go
  session keys, peers keep encrypting with their stale per-peer
  session against us until their rekey timer fires (WireGuard has no
  data-plane signaling to invalidate sessions). Bouncing WantRunning
  runs the engine through Reconfig(empty) → authReconfig, which
  drops every peer's WG session so the next packet either way
  triggers a fresh handshake.

  ipn/ipnlocal, ipn/localapi: add a debug-only "peer-disco-keys"
  LocalAPI action ([LocalBackend.DebugPeerDiscoKeys]) that returns
  a map[NodePublic]DiscoPublic from the current netmap. Tests reach
  it via [local.Client.DebugResultJSON]. We do not surface disco
  keys via [ipnstate.PeerStatus] because adding a non-comparable
  [key.DiscoPublic] field there breaks reflect-based test helpers
  (e.g. TestFilterFormatAndSortExitNodes' use of cmp.Diff), and
  general LocalAPI clients have no need for disco keys. Since the
  debug LocalAPI is gated behind the ts_omit_debug build tag, this
  endpoint is automatically stripped from small binaries.

  cmd/tta: add /restart-tailscaled handler (Linux-only, via /proc walk)
  to drive the SIGKILL phase. On gokrazy the supervisor respawns
  tailscaled within a second.

  tstest/integration/testcontrol: add Server.AllOnline. When set,
  every peer entry in MapResponses is marked Online=true. Several
  disco-key handling fast paths in controlclient and wgengine
  (removeUnwantedDiscoUpdates, removeUnwantedDiscoUpdatesFromFull
  NetmapUpdate, the wgengine tsmpLearnedDisco fast path) only fire
  for online peers; without this flag, tests exercising disco-key
  rotation only hit the offline-peer code paths, which mask issues
  and are several seconds slower in this scenario. Finer-grained
  per-node online tracking can be added later.

  tstest/natlab/vmtest: add Env.RotateDiscoKey,
  Env.RestartTailscaled, Env.PeerDiscoKey, Node.Name, an
  [AllOnline] EnvOption that plumbs through to
  testcontrol.Server.AllOnline, and an exported
  Env.Ping(from, to, type, timeout). Ping replaces the unexported
  helper so callers can specify both a ping type (PingDisco for
  warming peer state, PingTSMP for asserting end-to-end
  connectivity) and a deadline. PeerDiscoKey returns its LocalAPI
  error so callers inside tstest.WaitFor can retry transient
  failures rather than fataling the test.

Updates #12639
Updates #13038

Change-Id: I3644f27fc30e52990ba25a3983498cc582ddb958
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 12:58:00 -07:00
Alex Valiushko 01d0bdd253 cmd/derper,derp: add metrics for rate limit hits (#19560)
Expvars track count of rate limiters exceeding their threshold.
Covers (1) global rate limiter and (2) total of local rate limiters.

Also publish optional rate-limit metrics during ExpVar() call
if -rate-config is specified. Fixes current rate-limit metrics
being published outside of "derp" in /debug/vars.

Updates tailscale/corp#38509

Change-Id: Ic7f5a1e890d0d7d3d7b679daa4b5f8926a6a6964
Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
2026-04-29 10:29:09 -07:00
Brad Fitzpatrick 02ffe5baa8 tstest/natlab/vmtest: add macOS VM snapshot caching for fast test starts
Cache a pre-booted macOS VM snapshot on disk so subsequent test runs
restore from the snapshot instead of cold-booting. The snapshot is keyed
by the Tart base image digest and a code version constant
(macOSSnapshotCodeVersion); bumping either invalidates the cache.

Snapshot preparation (one-time):
- Boot the Tart base image with a NAT NIC (--nat-nic flag)
- Wait for SSH, compile and install cmd/tta as a LaunchDaemon
- TTA polls the host via AF_VSOCK for an IP assignment; during prep
  the host replies "wait"
- Disconnect NIC, save VM state via SIGINT

Test fast path (cached, ~7s to agent connected):
- APFS clone the snapshot, write test-specific config.json
- Launch Host.app with --disconnected-nic --attach-network --assign-ip
- VZ restores from SaveFile.vzvmsave (~5s with 4GB RAM)
- TTA's vsock poll gets the IP config, sets static IP via ifconfig
  (bypasses DHCP entirely), switches driver addr to the IP directly
  (bypasses DNS), and resets the dial context so the reverse-dial
  reconnects immediately
- TTA agent connects to test driver within ~2s of IP assignment

Key optimizations:
- 4GB RAM instead of 8GB: halves SaveFile.vzvmsave (1.4GB vs 2.4GB),
  halves restore time (5.5s vs 11s)
- AF_VSOCK IP assignment: bypasses macOS DHCP (~5-7s saved)
- Direct IP dial: bypasses DNS resolution for test-driver.tailscale
- Dial context reset: cancels stale in-flight dials from snapshot
- Kill instead of SIGINT for test VM cleanup (no state save needed)
- Parallel VM launches

Also:
- Add TestDriverIPv4/TestDriverPort constants to vnet
- Add --nat-nic and --assign-ip flags to Host.app
- Fix SIGINT handler: retain DispatchSource globally, use dispatchMain()
- Add vsock listener (port 51011) to Host.app for IP config protocol
- Add disconnectNetwork() to VMController for clean snapshot state
- Fix Makefile: set -o pipefail so xcodebuild failures aren't swallowed

Updates #13038

Change-Id: Icbab73b57af7df3ae96136fb49cda2536310f31b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 08:17:13 -07:00
David Bond a29e42135b cmd/k8s-operator: add nodeSelector to DNSConfig resource (#19429)
This commit modifies the `DNSConfig` resource to allow customisation of
the `spec.nodeSelector` field in the nameserver pods.

Closes: https://github.com/tailscale/tailscale/issues/19419

Signed-off-by: David Bond <davidsbond93@gmail.com>
2026-04-29 15:56:33 +01:00
Noel O'Brien 40088602c9 cmd/hello: remove hello.ipn.dev (#19567)
Fixes #19566

Signed-off-by: Noel O'Brien <noel@tailscale.com>
2026-04-28 17:54:29 -07:00
Brad Fitzpatrick ec7b11d986 tstest/natlab/vmtest, cmd/tta: add TestTaildrop
Add a vmtest that brings up two Ubuntu nodes, each behind its own
EasyNAT, joined to the tailnet. The sender pushes a small file via
"tailscale file cp" and the receiver fetches it via "tailscale file
get --wait", asserting that the filename and contents round-trip
unchanged.

To make Taildrop work in vmtest, three small pieces were needed:

The Linux/FreeBSD cloud-init now starts tailscaled with --statedir as
well as --state=mem:, so the daemon has a VarRoot to host Taildrop's
incoming-files directory. State itself remains in-memory (so nothing
persists across reboots); only the var-root scratch space is on disk.

vmtest.New grows a variadic EnvOption parameter and a SameTailnetUser
helper. When the option is passed, Start sets AllNodesSameUser=true
on the embedded testcontrol.Server. Cross-node Taildrop requires the
sender and receiver to share a Tailnet user (or have an explicit
PeerCapabilityFileSharingTarget granted between them, which we don't
plumb here), so TestTaildrop opts in. Existing tests don't.

cmd/tta gains /taildrop-send and /taildrop-recv handlers that wrap
"tailscale file cp" and "tailscale file get --wait", plus
Env.SendTaildropFile and Env.RecvTaildropFile helpers in vmtest that
drive them.

Updates #13038

Change-Id: I8f5f70f88106e6e2ee07780dd46fe00f8efcfdf1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-28 12:27:55 -07:00
Brad Fitzpatrick 4b8e0ede6d tstest/natlab/{vmtest,vnet}, cmd/tta: add TestMullvadExitNode
Add a vmtest that brings up a Tailscale client, an Ubuntu VM acting
as a Mullvad-style plain-WireGuard exit node, and a non-Tailscale
webserver, each on its own NAT'd vnet network with a distinct WAN
IP. The test exercises Tailscale's IsWireGuardOnly peer code path:
the way the control plane wires Mullvad exit nodes into a client's
netmap, including the per-client SelfNodeV4MasqAddrForThisPeer
source-IP rewrite that lets a Tailscale CGNAT IP egress through a
plain-WireGuard tunnel that has no idea what Tailscale is.

The mullvad VM doesn't run wireguard-tools or kernel WireGuard;
instead, a new TTA endpoint /wg-server-up creates a real Linux TUN
named wg0, drives it with wireguard-go (already vendored), and
configures the kernel side (ip addr/up, ip_forward, iptables NAT
MASQUERADE) so decrypted traffic from the peer egresses with the
mullvad VM's WAN IP. Userspace vs kernel WireGuard makes no
difference on the wire — what's being tested is Tailscale's
plain-WireGuard exit-node code path, not the kernel module — and
this lets the test avoid downloading and installing .deb packages
inside the VM.

Adds Env.BringUpMullvadWGServer (calls /wg-server-up, returns the
generated WG public key as a key.NodePublic), Env.SetExitNodeIP
(EditPrefs ExitNodeIP directly, for exit nodes whose IPs aren't
discoverable via TTA), Env.ControlServer (exposes the underlying
testcontrol.Server so tests can UpdateNode / SetMasqueradeAddresses
to inject custom peers), and Env.Status (fetches a node's tailscale
status, used to read the client's pubkey so we can pin it as the
WG server's only allowed peer).

The test verifies that the webserver's echoed source IP is the
client's WAN with no exit node selected, the mullvad VM's WAN with
the WG-only peer selected as exit, and the client's WAN again after
clearing.

Updates #13038

Change-Id: I5bac4e0d832f05929f12cb77fa9946d7f5fb5ef1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-28 11:31:48 -07:00
Brad Fitzpatrick f7f8b0a0a5 cmd/tailscale/cli: drive "file cp" progress and offline warning from peerAPI
The Online bit in PeerStatus comes from control's last-known state and
can lag reality, so gating "tailscale file cp" on it is both unreliable
and pushes correctness onto the server. Just try the push directly.

In runCp, when the target's PeerStatus says it's offline, no longer
fail upfront; getTargetStableID returns the StableID anyway. Replace
the static "is offline" warning with a 3-second timer armed for the
first file: if the timer fires before peerAPI bytes have flowed, we
print a warning to stderr. The wording depends on whether control
reported the peer offline ("is reportedly offline; trying anyway") or
online ("is not replying; trying anyway"). The warning is printed with
a leading vt100 clear-line and a trailing newline so it doesn't get
painted over by the progress redraw and so the next progress redraw
lands on a fresh line below it.

Both the timer disarm and the progress display now read from
tailscaled's OutgoingFile.Sent (subscribed via WatchIPNBus) instead of
the local-body counter. That's the difference between bytes-acked-by-
local-tailscaled (what countingReader.n was measuring; useless for
detecting an unreachable peer because for small files net/http buffers
the entire body into the unix-socket conn before the peerAPI dial has
even started) and bytes-pulled-toward-peerAPI (what tailscaled is
actually doing, reflected in OutgoingFile.Sent). The previous code
reported 100% within milliseconds for a 3 KiB file even when the peer
was unreachable.

Add --update-interval (default 250ms) to control the progress repaint
cadence; zero or negative disables the progress display entirely. The
printer now also stops repainting once it observes Sent at full size
with a near-zero rate for >2s, so a stuck transfer doesn't keep
clobbering whatever the rest of runCp is trying to print.

Updates #18740

Change-Id: I189bd1c2cd8e094d372c4fee23114b1d2f8024b4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-28 11:03:58 -07:00
Brad Fitzpatrick 88cb6f58f8 tool/updateflakes, cmd/nardump: replace update-flake.sh with Go tool
Consolidate go.mod.sri and go.toolchain.rev.sri into a single
flakehashes.json file at the repo root, owned by a new Go program at
tool/updateflakes. The JSON is consumed by flake.nix via
builtins.fromJSON and by any future Go code via the FlakeHashes
struct that defines its schema.

Each block records its input fingerprint alongside the SRI it
produced: the goModSum (a sha256 over go.mod and go.sum) for the
vendor block, and the literal rev string from go.toolchain.rev for
the toolchain block. updateflakes regenerates a block only when its
recorded fingerprint disagrees with the current input.

Doing the gating by content rather than file mtimes avoids the usual
mtime hazards across git checkouts, clones, and merges. It also
means re-runs with no input changes are essentially free, and a
re-run that touches only one input pays only for that one block.

The two blocks have no shared state -- vendor invokes go mod vendor
into one tempdir, toolchain fetches and extracts a tarball into
another -- so they run concurrently via errgroup. Cold time is
bounded by the slower of the two rather than their sum.

Also takes the opportunity to fold the toolchain fetch into a single
curl|tar pipeline (no intermediate .tar.gz on disk).

Split cmd/nardump into a thin package main and a new package nardump
library at cmd/nardump/nardump that holds the NAR encoder and SRI
helper. tool/updateflakes imports the library directly rather than
building and exec'ing the nardump binary at runtime. The library
uses fs.ReadLink (Go 1.25+) instead of os.Readlink, so it no longer
requires the caller to chdir into the FS root for symlink targets to
resolve. WriteNAR now wraps its writer in a bufio.Writer internally
(unless the caller already passed one) and flushes on return, so
callers don't pay for tiny writes against slow underlying writers.

The cache-busting line in flake.nix and shell.nix is known to live
at end of file, so updateCacheBust walks the lines in reverse.

make tidy timings on this machine, before: ~14s every run.
After:

  warm (no input changes):       0.05s
  vendor block stale only:       1.4s
  toolchain block stale only:    5.0s
  cold (no flakehashes.json):    5.0s

Updates #6845

Change-Id: I0340608798f1614abf147a491bf7c68a198a0db4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-28 10:18:32 -07:00
Daniel Pañeda 7735b15de3 cmd/k8s-operator: truncate long label values in metrics resources (#18895)
* cmd/k8s-operator: truncate long label values in metrics resources

Kubernetes label values have a 63-character limit, but resource names
can be up to 253 characters. When a Service or Ingress with a long
name is exposed via Tailscale, the operator fails to reconcile because
it uses the parent resource name directly as label values on metrics
Services.

Truncate label values that may exceed the limit by keeping the first
54 characters and appending a SHA256-based hash suffix to preserve
uniqueness.

Fixes #18894

Signed-off-by: Daniel Pañeda <daniel.paneda@clickhouse.com>
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>

* cmd/k8s-operator: move TruncateLabelValue to shared k8s-operator package

Move the label truncation helper to k8s-operator/utils.go so it can be
reused by other components that need to produce valid Kubernetes labels.

Signed-off-by: Daniel Pañeda <daniel.paneda@clickhouse.com>
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>

* cmd/k8s-operator: truncate long domain label values in cert resources

Applies TruncateLabelValue to certResourceLabels in order to prevent API
server validation failures. This covers both the HA Ingress and kube-apiserver
proxy reconcilers, as both flow through certResourceLabels.

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>

* cmd/k8s-operator: remove empty metrics_resources_test.go, use hyphens in test names to satisfy go vet

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>

---------

Signed-off-by: Daniel Pañeda <daniel.paneda@clickhouse.com>
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-04-28 14:11:59 +01:00
Will Norris 2d85f37f39 client/systray: support several different color themes
Currently we only have a dark theme icon with white and grey dots over
a black background. For some desktops, a logo with black and grey dots
over a white background might be preferable. And for desktops where the
bar is *almost* black or white, but not quite, an option to render the
logo with dots only and no background can look really nice.

Add a new -theme flag to the systray command with the default staying
the same as it is today.

Updates #18303

Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
2026-04-27 18:54:14 -07:00
Brad Fitzpatrick 5c1738fd56 tstest/natlab/{vmtest,vnet}, cmd/tta: add TestExitNode
Add a vmtest TestExitNode that brings up a client, two exit nodes, and a
non-Tailscale webserver, each on its own NAT'd vnet network with a
distinct WAN IP. The test cycles the client's exit node setting between
off, exit1, and exit2 and asserts that the webserver echoes the expected
post-NAT source IP for each.

Three pieces were needed to make this work:

vnet now forwards TCP between simulated networks at the packet level,
mirroring the existing UDP path. When a guest VM sends TCP to another
simulated network's WAN IP, the source network's gateway rewrites src
via doNATOut and routeTCPPacket hands the packet off to the destination
network, which rewrites dst via doNATIn and writes the rewritten frame
onto the destination LAN. The TCP stacks of the two guest VM kernels
talk end-to-end; vnet just NATs the IP/port headers in flight, so all
TCP semantics (handshakes, options, sequence numbers, payload) are
preserved without a gvisor TCP termination in the middle. Adds a
focused TestInterNetworkTCP that exercises this path without any
Tailscale machinery.

cmd/tta binds its outbound dial to the default route's interface using
SO_BINDTODEVICE. Without that, the moment tailscaled installs
0.0.0.0/0 → tailscale0 in response to setting an exit node, TTA's
existing TCP connection to test-driver gets rerouted through the exit
node. From the test driver's perspective the connection's packets then
arrive with the exit node's WAN IP as the source rather than the
client's, so they don't match the existing flow and the connection is
dead — manifesting in the test as a hang on EditPrefs (which had
actually completed in milliseconds on the daemon side, but whose
response never made it back). Pinning the socket to the underlying NIC
keeps TTA's agent connection on a real interface regardless of any
policy routing tailscaled installs later. We bind rather than carry the
Tailscale bypass fwmark because the fwmark approach is conditional on
tailscaled having configured SO_MARK-based policy routing, while
binding is unconditional.

vmtest grows an Env.SetExitNode helper that sets ExitNodeIP via
EditPrefs through the agent, used by the new test.

Updates #13038

Change-Id: I9fc8f91848b7aa2297ef3eaf71fed9d96056a024
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-27 16:54:20 -07:00
BeckyPauley 7477a6ee47 cmd/k8s-operator: use dynamic resource names in e2e ingress tests (#19536)
Replace hardcoded resource names with dynamically generated names in
k8s-operator-e2e ingress tests to avoid collisions with stale resources.

Updates #tailscale/corp#40612

Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-04-27 13:40:46 +01:00
Andrew Lytvynov ad9e6c1925 go.mod: bump github.com/google/go-containerregistry (#19500)
This drops an indirect dependency on the old github.com/docker/docker
(which was replaced with github.com/moby/moby) and fixes a couple recent
CVEs.

Updates #cleanup

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2026-04-23 10:39:27 -07:00
Brad Fitzpatrick f289f7e77c tstest/natlab/vmtest,cmd/tta: add TestSiteToSite
Verifies that site-to-site Tailscale subnet routing with
--snat-subnet-routes=false preserves the original source IP
end-to-end.

Topology: two sites, each with a Linux subnet router on a NATted WAN
plus an internal LAN, and a non-Tailscale backend on each LAN. Backends
are given static routes pointing to their local subnet router for the
remote site's prefix; an HTTP GET from backend-a to backend-b over
Tailscale returns a body containing backend-a's LAN IP.

Adds the supporting vmtest.SNATSubnetRoutes NodeOption and plumbs
snat-subnet-routes through TTA's /up handler. The webserver started by
vmtest.WebServer now also echoes the remote IP, for the preservation
assertion.

Adds a /add-route TTA endpoint (Linux-only for now) and a vmtest
Env.AddRoute helper so the test can install the backend static routes
through TTA rather than needing a host SSH key and debug NIC.

ensureGokrazy now always rebuilds the natlab qcow2 (once per test
process, via sync.Once) so the test picks up the new TTA and webserver
behavior.

This is pulled out of a larger pending change that adds FreeBSD
site-to-site subnet routing support; figured we should have at least
the Linux test covering what works today.

Updates #5573

Change-Id: I881c55b0f118ac9094546b5fbe68dddf179bb042
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-22 12:11:30 -07:00
Fernando Serboncini 81fbcc1ac8 cmd/tsnet-proxy: add tsnet-based port proxy tool (#19468)
Exposes a local port on the tailnet under a chosen hostname. Raw TCP by
default; --http or --https reverse-proxy with Tailscale-User-* identity
headers from WhoIs, matching tailscaled's serve header conventions.

Useful as a one-shot to put a dev server on the tailnet.

Fixes #19467

Change-Id: I79f63cfbbedf7e40cf0f1f51cbae8df86ae90cdf

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-04-22 13:34:18 -04:00
James 'zofrex' Sanderson ffae275d4d ipn/ipnlocal,tailcfg: add /debug/tka c2n endpoint (#19198)
Updates tailscale/corp#35015

Signed-off-by: James Sanderson <jsanderson@tailscale.com>
2026-04-20 16:00:03 +01:00
BeckyPauley b239e92eb6 cmd/k8s-operator: add e2e test setup and l7 ingress test for multi-tailnet (#19426)
This change adds setup for a second tailnet to enable multi-tailnet e2e
tests. When running against devcontrol, a second tailnet is created via the
API. Otherwise, credentials are read from SECOND_TS_API_CLIENT_SECRET.

Also adds an l7 HA Ingress test for multi-tailnet.

Fixes tailscale/corp#37498

Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-04-17 17:03:25 +01:00
Andrew Dunham d52ae45e9b cmd/cloner: deep-clone pointer elements in map-of-slice values
The cloner's codegen for map[K][]*V fields was doing a shallow
append (copying pointer values) instead of cloning each element.
This meant that cloned structs aliased the original's pointed-to
values through the map's slice entries.

Mirror the existing standalone-slice logic that checks
ContainsPointers(sliceType.Elem()) and generates per-element
cloning for pointer, interface, and struct types.

Regenerate net/dns and tailcfg which both had affected
map[...][]*dnstype.Resolver fields.

Fixes #19284

Signed-off-by: Andrew Dunham <andrew@tailscale.com>
2026-04-17 11:36:05 -04:00
Bjorn Stange 47ecbe5845 cmd/k8s-operator: add priorityClassName support to helm chart (#19236)
Expose priorityClassName in the operator Helm chart values so that
users can configure the operator deployment with a Kubernetes
PriorityClass. This prevents the operator pods from being preempted
by lower-priority workloads.

Fixes #19235

Signed-off-by: Bjorn Stange <bjorn.stange@expel.io>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 12:57:12 +01:00
Brad Fitzpatrick 50d7176333 control/tsp, cmd/tsp: add low-level Tailscale protocol client and tool
Add a new control/tsp package providing a client for speaking the
Tailscale protocol to a coordination server over Noise, along with a
cmd/tsp binary exposing it as a low-level composable tool for
generating keys, registering nodes, and issuing map requests.

Previously developed out-of-tree at github.com/bradfitz/tsp; imported
here without git history.

Updates #12542

Change-Id: I6ad21143c4aefe8939d4a46ae65b2184173bf69f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-16 20:00:25 -07:00
David Bond eea39eaf52 cmd/k8s-operator: add affinity rules to DNSConfig (#19360)
This commit modifies the `DNSConfig` custom resource to allow the
user to specify affinity rules on the nameserver pods.

Updates: https://github.com/tailscale/tailscale/issues/18556

Signed-off-by: David Bond <davidsbond93@gmail.com>
2026-04-15 22:39:04 +01:00
Jonathan Nobels acc43356c6 control/controlclient: enable request signatures on macOS (#19317)
fixes tailscale/corp#39422

Updates tailscale/certstore for properly macOS support and
builds the request signing support into macOS builds.  iOS and builds
that do not use cGo are omitted.

Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
2026-04-15 14:11:14 -04:00
Tom Meadows 5eb0b4be31 cmd/containerboot,cmd/k8s-proxy,kube: add authkey renewal to k8s-proxy (#19221)
* kube/authkey,cmd/containerboot: extract shared auth key reissue package

Move auth key reissue logic (set marker, wait for new key, clear marker,
read config) into a shared kube/authkey package and update containerboot
to use it. No behaviour change.

Updates #14080

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>

* kube/authkey,kube/state,cmd/containerboot: preserve device_id across restarts

Stop clearing device_id, device_fqdn, and device_ips from state on startup.
These keys are now preserved across restarts so the operator can track
device identity. Expand ClearReissueAuthKey to clear device state and
tailscaled profile data when performing a full auth key reissue.

Updates #14080

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>

* cmd/containerboot: use root context for auth key reissue wait

Pass the root context instead of bootCtx to setAndWaitForAuthKeyReissue.
The 60-second bootCtx timeout was cancelling the reissue wait before the
operator had time to respond, causing the pod to crash-loop.

Updates #14080

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>

* cmd/k8s-proxy: add auth key renewal support

Add auth key reissue handling to k8s-proxy, mirroring containerboot.
When the proxy detects an auth failure (login-state health warning or
NeedsLogin state), it disconnects from control, signals the operator
via the state Secret, waits for a new key, clears stale state, and
exits so Kubernetes restarts the pod with the new key.

A health watcher goroutine runs alongside ts.Up() to short-circuit
the startup timeout on terminal auth failures.

Updates #14080

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>

---------

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-04-15 16:13:46 +01:00
Brad Fitzpatrick 9fbe4b3ed2 all: fix six tests that failed with -count=2
Avery found a bunch of tests that fail with -count=2.

Updates tailscale/corp#40176 (tracks making our CI detect them)

Change-Id: Ie3e4398070dd92e4fe0146badddf1254749cca20
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Co-authored-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13 18:52:57 -07:00
Brad Fitzpatrick a97850f7e2 cmd/derper: fix TestLookupMetric to pass when run alone
TestLookupMetric was added in e8d140654 (2023-08-17) without
initializing the dnsCache and dnsCacheBytes globals. When run in
isolation, handleBootstrapDNS writes a nil body (from the
uninitialized dnsCacheBytes), causing getBootstrapDNS to fail
decoding an empty response with EOF.

Add a setDNSCache test helper that stores the dnsEntryMap, marshals
dnsCacheBytes, and registers a t.Cleanup to nil both out, so tests
that forget to call it will hit the dnsCache-nil fatal in
getBootstrapDNS rather than silently depending on prior test state.

Also add AssertNotParallel and a dnsCache-nil fatal check to
getBootstrapDNS, the central helper all bootstrap DNS tests flow
through, to prevent future tests from running in parallel (they
all mutate package-level DNS caches and metrics) and to give a
clear error if a test forgets to initialize the DNS caches.

Fixes #19388

Change-Id: I8ad454ec6026c71f13ecfa14d25925df5478b908
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Co-authored-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13 17:20:43 -07:00
Brad Fitzpatrick 6500d3c3f8 cmd/containerboot: mark TestContainerBoot as flaky
Updates #19380

Change-Id: Ib1be53836e37224265d10abd0c2213644ea54d64
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-13 15:21:42 -07:00
Jordan Whited 929ad51be0 cmd/derper: mark rate-config flag as experimental and unstable
Updates tailscale/corp#38509

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-04-13 12:24:59 -07:00
Mike O'Driscoll ca5db865b4 cmd/derper,derp: add --rate-config file with SIGHUP reload (#19314)
Add a --rate-config flag pointing to a JSON file for per-client receive
rate limits (bytes/sec and burst bytes). The config is reloaded on SIGHUP,
updating all existing client connections live. The --per-client-rate-limit
and --per-client-rate-burst flags are removed in favor of the config file.

In derpserver, rate limiting uses an atomic.Pointer[xrate.Limiter] per
client: nil when unlimited or mesh (zero overhead), non-nil when
rate-limited.

Document that clientSet.activeClient Store operations require Server.mu.

Updates tailscale/corp#38509

Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
2026-04-10 18:37:54 -04:00
Fernando Serboncini 6b7caaf7ee cmd/k8s-operator: set PreferDualStack on ProxyGroup egress services (#19194)
On dual-stack clusters defaulting to IPv6, the ProxyGroup egress
service only got an IPv6 address, which causes request failures.
Individual egress proxies already set PreferDualStack correctly.

Fixes: #18768

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-04-09 13:33:39 -04:00
David Bond 85d6ba9473 cmd/k8s-operator: migrate to tailscale-client-go-v2 (#19010)
This commit modifies the kubernetes operator to use the `tailscale-client-go-v2`
package instead of the internal tailscale client it was previously using. This
now gives us the ability to expand out custom resources and features as they
become available via the API module.

The tailnet reconciler has also been modified to manage clients as tailnets
are created and removed, providing each subsequent reconciler with a single
`ClientProvider` that obtains a tailscale client for the respective tailnet
by name, or the operator's default when presented with a blank string.

Fixes: https://github.com/tailscale/corp/issues/38418

Signed-off-by: David Bond <davidsbond93@gmail.com>
2026-04-09 14:39:46 +01:00
Brad Fitzpatrick ec0b23a21f vmtest: add VM-based integration test framework
Add tstest/natlab/vmtest, a high-level framework for running multi-VM
integration tests with mixed OS types (gokrazy + Ubuntu/Debian cloud
images) connected via natlab's vnet virtual network.

The vmtest package provides:
  - Env type that orchestrates vnet, QEMU processes, and agent connections
  - OS image support (Gokrazy, Ubuntu2404, Debian12) with download/cache
  - QEMU launch per OS type (microvm for gokrazy, q35+KVM for cloud)
  - Cloud-init seed ISO generation with network-config for multi-NIC
  - Cross-compilation of test binaries for cloud VMs
  - Debug SSH NIC on cloud VMs for interactive debugging
  - Test helpers: ApproveRoutes, HTTPGet, TailscalePing, DumpStatus,
    WaitForPeerRoute, SSHExec

TTA enhancements (cmd/tta):
  - Parameterize /up (accept-routes, advertise-routes, snat-subnet-routes)
  - Add /set, /start-webserver, /http-get endpoints
  - /http-get uses local.Client.UserDial for Tailscale-routed requests
  - Fix /ping for non-gokrazy systems

TestSubnetRouter exercises a 3-VM subnet router scenario:
  client (gokrazy) → subnet-router (Ubuntu, dual-NIC) → backend (gokrazy)
  Verifies HTTP access to the backend webserver through the Tailscale
  subnet route. Passes in ~30 seconds.

Updates tailscale/tailscale#13038

Change-Id: I165b64af241d37f5f5870e796a52502fc56146fa
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-08 17:24:18 -07:00