Commit Graph

10620 Commits

Author SHA1 Message Date
Claus Lensbøl bb47ea2c6b tstest/natlab/vmtest: start migrating old natlab tests to vmtest (#19727)
Instead of having two entry points for running natlab tests, start
converting the connectivity tests to use the vmtest framework.

Grid and pair tests have yet to be moved over.

Updates #13038

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-05-13 16:44:53 -04:00
Fran Bull 3a6261b79b feature/conn25: keep addrAssignments through pool reconfig
Fixes tailscale/corp#40250

Signed-off-by: Fran Bull <fran@tailscale.com>
2026-05-13 11:00:47 -07:00
Simon Law e4e59a2af0 wgengine/netstack: stop inject goroutine from leaking in Impl.Start (#19721)
This patch fixes a data race in wgengine/netstack that surfaced while
running both TestTCPForwardLimits and TestTCPForwardLimits_PerClient.
Because these two tests both setup the TS_DEBUG_NETSTACK envknob, a
race happens because netstack.Impl.Close leaked its inject goroutine.
The inject goroutine also reads the TS_DEBUG_NETSTACK envknob, so if
it is still running when the next test starts, then it will break.

This patch also cleans up the tests a bit, ensuring that neither of
them run in T.Parallel. It also adds a T.Cleanup call to clear the
envknob.

Fixes #19720

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-13 08:13:40 -07:00
Simon Law 6467f0d067 ipn/ipnlocal: fix minor typo in shouldUseOneCGNATRoute (#19719)
This fixes a log message where ipn/ipnlocal.shouldUseOneCGNATRoute
would claim that an android machines was actually macOS.

Updates #cleanup
Updates #19652

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-12 21:55:29 -07:00
Brad Fitzpatrick 6b729795c3 derp/derpserver: use hashtriemap for peer lookup
Replace the process-global Server.mu lookup in the packet send hot path
with a global hashtriemap mirror of local clientSet entries. The
authoritative clients map remains guarded by Server.mu; clientsAtomic is
only a lock-free fast path for active local clients.

Misses, stale inactive client sets, duplicate accounting, and mesh
forwarding still fall back to lookupDestUncached. This avoids taking
Server.mu for the common local active-client send path, at the cost of
adding one global concurrent map that mirrors Server.clients for local
peers.

The benchmark uses four destination peers. The before run sets
TS_DEBUG_DERP_DISABLE_PEER_HASHTRIE=true to force the old mutex lookup
path; the after run uses the hashtrie fast path.

    goos: linux
    goarch: amd64
    pkg: tailscale.com/derp/derpserver
    cpu: Intel(R) Xeon(R) 6975P-C
                          │    before     │                after                │
                          │    sec/op     │   sec/op     vs base                │
    LookupDestHashTrie-16   176.050n ± 1%   1.904n ± 6%  -98.92% (p=0.000 n=10)

                          │   before   │             after              │
                          │    B/op    │    B/op     vs base            │
    LookupDestHashTrie-16   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
    ¹ all samples are equal

                          │   before   │             after              │
                          │ allocs/op  │ allocs/op   vs base            │
    LookupDestHashTrie-16   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
    ¹ all samples are equal

Updates #3560 (very indirectly, historically)
Updates #19713 (as an alternative to that PR)

Change-Id: Ifb72e5c9854ad00e938cd24c6ab9c27312f297e8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-12 16:08:16 -07:00
Adriano Sela Aviles 72578de033 ipn/{ipnlocal,localapi},client/local: add per-dst cap resolution for services
Adds two new cap resolution methods alongside the existing PeerCaps:

PeerCapsForService(src netip.Addr, svcName tailcfg.ServiceName) resolves
the service name to its VIP addresses via the node's service IP mappings
and returns caps scoped to that service. Exposed on /v0/whois via the
svc_name query parameter and on client/local.Client as WhoIsForService.

PeerCapsForIP(src, dst netip.Addr) resolves caps against an arbitrary
destination IP. Exposed on /v0/whois via the svc_addr query parameter
and on client/local.Client as WhoIsForIP.

svc_name takes priority over svc_addr when both are present. Invalid
values for either return 400. The existing PeerCaps/WhoIs path is
unchanged: without a service parameter, WhoIs returns only host-level
caps.

Updates tailscale/corp#41632

Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
2026-05-12 15:50:39 -07:00
DeedleFake ad8ead9c94 cmd/tailscale/cli: add RunWithContext
Fixes #12778

Change-Id: If9f8b299cef0cb68f93b344845b5c6a5b7554d2c
Signed-off-by: DeedleFake <deedlefake@users.noreply.github.com>
2026-05-12 12:27:55 -07:00
M. J. Fromberger 9f48567bf1 ipn/ipnlocal,wgengine/magicsock: add basic counters for cached peer connectivity (#19699)
Add new clientmetric counters for establishing contact with peers while using
cached network map data. To do this, instrument the magicsock.Conn with a bit
to indicate whether its peer data came from a cached netmap. If so, there are
two conditions we will count as establishing connectivity to a peer:

  - Receipt of a CallMeMaybe from a peer via disco.
  - Establishing a valid endpoint address for a peer.

In vmtest, add Env.ClientMetrics to scrape metrics from the specified node.
Use this to check that counters were updated in caching tests.

Updates https://github.com/tailscale/projects/issues/13
Updates #12639

Change-Id: Ie8cf3244ac8af4f5bcfe4d0d944078da2ba08990
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-05-12 12:01:05 -07:00
James Tucker 120bfcf1cc util/eventbus: extract non-generic SubscriberFunc constructor body and cache type name
Two changes that share the same intent of reducing per-T duplication
in code that doesn't actually depend on T:

1. Hoist the non-generic portion of newSubscriberFunc[T] into a
   newSubscriberFuncCore() helper. The hoisted work is the time
   timer setup, the subscriberFuncCore allocation, and the
   unregister closure (which captures only the non-generic
   reflect.Type and *subscribeState). The generic body now does
   only the two T-bound things it has to: compute reflect.TypeFor[T]
   and create the dispatch closure.

   Effect on the per-shape-stencil body of newSubscriberFunc[T]:
     before: 523 B per shape (in synthetic test)
     after:  293 B per shape (-230 B per shape; -56% on this body)

2. Cache reflect.Type.String() once at construction (in core.typeName)
   instead of recomputing it every time the dispatch closure runs.
   The dispatch closure also now takes the *subscriberFuncCore directly
   rather than building an intermediate dispatchFuncState struct on
   every call.

   Effect on the dispatch closure body (newSubscriberFunc[T].func1):
     before: 581 B per shape
     after:  480 B per shape (-101 B per shape; -17%)

Combined effect on tailscaled (linux/amd64):
  named-symbol savings via symcost: ~7 KB
  stripped binary delta:            -8 KB (page-quantized)
  arm64 binary delta:                0 (page-quantized)

  cumulative reduction from baseline (5167ff412):
    linux/amd64:  -110,592 bytes (-0.391%)
    linux/arm64:  -131,072 bytes (-0.499%)

Throughput is also improved by the typeName cache: BenchmarkBasic
goes from 2018 ns/op to 1864 ns/op (-7.6%) because the dispatch hot
path no longer allocates a string on every event.

Updates #12614

Change-Id: Ib3a3d6796785e16506330ec034e1144580d467a3
Signed-off-by: James Tucker <james@tailscale.com>
2026-05-12 11:16:04 -07:00
Brad Fitzpatrick 758ebe9839 tstest/natlab/vmtest: use short paths for Unix sockets
macOS limits Unix socket paths to 104 bytes. The Go test TempDir
path (e.g. /var/folders/.../TestDirectConnection...679197086/001/)
easily exceeds that, causing "bind: invalid argument". Create a
short /tmp/vmtest* directory for all socket files (vnet, QMP,
dgram) so the paths stay well under the limit on every platform.

Updates #13038

Change-Id: I721d24561d1766aaa964692bc77f40a131aa9455
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-11 21:54:27 -07:00
Brad Fitzpatrick f4c5613156 tstest/natlab/vmtest: don't require KVM; use TCG on macOS
startCloudQEMU hardcoded -machine q35,accel=kvm and -cpu host,
which fails on any host without KVM (notably macOS). Replace
with a qemuAccelArgs helper that probes /dev/kvm and falls back
to QEMU's TCG software emulation, matching the pattern already
used by tstest/integration/nat. Also wire the helper into
startGokrazyQEMU so gokrazy VMs pick up KVM when available.

Updates #13038

Change-Id: I7745518db823279b1880957bb14ca2ffdaab4c50
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-11 19:18:17 -07:00
Brad Fitzpatrick e062b46984 tstest/natlab, .github/workflows: add opt-in natlab CI workflow
The natlab vmtest suite (tstest/natlab/vmtest) and the integration nat
tests are gated behind --run-vm-tests because they need KVM and are
slow. Until now nothing in CI exercised them apart from a single
canary TestEasyEasy run on every PR.

Add .github/workflows/natlab-test.yml that runs the full opt-in suite
on demand (workflow_dispatch), on PRs labeled "natlab", and on main
every 12 hours via cron. The workflow has two phases:

  - "prepare" builds the gokrazy VM image, downloads the Ubuntu and
    FreeBSD cloud images once via the new natlabprep tool, and emits
    a dynamic JSON matrix of every TestX function it finds in the two
    opt-in packages.
  - "test" is a per-test matrix that depends on prepare. Each matrix
    job restores the shared caches and runs a single test, so adding
    a new TestFoo is automatically picked up on the next run without
    any workflow edits.

Rename the existing natlab-integrationtest.yml to natlab-basic.yml
since it's the small smoke variant (just TestEasyEasy on every PR);
the new natlab-test.yml is the bigger suite. The job inside is
renamed to EasyEasy for the same reason.

Move the macOS arm64 host check from vmtest.Env.Start into
vmtest.Env.AddNode so a test that adds a vmtest.MacOS node skips
immediately on a non-macOS host, and add an explicit
skipIfNotMacOSArm64 helper at the top of the two macOS-only tests
so the platform requirement is obvious to readers.

Quiet the takeAgentConnOne miss log in tstest/natlab/vnet by default
(it was the overwhelming majority of bytes in CI logs, with no signal
in healthy runs) and replace it with a periodic "still waiting" line
that only fires after 10s, so a truly stuck agent connection still
surfaces.

Updates #13038

Change-Id: I4582098d8865200fd5a73a9b696942319ccf3bf0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-11 17:14:46 -07:00
James Tucker 4eec4423b4 util/eventbus: move Publisher publisher-interface impl to a non-generic core
Mirrors the same refactor previously applied to SubscriberFunc:

  - Publisher[T]: a thin user-facing facade. Holds a pointer to a
    non-generic publisherCore and exposes Publish/Close/ShouldPublish.
  - publisherCore: a non-generic struct that owns the *Client back-
    pointer, stop flag, and cached reflect.Type. It implements the
    package-private publisher interface (publishType, Close).
    The bus's per-Client publisher set is set.Set[publisher] keyed
    on this single non-generic type.

The publisher interface only exists to support diagnostic
introspection (Debugger.PublishTypes returning the list of types a
client publishes). Previously, satisfying that diagnostic-only
interface forced *Publisher[T] to be the implementor and cost a
per-T itab, generic dictionary, and equality function on every
event type ever passed through Publish[T]. Moving the
implementation to a non-generic core lets the diagnostic surface
work unchanged while charging zero per-T cost for the
diagnostic-driven generic interface.

Publisher[T].Publish is also slimmed: the channel/select/stopFlag
loop is now a non-generic publish() helper that takes the value as
'any'. The per-T body is reduced to forwarding the boxed value to
the helper.

Measured impact (util/eventbus/sizetest):

  total per-flow binary cost:
    linux/amd64:  2252.8 B/flow -> 1900.5 B/flow  (-352.3 B / -15.6%)
    linux/arm64:  2228.2 B/flow -> 1835.0 B/flow  (-393.2 B / -17.6%)

  Publisher per-receiver attribution:
    linux/amd64:   635.2 B/flow ->  369.6 B/flow  (-265.6 B / -41.8%)
    linux/arm64:   751.7 B/flow ->  373.2 B/flow  (-378.5 B / -50.4%)

Cumulative reduction from the original baseline (5167ff412):
    linux/amd64:  3096.6 B/flow -> 1900.5 B/flow  (-1196.1 B / -38.6%)
    linux/arm64:  3145.7 B/flow -> 1835.0 B/flow  (-1310.7 B / -41.7%)

Dropped per-T symbols (200-flow eventbus binary):

  - .dict.Publisher[T]                   was 14,400 B (72 B/T)
  - type:.eq.Publisher[T]                was 11,832 B (58 B/T)
  - go:itab.*Publisher[T],publisher      was  8,000 B (40 B/T)
  - (*Publisher[T]).Close shape stencils collapsed to 1

Behavior is unchanged: BenchmarkBasicThroughput is within noise
(2018 -> 2038 ns/op at -benchtime=2s) and all eventbus tests pass.

Updates #12614

Change-Id: I61979c2bf95d2a711c2321e6e0b4b7d15980e9f5
Signed-off-by: James Tucker <james@tailscale.com>
2026-05-11 14:39:42 -07:00
James Tucker d72cde1a6b util/eventbus: move SubscriberFunc subscriber-interface impl to a non-generic core
Splits SubscriberFunc[T] into:

  - SubscriberFunc[T]: a thin user-facing facade that holds only a
    pointer to a non-generic core. It exposes Close() to user code,
    which forwards to the core.
  - subscriberFuncCore: a non-generic struct that owns all the
    subscriber state (stop flag, unregister, logf, slow timer,
    cached reflect.Type) and implements the bus's package-private
    subscriber interface. Its dispatch() invokes a closure
    captured at construction time that performs the
    vals.Peek().Event.(T) type assertion and runs the user
    callback on the unboxed value.

The bus's outputs map and subscriber-interface itab are
parameterized only by *subscriberFuncCore, not by T, eliminating
both the per-T itab and the per-T generic dictionary that
previously scaled with the number of subscribed event types.

Measured impact (util/eventbus/sizetest):

  total per-flow binary cost:
    linux/amd64:  3039.2 B/flow -> 2252.8 B/flow  (-786.4 B / -25.9%)
    linux/arm64:  3145.7 B/flow -> 2228.2 B/flow  (-917.5 B / -29.2%)

  SubscriberFunc per-receiver attribution:
    linux/amd64:   840.8 B/flow ->  300.8 B/flow  (-540.0 B / -64.2%)
    linux/arm64:   849.9 B/flow ->  303.8 B/flow  (-546.1 B / -64.3%)

Dropped per-T symbols (200-flow eventbus binary):

  - (*SubscriberFunc[T]).dispatch     was 26,639 B total (130 B/T)
  - (*SubscriberFunc[T]).subscribeType was  3,600 B total ( 18 B/T)
  - .dict.SubscriberFunc[T]            was 14,400 B total ( 72 B/T)
  - go:itab.*SubscriberFunc[T],...     was  9,600 B total ( 48 B/T)

Of the original 913 B/flow attributed to SubscriberFunc, 540 B/flow
is now gone, dropping the receiver to 300 B/flow.

Behavior is unchanged: BenchmarkBasicThroughput is within noise
(1955 -> 1941 ns/op on the test box) and all eventbus tests pass.

Updates #12614

Change-Id: I646b3b05fd8d95f9afead59bfd0f69cd18b7a709
Signed-off-by: James Tucker <james@tailscale.com>
2026-05-11 12:14:05 -07:00
Francois Marier ead5ce65a3 cmd/pgproxy: fix client TLS handshake timeout
There is a 30-second timeout set on client TLS connections but the handshake was
called on the wrong connection and so the timeout was never used in practice.

Signed-off-by: Francois Marier <francois@fmarier.org>
2026-05-11 11:12:11 -07:00
Fran Bull 2f45a6a9d8 feature/conn25: return expired assignments to address pools
Make it possible to remove the least recently used expired address
assignment from addrAssignments.
Before checking out a new address from the IP pools, return a handful of
expired addresses.

Updates tailscale/corp#39975

Signed-off-by: Fran Bull <fran@tailscale.com>
2026-05-08 14:33:06 -07:00
Fran Bull 82346f3882 feature/conn25: move addrAssignments to their own file
Updates tailscale/corp#39975

Signed-off-by: Fran Bull <fran@tailscale.com>
2026-05-08 14:33:06 -07:00
Claus Lensbøl 469d356ed8 tstest/natlab/vmtest: add test for direct conn with cached netmap (#19660)
When a peer is not able to connect to control after a restart and is
using a cached netmap, that nodes should be able to connect to another
peer in its tailnet (given that the home DERP of that peer has not
changed in the meantime).

Add test that starts two peers and connects them to a tailnet with
caching enabled. Then blackhole traffic to control from one peer and
restart it. Verify that the connection between the two ends up direct.

Adds facilities for expecting a certain path type between nodes.

Updates: #19597

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-05-08 16:57:27 -04:00
Fran Bull ee2378b141 feature/conn25: follow CNAMEs when rewriting DNS response
If a DNS query for a domain that should be routed through a connector
results in CNAME records in the response, collapse the CNAME chain to an
A/AAAA record for the domain -> magic IP.

Fixes tailscale/corp#39978

Signed-off-by: Fran Bull <fran@tailscale.com>
2026-05-08 08:12:24 -07:00
Brad Fitzpatrick 24eb157448 go.toolchain.rev: bump to Go 1.26.3
Updates tailscale/corp#41490

Change-Id: I35b67bdbcd71468fea03b033b17aeefe1319dc45
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-07 15:33:05 -07:00
Alex Chan d6ffc0d986 tka,ipn: reduce boilerplate in Tailnet Lock tests
The `CreateStateForTest` helper reduces boilerplate in cases where the test
only cares about the trusted keys and not the disablement values (and makes
it more obvious where the disablement values are meaningful).

The `setupChonkStorage` helper reduces the boilerplate when creating on-disk
TKA storage in tests.

The `fakeLocalBackend` helper reduces the boilerplate when setting up a
`LocalBackend` instance in the IPN tests.

Updates #cleanup

Change-Id: Iacfba1be5f7fab208eec11e4369d63c7d7519da5
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-07 21:49:27 +01:00
Fernando Serboncini 495d3acc7b tstest/natlab/vmtest: kill QEMU when test process dies (#19676)
Re-exec the test binary as a thin wrapper that holds a pipe inherited
from the test. When the test goes away (any reason, including SIGKILL,
panic, or OOM), the kernel closes the pipe write end; the wrapper sees
EOF and SIGKILLs itself, taking QEMU and its children with it.

Updates #13038

Change-Id: Ib2151098193551396c1d7bb51b07da3bd6b2cfb4

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-05-07 16:14:27 -04:00
Claus Lensbøl 76248a68b2 tstest/natlab/vnet: close gonet sockets when test is done (#19677)
Running all vmtests in tstest/natlab/vmtest locally was breaking later
tasks in the queue. The goroutine dump on timeout had goroutines hanging
around for 9 minutes, meaning that something was not getting cleaned up.

  goroutine 262 [select, 9 minutes]:
  gvisor.dev/gvisor/pkg/tcpip/adapters/gonet.commonRead({...})

Add a timeout of Now() to gonet TCP connections when the test ends
(inspired by ServeUnixConn()), and wait for them to shut down before
exiting the test.

Updates #13038

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-05-07 14:57:07 -04:00
Hazel T 33b9579c21 scripts/installer.sh: add openSUSE Slowroll as a Tumbleweed derivative (#19662)
Fixes: #14927

Signed-off-by: Hazel T <hazel@tailscale.com>
2026-05-07 12:43:55 +01:00
Erisa A 76712b32d9 .github: install ca-certificates on Kali to fix installer tests (#19673)
Updates #cleanup

Signed-off-by: Erisa A <erisa@tailscale.com>
2026-05-07 12:20:09 +01:00
James Tucker 0def0f19bd util/eventbus: extract SubscriberFunc.dispatch loop to a non-generic helper
The (*SubscriberFunc[T]).dispatch method body — a ~40-line select
loop with slow-subscriber timer, snapshot handling, ctx-cancel
draining, and a CI stack-dump branch — was previously fully
duplicated by the Go compiler for every distinct GC shape of T.
None of that body actually depends on T except for the type
assertion and the user callback invocation.

This change moves the loop body into a non-generic dispatchFunc()
helper, leaving (*SubscriberFunc[T]).dispatch as a tiny wrapper
that:

  - performs the vals.Peek().Event.(T) type assertion
  - spawns the callback goroutine via `go runFuncCallback(s.read,
    t, callDone)` — a regular generic function call, not a closure,
    so that `go` binds the args to the goroutine's frame instead of
    allocating a closure on the heap. This preserves the
    zero-extra-allocation behavior of the original
    (*SubscriberFunc[T]).runCallback method.
  - resolves T's name via reflect.TypeFor[T]().String() (cached on
    the stack rather than recomputed on each %T formatting)
  - calls dispatchFunc with the callDone channel

The %T formatting in the original logf calls is replaced with %s
on the resolved name string, removing per-T fmt instantiations.

A new BenchmarkBasicFuncThroughput is added alongside the existing
BenchmarkBasicThroughput so per-event allocation behavior on the
SubscribeFunc dispatch path is covered by the benchmark suite.

Measured impact (util/eventbus/sizetest):

  SubscriberFunc per-flow attribution:
    linux/amd64:  912.5 B/flow -> 840.8 B/flow  (-71.7 B/flow)
    linux/arm64:  917.5 B/flow -> 849.9 B/flow  (-67.6 B/flow)

The total per-flow size delta on amd64 dropped from 3,096.6 B to
3,039.2 B (-57 B/flow). The arm64 total stayed at 3,145.7 B
because the linker's page-aligned section sizing absorbed the
improvement on this binary; the symcost-attributed per-receiver
number is the real signal.

Behavior is unchanged: BenchmarkBasicThroughput stays at 0
allocs/op and BenchmarkBasicFuncThroughput holds at the same 2
allocs/op, 144 B/op as the prior eventbus implementation. All
eventbus tests pass.

Updates #12614

Change-Id: I85f933f50f58cd25bbfe5cc46bdda7aab22f0bf7
Signed-off-by: James Tucker <james@tailscale.com>
2026-05-06 18:56:09 -07:00
Brad Fitzpatrick 87a74c3aa2 tsnet: make workload identity federation opt-in
The tailscale.com/wif package brings in the AWS SDK
(github.com/aws/aws-sdk-go-v2/{config,sts,...} and github.com/aws/smithy-go)
to support fetching ID tokens from AWS IMDS for workload identity
federation. Until now, tsnet pulled this in unconditionally via
feature/condregister/identityfederation, costing ~70 unwanted deps for
every tsnet program whether or not it uses workload identity federation.

These AWS SDK deps were originally removed from tsnet on 2025-09-29 by
commit 69c79cb9f ("ipn/store, feature/condregister: move AWS + Kube
store registration to condregister"). They were then accidentally added
back on 2026-01-14 by commit 6a6aa805d ("cmd,feature: add identity
token auto generation for workload identity", PR #18373) when the new
wif package was wired into tsnet via feature/identityfederation.

Drop the blanket import. tsnet programs that want workload identity
federation now opt in with:

    import _ "tailscale.com/feature/identityfederation"

The hook lookup in resolveAuthKey already uses GetOk and degrades
gracefully when the feature isn't linked, so existing programs that
don't use workload identity federation see no behavior change. The
tailscale CLI still imports the condregister wrapper directly, so its
behavior is also unchanged.

Lock this in with TestDeps additions: tailscale.com/wif as a BadDep,
plus substring checks in OnDep that fail on any github.com/aws/ or
k8s.io/ dependency creeping back in.

Also, switch cmd/gitops-pusher from the condregister wrapper to a
direct import of feature/identityfederation: gitops-pusher's auth flow
calls HookExchangeJWTForTokenViaWIF directly, so it shouldn't be
subject to the ts_omit_identityfederation build tag.

Updates #12614

Change-Id: I70599f2bdd4d3666b26a859d5b76caa5d6b94507
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-06 18:43:45 -07:00
Adriano Sela Aviles daddb14b8f control/controlhttp: use ws:// when HTTPSPort is NoPort in JS dialer
When HTTPS is explicitly disabled (HTTPSPort == NoPort), the JS WebSocket
dialer should use ws:// instead of wss://. This matches the behavior of
the non-JS client and fixes connections to development control servers
e.g. http://localhost:31544.

Updates tailscale/corp#40944

Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
2026-05-06 15:58:58 -07:00
Brad Fitzpatrick d06cc56987 wgengine/magicsock: add more docs, checks to Test32bitAlignment
Per recent chat with @raggi about all this, I went and looked at this
test again.

Updates #cleanup

Change-Id: Icb7d87b1ed2cebf481ee4e358a3aa603e63fb8a4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-06 15:29:44 -07:00
Brad Fitzpatrick 15bb10dbce tsnet: ban awsstore and kubestore as deps in TestDeps
Commit 69c79cb9f (Sep 2025) moved awsstore and kubestore registration
behind condregister build tags so tsnet wouldn't pull in the AWS SDK
and Kubernetes client by default. The accompanying TestDeps BadDeps
entry was missed, so PR #19667 (which re-added those imports) wasn't
caught by the test.

Add the two packages to BadDeps so future regressions fail the test.

Updates #19667
Updates #12614

Change-Id: I903b7c976e5e122cc0c0b956dc73740f5d474fac
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-06 14:57:47 -07:00
Tom Proctor b74eeda055 cmd/testwrapper: print unit for package duration (#19663)
Include the unit (s) when printing the time taken to test each package.

Updates #cleanup

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2026-05-06 22:31:48 +01:00
kari-ts c721189cef ipn/ipnlocal: prefer one CGNAT route on Android (#19652)
Android rebuilds its VpnService interface when the VPN route
configuration changes, which tears down long lived TCP connections
through the tunnel. Use the same automatic OneCGNATRoute behavior as
macOS on Android, and prefer the single CGNAT route when no other
interface is using the CGNAT, falling back to fine grained peer routes
otherwise.

Updates tailscale/tailscale#19591

Signed-off-by: kari <kari@tailscale.com>
2026-05-05 19:11:17 -07:00
Brad Fitzpatrick f844c8bc32 util/winutil/gp: deflake TestGroupPolicyReadLockClose
The test goroutine read lockCnt immediately after Lock returned, racing
with Close: close(lk.closing) wakes lockSlow's select, whose deferred
Add(-2) on lockCnt can run before Close's CAS clears the LSB. When that
happens, lockCnt is briefly 1 (3 - 2) instead of 0 (1 + 2 - 2 - 1),
producing "lockCnt: got 1; want 0".

Move the lockCnt assertion into the main test goroutine, after both
Close has returned and the Lock goroutine has finished, so both updates
have settled before we read.

Fixes #19647

Change-Id: Ia67036ff73a1beb528cbd621460db9048f3066ad
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-05 14:02:35 -07:00
Jonathan Nobels 872d79089e VERSION.txt: this is v1.99.0 (#19645)
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
2026-05-05 15:07:20 -04:00
Evan Lowry aa21b0c008 client/systray: fix recommended exit node not showing as selected (#19627)
When an exit node was set before launching systray, the recommended row
in exit nodes rendered as not selected even when the active exit node
was at the same location.

This looks to be two different things:

- suggestExitNode takes its own suggestion into account, and not the
  users active exit node. When a mullvad city is reached via the picker
  rather than the recommended row, the suggester's pick and
  prefs.ExitNodeID end up as distinct peers in the same city, resulting
  in an ID-only equality check missing the match.
- Toggle state was constructed and mutated via .Check(), which for newly
  created elements may be cached (such as when launching systray, with
  an already active node).

Fixes #19626

Signed-off-by: Evan Lowry <evan@tailscale.com>
2026-05-05 10:49:38 -03:00
Alex Chan eac531da8e cmd/tailscale/cli: unhide --report posture flag in up
This was originally hidden during the beta period in both `up` and `set`,
then when device posture went GA we unhid the flag in `set` but not in
`up`.

This is confusing for users, because an error message can direct them to
run `tailscale up` with this flag if they've set it previously, but the
help text won't tell them what it does.

Updates #5902
Updates #17972

Change-Id: I9a31946f4b3bb411feed0f5a6449d7ff9a5ba9d3
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-05 10:12:36 +01:00
Brad Fitzpatrick 883d4fd2cd wgengine/netstack, net/ping: stop using pro-bing and use our net/ping instead
Fixes #19633
Fixes #13760

Change-Id: I0fa9423523a3a0fb1dfcde57de0f26e51723ff97
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-04 14:05:24 -07:00
Brad Fitzpatrick 81569e891f tstest/iosdeps: update import list to mirror ipn-go-bridge
The purpose of this package is to test the iOS dependency closure, but
it had drifted from the actual import list of the ipn-go-bridge package
in the corp repo (the Go side of the iOS / macOS app).

Update the imports to match ipn-go-bridge's GOOS=ios import list,
adding many missing packages including wgengine/netstack,
feature/{taildrop,syspolicy,condregister}, the util/syspolicy/*
subpackages, types/{key,lazy,logid,netmap}, tsd, safesocket,
util/{eventbus,must,set}, and several net/* and ipn/* packages.

Drop two now-stale BadDeps entries (for now!): database/sql/driver and
github.com/google/uuid are reached via wgengine/netstack ->
github.com/prometheus-community/pro-bing, which netstack imports on
darwin || ios for ICMP user-ping, so the iOS app already ships them.
But we should fix that later.

Updates #19633

Change-Id: Ic50779fdb195685a2e8ccd7c513eee91b0feeaf8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-04 14:05:24 -07:00
Brad Fitzpatrick 9bb7ca6116 cmd/vet/lowerell, drive/driveimpl: forbid variables named "l" or "I"
Add a new vet checker that rejects variables, parameters, named
return values, receivers, range/type-switch bindings, type
parameters, struct fields, and constants named "l" (lowercase ell)
or "I" (uppercase i). Both are hard to distinguish from the digit
"1" and from each other in too many fonts.

Rename the two pre-existing struct fields named "l" (both of type
net.Listener) in drive/driveimpl/drive_test.go to "ln", matching the
convention used elsewhere for net.Listener locals.

Rename the test-fixture struct fields "I" (single int label) to
"Int" in metrics/multilabelmap_test.go and util/deephash/deephash_test.go,
preserving the "first letters of types" convention used alongside
neighboring fields like I8/I16/U/U8.

Also teach pkgdoc_test.go to skip testdata/ directories, which
the go tool ignores; they are not real packages.

Fixes #19631

Change-Id: I71ad2fa990705f7a070406ebcdb8cefa7487d849
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-04 14:03:28 -07:00
Andrew Lytvynov 0cf899610c util/linuxfw/linuxfwtest: remove unused package (#19520)
Added in 2022, this appears to be unused now.

Updates #cleanup

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2026-05-04 12:33:12 -07:00
License Updater ca2317439d licenses: update license notices
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
2026-05-04 10:34:27 -07:00
Jordan Whited ce76f44df2 derp/derpserver: remove global rate limiter
Which can be unfair around varying packet sizes.

Updates tailscale/corp#40962

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-05-04 09:41:14 -07:00
Fernando Serboncini 29122506be misc/git_hook: propagate shared HOOK_VERSION (#19476)
Move HOOK_VERSION into the githook package and export it as
githook.HookVersion, so tailscale/corp can reference it via
the shared-code bump instead of having to bump HOOK_VERSION
by hand.

New launcher.sh composes the wanted version from 2 sources:
the shared HOOK_VERSION and an optional repo local version,
misc/git_hook/HOOK_VERSION, for repo-specific config bumps.

Updates tailscale/corp#40381

Change-Id: I7cf16889ba53cb564cc2df7dfd7588748f542c55

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-05-04 12:38:28 -04:00
George Jones 290a6cc03c appc, feature/conn25: handle exact and wildcard domains correctly (#19202)
Installed SplitDNS routes are always treated as wildcard domains,
so the domains that we pass to the local resolver should be normalized
and have any leading *. wildcard prefix removed.

When looking at DNS responses to see if the domain matches, we need to
consider both exact matches and wildcard matches. We now keep separate
maps of exact-match domains and wildcard domains, and when we match we
check to see if there's a match in the exact-match map, otherwise we
check against the wild card match map until we find a match, removing
a label after each check.

Rather than looking for matching self-hosted domains (domains serviced
by the connector being run on the self-node), the apps that are being
serviced by the connector on the self-node are tracked instead. When
checking to see if a DNS response should be rewritten, it is ignored
if any of the matching apps for the domain are in the self-hosted apps set.

Fixes tailscale/corp#39272

Signed-off-by: George Jones <george@tailscale.com>
2026-05-01 17:33:21 -04:00
Fran Bull bdf3419e7d net/dns: add custom scheme resolvers
If another part of the client code registers a custom scheme with the
forwarder, the forwarder will check resolver addresses to see if they
match the scheme. If they do, the corresponding custom scheme handler
will be called to find the actual address for the resolver at this
moment. If the handler returns the empty string then that resolver will
be ignored.

This is useful if you want to dynamically determine where to send
certain DNS requests. It is being added to support new app connector
(conn25) work that would like to make sure it sends DNS requests to the
current connector peer in a high availability configuration.

Updates tailscale/corp#39858

Signed-off-by: Fran Bull <fran@tailscale.com>
2026-05-01 14:01:10 -07:00
Rollie Ma 78126c5d9f tailcfg: add node capability for services in desktop clients (#19605)
Add a node capability to help determine if the desktop clients should
show services list/menu/section

Updates: https://github.com/tailscale/corp/issues/40900

Change-Id: Ie34b3362f921d710173b2a0dd190354352bb26f0

Signed-off-by: Rollie Ma <rollie@tailscale.com>
2026-05-01 12:07:33 -07:00
Tom Meadows ee10f9881c cmd/k8s-operator: add authkey reissuing to recorder reconciler (#19556)
also fixes memory leak with authKeyReissuing map on ProxyGroup
reconciler authkey reissue.

Updates #19311

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-05-01 18:26:55 +01:00
Alex Chan 3ced30b0b6 tka: clarify that this limit is on disablement *values* not *secrets*
Values get written into TKA state; secrets don't.

Updates #cleanup

Change-Id: Ief9831dcb1102f584a33b2e71b611b38ca463724
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-01 18:25:39 +01:00
Andrew Lytvynov f15a4f4416 client/web: move API permission checks into handlers (#19576)
There are only a couple endpoints that check peer capabilities. Keeping
permission checks with the code that assumes they were performed, rather
than with the routing layer, feels easier to reason about.

Check that the caller is actually a peer and pass their capabilities via
a context value for handlers that want to check them.

Along with this, simplify the helper handler wrappers that are not
needed for most of the endpoints.

Updates #40851

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2026-05-01 09:01:53 -07:00
Brad Fitzpatrick bbcb8650d4 cmd/tailscale/cli: fetch netmap via current-netmap debug action
Stop opening an IPN bus subscription with NotifyInitialNetMap purely to
read the current netmap once. Use the LocalAPI debug current-netmap
action (added in 159cf8707) instead, which returns the current netmap
synchronously without subscribing to the bus.

Updates #12542

Change-Id: I8aa2096d65aaea4dfe62634f03ce06b5470e0e51
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-01 07:53:51 -07:00