Fix the following issues:
1. Endianness Bug: The nftables runner used hardcoded
big-endian byte arrays for firewall mark values (0xff0000, etc.), breaking
bitwise operations on little-endian systems (all x86/x64, ARM). This caused
connmark save/restore rules to silently fail. Fixed by using
binary.NativeEndian to generate correct byte order for the host system.
2. Connmark Restore Conditional Check: The connmark restore
mechanism unconditionally overwrote packet marks, even when Tailscale
hadn't set any mark bits in conntrack. This destroyed mark bits set by
other systems (VPNs, policy routing, vendor flags), breaking coexistence.
Fixed by adding a conditional check to only restore when (ct mark &
0xff0000) != 0, preventing the worst case of wiping all marks to zero.
Changes:
- util/linuxfw/linuxfw.go: Added nativeEndianUint32() helper and updated
all mask functions to use native byte order instead of hardcoded bytes
- util/linuxfw/nftables_runner.go: Added conditional check in
makeConnmarkRestoreExprs() to only restore when ct mark has Tailscale
bits set; added detailed comment about bit preservation limitations
- util/linuxfw/iptables_runner.go: Added conditional check using -m
connmark ! --mark to match nftables behavior
- Tests updated: Fixed byte-level regression tests to expect little-endian
byte sequences and verify the new conditional check
Note: Perfect bit preservation in nftables remains challenging
due to nftables expression VM limitations. The current implementation
prevents the critical case of wiping marks with zero.
Updates #3310Fixes#11803
Related to #8555
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
Brings Subscriber[T] in line with the same non-generic-core pattern already
applied to SubscriberFunc[T] and Publisher[T]:
- Renames subscriberFuncCore to subscriberCore and shares it between
Subscriber[T] and SubscriberFunc[T]. Both typed facades hold a
*subscriberCore plus their respective per-T delivery state
(Subscriber: chan T; SubscriberFunc: nothing, the user callback is
captured in the dispatch closure).
- The bus's outputs map and subscriber-interface itab key on
*subscriberCore for both subscriber kinds, so adding a new Subscribe[T]
call site no longer pays a per-T itab, dictionary, or equality function
for the subscriber-interface side.
- Subscribe[T] now hoists the non-generic constructor portion into
newSubscriberCore (timer setup, core allocation, cached type/typeName,
unregister method-value), matching SubscribeFunc.
The dispatch loop is intentionally NOT extracted to a non-generic helper for
Subscriber[T], unlike SubscriberFunc[T]. The reason is the typed channel send
'case s.read <- t:' must appear lexically inside the select; the only way to
lift it into a non-generic loop is to bridge typed and untyped via a per-event
goroutine, which costs ~2.7x throughput on BenchmarkBasicThroughput. We keep
dispatchTyped on the generic facade and accept the per-shape stencil cost as
the cheaper alternative.
Symbol-level effect on tailscaled (linux/amd64, measured via
`go tool nm -size`):
Before:
(*Subscriber[T]).dispatch
2 shape stencils: 1,682 + 1,549 = 3,231 B
3 thin per-T wrappers: 124 B each = 372 B
2 deferwrap1 helpers: 62 B each = 124 B
total: 3,727 B
After:
(*Subscriber[T]).dispatchTyped
2 shape stencils: 1,678 + 1,582 = 3,260 B
0 per-T wrappers (replaced by closure stored on core)
2 deferwrap1 helpers: 62 B each = 124 B
total: 3,384 B
dispatch path .text delta: -343 B (-9.2%)
Per-shape stencils are ~1,600 B (.text body) + ~1,100 B (pclntab) =
~2,700 B each on production tailscaled. The shape count matches before/after
(two distinct GC shapes for the Subscriber[T] event types in this binary).
What changes is that the per-T thin wrappers are eliminated because
Subscriber[T] no longer implements the subscriber interface directly.
Whole-binary section deltas:
.text: -2,304 B (includes the dispatch savings plus other
small downstream effects)
.rodata: +512 B (additional closure-type metadata)
.gopclntab: -2,981 B (fewer per-T compiled functions => less metadata)
Stripped tailscaled (linux/amd64): no change at the file level (the savings
fall below the linker's section-alignment boundary). Unstripped builds shrink
by ~2,900 B.
Behavior is unchanged:
BenchmarkBasicThroughput: 2,161 ns/op, 0 B/op, 0 allocs/op
BenchmarkBasicFuncThroughput: 2,493 ns/op, 144 B/op, 2 allocs/op
BenchmarkSubsThroughput: 3,727 ns/op, 0 B/op, 0 allocs/op
Updates #12614
Change-Id: I97918ec68bd2cdb15958bbfd7687592b39663efe
Signed-off-by: James Tucker <james@tailscale.com>
Server.clientsAtomic was introduced in 6b729795c3 as a lock-free
mirror of Server.clients to skip Server.mu on the packet send hot
path. This drops the non-concurrent map and makes all the existing
callers of the old plain map just use the concurrent map, but still
holding Server.mu.
BenchmarkLookupDestHashTrie is unchanged at ~2ns/op.
Fixes#19726
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I0894e4d86914d152b9b5fef969a3184bcb96f678
Warnables with a non-zero TimeToVisible are only published on the eventbus when
they remain unhealthy long enough to become visible.
However, we still publish a health.Change when a warning that was never visible
(and was never published to the eventbus) becomes healthy.
This PR fixes that and reduces churn when there is no actual state change. In
particular, it avoids unnecessary IPN bus notifications sent to GUI/CLI clients,
captive portal detection, etc.
Updates tailscale/corp#39759 (noticed while working on it)
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Adds a new NoiseRoundTripper field to tsd.Sys
to expose an http.RoundTripper to make requests
over the control plane Noise connection.
This will be used in PAM use cases soon.
Updates tailscale/corp#41800
Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
A missing hosts file is not a fatal error. We should log it, but still proceed
and create a new one instead of failing the DNS reconfiguration completely.
Fixes#19733
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Instead of having two entry points for running natlab tests, start
converting the connectivity tests to use the vmtest framework.
Grid and pair tests have yet to be moved over.
Updates #13038
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
This patch fixes a data race in wgengine/netstack that surfaced while
running both TestTCPForwardLimits and TestTCPForwardLimits_PerClient.
Because these two tests both setup the TS_DEBUG_NETSTACK envknob, a
race happens because netstack.Impl.Close leaked its inject goroutine.
The inject goroutine also reads the TS_DEBUG_NETSTACK envknob, so if
it is still running when the next test starts, then it will break.
This patch also cleans up the tests a bit, ensuring that neither of
them run in T.Parallel. It also adds a T.Cleanup call to clear the
envknob.
Fixes#19720
Signed-off-by: Simon Law <sfllaw@tailscale.com>
This fixes a log message where ipn/ipnlocal.shouldUseOneCGNATRoute
would claim that an android machines was actually macOS.
Updates #cleanup
Updates #19652
Signed-off-by: Simon Law <sfllaw@tailscale.com>
Replace the process-global Server.mu lookup in the packet send hot path
with a global hashtriemap mirror of local clientSet entries. The
authoritative clients map remains guarded by Server.mu; clientsAtomic is
only a lock-free fast path for active local clients.
Misses, stale inactive client sets, duplicate accounting, and mesh
forwarding still fall back to lookupDestUncached. This avoids taking
Server.mu for the common local active-client send path, at the cost of
adding one global concurrent map that mirrors Server.clients for local
peers.
The benchmark uses four destination peers. The before run sets
TS_DEBUG_DERP_DISABLE_PEER_HASHTRIE=true to force the old mutex lookup
path; the after run uses the hashtrie fast path.
goos: linux
goarch: amd64
pkg: tailscale.com/derp/derpserver
cpu: Intel(R) Xeon(R) 6975P-C
│ before │ after │
│ sec/op │ sec/op vs base │
LookupDestHashTrie-16 176.050n ± 1% 1.904n ± 6% -98.92% (p=0.000 n=10)
│ before │ after │
│ B/op │ B/op vs base │
LookupDestHashTrie-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal
│ before │ after │
│ allocs/op │ allocs/op vs base │
LookupDestHashTrie-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal
Updates #3560 (very indirectly, historically)
Updates #19713 (as an alternative to that PR)
Change-Id: Ifb72e5c9854ad00e938cd24c6ab9c27312f297e8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Adds two new cap resolution methods alongside the existing PeerCaps:
PeerCapsForService(src netip.Addr, svcName tailcfg.ServiceName) resolves
the service name to its VIP addresses via the node's service IP mappings
and returns caps scoped to that service. Exposed on /v0/whois via the
svc_name query parameter and on client/local.Client as WhoIsForService.
PeerCapsForIP(src, dst netip.Addr) resolves caps against an arbitrary
destination IP. Exposed on /v0/whois via the svc_addr query parameter
and on client/local.Client as WhoIsForIP.
svc_name takes priority over svc_addr when both are present. Invalid
values for either return 400. The existing PeerCaps/WhoIs path is
unchanged: without a service parameter, WhoIs returns only host-level
caps.
Updates tailscale/corp#41632
Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
Add new clientmetric counters for establishing contact with peers while using
cached network map data. To do this, instrument the magicsock.Conn with a bit
to indicate whether its peer data came from a cached netmap. If so, there are
two conditions we will count as establishing connectivity to a peer:
- Receipt of a CallMeMaybe from a peer via disco.
- Establishing a valid endpoint address for a peer.
In vmtest, add Env.ClientMetrics to scrape metrics from the specified node.
Use this to check that counters were updated in caching tests.
Updates https://github.com/tailscale/projects/issues/13
Updates #12639
Change-Id: Ie8cf3244ac8af4f5bcfe4d0d944078da2ba08990
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Two changes that share the same intent of reducing per-T duplication
in code that doesn't actually depend on T:
1. Hoist the non-generic portion of newSubscriberFunc[T] into a
newSubscriberFuncCore() helper. The hoisted work is the time
timer setup, the subscriberFuncCore allocation, and the
unregister closure (which captures only the non-generic
reflect.Type and *subscribeState). The generic body now does
only the two T-bound things it has to: compute reflect.TypeFor[T]
and create the dispatch closure.
Effect on the per-shape-stencil body of newSubscriberFunc[T]:
before: 523 B per shape (in synthetic test)
after: 293 B per shape (-230 B per shape; -56% on this body)
2. Cache reflect.Type.String() once at construction (in core.typeName)
instead of recomputing it every time the dispatch closure runs.
The dispatch closure also now takes the *subscriberFuncCore directly
rather than building an intermediate dispatchFuncState struct on
every call.
Effect on the dispatch closure body (newSubscriberFunc[T].func1):
before: 581 B per shape
after: 480 B per shape (-101 B per shape; -17%)
Combined effect on tailscaled (linux/amd64):
named-symbol savings via symcost: ~7 KB
stripped binary delta: -8 KB (page-quantized)
arm64 binary delta: 0 (page-quantized)
cumulative reduction from baseline (5167ff412):
linux/amd64: -110,592 bytes (-0.391%)
linux/arm64: -131,072 bytes (-0.499%)
Throughput is also improved by the typeName cache: BenchmarkBasic
goes from 2018 ns/op to 1864 ns/op (-7.6%) because the dispatch hot
path no longer allocates a string on every event.
Updates #12614
Change-Id: Ib3a3d6796785e16506330ec034e1144580d467a3
Signed-off-by: James Tucker <james@tailscale.com>
macOS limits Unix socket paths to 104 bytes. The Go test TempDir
path (e.g. /var/folders/.../TestDirectConnection...679197086/001/)
easily exceeds that, causing "bind: invalid argument". Create a
short /tmp/vmtest* directory for all socket files (vnet, QMP,
dgram) so the paths stay well under the limit on every platform.
Updates #13038
Change-Id: I721d24561d1766aaa964692bc77f40a131aa9455
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
startCloudQEMU hardcoded -machine q35,accel=kvm and -cpu host,
which fails on any host without KVM (notably macOS). Replace
with a qemuAccelArgs helper that probes /dev/kvm and falls back
to QEMU's TCG software emulation, matching the pattern already
used by tstest/integration/nat. Also wire the helper into
startGokrazyQEMU so gokrazy VMs pick up KVM when available.
Updates #13038
Change-Id: I7745518db823279b1880957bb14ca2ffdaab4c50
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The natlab vmtest suite (tstest/natlab/vmtest) and the integration nat
tests are gated behind --run-vm-tests because they need KVM and are
slow. Until now nothing in CI exercised them apart from a single
canary TestEasyEasy run on every PR.
Add .github/workflows/natlab-test.yml that runs the full opt-in suite
on demand (workflow_dispatch), on PRs labeled "natlab", and on main
every 12 hours via cron. The workflow has two phases:
- "prepare" builds the gokrazy VM image, downloads the Ubuntu and
FreeBSD cloud images once via the new natlabprep tool, and emits
a dynamic JSON matrix of every TestX function it finds in the two
opt-in packages.
- "test" is a per-test matrix that depends on prepare. Each matrix
job restores the shared caches and runs a single test, so adding
a new TestFoo is automatically picked up on the next run without
any workflow edits.
Rename the existing natlab-integrationtest.yml to natlab-basic.yml
since it's the small smoke variant (just TestEasyEasy on every PR);
the new natlab-test.yml is the bigger suite. The job inside is
renamed to EasyEasy for the same reason.
Move the macOS arm64 host check from vmtest.Env.Start into
vmtest.Env.AddNode so a test that adds a vmtest.MacOS node skips
immediately on a non-macOS host, and add an explicit
skipIfNotMacOSArm64 helper at the top of the two macOS-only tests
so the platform requirement is obvious to readers.
Quiet the takeAgentConnOne miss log in tstest/natlab/vnet by default
(it was the overwhelming majority of bytes in CI logs, with no signal
in healthy runs) and replace it with a periodic "still waiting" line
that only fires after 10s, so a truly stuck agent connection still
surfaces.
Updates #13038
Change-Id: I4582098d8865200fd5a73a9b696942319ccf3bf0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Mirrors the same refactor previously applied to SubscriberFunc:
- Publisher[T]: a thin user-facing facade. Holds a pointer to a
non-generic publisherCore and exposes Publish/Close/ShouldPublish.
- publisherCore: a non-generic struct that owns the *Client back-
pointer, stop flag, and cached reflect.Type. It implements the
package-private publisher interface (publishType, Close).
The bus's per-Client publisher set is set.Set[publisher] keyed
on this single non-generic type.
The publisher interface only exists to support diagnostic
introspection (Debugger.PublishTypes returning the list of types a
client publishes). Previously, satisfying that diagnostic-only
interface forced *Publisher[T] to be the implementor and cost a
per-T itab, generic dictionary, and equality function on every
event type ever passed through Publish[T]. Moving the
implementation to a non-generic core lets the diagnostic surface
work unchanged while charging zero per-T cost for the
diagnostic-driven generic interface.
Publisher[T].Publish is also slimmed: the channel/select/stopFlag
loop is now a non-generic publish() helper that takes the value as
'any'. The per-T body is reduced to forwarding the boxed value to
the helper.
Measured impact (util/eventbus/sizetest):
total per-flow binary cost:
linux/amd64: 2252.8 B/flow -> 1900.5 B/flow (-352.3 B / -15.6%)
linux/arm64: 2228.2 B/flow -> 1835.0 B/flow (-393.2 B / -17.6%)
Publisher per-receiver attribution:
linux/amd64: 635.2 B/flow -> 369.6 B/flow (-265.6 B / -41.8%)
linux/arm64: 751.7 B/flow -> 373.2 B/flow (-378.5 B / -50.4%)
Cumulative reduction from the original baseline (5167ff412):
linux/amd64: 3096.6 B/flow -> 1900.5 B/flow (-1196.1 B / -38.6%)
linux/arm64: 3145.7 B/flow -> 1835.0 B/flow (-1310.7 B / -41.7%)
Dropped per-T symbols (200-flow eventbus binary):
- .dict.Publisher[T] was 14,400 B (72 B/T)
- type:.eq.Publisher[T] was 11,832 B (58 B/T)
- go:itab.*Publisher[T],publisher was 8,000 B (40 B/T)
- (*Publisher[T]).Close shape stencils collapsed to 1
Behavior is unchanged: BenchmarkBasicThroughput is within noise
(2018 -> 2038 ns/op at -benchtime=2s) and all eventbus tests pass.
Updates #12614
Change-Id: I61979c2bf95d2a711c2321e6e0b4b7d15980e9f5
Signed-off-by: James Tucker <james@tailscale.com>
Splits SubscriberFunc[T] into:
- SubscriberFunc[T]: a thin user-facing facade that holds only a
pointer to a non-generic core. It exposes Close() to user code,
which forwards to the core.
- subscriberFuncCore: a non-generic struct that owns all the
subscriber state (stop flag, unregister, logf, slow timer,
cached reflect.Type) and implements the bus's package-private
subscriber interface. Its dispatch() invokes a closure
captured at construction time that performs the
vals.Peek().Event.(T) type assertion and runs the user
callback on the unboxed value.
The bus's outputs map and subscriber-interface itab are
parameterized only by *subscriberFuncCore, not by T, eliminating
both the per-T itab and the per-T generic dictionary that
previously scaled with the number of subscribed event types.
Measured impact (util/eventbus/sizetest):
total per-flow binary cost:
linux/amd64: 3039.2 B/flow -> 2252.8 B/flow (-786.4 B / -25.9%)
linux/arm64: 3145.7 B/flow -> 2228.2 B/flow (-917.5 B / -29.2%)
SubscriberFunc per-receiver attribution:
linux/amd64: 840.8 B/flow -> 300.8 B/flow (-540.0 B / -64.2%)
linux/arm64: 849.9 B/flow -> 303.8 B/flow (-546.1 B / -64.3%)
Dropped per-T symbols (200-flow eventbus binary):
- (*SubscriberFunc[T]).dispatch was 26,639 B total (130 B/T)
- (*SubscriberFunc[T]).subscribeType was 3,600 B total ( 18 B/T)
- .dict.SubscriberFunc[T] was 14,400 B total ( 72 B/T)
- go:itab.*SubscriberFunc[T],... was 9,600 B total ( 48 B/T)
Of the original 913 B/flow attributed to SubscriberFunc, 540 B/flow
is now gone, dropping the receiver to 300 B/flow.
Behavior is unchanged: BenchmarkBasicThroughput is within noise
(1955 -> 1941 ns/op on the test box) and all eventbus tests pass.
Updates #12614
Change-Id: I646b3b05fd8d95f9afead59bfd0f69cd18b7a709
Signed-off-by: James Tucker <james@tailscale.com>
There is a 30-second timeout set on client TLS connections but the handshake was
called on the wrong connection and so the timeout was never used in practice.
Signed-off-by: Francois Marier <francois@fmarier.org>
Make it possible to remove the least recently used expired address
assignment from addrAssignments.
Before checking out a new address from the IP pools, return a handful of
expired addresses.
Updates tailscale/corp#39975
Signed-off-by: Fran Bull <fran@tailscale.com>
When a peer is not able to connect to control after a restart and is
using a cached netmap, that nodes should be able to connect to another
peer in its tailnet (given that the home DERP of that peer has not
changed in the meantime).
Add test that starts two peers and connects them to a tailnet with
caching enabled. Then blackhole traffic to control from one peer and
restart it. Verify that the connection between the two ends up direct.
Adds facilities for expecting a certain path type between nodes.
Updates: #19597
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
If a DNS query for a domain that should be routed through a connector
results in CNAME records in the response, collapse the CNAME chain to an
A/AAAA record for the domain -> magic IP.
Fixestailscale/corp#39978
Signed-off-by: Fran Bull <fran@tailscale.com>
The `CreateStateForTest` helper reduces boilerplate in cases where the test
only cares about the trusted keys and not the disablement values (and makes
it more obvious where the disablement values are meaningful).
The `setupChonkStorage` helper reduces the boilerplate when creating on-disk
TKA storage in tests.
The `fakeLocalBackend` helper reduces the boilerplate when setting up a
`LocalBackend` instance in the IPN tests.
Updates #cleanup
Change-Id: Iacfba1be5f7fab208eec11e4369d63c7d7519da5
Signed-off-by: Alex Chan <alexc@tailscale.com>
Re-exec the test binary as a thin wrapper that holds a pipe inherited
from the test. When the test goes away (any reason, including SIGKILL,
panic, or OOM), the kernel closes the pipe write end; the wrapper sees
EOF and SIGKILLs itself, taking QEMU and its children with it.
Updates #13038
Change-Id: Ib2151098193551396c1d7bb51b07da3bd6b2cfb4
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
Running all vmtests in tstest/natlab/vmtest locally was breaking later
tasks in the queue. The goroutine dump on timeout had goroutines hanging
around for 9 minutes, meaning that something was not getting cleaned up.
goroutine 262 [select, 9 minutes]:
gvisor.dev/gvisor/pkg/tcpip/adapters/gonet.commonRead({...})
Add a timeout of Now() to gonet TCP connections when the test ends
(inspired by ServeUnixConn()), and wait for them to shut down before
exiting the test.
Updates #13038
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
The (*SubscriberFunc[T]).dispatch method body — a ~40-line select
loop with slow-subscriber timer, snapshot handling, ctx-cancel
draining, and a CI stack-dump branch — was previously fully
duplicated by the Go compiler for every distinct GC shape of T.
None of that body actually depends on T except for the type
assertion and the user callback invocation.
This change moves the loop body into a non-generic dispatchFunc()
helper, leaving (*SubscriberFunc[T]).dispatch as a tiny wrapper
that:
- performs the vals.Peek().Event.(T) type assertion
- spawns the callback goroutine via `go runFuncCallback(s.read,
t, callDone)` — a regular generic function call, not a closure,
so that `go` binds the args to the goroutine's frame instead of
allocating a closure on the heap. This preserves the
zero-extra-allocation behavior of the original
(*SubscriberFunc[T]).runCallback method.
- resolves T's name via reflect.TypeFor[T]().String() (cached on
the stack rather than recomputed on each %T formatting)
- calls dispatchFunc with the callDone channel
The %T formatting in the original logf calls is replaced with %s
on the resolved name string, removing per-T fmt instantiations.
A new BenchmarkBasicFuncThroughput is added alongside the existing
BenchmarkBasicThroughput so per-event allocation behavior on the
SubscribeFunc dispatch path is covered by the benchmark suite.
Measured impact (util/eventbus/sizetest):
SubscriberFunc per-flow attribution:
linux/amd64: 912.5 B/flow -> 840.8 B/flow (-71.7 B/flow)
linux/arm64: 917.5 B/flow -> 849.9 B/flow (-67.6 B/flow)
The total per-flow size delta on amd64 dropped from 3,096.6 B to
3,039.2 B (-57 B/flow). The arm64 total stayed at 3,145.7 B
because the linker's page-aligned section sizing absorbed the
improvement on this binary; the symcost-attributed per-receiver
number is the real signal.
Behavior is unchanged: BenchmarkBasicThroughput stays at 0
allocs/op and BenchmarkBasicFuncThroughput holds at the same 2
allocs/op, 144 B/op as the prior eventbus implementation. All
eventbus tests pass.
Updates #12614
Change-Id: I85f933f50f58cd25bbfe5cc46bdda7aab22f0bf7
Signed-off-by: James Tucker <james@tailscale.com>
The tailscale.com/wif package brings in the AWS SDK
(github.com/aws/aws-sdk-go-v2/{config,sts,...} and github.com/aws/smithy-go)
to support fetching ID tokens from AWS IMDS for workload identity
federation. Until now, tsnet pulled this in unconditionally via
feature/condregister/identityfederation, costing ~70 unwanted deps for
every tsnet program whether or not it uses workload identity federation.
These AWS SDK deps were originally removed from tsnet on 2025-09-29 by
commit 69c79cb9f ("ipn/store, feature/condregister: move AWS + Kube
store registration to condregister"). They were then accidentally added
back on 2026-01-14 by commit 6a6aa805d ("cmd,feature: add identity
token auto generation for workload identity", PR #18373) when the new
wif package was wired into tsnet via feature/identityfederation.
Drop the blanket import. tsnet programs that want workload identity
federation now opt in with:
import _ "tailscale.com/feature/identityfederation"
The hook lookup in resolveAuthKey already uses GetOk and degrades
gracefully when the feature isn't linked, so existing programs that
don't use workload identity federation see no behavior change. The
tailscale CLI still imports the condregister wrapper directly, so its
behavior is also unchanged.
Lock this in with TestDeps additions: tailscale.com/wif as a BadDep,
plus substring checks in OnDep that fail on any github.com/aws/ or
k8s.io/ dependency creeping back in.
Also, switch cmd/gitops-pusher from the condregister wrapper to a
direct import of feature/identityfederation: gitops-pusher's auth flow
calls HookExchangeJWTForTokenViaWIF directly, so it shouldn't be
subject to the ts_omit_identityfederation build tag.
Updates #12614
Change-Id: I70599f2bdd4d3666b26a859d5b76caa5d6b94507
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When HTTPS is explicitly disabled (HTTPSPort == NoPort), the JS WebSocket
dialer should use ws:// instead of wss://. This matches the behavior of
the non-JS client and fixes connections to development control servers
e.g. http://localhost:31544.
Updates tailscale/corp#40944
Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
Per recent chat with @raggi about all this, I went and looked at this
test again.
Updates #cleanup
Change-Id: Icb7d87b1ed2cebf481ee4e358a3aa603e63fb8a4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Commit 69c79cb9f (Sep 2025) moved awsstore and kubestore registration
behind condregister build tags so tsnet wouldn't pull in the AWS SDK
and Kubernetes client by default. The accompanying TestDeps BadDeps
entry was missed, so PR #19667 (which re-added those imports) wasn't
caught by the test.
Add the two packages to BadDeps so future regressions fail the test.
Updates #19667
Updates #12614
Change-Id: I903b7c976e5e122cc0c0b956dc73740f5d474fac
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Android rebuilds its VpnService interface when the VPN route
configuration changes, which tears down long lived TCP connections
through the tunnel. Use the same automatic OneCGNATRoute behavior as
macOS on Android, and prefer the single CGNAT route when no other
interface is using the CGNAT, falling back to fine grained peer routes
otherwise.
Updates tailscale/tailscale#19591
Signed-off-by: kari <kari@tailscale.com>
The test goroutine read lockCnt immediately after Lock returned, racing
with Close: close(lk.closing) wakes lockSlow's select, whose deferred
Add(-2) on lockCnt can run before Close's CAS clears the LSB. When that
happens, lockCnt is briefly 1 (3 - 2) instead of 0 (1 + 2 - 2 - 1),
producing "lockCnt: got 1; want 0".
Move the lockCnt assertion into the main test goroutine, after both
Close has returned and the Lock goroutine has finished, so both updates
have settled before we read.
Fixes#19647
Change-Id: Ia67036ff73a1beb528cbd621460db9048f3066ad
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When an exit node was set before launching systray, the recommended row
in exit nodes rendered as not selected even when the active exit node
was at the same location.
This looks to be two different things:
- suggestExitNode takes its own suggestion into account, and not the
users active exit node. When a mullvad city is reached via the picker
rather than the recommended row, the suggester's pick and
prefs.ExitNodeID end up as distinct peers in the same city, resulting
in an ID-only equality check missing the match.
- Toggle state was constructed and mutated via .Check(), which for newly
created elements may be cached (such as when launching systray, with
an already active node).
Fixes#19626
Signed-off-by: Evan Lowry <evan@tailscale.com>
This was originally hidden during the beta period in both `up` and `set`,
then when device posture went GA we unhid the flag in `set` but not in
`up`.
This is confusing for users, because an error message can direct them to
run `tailscale up` with this flag if they've set it previously, but the
help text won't tell them what it does.
Updates #5902
Updates #17972
Change-Id: I9a31946f4b3bb411feed0f5a6449d7ff9a5ba9d3
Signed-off-by: Alex Chan <alexc@tailscale.com>
The purpose of this package is to test the iOS dependency closure, but
it had drifted from the actual import list of the ipn-go-bridge package
in the corp repo (the Go side of the iOS / macOS app).
Update the imports to match ipn-go-bridge's GOOS=ios import list,
adding many missing packages including wgengine/netstack,
feature/{taildrop,syspolicy,condregister}, the util/syspolicy/*
subpackages, types/{key,lazy,logid,netmap}, tsd, safesocket,
util/{eventbus,must,set}, and several net/* and ipn/* packages.
Drop two now-stale BadDeps entries (for now!): database/sql/driver and
github.com/google/uuid are reached via wgengine/netstack ->
github.com/prometheus-community/pro-bing, which netstack imports on
darwin || ios for ICMP user-ping, so the iOS app already ships them.
But we should fix that later.
Updates #19633
Change-Id: Ic50779fdb195685a2e8ccd7c513eee91b0feeaf8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add a new vet checker that rejects variables, parameters, named
return values, receivers, range/type-switch bindings, type
parameters, struct fields, and constants named "l" (lowercase ell)
or "I" (uppercase i). Both are hard to distinguish from the digit
"1" and from each other in too many fonts.
Rename the two pre-existing struct fields named "l" (both of type
net.Listener) in drive/driveimpl/drive_test.go to "ln", matching the
convention used elsewhere for net.Listener locals.
Rename the test-fixture struct fields "I" (single int label) to
"Int" in metrics/multilabelmap_test.go and util/deephash/deephash_test.go,
preserving the "first letters of types" convention used alongside
neighboring fields like I8/I16/U/U8.
Also teach pkgdoc_test.go to skip testdata/ directories, which
the go tool ignores; they are not real packages.
Fixes#19631
Change-Id: I71ad2fa990705f7a070406ebcdb8cefa7487d849
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Move HOOK_VERSION into the githook package and export it as
githook.HookVersion, so tailscale/corp can reference it via
the shared-code bump instead of having to bump HOOK_VERSION
by hand.
New launcher.sh composes the wanted version from 2 sources:
the shared HOOK_VERSION and an optional repo local version,
misc/git_hook/HOOK_VERSION, for repo-specific config bumps.
Updates tailscale/corp#40381
Change-Id: I7cf16889ba53cb564cc2df7dfd7588748f542c55
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
Installed SplitDNS routes are always treated as wildcard domains,
so the domains that we pass to the local resolver should be normalized
and have any leading *. wildcard prefix removed.
When looking at DNS responses to see if the domain matches, we need to
consider both exact matches and wildcard matches. We now keep separate
maps of exact-match domains and wildcard domains, and when we match we
check to see if there's a match in the exact-match map, otherwise we
check against the wild card match map until we find a match, removing
a label after each check.
Rather than looking for matching self-hosted domains (domains serviced
by the connector being run on the self-node), the apps that are being
serviced by the connector on the self-node are tracked instead. When
checking to see if a DNS response should be rewritten, it is ignored
if any of the matching apps for the domain are in the self-hosted apps set.
Fixestailscale/corp#39272
Signed-off-by: George Jones <george@tailscale.com>