As a warm-up to making natlab support multiple operating systems,
start with an easy one (in that it's also Unixy and open source like
Linux) and add FreeBSD 15.0 as a VM OS option for the vmtest
integration test framework, and add TestSubnetRouterFreeBSD which
tests subnet routing through a FreeBSD VM (Gokrazy → FreeBSD →
Gokrazy).
Key changes:
- Add FreeBSD150 OSImage using the official FreeBSD 15.0
BASIC-CLOUDINIT cloud image (xz-compressed qcow2)
- Add GOOS()/IsFreeBSD() methods to OSImage for cross-compilation
and OS-specific behavior
- Handle xz-compressed image downloads in ensureImage
- Refactor compileBinaries into compileBinariesForOS to support
multiple GOOS targets (linux, freebsd), with binaries registered
at <goos>/<name> paths on the file server VIP
- Add FreeBSD-specific cloud-init (nuageinit) user-data generation:
string-form runcmd (nuageinit doesn't support YAML arrays),
fetch(1) instead of curl, FreeBSD sysctl names for IP forwarding,
mkdir /usr/local/bin, PATH setup for tta
- Skip network-config in cidata ISO for FreeBSD (DHCP via rc.conf)
Updates tailscale/tailscale#13038
Change-Id: Ibeb4f7d02659d5cd8e3a7c3a66ee7b1a92a0110d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add tstest/natlab/vmtest, a high-level framework for running multi-VM
integration tests with mixed OS types (gokrazy + Ubuntu/Debian cloud
images) connected via natlab's vnet virtual network.
The vmtest package provides:
- Env type that orchestrates vnet, QEMU processes, and agent connections
- OS image support (Gokrazy, Ubuntu2404, Debian12) with download/cache
- QEMU launch per OS type (microvm for gokrazy, q35+KVM for cloud)
- Cloud-init seed ISO generation with network-config for multi-NIC
- Cross-compilation of test binaries for cloud VMs
- Debug SSH NIC on cloud VMs for interactive debugging
- Test helpers: ApproveRoutes, HTTPGet, TailscalePing, DumpStatus,
WaitForPeerRoute, SSHExec
TTA enhancements (cmd/tta):
- Parameterize /up (accept-routes, advertise-routes, snat-subnet-routes)
- Add /set, /start-webserver, /http-get endpoints
- /http-get uses local.Client.UserDial for Tailscale-routed requests
- Fix /ping for non-gokrazy systems
TestSubnetRouter exercises a 3-VM subnet router scenario:
client (gokrazy) → subnet-router (Ubuntu, dual-NIC) → backend (gokrazy)
Verifies HTTP access to the backend webserver through the Tailscale
subnet route. Passes in ~30 seconds.
Updates tailscale/tailscale#13038
Change-Id: I165b64af241d37f5f5870e796a52502fc56146fa
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Multi-NIC support:
- Add nodeNIC type and node.extraNICs for secondary network interfaces
- Add netForMAC/macForNet to route packets to the correct network by MAC
- Update initFromConfig to allocate a MAC + LAN IP per network
- Fix handleEthernetFrameFromVM, ServeUnixConn to use netForMAC
- Fix MACOfIP, writeEth, WriteUDPPacketNoNAT, gVisor write path, and
createARPResponse to use macForNet (return the MAC actually on that
network, not the node's primary MAC)
- Fix createDHCPResponse for multi-NIC (correct client IP and subnet)
- Add nodeNICMac for secondary NIC MAC generation
- Add Node accessors: NumNICs, NICMac, Networks, LanIP
DHCP fixes:
- Include LeaseTime, SubnetMask, Router, DNS in DHCP Offer (not just
Ack). systemd-networkd requires these to accept an Offer.
- Fix DHCP response source IP: use gateway IP instead of echoing
the request's destination (which was 255.255.255.255 for discovers)
New VIPs:
- cloud-init.tailscale: serves per-node cloud-init meta-data, user-data,
and network-config for VMs booting with nocloud datasource
- files.tailscale: serves binary files (tta, tailscale, tailscaled)
registered via RegisterFile for cloud VM provisioning
- Add ControlServer() accessor for test control server
This is necessary for a three-VM natlab subnet router
integration test, coming later.
Updates #13038
Change-Id: I59f9f356bae9b5509c117265237983972dfdd5af
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When SetSubnetRoutes is called, also send updatePeerChanged to all
other connected nodes so they re-fetch their MapResponse and learn
about the updated AllowedIPs. Without this, peers never see new
subnet routes until they happen to reconnect to the control server.
Discovered while working on a three-VM natlab subnet router
integration test, coming later.
Updates #13038
Change-Id: I20e7a2fda994a8ab0e7a24240e6eae536f4f5f15
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Move the ipn/desktop blank import from cmd/tailscaled/tailscaled_windows.go
into feature/condregister/maybe_desktop_sessions.go, consistent with how
all other modular features are registered. tailscaled already imports
feature/condregister, so it still gets ipn/desktop on Windows.
Updates #12614
Change-Id: I92418c4bf0e67f0ab40542e47584762ac0ffa2b2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add a new vet analyzer that checks t.Run subtest names don't contain
characters requiring quoting when re-running via "go test -run". This
enforces the style guide rule: don't use spaces or punctuation in
subtest names.
The analyzer flags:
- Direct t.Run calls with string literal names containing spaces,
regex metacharacters, quotes, or other problematic characters
- Table-driven t.Run(tt.name, ...) calls where tt ranges over a
slice/map literal with bad name field values
Also fix all 978 existing violations across 81 test files, replacing
spaces with hyphens and shortening long sentence-like names to concise
hyphenated forms.
Updates #19242
Change-Id: Ib0ad96a111bd8e764582d1d4902fe2599454ab65
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Previously, running `add/remove/revoke-keys` without passing any keys
would fail with an unhelpful error:
```console
$ tailscale lock revoke-keys
generation of recovery AUM failed: sending generate-recovery-aum: 500 Internal Server Error: no provided key is currently trusted
```
or
```console
$ tailscale lock revoke-keys
generation of recovery AUM failed: sending generate-recovery-aum: 500 Internal Server Error: network-lock is not active
```
Now they fail with a more useful error:
```console
$ tailscale lock revoke-keys
missing argument, expected one or more tailnet lock keys
```
Fixes#19130
Change-Id: I9d81fe2f5b92a335854e71cbc6928e7e77e537e3
Signed-off-by: Alex Chan <alexc@tailscale.com>
Before sending a fix for #18991, this adds an integration test that
locks in that the proxymap WhoIs code works with two nodes running as
different users, with the second node running a localhost service and
able to use its local tailscaled to identify a Tailscale connection
from the other tailscaled.
Updates #18991
Change-Id: I6fbb0810204d77d2ac558f0cc786b73e3248d031
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This makes tsnet apps not depend on x/crypto/ssh and locks that in with a test.
It also paves the wave for tsnet apps to opt-in to SSH support via a
blank feature import in the future.
Updates #12614
Change-Id: Ica85628f89c8f015413b074f5001b82b27c953a9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Two issues caused TestCollectPanic to flake:
1. ETXTBSY: The test exec'd the tailscaled binary directly without
going through StartDaemon/awaitTailscaledRunnable, so it lacked
the retry loop that other tests use to work around a mysterious
ETXTBSY on GitHub Actions.
2. Shared filch files: The test didn't pass --statedir or TS_LOGS_DIR,
so all parallel test instances wrote panic logs to the shared system
state directory (~/.local/share/tailscale). Concurrent runs would
clobber each other's filch log files, causing the second run to not
find the panic data from the first.
Fix both by adding awaitTailscaledRunnable before the first exec, and
passing --statedir and TS_LOGS_DIR to isolate each test's log files,
matching what StartDaemon does.
It now passes x/tools/cmd/stress.
Fixes#15865
Change-Id: If18b9acf8dbe9a986446a42c5d98de7ad8aae098
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I omitted a lot of the min/max modernizers because they didn't
result in more clear code.
Some of it's older "for x := range 123".
Also: errors.AsType, any, fmt.Appendf, etc.
Updates #18682
Change-Id: I83a451577f33877f962766a5b65ce86f7696471c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Fix its/it's, who's/whose, wether/whether, missing apostrophes
in contractions, and other misspellings across the codebase.
Updates #cleanup
Change-Id: I20453b81a7aceaa14ea2a551abba08a2e7f0a1d8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The test ping() passed the full 60s context to each PingWithOpts call,
so if the first attempt hung (DERP not yet registered), the retry loop
never reached attempt 2. Use a 2s per-call timeout instead.
Updates: #18810
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
Updating a node on a testcontrol server should trigger netmap updates to
all connected streaming clients. This was not the case previous to this
change and consequently caused race conditions in tests. It was possible
for a test to call UpdateNode and for connected nodes to never see the
update propagate.
Updates #16340Fixes#18703
Signed-off-by: Harry Harpham <harry@tailscale.com>
This switches our gokrazy builds to use a new variant of cmd/gok called
opinionated about using monorepos: https://github.com/bradfitz/monogok
And with that, we can get rid of all the go.mod files and builddir forests
under gokrazy/**.
Updates #13038
Updates gokrazy/gokrazy#361
Change-Id: I9f18fbe59b8792286abc1e563d686ea9472c622d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Two methods were recently added to the testcontrol.Server type:
AddDNSRecords and SetGlobalAppCaps. These two methods should trigger
netmap updates for all nodes connected to the Server instance, the way
that other state-change methods do (see SetNodeCapMap, for example).
This will also allow us to get rid of Server.ForceNetmapUpdate, which
was a band-aid fix to force the netmap updates which should have been
triggered by the aforementioned methods.
Fixestailscale/corp#37102
Signed-off-by: Harry Harpham <harry@tailscale.com>
Instead of relying on the local timezone, which may cause
non-deterministic behavior in some CIs, we force timezone
to be UTC on default created clocks.
Fixes: tailscale/corp#37005
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
Similarly to allowing link-local multicast in #13661, we should also allow broadcast traffic
on permitted interfaces when the killswitch is enabled due to exit node usage on Windows.
This always includes internal interfaces, such as Hyper-V/WSL2, and also the LAN when
"Allow local network access" is enabled in the client.
Updates #18504
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This file was never truly necessary and has never actually been used in
the history of Tailscale's open source releases.
A Brief History of AUTHORS files
---
The AUTHORS file was a pattern developed at Google, originally for
Chromium, then adopted by Go and a bunch of other projects. The problem
was that Chromium originally had a copyright line only recognizing
Google as the copyright holder. Because Google (and most open source
projects) do not require copyright assignemnt for contributions, each
contributor maintains their copyright. Some large corporate contributors
then tried to add their own name to the copyright line in the LICENSE
file or in file headers. This quickly becomes unwieldy, and puts a
tremendous burden on anyone building on top of Chromium, since the
license requires that they keep all copyright lines intact.
The compromise was to create an AUTHORS file that would list all of the
copyright holders. The LICENSE file and source file headers would then
include that list by reference, listing the copyright holder as "The
Chromium Authors".
This also become cumbersome to simply keep the file up to date with a
high rate of new contributors. Plus it's not always obvious who the
copyright holder is. Sometimes it is the individual making the
contribution, but many times it may be their employer. There is no way
for the proejct maintainer to know.
Eventually, Google changed their policy to no longer recommend trying to
keep the AUTHORS file up to date proactively, and instead to only add to
it when requested: https://opensource.google/docs/releasing/authors.
They are also clear that:
> Adding contributors to the AUTHORS file is entirely within the
> project's discretion and has no implications for copyright ownership.
It was primarily added to appease a small number of large contributors
that insisted that they be recognized as copyright holders (which was
entirely their right to do). But it's not truly necessary, and not even
the most accurate way of identifying contributors and/or copyright
holders.
In practice, we've never added anyone to our AUTHORS file. It only lists
Tailscale, so it's not really serving any purpose. It also causes
confusion because Tailscalars put the "Tailscale Inc & AUTHORS" header
in other open source repos which don't actually have an AUTHORS file, so
it's ambiguous what that means.
Instead, we just acknowledge that the contributors to Tailscale (whoever
they are) are copyright holders for their individual contributions. We
also have the benefit of using the DCO (developercertificate.org) which
provides some additional certification of their right to make the
contribution.
The source file changes were purely mechanical with:
git ls-files | xargs sed -i -e 's/\(Tailscale Inc &\) AUTHORS/\1 contributors/g'
Updates #cleanup
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
This change allows tsnet nodes to act as Service hosts by adding a new
function, tsnet.Server.ListenService. Invoking this function will
advertise the node as a host for the Service and create a listener to
receive traffic for the Service.
Fixes#17697Fixestailscale/corp#27200
Signed-off-by: Harry Harpham <harry@tailscale.com>
This patch adds an integration test for Tailnet Lock, checking that a node can't
talk to peers in the tailnet until it becomes signed.
This patch also introduces a new package `tstest/tkatest`, which has some helpers
for constructing a mock control server that responds to TKA requests. This allows
us to reduce boilerplate in the IPN tests.
Updates tailscale/corp#33599
Signed-off-by: Alex Chan <alexc@tailscale.com>
And fix up the TestAutoUpdateDefaults integration tests as they
weren't testing reality: the DefaultAutoUpdate is supposed to only be
relevant on the first MapResponse in the stream, but the tests weren't
testing that. They were instead injecting a 2nd+ MapResponse.
This changes the test control server to add a hook to modify the first
map response, and then makes the test control when the node goes up
and down to make new map responses.
Also, the test now runs on macOS where the auto-update feature being
disabled would've previously t.Skipped the whole test.
Updates #11502
Change-Id: If2319bd1f71e108b57d79fe500b2acedbc76e1a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
SetSubnetRoutes was not sending update notifications to nodes when their
approved routes changed, causing nodes to not fetch updated netmaps with
PrimaryRoutes populated. This resulted in TestUserMetricsRouteGauges
flaking because it waited for PrimaryRoutes to be set, which only happened
if the node happened to poll for other reasons.
Now send updateSelfChanged notification to affected nodes so they fetch
an updated netmap immediately.
Fixes#17962
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
Linux kernel versions 6.6.102-104 and 6.12.42-45 have a regression
in /proc/net/tcp that causes seek operations to fail with "illegal seek".
This breaks portlist tests on these kernels.
Add kernel version detection for Linux systems and a SkipOnKernelVersions
helper to tstest. Use it to skip affected portlist tests on the broken
kernel versions.
Thanks to philiptaron for the list of kernels with the issue and fix.
Updates #16966
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
With the introduction of node sealing, store.New fails in some cases due
to the TPM device being reset or unavailable. Currently it results in
tailscaled crashing at startup, which is not obvious to the user until
they check the logs.
Instead of crashing tailscaled at startup, start with an in-memory store
with a health warning about state initialization and a link to (future)
docs on what to do. When this health message is set, also block any
login attempts to avoid masking the problem with an ephemeral node
registration.
Updates #15830
Updates #17654
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Previously a TKA compaction would only run when a node starts, which means a long-running node could use unbounded storage as it accumulates ever-increasing amounts of TKA state. This patch changes TKA so it runs a compaction after every sync.
Updates https://github.com/tailscale/corp/issues/33537
Change-Id: I91df887ea0c5a5b00cb6caced85aeffa2a4b24ee
Signed-off-by: Alex Chan <alexc@tailscale.com>
I added a RemoveAll() method on tka.Chonk in #17946, but it's only used
in the node to purge local AUMs. We don't need it in the SQLite storage,
which currently implements tka.Chonk, so move it to CompactableChonk
instead.
Also add some automated tests, as a safety net.
Updates tailscale/corp#33599
Change-Id: I54de9ccf1d6a3d29b36a94eccb0ebd235acd4ebc
Signed-off-by: Alex Chan <alexc@tailscale.com>
This requires making the internals of LocalBackend a bit more generic,
and implementing the `tka.CompactableChonk` interface for `tka.Mem`.
Signed-off-by: Alex Chan <alexc@tailscale.com>
Updates https://github.com/tailscale/corp/issues/33599
It's an unnecessary nuisance having it. We go out of our way to redact
it in so many places when we don't even need it there anyway.
Updates #12639
Change-Id: I5fc72e19e9cf36caeb42cf80ba430873f67167c3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This patch creates a set of tests that should be true for all implementations of Chonk and CompactableChonk, which we can share with the SQLite implementation in corp.
It includes all the existing tests, plus a test for LastActiveAncestor which was in corp but not in oss.
Updates https://github.com/tailscale/corp/issues/33465
Signed-off-by: Alex Chan <alexc@tailscale.com>
Previously, running `tailscale lock log` in a tailnet without Tailnet
Lock enabled would return a potentially confusing error:
$ tailscale lock log
2025/10/20 11:07:09 failed to connect to local Tailscale service; is Tailscale running?
It would return this error even if Tailscale was running.
This patch fixes the error to be:
$ tailscale lock log
Tailnet Lock is not enabled
Fixes#17586
Signed-off-by: Alex Chan <alexc@tailscale.com>
This patch fixes several issues related to printing login and device
approval URLs, especially when `tailscale up` is interrupted:
1. Only print a login URL that will cause `tailscale up` to complete.
Don't print expired URLs or URLs from previous login attempts.
2. Print the device approval URL if you run `tailscale up` after
previously completing a login, but before approving the device.
3. Use the correct control URL for device approval if you run a bare
`tailscale up` after previously completing a login, but before
approving the device.
4. Don't print the device approval URL more than once (or at least,
not consecutively).
Updates tailscale/corp#31476
Updates #17361
## How these fixes work
This patch went through a lot of trial and error, and there may still
be bugs! These notes capture the different scenarios and considerations
as we wrote it, which are also captured by integration tests.
1. We were getting stale login URLs from the initial IPN state
notification.
When the IPN watcher was moved to before Start() in c011369, we
mistakenly continued to request the initial state. This is only
necessary if you start watching after you call Start(), because
you may have missed some notifications.
By getting the initial state before calling Start(), we'd get
a stale login URL. If you clicked that URL, you could complete
the login in the control server (if it wasn't expired), but your
instance of `tailscale up` would hang, because it's listening for
login updates from a different login URL.
In this patch, we no longer request the initial state, and so we
don't print a stale URL.
2. Once you skip the initial state from IPN, the following sequence:
* Run `tailscale up`
* Log into a tailnet with device approval
* ^C after the device approval URL is printed, but without approving
* Run `tailscale up` again
means that nothing would ever be printed.
`tailscale up` would send tailscaled the pref `WantRunning: true`,
but that was already the case so nothing changes. You never get any
IPN notifications, and in particular you never get a state change to
`NeedsMachineAuth`. This means we'd never print the device approval URL.
In this patch, we add a hard-coded rule that if you're doing a simple up
(which won't trigger any other IPN notifications) and you start in the
`NeedsMachineAuth` state, we print the device approval message without
waiting for an IPN notification.
3. Consider the following sequence:
* Run `tailscale up --login-server=<custom server>`
* Log into a tailnet with device approval
* ^C after the device approval URL is printed, but without approving
* Run `tailscale up` again
We'd print the device approval URL for the default control server,
rather than the real control server, because we were using the `prefs`
from the CLI arguments (which are all the defaults) rather than the
`curPrefs` (which contain the custom login server).
In this patch, we use the `prefs` if the user has specified any settings
(and other code will ensure this is a complete set of settings) or
`curPrefs` if it's a simple `tailscale up`.
4. Consider the following sequence: you've logged in, but not completed
device approval, and you run `down` and `up` in quick succession.
* `up`: sees state=NeedsMachineAuth
* `up`: sends `{wantRunning: true}`, prints out the device approval URL
* `down`: changes state to Stopped
* `up`: changes state to Starting
* tailscaled: changes state to NeedsMachineAuth
* `up`: gets an IPN notification with the state change, and prints
a second device approval URL
Either URL works, but this is annoying for the user.
In this patch, we track whether the last printed URL was the device
approval URL, and if so, we skip printing it a second time.
Signed-off-by: Alex Chan <alexc@tailscale.com>
This patch extends the integration tests for `tailscale up` to include tailnets
where new devices need to be approved. It doesn't change the CLI, because it's
mostly working correctly already -- these tests are just to prevent future
regressions.
I've added support for `MachineAuthorized` to mock control, and I've refactored
`TestOneNodeUpAuth` to be more flexible. It now takes a sequence of steps to
run and asserts whether we got a login URL and/or machine approval URL after
each step.
Updates tailscale/corp#31476
Updates #17361
Signed-off-by: Alex Chan <alexc@tailscale.com>
Saves 139 KB.
Also Synology support, which I saw had its own large-ish proxy parsing
support on Linux, but support for proxies without Synology proxy
support is reasonable, so I pulled that out as its own thing.
Updates #12614
Change-Id: I22de285a3def7be77fdcf23e2bec7c83c9655593
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It has nothing to do with logtail and is confusing named like that.
Updates #cleanup
Updates #17323
Change-Id: Idd34587ba186a2416725f72ffc4c5778b0b9db4a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This PR cleans up a bunch of things in ./tstest/integration/vms:
- Bumps version of Ubuntu that's actually run from CI 20.04 -> 24.04
- Removes Ubuntu 18.04 test
- Bumps NixOS 21.05 -> 25.05
Updates#cleanup
Signed-off-by: Irbe Krumina <irbe@tailscale.com>