Ensure that hardware attestation keys are not added to tailscaled
state stores that are Kubernetes Secrets or AWS SSM as those Tailscale
devices should be able to be recreated on different nodes, for example,
when moving Pods between nodes.
Updates tailscale/tailscale#18302
Signed-off-by: Irbe Krumina <irbekrm@gmail.com>
TPM-based features have been incredibly painful due to the heterogeneous
devices in the wild, and many situations in which the TPM "changes" (is
reset or replaced). All of this leads to a lot of customer issues.
We hoped to iron out all the kinks and get all users to benefit from
state encryption and hardware attestation without manually opting in,
but the long tail of kinks is just too long.
This change disables TPM-based features on Windows and Linux by default.
Node state should get auto-decrypted on update, and old attestation keys
will be removed.
There's also tailscaled-on-macOS, but it won't have a TPM or Keychain
bindings anyway.
Updates #18302
Updates #15830
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Soft-fail on initial unmarshal and try again, ignoring the
AttestationKey. This helps in cases where something about the
attestation key storage (usually a TPM) is messed up. The old key will
be lost, but at least the node can start again.
Updates #18302
Updates #15830
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
An error returned by net.Listener.Accept() causes the owning http.Server to shut down.
With the deprecation of net.Error.Temporary(), there's no way for the http.Server to test
whether the returned error is temporary / retryable or not (see golang/go#66252).
Because of that, errors returned by (*safesocket.winIOPipeListener).Accept() cause the LocalAPI
server (aka ipnserver.Server) to shut down, and tailscaled process to exit.
While this might be acceptable in the case of non-recoverable errors, such as programmer errors,
we shouldn't shut down the entire tailscaled process for client- or connection-specific errors,
such as when we couldn't obtain the client's access token because the client attempts to connect
at the Anonymous impersonation level. Instead, the LocalAPI server should gracefully handle
these errors by denying access and returning a 401 Unauthorized to the client.
In tailscale/tscert#15, we fixed a known bug where Caddy and other apps using tscert would attempt
to connect at the Anonymous impersonation level and fail. However, we should also fix this on the tailscaled
side to prevent a potential DoS, where a local app could deliberately open the Tailscale LocalAPI named pipe
at the Anonymous impersonation level and cause tailscaled to exit.
In this PR, we defer token retrieval until (*WindowsClientConn).Token() is called and propagate the returned token
or error via ipnauth.GetConnIdentity() to ipnserver, which handles it the same way as other ipnauth-related errors.
Fixes#18212Fixestailscale/tscert#13
Signed-off-by: Nick Khyl <nickk@tailscale.com>
In dynamically changing environments where ACME account keys and certs
are stored separately, it can happen that the account key would get
deleted (and recreated) between issuances. If that is the case,
we currently fail renewals and the only way to recover is for users
to delete certs.
This adds a config knob to allow opting out of the replaces extension
and utilizes it in the Kubernetes operator where there are known
user workflows that could end up with this edge case.
Updates #18251
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
updates tailscale/corp#33891
Addresses several older the TODO's in netmon. This removes the
Major flag precomputes the ChangeDelta state, rather than making
consumers of ChangeDeltas sort that out themselves. We're also seeing
a lot of ChangeDelta's being flagged as "Major" when they are
not interesting, triggering rebinds in wgengine that are not needed. This
cleans that up and adds a host of additional tests.
The dependencies are cleaned, notably removing dependency on netmon
itself for calculating what is interesting, and what is not. This includes letting
individual platforms set a bespoke global "IsInterestingInterface"
function. This is only used on Darwin.
RebindRequired now roughly follows how "Major" was historically
calculated but includes some additional checks for various
uninteresting events such as changes in interface addresses that
shouldn't trigger a rebind. This significantly reduces thrashing (by
roughly half on Darwin clients which switching between nics). The individual
values that we roll into RebindRequired are also exposed so that
components consuming netmap.ChangeDelta can ask more
targeted questions.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
The existing client metric methods only support incrementing (or
decrementing) a delta value. This new method allows setting the metric
to a specific value.
Updates tailscale/corp#35327
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
tcpHandlerForVIPService was missing ProxyProtocol support that
tcpHandlerForServe already had. Extract the shared logic into
forwardTCPWithProxyProtocol helper and use it in both handlers.
Fixes#18172
Signed-off-by: Raj Singh <raj@tailscale.com>
It appears (*controlclient.Auto).Shutdown() can still deadlock when called with b.mu held, and therefore the changes in #18127 are unsafe.
This reverts #18127 until we figure out what causes it.
This reverts commit d199ecac80.
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Previously we only set this when it updated, which was fine for the first
call to Start(), but after that point future updates would be skipped if
nothing had changed. If Start() was called again, it would wipe the peer API
endpoints and they wouldn't get added back again, breaking exit nodes (and
anything else requiring peer API to be advertised).
Updates tailscale/corp#27173
Signed-off-by: James Sanderson <jsanderson@tailscale.com>
Based on PR #16700 by @lox, adapted to current codebase.
Adds support for proxying HTTP requests to Unix domain sockets via
tailscale serve unix:/path/to/socket, enabling exposure of services
like Docker, containerd, PHP-FPM over Tailscale without TCP bridging.
The implementation includes reasonable protections against exposure of
tailscaled's own socket.
Adaptations from original PR:
- Use net.Dialer.DialContext instead of net.Dial for context propagation
- Use http.Transport with Protocols API (current h2c approach, not http2.Transport)
- Resolve conflicts with hasScheme variable in ExpandProxyTargetValue
Updates #9771
Signed-off-by: Peter A. <ink.splatters@pm.me>
Co-authored-by: Lachlan Donald <lachlan@ljd.cc>
If a packet arrives while WireGuard is being reconfigured with b.mu held, such as during a profile switch,
calling back into (*LocalBackend).GetPeerAPIPort from (*Wrapper).filterPacketInboundFromWireGuard
may deadlock when it tries to acquire b.mu.
This occurs because a peer cannot be removed while an inbound packet is being processed.
The reconfig and profile switch wait for (*Peer).RoutineSequentialReceiver to return, but it never finishes
because GetPeerAPIPort needs b.mu, which the waiting goroutine already holds.
In this PR, we make peerAPIPorts a new syncs.AtomicValue field that is written with b.mu held
but can be read by GetPeerAPIPort without holding the mutex, which fixes the deadlock.
There might be other long-term ways to address the issue, such as moving peer API listeners
from LocalBackend to nodeBackend so they can be accessed without holding b.mu,
but these changes are too large and risky at this stage in the v1.92 release cycle.
Updates #18124
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Previously, callers of (*LocalBackend).resetControlClientLocked were supposed
to call Shutdown on the returned controlclient.Client after releasing b.mu.
In #17804, we started calling Shutdown while holding b.mu, which caused
deadlocks during profile switches due to the (*ExecQueue).RunSync implementation.
We first patched this in #18053 by calling Shutdown in a new goroutine,
which avoided the deadlocks but made TestStateMachine flaky because
the shutdown order was no longer guaranteed.
In #18070, we updated (*ExecQueue).RunSync to allow shutting down
the queue without waiting for RunSync to return. With that change,
shutting down the control client while holding b.mu became safe.
Therefore, this PR updates (*LocalBackend).resetControlClientLocked
to shut down the old client synchronously during the reset, instead of
returning it and shifting that responsibility to the callers.
This fixes the flaky tests and simplifies the code.
Fixes#18052
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This patch adds an integration test for Tailnet Lock, checking that a node can't
talk to peers in the tailnet until it becomes signed.
This patch also introduces a new package `tstest/tkatest`, which has some helpers
for constructing a mock control server that responds to TKA requests. This allows
us to reduce boilerplate in the IPN tests.
Updates tailscale/corp#33599
Signed-off-by: Alex Chan <alexc@tailscale.com>
In preparation for exposing its configuration via ipn.ConfigVAlpha,
change {Masked}Prefs.RelayServerPort from *int to *uint16. This takes a
defensive stance against invalid inputs at JSON decode time.
'tailscale set --relay-server-port' is currently the only input to this
pref, and has always sanitized input to fit within a uint16.
Updates tailscale/corp#34591
Signed-off-by: Jordan Whited <jordan@tailscale.com>
In suggestExitNodeLocked, if no exit node candidates have a home DERP or
valid location info, `bestCandidates` is an empty slice. This slice is
passed to `selectNode` (`randomNode` in prod):
```go func randomNode(nodes views.Slice[tailcfg.NodeView], …) tailcfg.NodeView {
…
return nodes.At(rand.IntN(nodes.Len()))
}
```
An empty slice becomes a call to `rand.IntN(0)`, which panics.
This patch changes the behaviour, so if we've filtered out all the
candidates before calling `selectNode`, reset the list and then pick
from any of the available candidates.
This patch also updates our tests to give us more coverage of `randomNode`,
so we can spot other potential issues.
Updates #17661
Change-Id: I63eb5e4494d45a1df5b1f4b1b5c6d5576322aa72
Signed-off-by: Alex Chan <alexc@tailscale.com>
In PR tailscale/corp#34401, the `traffic-steering` feature flag does
not automatically enable traffic steering for all nodes. Instead, an
admin must add the `traffic-steering` node attribute to each client
node that they want opted-in.
For backwards compatibility with older clients, tailscale/corp#34401
strips out the `traffic-steering` node attribute if the feature flag
is not enabled, even if it is set in the policy file. This lets us
safely disable the feature flag.
This PR adds a missing test case for suggested exit nodes that have no
priority.
Updates tailscale/corp#34399
Signed-off-by: Simon Law <sfllaw@tailscale.com>
When the underlying transport returns a network error, the RoundTrip
method returns (nil, error). The defer was attempting to access resp
without checking if it was nil first, causing a panic. Fix this by
checking for nil in the defer.
Also changes driveTransport.tr from *http.Transport to http.RoundTripper
and adds a test.
Fixes#17306
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
Change-Id: Icf38a020b45aaa9cfbc1415d55fd8b70b978f54c
With the introduction of node sealing, store.New fails in some cases due
to the TPM device being reset or unavailable. Currently it results in
tailscaled crashing at startup, which is not obvious to the user until
they check the logs.
Instead of crashing tailscaled at startup, start with an in-memory store
with a health warning about state initialization and a link to (future)
docs on what to do. When this health message is set, also block any
login attempts to avoid masking the problem with an ephemeral node
registration.
Updates #15830
Updates #17654
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
These validations were previously performed in the CLI frontend. There
are two motivations for moving these to the local backend:
1. The backend controls synchronization around the relevant state, so
only the backend can guarantee many of these validations.
2. Doing these validations in the back-end avoids the need to repeat
them across every frontend (e.g. the CLI and tsnet).
Updates tailscale/corp#27200
Signed-off-by: Harry Harpham <harry@tailscale.com>
This commit enables user to set service backend to remote destinations, that can be a partial
URL or a full URL. The commit also prevents user to set remote destinations on linux system
when socket mark is not working. For user on any version of mac extension they can't serve a
service either. The socket mark usability is determined by a new local api.
Fixestailscale/corp#24783
Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
Now that we support using an in-memory backend for TKA state (#17946),
this function always returns `nil` – we can always support Network Lock.
We don't need it any more.
Plus, clean up a couple of errant TODOs from that PR.
Updates tailscale/corp#33599
Change-Id: Ief93bb9adebb82b9ad1b3e406d1ae9d2fa234877
Signed-off-by: Alex Chan <alexc@tailscale.com>
Previously a TKA compaction would only run when a node starts, which means a long-running node could use unbounded storage as it accumulates ever-increasing amounts of TKA state. This patch changes TKA so it runs a compaction after every sync.
Updates https://github.com/tailscale/corp/issues/33537
Change-Id: I91df887ea0c5a5b00cb6caced85aeffa2a4b24ee
Signed-off-by: Alex Chan <alexc@tailscale.com>
Adds the ability to rotate discovery keys on running clients, needed for
testing upcoming disco key distribution changes.
Introduces key.DiscoKey, an atomic container for a disco private key,
public key, and the public key's ShortString, replacing the prior
separate atomic fields.
magicsock.Conn has a new RotateDiscoKey method, and access to this is
provided via localapi and a CLI debug command.
Note that this implementation is primarily for testing as it stands, and
regular use should likely introduce an additional mechanism that allows
the old key to be used for some time, to provide a seamless key rotation
rather than one that invalidates all sessions.
Updates tailscale/corp#34037
Signed-off-by: James Tucker <james@tailscale.com>
For manual (human) testing, this lets the user disable control plane
map polls with "tailscale set --sync=false" (which survives restarts)
and "tailscale set --sync" to restore.
A high severity health warning is shown while this is active.
Updates #12639
Updates #17945
Change-Id: I83668fa5de3b5e5e25444df0815ec2a859153a6d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This requires making the internals of LocalBackend a bit more generic,
and implementing the `tka.CompactableChonk` interface for `tka.Mem`.
Signed-off-by: Alex Chan <alexc@tailscale.com>
Updates https://github.com/tailscale/corp/issues/33599
Pick up a fix for https://pkg.go.dev/vuln/GO-2025-4116 (even though
we're not affected).
Updates #cleanup
Change-Id: I9f2571b17c1f14db58ece8a5a34785805217d9dd
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Includes adding StartPaused, which will be used in a future change to
enable netmap caching testing.
Updates #12639
Change-Id: Iec39915d33b8d75e9b8315b281b1af2f5d13a44a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds the --proxy-protocol flag to 'tailscale serve' and
'tailscale funnel', which tells the Tailscale client to prepend a PROXY
protocol[1] header when making connections to the proxied-to backend.
I've verified that this works with our existing funnel servers without
additional work, since they pass along source address information via
PeerAPI already.
Updates #7747
[1]: https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
Change-Id: I647c24d319375c1b33e995555a541b7615d2d203
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
It's an unnecessary nuisance having it. We go out of our way to redact
it in so many places when we don't even need it there anyway.
Updates #12639
Change-Id: I5fc72e19e9cf36caeb42cf80ba430873f67167c3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Remove the State enum (StateNew, StateNotAuthenticated, etc.) from
controlclient and replace it with two explicit boolean fields:
- LoginFinished: indicates successful authentication
- Synced: indicates we've received at least one netmap
This makes the state more composable and easier to reason about, as
multiple conditions can be true independently rather than being
encoded in a single enum value.
The State enum was originally intended as the state machine for the
whole client, but that abstraction moved to ipn.Backend long ago.
This change continues moving away from the legacy state machine by
representing state as a combination of independent facts.
Also adds test helpers in ipnlocal that check independent, observable
facts (hasValidNetMap, needsLogin, etc.) rather than relying on
derived state enums, making tests more robust.
Updates #12639
Signed-off-by: James Tucker <james@tailscale.com>
As a baby step towards eventbus-ifying controlclient, make the
Observer optional.
This also means callers that don't care (like this network lock test,
and some tests in other repos) can omit it, rather than passing in a
no-op one.
Updates #12639
Change-Id: Ibd776b45b4425c08db19405bc3172b238e87da4e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>