When the client of a connector assigns transit IP addresses for a
connector we need to let wireguard know that packets for the transit IPs
should be sent to the connector node. We do this by:
* keeping a map of node -> transit IPs we've assigned for it
* setting a callback hook within wireguard reconfig to ask us for these
extra allowed IPs.
* forcing wireguard to do a reconfig after we have assigned new transit
IPs.
And this commit is the last part: forcing the wireguard reconfig after a
new address assignment.
Fixestailscale/corp#38124
Signed-off-by: Fran Bull <fran@tailscale.com>
By polling RTM_GETSTATS via netlink. RTM_GETSTATS is a relatively
efficient and targeted (single device) polling method available since
Linux v4.7.
The tundevstats "feature" can be extended to other platforms in the
future, and it's trivial to add new rtnl_link_stats64 counters on
Linux.
Updates tailscale/corp#38181
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Replace byte-at-a-time ReadByte loops with Peek+Discard in the DERP
read path. Peek returns a slice into bufio's internal buffer without
allocating, and Discard advances the read pointer without copying.
Introduce util/bufiox with a BufferedReader interface and ReadFull
helper that uses Peek+copy+Discard as an allocation-free alternative
to io.ReadFull.
- derp.ReadFrameHeader: replace 5× ReadByte with Peek(5)+Discard(5),
reading the frame type and length directly from the peeked slice.
Remove now-unused readUint32 helper.
name old ns/op new ns/op speedup
ReadFrameHeader-8 24.2 12.4 ~2x
(0 allocs/op in both)
- key.NodePublic.ReadRawWithoutAllocating: replace 32× ReadByte with
bufiox.ReadFull. Addresses the "Dear future" comment about switching
away from byte-at-a-time reads once a non-escaping alternative exists.
name old ns/op new ns/op speedup
NodeReadRawWithoutAllocating-8 140 43.6 ~3.2x
(0 allocs/op in both)
- derpserver.handleFramePing: replace io.ReadFull with bufiox.ReadFull.
Updates tailscale/corp#38509
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
When we don't care about the payload value and are just checking whether
a set contains an IP/prefix, we can use `bart.Lite` for the same lookup
times but a lower memory footprint.
Fixes#19075
Change-Id: Ia709e8b718666cc61ea56eac1066467ae0b6e86c
Signed-off-by: Alex Chan <alexc@tailscale.com>
tsnet has a 5s sleep as part of its logic waiting to log successful auth.
Add an additional channel that will interrupt this sleep early if the
local backend's state changes before then. This is early enough in the
bootstrap logic that the local client has not been set up yet, so we
subscribe directly on the local backend in keeping with the rest of the
function, but it would be nice to port the whole function to the new
eventbus in a separate change.
Note this does not affect how quickly auth actually happens, it just
ensures we more responsively log the fact that auth state has changed.
Updates #16340
Change-Id: I7a28fd3927bbcdead9a5aad39f4a3596b5f659b0
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
When racing multiple upstream DNS resolvers, a REFUSED (RCode 5) response
from a broken or misconfigured resolver could win the race and be returned
to the client before healthier resolvers had a chance to respond with a
valid answer. This caused complete DNS failure in cases where, e.g., a
broken upstream resolver returned REFUSED quickly while a working resolver
(such as 1.1.1.1) was still responding.
Previously, only SERVFAIL (RCode 2) was treated as a soft error. REFUSED
responses were returned as successful bytes and could win the race
immediately. This change also treats REFUSED as a soft error in the UDP
and TCP forwarding paths, so the race continues until a better answer
arrives. If all resolvers refuse, the first REFUSED response is returned
to the client.
Additionally, SERVFAIL responses from upstream resolvers are now returned
verbatim to the client rather than replaced with a locally synthesized
packet. Synthesized SERVFAIL responses were authoritative and guaranteed
to include a question section echoing the original query; upstream
responses carry no such guarantees but may include extended error
information (e.g. RFC 8914 extended DNS errors) that would otherwise
be lost.
Fixes#19024
Signed-off-by: Brendan Creane <bcreane@gmail.com>
This path is currently only used by DERP servers that have also
enabled `verify-clients` to ensure that only authorized clients
within a Tailnet are allowed to use said DERP server.
The previous naive linear scan in NodeByKey would almost
certainly lead to bad outcomes with a large enough netmap, so
address an existing todo by building a map of node key -> node ID.
Updates #19042
Signed-off-by: Amal Bansode <amal@tailscale.com>
This repo's module is tailscale.com, and the tailscale-client-go-v2 repo
uses tailscale.com/client/tailscale/v2. It seems from #19010 that if we
have the client module as a dependency in this module, go vet will start
to consider the client module as part of tailscale.com/...
I'm not sure if this is a bug in go vet, but for now let's take the easy
fix and specify ./... instead. In my testing, it seems like this is
sufficient to make sure it just walks the file hierarchy and doesn't
find the client module as a sub-path.
Updates tailscale/corp#38418
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
That will be able to be plugged into the hooks in
wgengine/filter/filter.go to let connector packets flow.
Fixestailscale/corp#37144Fixestailscale/corp#37145
Signed-off-by: Fran Bull <fran@tailscale.com>
Rather than printing `unknown subcommand: drive` for any Taildrive
commands run in the macOS GUI, print an error message directing the user
to the GUI client and the docs page.
Updates #17210Fixes#18823
Change-Id: I6435007b5911baee79274b56e3ee101e6bb6d809
Signed-off-by: Alex Chan <alexc@tailscale.com>
Updates #19050
When tsnet.Server.start() is called with both Hostname and Dir explicitly
set, os.Executable() failure should not prevent the server from starting.
Extend the existing ios fallback to also cover darwin, where the same
failure occurs when the Go runtime is embedded in a framework launched
via Xcode's debug launcher.
Signed-off-by: Prakash Rudraraju <prakashrj@yahoo.com>
Also implement a limit of one on the number of goroutines that can be
waiting to do a reconfig via AuthReconfig, to prevent extensions from
calling too fast and taxing resources.
Even with the protection, the new method should only be used in
experimental or proof-of-concept contexts. The current intended use is
for an extension to be able force a reconfiguration of WireGuard, and
have the reconfiguration call back into the extension for extra Allowed
IPs.
If in the future if WireGuard is able to reconfigure individual peers more
dynamically, an extension might be able to hook into that process, and
this method on ipnext.Host may be deprecated.
Fixestailscale/corp#38120
Updates tailscale/corp#38124
Updates tailscale/corp#38125
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
The actual secret is passed through argon2 first, so a timing attack is
not feasible remotely, and pretty unlikely locally. Still, clean this
up.
Fixes#19063
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
* ipn: reject advertised routes with non-address bits set
The config file path, EditPrefs local API, and App Connector API were
accepting invalid subnet route prefixes with non-address bits set (e.g.,
2a01:4f9:c010:c015::1/64 instead of 2a01:4f9:c010:c015::/64). All three
paths now reject prefixes where prefix != prefix.Masked() with an error
message indicating the expected masked form.
Updates tailscale/corp#36738
Signed-off-by: Brendan Creane <bcreane@gmail.com>
* address review comments
Signed-off-by: Brendan Creane <bcreane@gmail.com>
---------
Signed-off-by: Brendan Creane <bcreane@gmail.com>
Rename variables to match their types after the server -> connector
rename.
Updates tailscale/corp#37144
Updates tailscale/corp#37145
Signed-off-by: Fran Bull <fran@tailscale.com>
When a client starts up without being able to connect to control, it
sends its discoKey to other nodes it wants to communicate with over
TSMP. This disco key will be a newer key than the one control knows
about.
If the client that can connect to control gets a full netmap, ensure
that the disco key for the node not connected to control is not
overwritten with the stale key control knows about.
This is implemented through keeping track of mapSession and use that for
the discokey injection if it is available. This ensures that we are not
constantly resetting the wireguard connection when getting the wrong
keys from control.
This is implemented as:
- If the key is received via TSMP:
- Set lastSeen for the peer to now()
- Set online for the peer to false
- When processing new keys, only accept keys where either:
- Peer is online
- lastSeen is newer than existing last seen
If mapSession is not available, as in we are not yet connected to
control, punt down the disco key injection to magicsock.
Ideally, we will want to have mapSession be long lived at some point in
the near future so we only need to inject keys in one location and then
also use that for testing and loading the cache, but that is a yak for
another PR.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
This populates UserProfile.Groups in the WhoIs response from the
local backend with the groups of the corresponding user in the
netmap.
This allows tsnet apps to see (and e.g. forward) which groups a
user making a request belongs to - as long as the tsnet app runs
on a node that been granted the tailscale.com/visible-groups
capability via node attributes. If that's not the case or the
user doesn't belong to any groups allow-listed via the node
attribute, Groups won't be populated.
Updates tailscale/corp#31529
Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
This test was failing on Alpine's CI which had 'git' but wasn't in a git repo:
036b6a1262 (commitcomment-180001647)
Updates #12614
Change-Id: Ic1b8856aaf020788a2a57e48738851e13ea85a93
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If an NRPT rule lists more than one server, those servers should be separated by a semicolon (";"),
rather than a semicolon followed by a space ("; "). Otherwise, Windows fails to parse the created
registry value, and DNS resolution may fail.
https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-gpnrpt/06088ca3-4cf1-48fa-8837-ca8d853ee1e8Fixes#19040
Updates #15404 (enabled MagicDNS IPv6 by default, adding a second server and triggering the issue)
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Two methods could deadlock during shutdown when closing the wrapper.
Ensure that the writers are aware of the wrapper being closed.
Fixes#19037
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Currently IP forwarding health check is done on sending MapRequests.
Move ip forwarding to the health service to gain the benefits
of the health tracker and perodic monitoring out of band from
the MapRequest path. ipnlocal now provides a closure to
the health service to provide the check if forwarding is broken.
Removed `skipIPForwardingCheck` from controlclient/direct.go,
it wasn't being used as the comments describe it, that check
has moved to ipnlocal for the closure to the health tracker.
Updates #18976
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
These were previously swappable for historical reasons that are no
longer relevant.
Removing the indirection enables future inlining optimizations if we
simplify further.
Updates tailscale/corp#38703
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Introduce a datapathHandler that implements hooks that will
receive packets from the tstun.Wrapper. This commit does not wire
those up just yet.
Perform DNAT from Magic IP to Transit IP on outbound flows on clients,
and reverse SNAT in the reverse direction.
Perform DNAT from Transit IP to final destination IP on outbound flows
on connectors, and reverse SNAT in the reverse direction.
Introduce FlowTable to cache validated flows by 5-tuple for fast lookups
after the first packet.
Flow expiration is not covered, and is intended as future work before
the feature is officially released.
Fixestailscale/corp#34249Fixestailscale/corp#35995
Co-authored-by: Fran Bull <fran@tailscale.com>
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
Adds envknobs to override the netstack default TCP keepalive idle time
(~2h) and probe interval (75s) for forwarded connections.
When a tailnet peer goes away without closing its connections (pod
deleted, peer removed from the netmap, silent network partition), the
forwardTCP io.Copy goroutines block until keepalive fires: the
gvisor-side Read waits on a peer that will never send again, and the
backend-side Read waits on a backend that is alive and idle. With the
netstack default of 7200s idle + 9×75s probes, dead-peer detection
takes a little over two hours. Under high-churn forwarding — many
short-lived peers, or peers holding thousands of proxied connections
that drop at once — stuck goroutines accumulate faster than they clear.
The existing SetKeepAlive(true) at this site enables keepalive without
setting the timers; the TODO above it noted "a shorter default might
be better" and "might be a useful user-tunable". This makes both
timers tunable without changing the defaults: unset preserves the ~2h
behavior, which is the right trade-off for battery-powered peers.
The two knobs are independent — setting one leaves the other at the
netstack default. The options are set before SetKeepAlive(true) so the
timer arms with the configured values rather than the defaults —
matches the order in ipnlocal/local.go for SSH keepalive.
Updates #4522
Signed-off-by: Josef Bacik <josefbacik@anthropic.com>
After #18179 switched to L4 TCPForward, EnsureCertLoops found no
domains since it only checked service.Web entries. Certs were never
provisioned, leaving kube-apiserver ProxyGroups stuck at 0/N ready.
Fixes#19019
Signed-off-by: Raj Singh <raj@tailscale.com>
This pulls in commits related to on-demand configuration of peers.
These commits introduce new API surfaces that are currently unused.
Updates tailscale/tailscale#17858
Updates tailscale/corp#35603
Signed-off-by: Jordan Whited <jordan@tailscale.com>
When we are mapping a dns response, if it is a connector domain, change
the source IP addresses for our magic IP addresses. This will allow the
tailscaled to DNAT the traffic for the domain to the connector.
Updates tailscale/corp#34258
Signed-off-by: Fran Bull <fran@tailscale.com>
Before sending a fix for #18991, this adds an integration test that
locks in that the proxymap WhoIs code works with two nodes running as
different users, with the second node running a localhost service and
able to use its local tailscaled to identify a Tailscale connection
from the other tailscaled.
Updates #18991
Change-Id: I6fbb0810204d77d2ac558f0cc786b73e3248d031
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For the purpose of improved observability of UDP socket receive buffer
overflows on Linux.
Updates tailscale/corp#37679
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Changed the mapping to store the transit IPs to be indexed by
peer IP rather than NodeID because the data path only has access
to the peer's IP. This change means that IPv4 transit IPs need to
be indexed by the peer's IPv4 address, and IPv6 transit IPs need to
be indexed by the peer's IPv6 address. It is an error if the peer
does not have an address of the same family as the transit IP.
It is also an error if the transit and destination IP families do
not match.
Added a check to ensure that the TransitIPRequest.App matches a
configured app on the connector.
Added additional TransitIPResponse codes to identify the new errors
and change the exsting use of the Other code to use it's own
specific code.
Added logging for the error cases, since they generally indicate that
a peer has constructed a bad request or that there is a config
mismatch between the peer and the local netmap.
Added a test framework for handleConnectorTransitIPRequest and moved
the existing tests into the framework and added new tests.
Fixestailscale/corp#37143
Signed-off-by: George Jones <george@tailscale.com>
The e2e ingress test was very occasionally flaky. On looking at operator
logs from one failure, you can see the default ProxyClass was not ready
before the first reconcile loop for the exposed Service. The ProxyClass
became ready soon after, but no additional reconciles were triggered for
the exposed Service because we only triggered reconciles for Services
that explicitly named their ProxyClass.
This change adds additional list API calls for when it's the default
ProxyClass that's been updated in order to catch Services that use it by
default. It also adds indexes for the fields we need to search on to
ensure the list is efficient.
Fixestailscale/corp#37533
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>