Add server-side per-client bandwidth enforcement using TCP backpressure.
When configured, the server calls WaitN after reading each DERP frame,
which delays the next read, fills the TCP receive buffer, shrinks
the TCP window, and naturally throttles the sender — no packets are dropped.
- Rate limiting is on the receive (inbound) side, which is what an abusive
client controls
- Mesh peers are exempt since they are trusted infrastructure
- The burst size is at least MaxPacketSize (64KB) to ensure a
single max-size frame can always be processed
Also refactors sclient to store a context.Context directly instead of a
done channel, which simplifies the rate limiter's WaitN call.
Flags added to cmd/derper:
--per-client-rate-limit (bytes/sec, default 0 = unlimited)
--per-client-rate-burst (bytes, default 0 = 2x rate limit)
Example for 10Mbps: --per-client-rate-limit=1250000
Updates #38509
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
Replace byte-at-a-time ReadByte loops with Peek+Discard in the DERP
read path. Peek returns a slice into bufio's internal buffer without
allocating, and Discard advances the read pointer without copying.
Introduce util/bufiox with a BufferedReader interface and ReadFull
helper that uses Peek+copy+Discard as an allocation-free alternative
to io.ReadFull.
- derp.ReadFrameHeader: replace 5× ReadByte with Peek(5)+Discard(5),
reading the frame type and length directly from the peeked slice.
Remove now-unused readUint32 helper.
name old ns/op new ns/op speedup
ReadFrameHeader-8 24.2 12.4 ~2x
(0 allocs/op in both)
- key.NodePublic.ReadRawWithoutAllocating: replace 32× ReadByte with
bufiox.ReadFull. Addresses the "Dear future" comment about switching
away from byte-at-a-time reads once a non-escaping alternative exists.
name old ns/op new ns/op speedup
NodeReadRawWithoutAllocating-8 140 43.6 ~3.2x
(0 allocs/op in both)
- derpserver.handleFramePing: replace io.ReadFull with bufiox.ReadFull.
Updates tailscale/corp#38509
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
I omitted a lot of the min/max modernizers because they didn't
result in more clear code.
Some of it's older "for x := range 123".
Also: errors.AsType, any, fmt.Appendf, etc.
Updates #18682
Change-Id: I83a451577f33877f962766a5b65ce86f7696471c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This file was never truly necessary and has never actually been used in
the history of Tailscale's open source releases.
A Brief History of AUTHORS files
---
The AUTHORS file was a pattern developed at Google, originally for
Chromium, then adopted by Go and a bunch of other projects. The problem
was that Chromium originally had a copyright line only recognizing
Google as the copyright holder. Because Google (and most open source
projects) do not require copyright assignemnt for contributions, each
contributor maintains their copyright. Some large corporate contributors
then tried to add their own name to the copyright line in the LICENSE
file or in file headers. This quickly becomes unwieldy, and puts a
tremendous burden on anyone building on top of Chromium, since the
license requires that they keep all copyright lines intact.
The compromise was to create an AUTHORS file that would list all of the
copyright holders. The LICENSE file and source file headers would then
include that list by reference, listing the copyright holder as "The
Chromium Authors".
This also become cumbersome to simply keep the file up to date with a
high rate of new contributors. Plus it's not always obvious who the
copyright holder is. Sometimes it is the individual making the
contribution, but many times it may be their employer. There is no way
for the proejct maintainer to know.
Eventually, Google changed their policy to no longer recommend trying to
keep the AUTHORS file up to date proactively, and instead to only add to
it when requested: https://opensource.google/docs/releasing/authors.
They are also clear that:
> Adding contributors to the AUTHORS file is entirely within the
> project's discretion and has no implications for copyright ownership.
It was primarily added to appease a small number of large contributors
that insisted that they be recognized as copyright holders (which was
entirely their right to do). But it's not truly necessary, and not even
the most accurate way of identifying contributors and/or copyright
holders.
In practice, we've never added anyone to our AUTHORS file. It only lists
Tailscale, so it's not really serving any purpose. It also causes
confusion because Tailscalars put the "Tailscale Inc & AUTHORS" header
in other open source repos which don't actually have an AUTHORS file, so
it's ambiguous what that means.
Instead, we just acknowledge that the contributors to Tailscale (whoever
they are) are copyright holders for their individual contributions. We
also have the benefit of using the DCO (developercertificate.org) which
provides some additional certification of their right to make the
contribution.
The source file changes were purely mechanical with:
git ls-files | xargs sed -i -e 's/\(Tailscale Inc &\) AUTHORS/\1 contributors/g'
Updates #cleanup
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
Adds an observation point that may identify potentially abusive traffic
patterns at outlier values.
Updates tailscale/corp#24681
Signed-off-by: James Tucker <james@tailscale.com>
PR #17258 extracted `derp.Server` into `derp/derpserver.Server`.
This followup patch adds the following cleanups:
1. Rename `derp_server*.go` files to `derpserver*.go` to match
the package name.
2. Rename the `derpserver.NewServer` constructor to `derpserver.New`
to reduce stuttering.
3. Remove the unnecessary `derpserver.Conn` type alias.
Updates #17257
Updates #cleanup
Signed-off-by: Simon Law <sfllaw@tailscale.com>
This exports a number of things from the derp (generic + client) package
to be used by the new derpserver package, as now used by cmd/derper.
And then enough other misc changes to lock in that cmd/tailscaled can
be configured to not bring in tailscale.com/client/local. (The webclient
in particular, even when disabled, was bringing it in, so that's now fixed)
Fixes#17257
Change-Id: I88b6c7958643fb54f386dd900bddf73d2d4d96d5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
* cmd/tailscale/cli: use client/local instead of deprecated client/tailscale
Updates tailscale/corp#22748
Signed-off-by: Alex Chan <alexc@tailscale.com>
* derp: use client/local instead of deprecated client/tailscale
Updates tailscale/corp#22748
Signed-off-by: Alex Chan <alexc@tailscale.com>
---------
Signed-off-by: Alex Chan <alexc@tailscale.com>
During a short period of packet loss, a TCP connection to the home DERP
may be maintained. If no other regions emerge as winners, such as when
all regions but one are avoided/disallowed as candidates, ensure that
the current home region, if still active, is not dropped as the
preferred region until it has failed two keepalives.
Relatedly apply avoid and no measure no home to ICMP and HTTP checks as
intended.
Updates tailscale/corp#12894
Updates tailscale/corp#29491
Signed-off-by: James Tucker <james@tailscale.com>
Add mesh key support to derpprobe for
probing derpers with verify set to true.
Move MeshKey checking to central point for code reuse.
Fix a bad error fmt msg.
Fixestailscale/corp#27294Fixestailscale/corp#25756
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
To authenticate mesh keys, the DERP servers used a simple == comparison,
which is susceptible to a side channel timing attack.
By extracting the mesh key for a DERP server, an attacker could DoS it
by forcing disconnects using derp.Client.ClosePeer. They could also
enumerate the public Wireguard keys, IP addresses and ports for nodes
connected to that DERP server.
DERP servers configured without mesh keys deny all such requests.
This patch also extracts the mesh key logic into key.DERPMesh, to
prevent this from happening again.
Security bulletin: https://tailscale.com/security-bulletins#ts-2025-003Fixestailscale/corp#28720
Signed-off-by: Simon Law <sfllaw@tailscale.com>
This fixes the implementation and test from #15208 which apparently
never worked.
Ignore the metacert when counting the number of expected certs
presented.
And fix the test, pulling out the TLSConfig setup code into something
shared between the real cmd/derper and the test.
Fixes#15579
Change-Id: I90526e38e59f89b480629b415f00587b107de10a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It was moved in f57fa3cbc3.
Updates tailscale/corp#22748
Change-Id: I19f965e6bded1d4c919310aa5b864f2de0cd6220
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The timeout still defaults to 2 seconds, but can now be changed via command-line flag.
Updates tailscale/corp#26045
Signed-off-by: Percy Wegmann <percy@tailscale.com>
This interface is used both by the DERP client as well as the server.
Defining the interface in derp.go makes it clear that it is shared.
Updates tailscale/corp#26045
Signed-off-by: Percy Wegmann <percy@tailscale.com>
Metrics currently exist for dropped packets by reason, and total
received packets by kind (e.g., `disco` or `other`), but relating these
two together to gleam information about the drop rate for specific
reasons on a per-kind basis is not currently possible.
Change `derp_packets_dropped` to use a `metrics.MultiLabelMap` to
track both the `reason` and `kind` in the same metric to allow for this
desired level of granularity.
Drop metrics that this makes unnecessary (namely `packetsDroppedReason`
and `packetsDroppedType`).
Updates https://github.com/tailscale/corp/issues/25489
Signed-off-by: Mario Minardi <mario@tailscale.com>
Use envknob to configure the per client send
queue depth for the derp server.
Fixestailscale/corp#24978
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
In f77821fd63 (released in v1.72.0), we made the client tell a DERP server
when the connection was not its ideal choice (the first node in its region).
But we didn't do anything with that information until now. This adds a
metric about how many such connections are on a given derper, and also
adds a bit to the PeerPresentFlags bitmask so watchers can identify
(and rebalance) them.
Updates tailscale/corp#372
Change-Id: Ief8af448750aa6d598e5939a57c062f4e55962be
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Write timeouts can be indicative of stalled TCP streams. Understanding
changes in the rate of such events can be helpful in an ops context.
Updates tailscale/corp#23668
Signed-off-by: Jordan Whited <jordan@tailscale.com>
In prep for upcoming flow tracking & mutex contention optimization
changes, this change refactors (subjectively simplifying) how the DERP
Server accounts for which peers have written to which other peers, to
be able to send PeerGoneReasonDisconnected messages to writes to
uncache their DRPO (DERP Return Path Optimization) routes.
Notably, this removes the Server.sentTo field which was guarded by
Server.mu and checked on all packet sends. Instead, the accounting is
moved to each sclient's sendLoop goroutine and now only needs to
acquire Server.mu for newly seen senders, the first time a peer sends
a packet to that sclient.
This change reduces the number of reasons to acquire Server.mu
per-packet from two to one. Removing the last one is the subject of an
upcoming change.
Updates #3560
Updates #150
Change-Id: Id226216d6629d61254b6bfd532887534ac38586c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And update a few callers as examples of motivation. (there are a
couple others, but these are the ones where it's prettier)
Updates #cleanup
Change-Id: Ic8c5cb7af0a59c6e790a599136b591ebe16d38eb
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
73280595a8 for #2751 added a "clientSet" interface to
distinguish the two cases of a client being singly connected (the
common case) vs tolerating multiple connections from the client at
once. At the time (three years ago) it was kinda an experiment
and we didn't know whether it'd stop the reconnect floods we saw
from certain clients. It did.
So this promotes it to a be first-class thing a bit, removing the
interface. The old tests from 73280595a were invaluable in ensuring
correctness while writing this change (they failed a bunch).
But the real motivation for this change is that it'll permit a future
optimization to add flow tracking for stats & performance where we
don't contend on Server.mu for each packet sent via DERP. Instead,
each client can track its active flows and hold on to a *clientSet and
ask the clientSet per packet what the active client is via one atomic
load rather than a mutex. And if the atomic load returns nil, we'll
know we need to ask the server to see if they died and reconnected and
got a new clientSet. But that's all coming later.
Updates #3560
Change-Id: I9ccda3e5381226563b5ec171ceeacf5c210e1faf
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I couldn't convince myself the old way was safe and couldn't lose
writes.
And it seemed too complicated.
Updates tailscale/corp#21104
Change-Id: I17ba7c7d6fd83458a311ac671146a1f6a458a5c1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
sendMeshUpdates tries to write as much as possible without blocking,
being careful to check the bufio.Writer.Available size before writes.
Except that regressed in 6c791f7d60 which made those messages larger, which
meants we were doing network I/O with the Server mutex held.
Updates tailscale/corp#13945
Change-Id: Ic327071d2e37de262931b9b390cae32084811919
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It was hex-ifying the String() form of key.NodePublic, which was already hex.
I noticed in some logs:
"client 6e6f64656b65793a353537353..."
And thought that 6x6x6x6x looked strange. It's "nodekey:" in hex.
Updates tailscale/corp#20844
Change-Id: Ib9f2d63b37e324420b86efaa680668a9b807e465
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The derp metrics got out of sync in 74eb99aed1 (2023-03).
They were fixed in 0380cbc90d (2024-05).
This adds some further guardrails (atop the previous fix) to make sure
they don't get out of sync again.
Updates #12288
Change-Id: I809061a81f8ff92f45054d0253bc13871fc71634
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Without changing behaviour, don't create a goroutine per connection that
sits and sleeps, but rather use a timer that wakes up and gathers
statistics on a regular basis.
Fixes#12127
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ibc486447e403070bdc3c2cd8ae340e7d02854f21
It's deprecated and using it gets us the old slow behavior
according to https://go.dev/blog/randv2.
> Having eliminated repeatability of the global output stream, Go 1.20
> was also able to make the global generator scale better in programs
> that don’t call rand.Seed, replacing the Go 1 generator with a very
> cheap per-thread wyrand generator already used inside the Go
> runtime. This removed the global mutex and made the top-level
> functions scale much better. Programs that do call rand.Seed fall
> back to the mutex-protected Go 1 generator.
Updates #7123
Change-Id: Ia5452e66bd16b5457d4b1c290a59294545e13291
Signed-off-by: Maisem Ali <maisem@tailscale.com>
So derpers can check an external URL for whether to permit access
to a certain public key.
Updates tailscale/corp#17693
Change-Id: I8594de58f54a08be3e2dbef8bcd1ff9b728ab297
Co-authored-by: Maisem Ali <maisem@tailscale.com>
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
See the field alignment lints for more information.
Reductions are 64->24 and 64->32 respectively.
Updates #self
Signed-off-by: James Tucker <james@tailscale.com>
Observed on one busy derp node, there were 600 goroutines blocked
writing to this channel, which represents not only more blocked routines
than we need, but also excess wake-ups downstream as the latent
goroutines writes represent no new work.
Updates #self
Signed-off-by: James Tucker <james@tailscale.com>