derp,types,util: use bufio Peek+Discard for allocation-free fast reads (#19067)

Replace byte-at-a-time ReadByte loops with Peek+Discard in the DERP
read path. Peek returns a slice into bufio's internal buffer without
allocating, and Discard advances the read pointer without copying.

Introduce util/bufiox with a BufferedReader interface and ReadFull
helper that uses Peek+copy+Discard as an allocation-free alternative
to io.ReadFull.

  - derp.ReadFrameHeader: replace 5× ReadByte with Peek(5)+Discard(5),
    reading the frame type and length directly from the peeked slice.
    Remove now-unused readUint32 helper.

    name                  old ns/op  new ns/op  speedup
    ReadFrameHeader-8     24.2       12.4       ~2x
    (0 allocs/op in both)

  - key.NodePublic.ReadRawWithoutAllocating: replace 32× ReadByte with
    bufiox.ReadFull. Addresses the "Dear future" comment about switching
    away from byte-at-a-time reads once a non-escaping alternative exists.

    name                              old ns/op  new ns/op  speedup
    NodeReadRawWithoutAllocating-8    140        43.6       ~3.2x
    (0 allocs/op in both)

  - derpserver.handleFramePing: replace io.ReadFull with bufiox.ReadFull.

Updates tailscale/corp#38509

Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
This commit is contained in:
Mike O'Driscoll
2026-03-24 10:52:20 -04:00
committed by GitHub
parent 1d0fde6fc2
commit 1403920367
17 changed files with 231 additions and 47 deletions
+4 -17
View File
@@ -15,6 +15,7 @@ import (
"golang.org/x/crypto/curve25519"
"golang.org/x/crypto/nacl/box"
"tailscale.com/types/structs"
"tailscale.com/util/bufiox"
)
const (
@@ -242,28 +243,14 @@ func (k NodePublic) AppendTo(buf []byte) []byte {
}
// ReadRawWithoutAllocating initializes k with bytes read from br.
// The reading is done ~4x slower than io.ReadFull, but in exchange is
// allocation-free.
// It uses [bufiox.ReadFull] to read without heap allocations.
func (k *NodePublic) ReadRawWithoutAllocating(br *bufio.Reader) error {
var z NodePublic
if *k != z {
return errors.New("refusing to read into non-zero NodePublic")
}
// This is ~4x slower than io.ReadFull, but using io.ReadFull
// causes one extra alloc, which is significant for the DERP
// server that consumes this method. So, process stuff slower but
// without allocation.
//
// Dear future: if io.ReadFull stops causing stuff to escape, you
// should switch back to that.
for i := range k.k {
b, err := br.ReadByte()
if err != nil {
return err
}
k.k[i] = b
}
return nil
_, err := bufiox.ReadFull(br, k.k[:])
return err
}
// WriteRawWithoutAllocating writes out k as 32 bytes to bw.