util/eventbus: extract SubscriberFunc.dispatch loop to a non-generic helper

The (*SubscriberFunc[T]).dispatch method body — a ~40-line select
loop with slow-subscriber timer, snapshot handling, ctx-cancel
draining, and a CI stack-dump branch — was previously fully
duplicated by the Go compiler for every distinct GC shape of T.
None of that body actually depends on T except for the type
assertion and the user callback invocation.

This change moves the loop body into a non-generic dispatchFunc()
helper, leaving (*SubscriberFunc[T]).dispatch as a tiny wrapper
that:

  - performs the vals.Peek().Event.(T) type assertion
  - spawns the callback goroutine via `go runFuncCallback(s.read,
    t, callDone)` — a regular generic function call, not a closure,
    so that `go` binds the args to the goroutine's frame instead of
    allocating a closure on the heap. This preserves the
    zero-extra-allocation behavior of the original
    (*SubscriberFunc[T]).runCallback method.
  - resolves T's name via reflect.TypeFor[T]().String() (cached on
    the stack rather than recomputed on each %T formatting)
  - calls dispatchFunc with the callDone channel

The %T formatting in the original logf calls is replaced with %s
on the resolved name string, removing per-T fmt instantiations.

A new BenchmarkBasicFuncThroughput is added alongside the existing
BenchmarkBasicThroughput so per-event allocation behavior on the
SubscribeFunc dispatch path is covered by the benchmark suite.

Measured impact (util/eventbus/sizetest):

  SubscriberFunc per-flow attribution:
    linux/amd64:  912.5 B/flow -> 840.8 B/flow  (-71.7 B/flow)
    linux/arm64:  917.5 B/flow -> 849.9 B/flow  (-67.6 B/flow)

The total per-flow size delta on amd64 dropped from 3,096.6 B to
3,039.2 B (-57 B/flow). The arm64 total stayed at 3,145.7 B
because the linker's page-aligned section sizing absorbed the
improvement on this binary; the symcost-attributed per-receiver
number is the real signal.

Behavior is unchanged: BenchmarkBasicThroughput stays at 0
allocs/op and BenchmarkBasicFuncThroughput holds at the same 2
allocs/op, 144 B/op as the prior eventbus implementation. All
eventbus tests pass.

Updates #12614

Change-Id: I85f933f50f58cd25bbfe5cc46bdda7aab22f0bf7
Signed-off-by: James Tucker <james@tailscale.com>
This commit is contained in:
James Tucker
2026-05-04 21:01:15 +00:00
committed by James Tucker
parent 87a74c3aa2
commit 0def0f19bd
2 changed files with 91 additions and 16 deletions
+21
View File
@@ -39,6 +39,27 @@ func BenchmarkBasicThroughput(b *testing.B) {
bus.Close()
}
// BenchmarkBasicFuncThroughput is the SubscribeFunc analogue of
// BenchmarkBasicThroughput: one publisher and one SubscribeFunc
// callback, shoveling events as fast as they can through the
// plumbing. Useful for tracking per-event allocation behavior on the
// SubscribeFunc dispatch path.
func BenchmarkBasicFuncThroughput(b *testing.B) {
bus := eventbus.New()
pcli := bus.Client(b.Name() + "-pub")
scli := bus.Client(b.Name() + "-sub")
type emptyEvent [0]byte
pub := eventbus.Publish[emptyEvent](pcli)
eventbus.SubscribeFunc(scli, func(emptyEvent) {})
for b.Loop() {
pub.Publish(emptyEvent{})
}
bus.Close()
}
func BenchmarkSubsThroughput(b *testing.B) {
bus := eventbus.New()
pcli := bus.Client(b.Name() + "-pub")