Files
tailscale/util/eventbus
James Tucker e7415e6393 util/eventbus: unify Subscriber/SubscriberFunc cores; structural symmetry
Brings Subscriber[T] in line with the same non-generic-core pattern already
applied to SubscriberFunc[T] and Publisher[T]:

  - Renames subscriberFuncCore to subscriberCore and shares it between
    Subscriber[T] and SubscriberFunc[T]. Both typed facades hold a
    *subscriberCore plus their respective per-T delivery state
    (Subscriber: chan T; SubscriberFunc: nothing, the user callback is
    captured in the dispatch closure).

  - The bus's outputs map and subscriber-interface itab key on
    *subscriberCore for both subscriber kinds, so adding a new Subscribe[T]
    call site no longer pays a per-T itab, dictionary, or equality function
    for the subscriber-interface side.

  - Subscribe[T] now hoists the non-generic constructor portion into
    newSubscriberCore (timer setup, core allocation, cached type/typeName,
    unregister method-value), matching SubscribeFunc.

The dispatch loop is intentionally NOT extracted to a non-generic helper for
Subscriber[T], unlike SubscriberFunc[T]. The reason is the typed channel send
'case s.read <- t:' must appear lexically inside the select; the only way to
lift it into a non-generic loop is to bridge typed and untyped via a per-event
goroutine, which costs ~2.7x throughput on BenchmarkBasicThroughput. We keep
dispatchTyped on the generic facade and accept the per-shape stencil cost as
the cheaper alternative.

Symbol-level effect on tailscaled (linux/amd64, measured via
`go tool nm -size`):

  Before:
    (*Subscriber[T]).dispatch
      2 shape stencils:        1,682 + 1,549 = 3,231 B
      3 thin per-T wrappers:   124 B each   =   372 B
      2 deferwrap1 helpers:    62 B each    =   124 B
      total:                                 3,727 B

  After:
    (*Subscriber[T]).dispatchTyped
      2 shape stencils:        1,678 + 1,582 = 3,260 B
      0 per-T wrappers (replaced by closure stored on core)
      2 deferwrap1 helpers:    62 B each    =   124 B
      total:                                 3,384 B

  dispatch path .text delta:                   -343 B (-9.2%)

Per-shape stencils are ~1,600 B (.text body) + ~1,100 B (pclntab) =
~2,700 B each on production tailscaled. The shape count matches before/after
(two distinct GC shapes for the Subscriber[T] event types in this binary).
What changes is that the per-T thin wrappers are eliminated because
Subscriber[T] no longer implements the subscriber interface directly.

Whole-binary section deltas:

  .text:        -2,304 B  (includes the dispatch savings plus other
                            small downstream effects)
  .rodata:        +512 B  (additional closure-type metadata)
  .gopclntab:   -2,981 B  (fewer per-T compiled functions => less metadata)

Stripped tailscaled (linux/amd64): no change at the file level (the savings
fall below the linker's section-alignment boundary). Unstripped builds shrink
by ~2,900 B.

Behavior is unchanged:
  BenchmarkBasicThroughput:       2,161 ns/op,  0 B/op,  0 allocs/op
  BenchmarkBasicFuncThroughput:   2,493 ns/op, 144 B/op, 2 allocs/op
  BenchmarkSubsThroughput:        3,727 ns/op,  0 B/op,  0 allocs/op

Updates #12614

Change-Id: I97918ec68bd2cdb15958bbfd7687592b39663efe
Signed-off-by: James Tucker <james@tailscale.com>
2026-05-13 17:36:30 -07:00
..