120bfcf1cc
Two changes that share the same intent of reducing per-T duplication
in code that doesn't actually depend on T:
1. Hoist the non-generic portion of newSubscriberFunc[T] into a
newSubscriberFuncCore() helper. The hoisted work is the time
timer setup, the subscriberFuncCore allocation, and the
unregister closure (which captures only the non-generic
reflect.Type and *subscribeState). The generic body now does
only the two T-bound things it has to: compute reflect.TypeFor[T]
and create the dispatch closure.
Effect on the per-shape-stencil body of newSubscriberFunc[T]:
before: 523 B per shape (in synthetic test)
after: 293 B per shape (-230 B per shape; -56% on this body)
2. Cache reflect.Type.String() once at construction (in core.typeName)
instead of recomputing it every time the dispatch closure runs.
The dispatch closure also now takes the *subscriberFuncCore directly
rather than building an intermediate dispatchFuncState struct on
every call.
Effect on the dispatch closure body (newSubscriberFunc[T].func1):
before: 581 B per shape
after: 480 B per shape (-101 B per shape; -17%)
Combined effect on tailscaled (linux/amd64):
named-symbol savings via symcost: ~7 KB
stripped binary delta: -8 KB (page-quantized)
arm64 binary delta: 0 (page-quantized)
cumulative reduction from baseline (5167ff412):
linux/amd64: -110,592 bytes (-0.391%)
linux/arm64: -131,072 bytes (-0.499%)
Throughput is also improved by the typeName cache: BenchmarkBasic
goes from 2018 ns/op to 1864 ns/op (-7.6%) because the dispatch hot
path no longer allocates a string on every event.
Updates #12614
Change-Id: Ib3a3d6796785e16506330ec034e1144580d467a3
Signed-off-by: James Tucker <james@tailscale.com>