The previous algorithm used a map of all visited pointers.
The strength of this approach is that it quickly prunes any nodes
that we have ever visited before. The detriment of the approach
is that pruning is heavily dependent on the order that pointers
were visited. This is especially relevant for hashing a map
where map entries are visited in a non-deterministic manner,
which would cause the map hash to be non-deterministic
(which defeats the point of a hash).
This new algorithm uses a stack of all visited pointers,
similar to how github.com/google/go-cmp performs cycle detection.
When we visit a pointer, we push it onto the stack, and when
we leave a pointer, we pop it from the stack.
Before visiting a pointer, we first check whether the pointer exists
anywhere in the stack. If yes, then we prune the node.
The detriment of this approach is that we may hash a node more often
than before since we do not prune as aggressively.
The set of visited pointers up until any node is only the
path of nodes up to that node and not any other pointers
that may have been visited elsewhere. This provides us
deterministic hashing regardless of visit order.
We can now delete hashMapFallback and associated complexity,
which only exists because the previous approach was non-deterministic
in the presence of cycles.
This fixes a failure of the old algorithm where obviously different
values are treated as equal because the pruning was too aggresive.
See https://github.com/tailscale/tailscale/issues/2443#issuecomment-883653534
The new algorithm is slightly slower since it prunes less aggresively:
name old time/op new time/op delta
Hash-8 66.1µs ± 1% 68.8µs ± 1% +4.09% (p=0.000 n=19+19)
HashMapAcyclic-8 63.0µs ± 1% 62.5µs ± 1% -0.76% (p=0.000 n=18+19)
TailcfgNode-8 9.79µs ± 2% 9.88µs ± 1% +0.95% (p=0.000 n=19+17)
HashArray-8 643ns ± 1% 653ns ± 1% +1.64% (p=0.000 n=19+19)
However, a slower but more correct algorithm seems
more favorable than a faster but incorrect algorithm.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
A Go interface may hold any number of different concrete types.
Just because two underlying values hash to the same thing
does not mean the two values are identical if they have different
concrete types. As such, include the type in the hash.
Seed the hash upon first use with the current time.
This ensures that the stability of the hash is bounded within
the lifetime of one program execution.
Hopefully, this prevents future bugs where someone assumes that
this hash is stable.
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
The fact that Hash returns a [sha256.Size]byte leaks details about
the underlying hash implementation. This could very well be any other
hashing algorithm with a possible different block size.
Abstract this implementation detail away by declaring an opaque type
that is comparable. While we are changing the signature of UpdateHash,
rename it to just Update to reduce stutter (e.g., deephash.Update).
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Running hex.Encode(b, b) is a bad idea.
The first byte of input will overwrite the first two bytes of output.
Subsequent bytes have no impact on the output.
Not related to today's IPv6 bug, but...wh::ps.
This caused us to spuriously ignore some wireguard config updates.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
e66d4e4c81 added AppendTo methods
to some key types. Their marshaled form is longer than 64 bytes.
name old time/op new time/op delta
Hash-8 15.5µs ± 1% 14.8µs ± 1% -4.17% (p=0.000 n=9+9)
name old alloc/op new alloc/op delta
Hash-8 1.18kB ± 0% 0.47kB ± 0% -59.87% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Hash-8 12.0 ± 0% 6.0 ± 0% -50.00% (p=0.000 n=10+10)
This is still a bit worse than explicitly handling the types,
but much nicer.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
All netaddr types that we are concerned with now implement AppendTo.
Use the AppendTo method if available, and remove all references to netaddr.
This is slower but cleaner, and more readily re-usable by others.
name old time/op new time/op delta
Hash-8 12.6µs ± 0% 14.8µs ± 1% +18.05% (p=0.000 n=8+10)
HashMapAcyclic-8 21.4µs ± 1% 21.9µs ± 1% +2.39% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
Hash-8 408B ± 0% 408B ± 0% ~ (p=1.000 n=10+10)
HashMapAcyclic-8 1.00B ± 0% 1.00B ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
Hash-8 6.00 ± 0% 6.00 ± 0% ~ (all equal)
HashMapAcyclic-8 0.00 0.00 ~ (all equal)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Reduce to just a single external endpoint.
Convert from a variadic number of interfaces to a slice there.
name old time/op new time/op delta
Hash-8 14.4µs ± 0% 14.0µs ± 1% -3.08% (p=0.000 n=9+9)
name old alloc/op new alloc/op delta
Hash-8 873B ± 0% 793B ± 0% -9.16% (p=0.000 n=9+6)
name old allocs/op new allocs/op delta
Hash-8 18.0 ± 0% 14.0 ± 0% -22.22% (p=0.000 n=10+10)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Slightly slower, but lots less garbage.
We will recover the speed lost in a follow-up commit.
name old time/op new time/op delta
Hash-8 13.5µs ± 1% 14.3µs ± 0% +5.84% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
Hash-8 1.46kB ± 0% 0.87kB ± 0% -40.10% (p=0.000 n=7+10)
name old allocs/op new allocs/op delta
Hash-8 43.0 ± 0% 18.0 ± 0% -58.14% (p=0.000 n=10+10)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This requires changes to the Go toolchain.
The changes are upstream at https://golang.org/cl/320929.
They haven't been pulled into our fork yet.
No need to allocate new iteration scratch values for every map.
name old time/op new time/op delta
Hash-8 13.6µs ± 0% 13.5µs ± 0% -1.01% (p=0.008 n=5+5)
HashMapAcyclic-8 21.2µs ± 1% 21.1µs ± 2% ~ (p=0.310 n=5+5)
name old alloc/op new alloc/op delta
Hash-8 1.58kB ± 0% 1.46kB ± 0% -7.60% (p=0.008 n=5+5)
HashMapAcyclic-8 152B ± 0% 128B ± 0% -15.79% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
Hash-8 49.0 ± 0% 43.0 ± 0% -12.24% (p=0.008 n=5+5)
HashMapAcyclic-8 4.00 ± 0% 2.00 ± 0% -50.00% (p=0.008 n=5+5)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
To get the benefit of this optimization requires help from the Go toolchain.
The changes are upstream at https://golang.org/cl/320929,
and have been pulled into the Tailscale fork at
728ecc58fd.
It also requires building with the build tag tailscale_go.
name old time/op new time/op delta
Hash-8 14.0µs ± 0% 13.6µs ± 0% -2.88% (p=0.008 n=5+5)
HashMapAcyclic-8 24.3µs ± 1% 21.2µs ± 1% -12.47% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
Hash-8 2.16kB ± 0% 1.58kB ± 0% -27.01% (p=0.008 n=5+5)
HashMapAcyclic-8 2.53kB ± 0% 0.15kB ± 0% -93.99% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
Hash-8 77.0 ± 0% 49.0 ± 0% -36.36% (p=0.008 n=5+5)
HashMapAcyclic-8 202 ± 0% 4 ± 0% -98.02% (p=0.008 n=5+5)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
setkey
The acyclic map code interacts badly with netaddr.IPs.
One of the netaddr.IP fields is an *intern.Value,
and we use a few sentinel values.
Those sentinel values make many of the netaddr data structures appear cyclic.
One option would be to replace the cycle-detection code with
a Floyd-Warshall style algorithm. The downside is that this will take
longer to detect cycles, particularly if the cycle is long.
This problem is exacerbated by the fact that the acyclic cycle detection
code shares a single visited map for the entire data structure,
not just the subsection of the data structure localized to the map.
Unfortunately, the extra allocations and work (and code) to use per-map
visited maps make this option not viable.
Instead, continue to special-case netaddr data types.
name old time/op new time/op delta
Hash-8 22.4µs ± 0% 14.0µs ± 0% -37.59% (p=0.008 n=5+5)
HashMapAcyclic-8 23.8µs ± 0% 24.3µs ± 1% +1.75% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
Hash-8 2.49kB ± 0% 2.16kB ± 0% ~ (p=0.079 n=4+5)
HashMapAcyclic-8 2.53kB ± 0% 2.53kB ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
Hash-8 86.0 ± 0% 77.0 ± 0% -10.47% (p=0.008 n=5+5)
HashMapAcyclic-8 202 ± 0% 202 ± 0% ~ (all equal)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Hash and xor each entry instead, then write final xor'ed result.
name old time/op new time/op delta
Hash-4 33.6µs ± 4% 34.6µs ± 3% +3.03% (p=0.013 n=10+9)
name old alloc/op new alloc/op delta
Hash-4 1.86kB ± 0% 1.77kB ± 0% -5.10% (p=0.000 n=10+9)
name old allocs/op new allocs/op delta
Hash-4 51.0 ± 0% 49.0 ± 0% -3.92% (p=0.000 n=10+10)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Yes, it printed, but that was an implementation detail for hashing.
And coming optimization will make it print even less.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Not that it matters, but we were missing a close parens.
It's cheap, so add it.
name old time/op new time/op delta
Hash-8 6.64µs ± 0% 6.67µs ± 1% +0.42% (p=0.008 n=9+10)
name old alloc/op new alloc/op delta
Hash-8 1.54kB ± 0% 1.54kB ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
Hash-8 37.0 ± 0% 37.0 ± 0% ~ (all equal)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The struct field names don't change within a single run,
so they are irrelevant. Use the field index instead.
name old time/op new time/op delta
Hash-8 6.52µs ± 0% 6.64µs ± 0% +1.91% (p=0.000 n=6+9)
name old alloc/op new alloc/op delta
Hash-8 1.67kB ± 0% 1.54kB ± 0% -7.66% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Hash-8 53.0 ± 0% 37.0 ± 0% -30.19% (p=0.000 n=10+10)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
These show up a lot in our data structures.
name old time/op new time/op delta
Hash-8 11.5µs ± 1% 7.8µs ± 1% -32.17% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
Hash-8 1.98kB ± 0% 1.67kB ± 0% -15.73% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Hash-8 82.0 ± 0% 53.0 ± 0% -35.37% (p=0.000 n=10+10)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The sha256 hash writer doesn't implement WriteString.
(See https://github.com/golang/go/issues/38776.)
As a consequence, we end up converting many strings to []byte.
Wrapping a bufio.Writer around the hash writer lets us
avoid these conversions by using WriteString.
Using a bufio.Writer is, perhaps surprisingly, almost as cheap as using unsafe.
The reason is that the sha256 writer does internal buffering,
but doesn't do any when handed larger writers.
Using a bufio.Writer merely shifts the data copying from one buffer
to a different one.
Using a concrete type for Print and print cuts 10% off of the execution time.
name old time/op new time/op delta
Hash-8 15.3µs ± 0% 11.5µs ± 0% -24.84% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
Hash-8 2.82kB ± 0% 1.98kB ± 0% -29.57% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Hash-8 140 ± 0% 82 ± 0% -41.43% (p=0.000 n=10+10)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The new deepprint package just walks a Go data structure and writes to
an io.Writer. It's not pretty like go-spew, etc.
We then use it to replace the use of UAPI (which we have a TODO to
remove) to generate signatures of data structures to detect whether
anything changed (without retaining the old copy).
This was necessary because the UAPI conversion ends up trying to do
DNS lookups which an upcoming change depends on not happening.