Adds logic for containerboot to signal that it can't auth, so the
operator can reissue a new auth key. This only applies when running with
a config file and with a kube state store.
If the operator sees reissue_authkey in a state Secret, it will create a
new auth key iff the config has no auth key or its auth key matches the
value of reissue_authkey from the state Secret. This is to ensure we
don't reissue auth keys in a tight loop if the proxy is slow to start or
failing for some other reason. The reissue logic also uses a burstable
rate limiter to ensure there's no way a terminally misconfigured
or buggy operator can automatically generate new auth keys in a tight loop.
Additional implementation details (ChaosInTheCRD):
- Added `ipn.NotifyInitialHealthState` to ipn watcher, to ensure that
`n.Health` is populated when notify's are returned.
- on auth failure, containerboot:
- Disconnects from control server
- Sets reissue_authkey marker in state Secret with the failing key
- Polls config file for new auth key (10 minute timeout)
- Restarts after receiving new key to apply it
- modified operator's reissue logic slightly:
- Deletes old device from tailnet before creating new key
- Rate limiting: 1 key per 30s with initial burst equal to replica count
- In-flight tracking (authKeyReissuing map) prevents duplicate API calls
across reconcile loops
Updates #14080
Change-Id: I6982f8e741932a6891f2f48a2936f7f6a455317f
(cherry picked from commit 969927c47c3d4de05e90f5b26a6d8d931c5ceed4)
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
returnfmt.Errorf("failed to watch tailscaled for updates: %w",err)
}
@ -365,8 +372,23 @@ authLoop:
ifisOneStepConfig(cfg){
// This could happen if this is the first time tailscaled was run for this
// device and the auth key was not passed via the configfile.
returnfmt.Errorf("invalid state: tailscaled daemon started with a config file, but tailscale is not logged in: ensure you pass a valid auth key in the config file.")
ifhasKubeStateStore(cfg){
log.Printf("Auth key missing or invalid (NeedsLogin state), disconnecting from control and requesting new key from operator")
returnfmt.Errorf("failed to get a reissued authkey: %w",err)
}
log.Printf("Successfully received new auth key, restarting to apply configuration")
// we don't return an error here since we have handled the reissue gracefully.
returnnil
}
returnerrors.New("invalid state: tailscaled daemon started with a config file, but tailscale is not logged in: ensure you pass a valid auth key in the config file")
}
iferr:=authTailscale();err!=nil{
returnfmt.Errorf("failed to auth tailscale: %w",err)
}
@ -384,6 +406,27 @@ authLoop:
log.Printf("tailscaled in state %q, waiting",*n.State)
}
}
ifn.Health!=nil{
// This can happen if the config has an auth key but it's invalid,
// for example if it was single-use and already got used, but the
tsoperator.SetProxyGroupCondition(pg,tsapi.ProxyGroupReady,metav1.ConditionFalse,reasonProxyGroupCreating,"the ProxyGroup's ProxyClass \"default-pc\" is not yet in a ready state, waiting...",1,cl,zl.Sugar())
expectEqual(t,fc,pg)
expectProxyGroupResources(t,fc,pg,false,pc)
ifkube.ProxyGroupAvailable(pg){
iftsoperator.ProxyGroupAvailable(pg){
t.Fatal("expected ProxyGroup to not be available")