cmd/{containerboot,k8s-operator}: reissue auth keys for broken proxies (#16450)

Adds logic for containerboot to signal that it can't auth, so the
operator can reissue a new auth key. This only applies when running with
a config file and with a kube state store.

If the operator sees reissue_authkey in a state Secret, it will create a
new auth key iff the config has no auth key or its auth key matches the
value of reissue_authkey from the state Secret. This is to ensure we
don't reissue auth keys in a tight loop if the proxy is slow to start or
failing for some other reason. The reissue logic also uses a burstable
rate limiter to ensure there's no way a terminally misconfigured
or buggy operator can automatically generate new auth keys in a tight loop.

Additional implementation details (ChaosInTheCRD):

- Added `ipn.NotifyInitialHealthState` to ipn watcher, to ensure that
  `n.Health` is populated when notify's are returned.
- on auth failure, containerboot:
  - Disconnects from control server
  - Sets reissue_authkey marker in state Secret with the failing key
  - Polls config file for new auth key (10 minute timeout)
  - Restarts after receiving new key to apply it

- modified operator's reissue logic slightly:
  - Deletes old device from tailnet before creating new key
  - Rate limiting: 1 key per 30s with initial burst equal to replica count
  - In-flight tracking (authKeyReissuing map) prevents duplicate API calls
    across reconcile loops

Updates #14080

Change-Id: I6982f8e741932a6891f2f48a2936f7f6a455317f


(cherry picked from commit 969927c47c3d4de05e90f5b26a6d8d931c5ceed4)

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
This commit is contained in:
Tom Proctor
2026-03-11 10:25:57 +00:00
committed by GitHub
parent 7a43e41a27
commit 95a135ead1
11 changed files with 875 additions and 156 deletions
+8 -8
View File
@@ -38,17 +38,17 @@ const (
// Keys that containerboot writes to state file that can be used to determine its state.
// fields set in Tailscale state Secret. These are mostly used by the Tailscale Kubernetes operator to determine
// the state of this tailscale device.
KeyDeviceID string = "device_id" // node stable ID of the device
KeyDeviceFQDN string = "device_fqdn" // device's tailnet hostname
KeyDeviceIPs string = "device_ips" // device's tailnet IPs
KeyPodUID string = "pod_uid" // Pod UID
// KeyCapVer contains Tailscale capability version of this proxy instance.
KeyCapVer string = "tailscale_capver"
KeyDeviceID = "device_id" // node stable ID of the device
KeyDeviceFQDN = "device_fqdn" // device's tailnet hostname
KeyDeviceIPs = "device_ips" // device's tailnet IPs
KeyPodUID = "pod_uid" // Pod UID
KeyCapVer = "tailscale_capver" // tailcfg.CurrentCapabilityVersion of this proxy instance.
KeyReissueAuthkey = "reissue_authkey" // Proxies will set this to the authkey that failed, or "no-authkey", if they can't log in.
// KeyHTTPSEndpoint is a name of a field that can be set to the value of any HTTPS endpoint currently exposed by
// this device to the tailnet. This is used by the Kubernetes operator Ingress proxy to communicate to the operator
// that cluster workloads behind the Ingress can now be accessed via the given DNS name over HTTPS.
KeyHTTPSEndpoint string = "https_endpoint"
ValueNoHTTPS string = "no-https"
KeyHTTPSEndpoint = "https_endpoint"
ValueNoHTTPS = "no-https"
// Pod's IPv4 address header key as returned by containerboot health check endpoint.
PodIPv4Header string = "Pod-IPv4"