tstest/natlab, .github/workflows: add opt-in natlab CI workflow

The natlab vmtest suite (tstest/natlab/vmtest) and the integration nat
tests are gated behind --run-vm-tests because they need KVM and are
slow. Until now nothing in CI exercised them apart from a single
canary TestEasyEasy run on every PR.

Add .github/workflows/natlab-test.yml that runs the full opt-in suite
on demand (workflow_dispatch), on PRs labeled "natlab", and on main
every 12 hours via cron. The workflow has two phases:

  - "prepare" builds the gokrazy VM image, downloads the Ubuntu and
    FreeBSD cloud images once via the new natlabprep tool, and emits
    a dynamic JSON matrix of every TestX function it finds in the two
    opt-in packages.
  - "test" is a per-test matrix that depends on prepare. Each matrix
    job restores the shared caches and runs a single test, so adding
    a new TestFoo is automatically picked up on the next run without
    any workflow edits.

Rename the existing natlab-integrationtest.yml to natlab-basic.yml
since it's the small smoke variant (just TestEasyEasy on every PR);
the new natlab-test.yml is the bigger suite. The job inside is
renamed to EasyEasy for the same reason.

Move the macOS arm64 host check from vmtest.Env.Start into
vmtest.Env.AddNode so a test that adds a vmtest.MacOS node skips
immediately on a non-macOS host, and add an explicit
skipIfNotMacOSArm64 helper at the top of the two macOS-only tests
so the platform requirement is obvious to readers.

Quiet the takeAgentConnOne miss log in tstest/natlab/vnet by default
(it was the overwhelming majority of bytes in CI logs, with no signal
in healthy runs) and replace it with a periodic "still waiting" line
that only fires after 10s, so a truly stuck agent connection still
surfaces.

Updates #13038

Change-Id: I4582098d8865200fd5a73a9b696942319ccf3bf0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit is contained in:
Brad Fitzpatrick
2026-05-06 20:07:45 +00:00
committed by Brad Fitzpatrick
parent 4eec4423b4
commit e062b46984
9 changed files with 454 additions and 78 deletions
+12 -7
View File
@@ -173,16 +173,21 @@ func (e *Env) generateFreeBSDUserData(n *Node) string {
ud.WriteString(" - \"sysctl net.inet6.ip6.forwarding=1\"\n")
}
// Start tailscaled and tta in the background.
// Set PATH to include /usr/local/bin so that tta can find "tailscale"
// (TTA uses exec.Command("tailscale", ...) without a full path).
// --statedir provides a VarRoot so features like Taildrop have a directory.
// Start tailscaled and tta in the background. Redirect stdio to log
// files and away from /dev/null on stdin; otherwise nuageinit's runcmd
// executor keeps the backgrounded child's stdout/stderr pipes open and
// blocks waiting for them, so subsequent runcmd entries (including the
// tta launch below) never run. Linux cloud-init doesn't have this
// gotcha. Set PATH to include /usr/local/bin so that tta can find
// "tailscale" (TTA uses exec.Command("tailscale", ...) without a full
// path). --statedir provides a VarRoot so features like Taildrop have a
// directory.
ud.WriteString(" - \"mkdir -p /var/lib/tailscale\"\n")
ud.WriteString(" - \"export PATH=/usr/local/bin:$PATH && /usr/local/bin/tailscaled --state=mem: --statedir=/var/lib/tailscale &\"\n")
ud.WriteString(" - \"export PATH=/usr/local/bin:$PATH && /usr/local/bin/tailscaled --state=mem: --statedir=/var/lib/tailscale </dev/null >/var/log/tailscaled.log 2>&1 &\"\n")
ud.WriteString(" - \"sleep 2\"\n")
// Start tta (Tailscale Test Agent).
ud.WriteString(" - \"export PATH=/usr/local/bin:$PATH && /usr/local/bin/tta &\"\n")
// Start tta (Tailscale Test Agent), with the same stdio redirection.
ud.WriteString(" - \"export PATH=/usr/local/bin:$PATH && /usr/local/bin/tta </dev/null >/var/log/tta.log 2>&1 &\"\n")
return ud.String()
}