tstest/natlab, .github/workflows: add opt-in natlab CI workflow

The natlab vmtest suite (tstest/natlab/vmtest) and the integration nat
tests are gated behind --run-vm-tests because they need KVM and are
slow. Until now nothing in CI exercised them apart from a single
canary TestEasyEasy run on every PR.

Add .github/workflows/natlab-test.yml that runs the full opt-in suite
on demand (workflow_dispatch), on PRs labeled "natlab", and on main
every 12 hours via cron. The workflow has two phases:

  - "prepare" builds the gokrazy VM image, downloads the Ubuntu and
    FreeBSD cloud images once via the new natlabprep tool, and emits
    a dynamic JSON matrix of every TestX function it finds in the two
    opt-in packages.
  - "test" is a per-test matrix that depends on prepare. Each matrix
    job restores the shared caches and runs a single test, so adding
    a new TestFoo is automatically picked up on the next run without
    any workflow edits.

Rename the existing natlab-integrationtest.yml to natlab-basic.yml
since it's the small smoke variant (just TestEasyEasy on every PR);
the new natlab-test.yml is the bigger suite. The job inside is
renamed to EasyEasy for the same reason.

Move the macOS arm64 host check from vmtest.Env.Start into
vmtest.Env.AddNode so a test that adds a vmtest.MacOS node skips
immediately on a non-macOS host, and add an explicit
skipIfNotMacOSArm64 helper at the top of the two macOS-only tests
so the platform requirement is obvious to readers.

Quiet the takeAgentConnOne miss log in tstest/natlab/vnet by default
(it was the overwhelming majority of bytes in CI logs, with no signal
in healthy runs) and replace it with a periodic "still waiting" line
that only fires after 10s, so a truly stuck agent connection still
surfaces.

Updates #13038

Change-Id: I4582098d8865200fd5a73a9b696942319ccf3bf0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit is contained in:
Brad Fitzpatrick
2026-05-06 20:07:45 +00:00
committed by Brad Fitzpatrick
parent 4eec4423b4
commit e062b46984
9 changed files with 454 additions and 78 deletions
+45
View File
@@ -0,0 +1,45 @@
# Run a single natlab smoke test on every PR. The full natlab suite
# is opt-in and lives in .github/workflows/natlab-test.yml.
# See https://github.com/tailscale/tailscale/issues/13038
name: "natlab-basic"
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
on:
push:
branches:
- "main"
- "release-branch/*"
pull_request:
# all PRs on all branches
merge_group:
branches:
- "main"
jobs:
EasyEasy:
runs-on: ubuntu-latest
steps:
- name: Check out code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Enable KVM
run: |
echo 'KERNEL=="kvm", GROUP="kvm", MODE="0666", OPTIONS+="static_node=kvm"' | sudo tee /etc/udev/rules.d/99-kvm4all.rules
sudo udevadm control --reload-rules
sudo udevadm trigger --name-match=kvm
- name: Install qemu
run: |
sudo rm -f /var/lib/man-db/auto-update
sudo apt-get -y update
sudo apt-get -y remove man-db
sudo apt-get install -y qemu-system-x86 qemu-utils
- name: Build VM image
# The test will build this if missing, but we do it explicitly
# to avoid cutting into the go test -timeout budget, and to
# fail earlier with a clearer error if the image build breaks.
run: |
make -C gokrazy natlab
- name: Run natlab integration tests
run: |
./tool/go test -v -run=^TestEasyEasy$ -timeout=3m -count=1 ./tstest/integration/nat --run-vm-tests