Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces an AI-driven GitHub Actions workflow to select and run relevant LISA test cases for PR validation, along with a dedicated runbook, and a small tweak to provisioning smoke-test panic labeling.
Changes:
- Added a new GitHub Actions workflow (
ai-test-selection.yml) that calls a selector script to choose test cases and then runs LISA in Azure. - Added an AI test selection helper script (
.github/scripts/ai_select_cases.py) and a new runbook (lisa/microsoft/runbook/ai_selected.yml) intended for running selected cases. - Updated provisioning smoke-test serial-console panic stage label in one failure path.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
lisa/microsoft/testsuites/core/provisioning.py |
Adjusts the SerialConsole.check_panic() stage string used when SSH port readiness fails. |
lisa/microsoft/runbook/ai_selected.yml |
Adds a runbook intended to run testcases by a variable-provided name pattern. |
.github/workflows/ai-test-selection.yml |
Adds a multi-job workflow to select tests via GitHub Models, run selected LISA tests on Azure, and report results back to the PR. |
.github/scripts/ai_select_cases.py |
Implements test enumeration + GitHub Models call + outputs selected case list and a runbook fragment. |
e01ca69 to
21a140b
Compare
921f980 to
ae79977
Compare
ae79977 to
9728e3b
Compare
9728e3b to
86a4845
Compare
| on: | ||
| pull_request: | ||
| branches: | ||
| - main |
There was a problem hiding this comment.
This workflow runs on pull_request and uses Azure secrets/OIDC. For PRs from forks, secrets are not available (leading to failures), and more importantly, you generally don’t want to run privileged Azure jobs for untrusted contributors. Add a guard (job-level if) to only run the Azure portions when the PR is from the same repository (e.g., github.event.pull_request.head.repo.fork == false) and/or when the author is trusted, while still allowing the “select tests” and “report” steps to run safely.
| - name: Azure Login (OIDC) | ||
| uses: azure/login@v2 | ||
| with: | ||
| client-id: ${{ secrets.AZURE_CLIENT_ID }} | ||
| tenant-id: ${{ secrets.AZURE_TENANT_ID }} | ||
| subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} |
There was a problem hiding this comment.
This workflow runs on pull_request and uses Azure secrets/OIDC. For PRs from forks, secrets are not available (leading to failures), and more importantly, you generally don’t want to run privileged Azure jobs for untrusted contributors. Add a guard (job-level if) to only run the Azure portions when the PR is from the same repository (e.g., github.event.pull_request.head.repo.fork == false) and/or when the author is trusted, while still allowing the “select tests” and “report” steps to run safely.
| permissions: | ||
| contents: read | ||
| pull-requests: write | ||
| id-token: write | ||
| models: read |
There was a problem hiding this comment.
The workflow grants pull-requests: write and id-token: write at the top level, but the ai-select-tests job checks out and executes PR-controlled code (.github/scripts/ai_select_cases.py). This combination is a security risk (a PR can modify the script to use the elevated token/OIDC). Reduce permissions to the minimum per-job (e.g., ai-select-tests: contents: read, pull-requests: read, models: read; no id-token), and only grant pull-requests: write to the report job; also consider checking out trusted code for the selector or setting persist-credentials: false.
| TOTAL=$(gh api /repos/${{ github.repository }}/commits/$SHA/check-runs \ | ||
| --jq '[.check_runs[] | select(.name | test("flake8|pylint|mypy|shellcheck|docs|test|example"; "i"))] | length') | ||
| if [ "$TOTAL" -gt 0 ]; then |
There was a problem hiding this comment.
The workflow calls the Checks API endpoint /commits/$SHA/check-runs, which typically requires checks: read permission. The workflow permissions block doesn't request checks: read, so this step may 403 and fail the job. Add checks: read (or switch to an endpoint covered by existing permissions) so the CI-status gate is reliable.
| fi | ||
|
|
||
| PR_NUMBER="${{ needs.ai-select-tests.outputs.pr_number }}" | ||
| gh pr comment $PR_NUMBER --body "$(echo -e "$BODY")" |
There was a problem hiding this comment.
If ai-select-tests fails before setting outputs (e.g., CI gate exits non-zero), report still runs (if: always()) but needs.ai-select-tests.outputs.pr_number will be empty and gh pr comment will fail. Make the report job re-determine the PR number (or guard when it's missing) and consider making the CI gate a soft-skip that sets case_count=0 rather than failing the job.
| gh pr comment $PR_NUMBER --body "$(echo -e "$BODY")" | |
| if [ -z "$PR_NUMBER" ]; then | |
| PR_NUMBER="${{ github.event.pull_request.number }}" | |
| fi | |
| if [ -z "$PR_NUMBER" ]; then | |
| echo "No PR number available; skipping PR comment." | |
| exit 0 | |
| fi | |
| gh pr comment "$PR_NUMBER" --body "$(echo -e "$BODY")" |
| # ── Step 2: Run selected LISA test cases on Azure ── | ||
| run-lisa-tests: | ||
| name: Run LISA Tests | ||
| needs: ai-select-tests | ||
| if: needs.ai-select-tests.outputs.case_count > 0 | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 120 | ||
|
|
||
| permissions: | ||
| id-token: write | ||
| contents: read | ||
|
|
||
| steps: | ||
| - name: Checkout trusted code | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| ref: ${{ github.event.repository.default_branch }} | ||
| path: trusted | ||
|
|
||
| - name: Azure Login (OIDC) | ||
| uses: azure/login@v2 | ||
| with: | ||
| client-id: ${{ secrets.AZURE_CLIENT_ID }} | ||
| tenant-id: ${{ secrets.AZURE_TENANT_ID }} | ||
| subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} | ||
|
|
There was a problem hiding this comment.
run-lisa-tests uses Azure secrets and runs on the pull_request event. For PRs from forks, secrets won't be available and this job will fail (potentially blocking contributions). Add a guard to only run this job for same-repo PRs (or move the secret-using portion behind workflow_dispatch / an approval gate).
| name: AI Test Case Selection | ||
|
|
||
| on: |
There was a problem hiding this comment.
The PR description is still the template with the main Description/Type/Checklist sections left blank. Please add a short description of what this workflow/selector is intended to do and why it’s being added so reviewers can validate scope and risk.
| export PR_CHANGED_FILES=$(cat /tmp/changed_files.txt) | ||
| export PR_DIFF=$(head -c 30000 /tmp/pr_diff.txt) |
There was a problem hiding this comment.
export PR_CHANGED_FILES=$(cat ...) and export PR_DIFF=$(head -c ...) are unquoted command substitutions. Because the diff/file list contains spaces and newlines, this will break the export command (word-splitting/globbing) and likely fail the workflow before the Python script runs. Quote these expansions (or pass file paths to the script and read the files in Python) to preserve the content correctly.
| export PR_CHANGED_FILES=$(cat /tmp/changed_files.txt) | |
| export PR_DIFF=$(head -c 30000 /tmp/pr_diff.txt) | |
| export PR_CHANGED_FILES="$(cat /tmp/changed_files.txt)" | |
| export PR_DIFF="$(head -c 30000 /tmp/pr_diff.txt)" |
| # Delete resource groups created by LISA (prefixed with 'intlisa') | ||
| for rg in $(az group list --query "[?starts_with(name, 'intlisa')].name" -o tsv); do |
There was a problem hiding this comment.
The cancellation cleanup deletes resource groups with prefix intlisa, but there are no references in the codebase to LISA creating RGs with that prefix; LISA commonly generates names starting with lisa-. As written, this cleanup likely won't remove leaked resources on cancellation. Align the prefix/tag filter with actual RG naming (or capture the RG name from LISA outputs and delete that specific RG).
| # Delete resource groups created by LISA (prefixed with 'intlisa') | |
| for rg in $(az group list --query "[?starts_with(name, 'intlisa')].name" -o tsv); do | |
| # Delete resource groups created by LISA (prefixed with 'lisa-') | |
| for rg in $(az group list --query "[?starts_with(name, 'lisa-')].name" -o tsv); do |
| req = urllib.request.Request(url, data=data, headers=headers, method="POST") | ||
| with urllib.request.urlopen(req) as resp: | ||
| result = json.loads(resp.read().decode("utf-8")) | ||
| return result["choices"][0]["message"]["content"] |
There was a problem hiding this comment.
urllib.request.urlopen(req) is called without a timeout, so the workflow can hang until the job-level timeout if the Models endpoint stalls. Pass an explicit timeout (and consider catching URLError/HTTPError to fail gracefully) to keep the CI behavior predictable.
| return ( | ||
| "## PR Changed Files\n" | ||
| f"{changed_files}\n\n" | ||
| "## PR Diff (truncated to key changes)\n" | ||
| f"{diff[:8000]}\n\n" | ||
| "## Available Test Cases\n" | ||
| f"{case_summary}\n\n" | ||
| "Select the test cases that should run for this PR. " | ||
| "IMPORTANT: Avoid selecting stress tests (e.g. stress_reboot, " | ||
| "stress_*) unless the PR directly modifies stress test code or " | ||
| "core functionality that stress tests specifically validate. " | ||
| "Prefer lighter functional tests over stress tests.\n" | ||
| "Return a JSON array of case names only, e.g. " | ||
| '["smoke_test", "verify_cpu_count"].' | ||
| ) |
There was a problem hiding this comment.
diff[:8000] is a hard-coded truncation limit that materially affects test selection quality/cost but isn’t documented or centralized. Consider making this a named constant with a brief rationale (and keep it consistent with the workflow’s head -c 30000 limit) to make future tuning safer.
| export PR_CHANGED_FILES=$(cat /tmp/changed_files.txt) | ||
| export PR_DIFF=$(head -c 30000 /tmp/pr_diff.txt) |
There was a problem hiding this comment.
The environment variable exports lose newlines and can mangle diff/file lists due to word-splitting (e.g., export PR_CHANGED_FILES=$(cat ...)). Quote the command substitutions so the Python script receives the intended content verbatim.
| export PR_CHANGED_FILES=$(cat /tmp/changed_files.txt) | |
| export PR_DIFF=$(head -c 30000 /tmp/pr_diff.txt) | |
| export PR_CHANGED_FILES="$(cat /tmp/changed_files.txt)" | |
| export PR_DIFF="$(head -c 30000 /tmp/pr_diff.txt)" |
| --jq '[.check_runs[] | select(.name | test("flake8|pylint|mypy|shellcheck|docs|test|example"; "i")) | select(.conclusion == "failure")] | length') | ||
| if [ "$FAILED" -gt 0 ]; then | ||
| echo "CI has $FAILED failed checks, skipping AI test selection" | ||
| exit 1 |
There was a problem hiding this comment.
The CI-status gate says it’s “skipping AI test selection” when failures are detected, but it exits with code 1, which fails the job/workflow. If the intent is to skip selection (not fail), consider exiting 0 and setting outputs (case_count=0, case_names=) so downstream jobs and reporting behave predictably.
| exit 1 | |
| # Do not fail the workflow; indicate that no tests should be selected. | |
| if [ -n "${GITHUB_OUTPUT:-}" ]; then | |
| echo "case_count=0" >> "$GITHUB_OUTPUT" | |
| echo "case_names=" >> "$GITHUB_OUTPUT" | |
| fi | |
| exit 0 |
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 0 | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: "3.12" | ||
|
|
||
| - name: Get PR diff | ||
| env: | ||
| GH_TOKEN: ${{ github.token }} | ||
| run: | | ||
| gh pr diff ${{ steps.get-pr.outputs.pr_number }} --name-only > /tmp/changed_files.txt | ||
| gh pr diff ${{ steps.get-pr.outputs.pr_number }} > /tmp/pr_diff.txt | ||
| echo "Changed files:" | ||
| cat /tmp/changed_files.txt | ||
|
|
||
| - name: Run AI test case selector | ||
| id: select | ||
| env: | ||
| GITHUB_TOKEN: ${{ github.token }} | ||
| RUNBOOK_OUTPUT: ai_selected_cases.yml | ||
| run: | | ||
| export PR_CHANGED_FILES=$(cat /tmp/changed_files.txt) | ||
| export PR_DIFF=$(head -c 30000 /tmp/pr_diff.txt) | ||
| python .github/scripts/ai_select_cases.py | ||
|
|
There was a problem hiding this comment.
This workflow checks out and executes .github/scripts/ai_select_cases.py from the PR branch while granting pull-requests: write and models: read. That allows untrusted PR code to run with write permissions (and could be abused to spam/modify PR comments, etc.). Consider checking out trusted code (default branch) for the selector script (similar to the trusted/ approach used later), or switching to pull_request_target with a trusted checkout, and only using the PR diff/content via GitHub API.
✅ AI Test Selection — PASSED12 test case(s) selected (view run)
Test case details
|
d442d76 to
1a36c17
Compare
✅ AI Test Selection — PASSED12 test case(s) selected (view run)
Test case details
|
1a36c17 to
a0d87e5
Compare
| 1. Reads the PR diff (changed files + patch) from environment variables | ||
| set by the GitHub Actions workflow. | ||
| 2. Enumerates all LISA test cases under microsoft/testsuites. | ||
| 3. Calls the GitHub Models API (GPT-4o) to decide which test cases | ||
| are relevant to the code changes. |
There was a problem hiding this comment.
Docstring says test cases are enumerated under microsoft/testsuites, but the script’s primary search path is lisa/microsoft/testsuites (with a fallback). Consider updating the docstring to match the actual directory layout to avoid confusion for maintainers.
| name: AI Test Case Selection | ||
|
|
||
| on: |
There was a problem hiding this comment.
The PR description template is still largely blank (no concrete description of what/why, and no test validation filled in). Please add a description explaining what this PR does and why.
| --jq '[.check_runs[] | select(.name | test("flake8|pylint|mypy|shellcheck|docs|test|example"; "i")) | select(.conclusion == "failure")] | length') | ||
| if [ "$FAILED" -gt 0 ]; then | ||
| echo "CI has $FAILED failed checks, skipping AI test selection" | ||
| exit 1 |
There was a problem hiding this comment.
This step exits with status 1 when other CI checks have failed. That will mark the AI workflow itself as failed (adding another red check) even though the intent is to skip selection. Consider exiting 0 and setting outputs (e.g., case_count=0) or using continue-on-error so the workflow reports a clean skip instead of a failure.
| exit 1 | |
| exit 0 |
| run-lisa-tests: | ||
| name: Run LISA Tests | ||
| needs: ai-select-tests | ||
| if: needs.ai-select-tests.outputs.case_count > 0 |
There was a problem hiding this comment.
This workflow runs Azure OIDC login and uses repository secrets. On pull_request events from forks, secrets aren’t available and you generally don’t want to attempt Azure runs for untrusted PRs. Add an explicit guard (e.g., only run when github.event.pull_request.head.repo.fork == false) to avoid noisy failures and reduce exposure.
| if: needs.ai-select-tests.outputs.case_count > 0 | |
| if: > | |
| needs.ai-select-tests.outputs.case_count > 0 && | |
| (github.event_name != 'pull_request' || github.event.pull_request.head.repo.fork == false) |
| import xml.etree.ElementTree as ET | ||
| import glob, os | ||
|
|
||
| junit_files = glob.glob("lisa-results/**/lisa.junit.xml", recursive=True) | ||
| if not junit_files: | ||
| exit(0) | ||
|
|
||
| tree = ET.parse(junit_files[0]) | ||
| root = tree.getroot() | ||
|
|
||
| rows = [] | ||
| passed = failed = skipped = 0 | ||
| for suite in root.iter("testsuite"): | ||
| for tc in suite.iter("testcase"): | ||
| name = tc.get("name", "unknown") | ||
| time_s = tc.get("time", "0") | ||
| failure = tc.find("failure") | ||
| skip = tc.find("skipped") | ||
| if failure is not None: | ||
| status = "❌ FAILED" | ||
| msg = (failure.get("message") or "")[:120] | ||
| failed += 1 | ||
| elif skip is not None: | ||
| status = "⏭️ SKIPPED" | ||
| msg = (skip.get("message") or "")[:120] | ||
| skipped += 1 | ||
| else: | ||
| status = "✅ PASSED" | ||
| msg = "" | ||
| passed += 1 | ||
| rows.append((name, status, time_s, msg)) | ||
|
|
||
| # Summary |
There was a problem hiding this comment.
The embedded Python here-doc is indented (the Python code starts with leading spaces, and the PYEOF terminator is also indented later). Bash requires the heredoc terminator to start at column 0 unless using <<-, and the current indentation will also trigger Python IndentationError. Left-align the heredoc content/terminator (or switch to python3 - <<'PYEOF' / <<- with tabs) so this block runs correctly.
| import xml.etree.ElementTree as ET | |
| import glob, os | |
| junit_files = glob.glob("lisa-results/**/lisa.junit.xml", recursive=True) | |
| if not junit_files: | |
| exit(0) | |
| tree = ET.parse(junit_files[0]) | |
| root = tree.getroot() | |
| rows = [] | |
| passed = failed = skipped = 0 | |
| for suite in root.iter("testsuite"): | |
| for tc in suite.iter("testcase"): | |
| name = tc.get("name", "unknown") | |
| time_s = tc.get("time", "0") | |
| failure = tc.find("failure") | |
| skip = tc.find("skipped") | |
| if failure is not None: | |
| status = "❌ FAILED" | |
| msg = (failure.get("message") or "")[:120] | |
| failed += 1 | |
| elif skip is not None: | |
| status = "⏭️ SKIPPED" | |
| msg = (skip.get("message") or "")[:120] | |
| skipped += 1 | |
| else: | |
| status = "✅ PASSED" | |
| msg = "" | |
| passed += 1 | |
| rows.append((name, status, time_s, msg)) | |
| # Summary | |
| import xml.etree.ElementTree as ET | |
| import glob, os | |
| junit_files = glob.glob("lisa-results/**/lisa.junit.xml", recursive=True) | |
| if not junit_files: | |
| exit(0) | |
| tree = ET.parse(junit_files[0]) | |
| root = tree.getroot() | |
| rows = [] | |
| passed = failed = skipped = 0 | |
| for suite in root.iter("testsuite"): | |
| for tc in suite.iter("testcase"): | |
| name = tc.get("name", "unknown") | |
| time_s = tc.get("time", "0") | |
| failure = tc.find("failure") | |
| skip = tc.find("skipped") | |
| if failure is not None: | |
| status = "❌ FAILED" | |
| msg = (failure.get("message") or "")[:120] | |
| failed += 1 | |
| elif skip is not None: | |
| status = "⏭️ SKIPPED" | |
| msg = (skip.get("message") or "")[:120] | |
| skipped += 1 | |
| else: | |
| status = "✅ PASSED" | |
| msg = "" | |
| passed += 1 | |
| rows.append((name, status, time_s, msg)) | |
| # Summary |
❌ AI Test Selection — FAILED103 test case(s) selected (view run)
Test case details
|
johnsongeorge-w
left a comment
There was a problem hiding this comment.
Review the copilot review recommendation.
Description
Related Issue
Type of Change
Checklist
Test Validation
Key Test Cases:
Impacted LISA Features:
Tested Azure Marketplace Images:
Test Results