fix: community PRs + security hardening + E2E stability (v0.12.7.0) by garrytan · Pull Request #552 · garrytan/gstack

garrytan · 2026-03-27T04:31:43Z

Summary

7 community PRs merged, reviewed, and security-audited:

Security hardening: JSON validation gate in review-log (fix(security): validate JSON input in gstack-review-log #472), telemetry input sanitization (fix(security): sanitize telemetry JSONL inputs against injection #469), dotfile filtering in skill discovery (fix(security): skip hidden directories in skill template discovery #433)
Bug fixes: File path detection for ./ paths (fix: screenshot misidentifies relative dot-paths as CSS selectors #499), build chain resilience (fix: decouple doc generation from binary compilation in build script #525), update-check fall-through after upgrade (fix: auto-upgrade marker no longer masks newer remote versions #523)
Infrastructure: Host-specific co-author trailers for Codex/Claude (fix: Codex skills use Claude co-author in commit messages #521)
10 new security tests covering injection vectors, validation gates, and edge cases
E2E test stability: Fixed browse-basic, ship-base-branch, and dashboard-via tests by extracting relevant SKILL.md sections instead of copying full 1900-line files
Removed unreliable journey-think-bigger test (ambiguous routing signal, 10 other journey tests cover routing)

Test Coverage

All free tests pass. E2E evals verified:

browse-basic: PASS (was FAIL)
ship-base-branch: PASS (was flaky)
review-dashboard-via: PASS (was timeout)
10 new security unit tests all pass

Pre-Landing Review

No issues found. All changes are shell scripts, TypeScript utilities, and test files. No SQL, no LLM trust boundaries.

Test plan

bun test passes (free tests)
E2E evals pass for browse-basic, ship-base-branch, dashboard-via
Security tests pass for telemetry injection, review-log validation, dotfile filtering

🤖 Generated with Claude Code

discoverTemplates() scans subdirectories for SKILL.md.tmpl files but only skips node_modules, .git, and dist. Hidden directories like .claude/, .agents/, and .codex/ (which contain symlinked skill installs) were being scanned, allowing a malicious .tmpl in a symlinked skill to inject into the generation pipeline. Fix: add !d.name.startsWith('.') to the subdirs() filter. This skips all dot-prefixed directories, matching the standard convention that hidden dirs are not source code.

SKILL, OUTCOME, SESSION_ID, SOURCE, and EVENT_TYPE values go directly into printf %s for JSONL output. If any contain double quotes, backslashes, or newlines, the JSON breaks — or worse, injects arbitrary fields. Fix: strip quotes, backslashes, and control characters from all string fields before JSONL construction via json_safe() helper.

gstack-review-log appends its argument directly to a JSONL file with no validation. Malformed or crafted input could corrupt the review log or inject arbitrary content. Fix: validate input is parseable JSON via python3 before appending. Reject with exit 1 and stderr message if invalid.

Closes #495

Codex-generated skills hardcoded a Claude co-author trailer in commit messages. Users running gstack under Codex pushed commits attributed to the wrong AI assistant. Add {{CO_AUTHOR_TRAILER}} resolver that emits the correct trailer based on ctx.host: - claude: Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> - codex: Co-Authored-By: OpenAI Codex <noreply@openai.com> Replace hardcoded trailers in ship/SKILL.md.tmpl and document-release/SKILL.md.tmpl with the resolver placeholder. Fixes #282. Fixes #383. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When a just-upgraded-from marker persists across sessions, the update check would write UP_TO_DATE to cache and exit immediately — never fetching the remote VERSION. Users silently miss updates that landed after their last upgrade. Remove the early exit and premature cache write so the script falls through to the remote check after consuming the marker. This ensures JUST_UPGRADED is still emitted for the preamble, while also detecting any newer versions available upstream. Fixes #515

The build script chains gen:skill-docs and bun build --compile with &&, so a doc generation failure (e.g. missing Codex host config, template error) prevents the browse binary from being compiled. Users end up with a broken install where setup reports the binary is missing. Replace && with ; for the two gen:skill-docs steps so they run independently of the compilation chain. Doc generation errors are still visible in stderr, but no longer block binary compilation. Fixes #482

… PRs - Extend json_safe() to ERROR_CLASS and FAILED_STEP fields - Improve ERROR_MESSAGE escaping to handle backslashes and newlines - Replace python3 with bun for JSON validation in gstack-review-log - Add 7 telemetry injection prevention tests - Add 2 review-log JSON validation tests - Add 1 discover-skills hidden directory filtering test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…oard-via) browse-basic: bump maxTurns 5→7 (agent reads PNG per SKILL.md instruction) ship-base-branch: extract Step 0 only instead of full 1900-line ship/SKILL.md dashboard-via: extract dashboard section only + increase timeout 90s→180s Root cause: copying full SKILL.md files into test fixtures caused context bloat, leading to timeouts and flaky turn limits. Extracting only the relevant section cut dashboard-via from timing out at 240s to finishing in 38s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Never copy full SKILL.md files into E2E test fixtures. Extract only the section the test needs. Also: run targeted evals in foreground, never pkill and restart mid-run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use exact trigger phrases from plan-ceo-review skill description ("think bigger", "expand scope", "ambitious enough") instead of the ambiguous "thinking too small". Reduce maxTurns 5→3 to cut cost per attempt ($0.12 vs $0.25). Test remains periodic tier since LLM routing is inherently non-deterministic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Never passed reliably. Tests ambiguous routing ("think bigger" → plan-ceo-review) but Claude legitimately answers directly instead of invoking a skill. The other 10 journey tests cover routing with clear, actionable signals. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-27T04:42:09Z

E2E Evals: ✅ PASS

67/67 tests passed | $7.14 total cost | 12 parallel runners

Suite	Result	Status	Cost
e2e-browse	7/7	✅	$0.34
e2e-deploy	6/6	✅	$1.04
e2e-design	3/3	✅	$0.51
e2e-plan	7/7	✅	$1.09
e2e-qa-workflow	3/3	✅	$1.06
e2e-review	6/6	✅	$1.05
e2e-workflow	4/4	✅	$0.51
llm-judge	25/25	✅	$0.5
e2e-deploy	6/6	✅	$1.04

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

…arrytan#552) * fix(security): skip hidden directories in skill template discovery discoverTemplates() scans subdirectories for SKILL.md.tmpl files but only skips node_modules, .git, and dist. Hidden directories like .claude/, .agents/, and .codex/ (which contain symlinked skill installs) were being scanned, allowing a malicious .tmpl in a symlinked skill to inject into the generation pipeline. Fix: add !d.name.startsWith('.') to the subdirs() filter. This skips all dot-prefixed directories, matching the standard convention that hidden dirs are not source code. * fix(security): sanitize telemetry JSONL inputs against injection SKILL, OUTCOME, SESSION_ID, SOURCE, and EVENT_TYPE values go directly into printf %s for JSONL output. If any contain double quotes, backslashes, or newlines, the JSON breaks — or worse, injects arbitrary fields. Fix: strip quotes, backslashes, and control characters from all string fields before JSONL construction via json_safe() helper. * fix(security): validate JSON input in gstack-review-log gstack-review-log appends its argument directly to a JSONL file with no validation. Malformed or crafted input could corrupt the review log or inject arbitrary content. Fix: validate input is parseable JSON via python3 before appending. Reject with exit 1 and stderr message if invalid. * fix: treat relative dot-paths as file paths in screenshot command Closes garrytan#495 * fix: use host-specific co-author trailer in /ship and /document-release Codex-generated skills hardcoded a Claude co-author trailer in commit messages. Users running gstack under Codex pushed commits attributed to the wrong AI assistant. Add {{CO_AUTHOR_TRAILER}} resolver that emits the correct trailer based on ctx.host: - claude: Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> - codex: Co-Authored-By: OpenAI Codex <noreply@openai.com> Replace hardcoded trailers in ship/SKILL.md.tmpl and document-release/SKILL.md.tmpl with the resolver placeholder. Fixes garrytan#282. Fixes garrytan#383. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: auto-upgrade marker no longer masks newer remote versions When a just-upgraded-from marker persists across sessions, the update check would write UP_TO_DATE to cache and exit immediately — never fetching the remote VERSION. Users silently miss updates that landed after their last upgrade. Remove the early exit and premature cache write so the script falls through to the remote check after consuming the marker. This ensures JUST_UPGRADED is still emitted for the preamble, while also detecting any newer versions available upstream. Fixes garrytan#515 * fix: decouple doc generation from binary compilation in build script The build script chains gen:skill-docs and bun build --compile with &&, so a doc generation failure (e.g. missing Codex host config, template error) prevents the browse binary from being compiled. Users end up with a broken install where setup reports the binary is missing. Replace && with ; for the two gen:skill-docs steps so they run independently of the compilation chain. Doc generation errors are still visible in stderr, but no longer block binary compilation. Fixes garrytan#482 * fix: extend security sanitization + add 10 tests for merged community PRs - Extend json_safe() to ERROR_CLASS and FAILED_STEP fields - Improve ERROR_MESSAGE escaping to handle backslashes and newlines - Replace python3 with bun for JSON validation in gstack-review-log - Add 7 telemetry injection prevention tests - Add 2 review-log JSON validation tests - Add 1 discover-skills hidden directory filtering test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: stabilize flaky E2E tests (browse-basic, ship-base-branch, dashboard-via) browse-basic: bump maxTurns 5→7 (agent reads PNG per SKILL.md instruction) ship-base-branch: extract Step 0 only instead of full 1900-line ship/SKILL.md dashboard-via: extract dashboard section only + increase timeout 90s→180s Root cause: copying full SKILL.md files into test fixtures caused context bloat, leading to timeouts and flaky turn limits. Extracting only the relevant section cut dashboard-via from timing out at 240s to finishing in 38s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add E2E fixture extraction rule to CLAUDE.md Never copy full SKILL.md files into E2E test fixtures. Extract only the section the test needs. Also: run targeted evals in foreground, never pkill and restart mid-run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: stabilize journey-think-bigger routing test Use exact trigger phrases from plan-ceo-review skill description ("think bigger", "expand scope", "ambitious enough") instead of the ambiguous "thinking too small". Reduce maxTurns 5→3 to cut cost per attempt ($0.12 vs $0.25). Test remains periodic tier since LLM routing is inherently non-deterministic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * remove: delete journey-think-bigger routing test Never passed reliably. Tests ambiguous routing ("think bigger" → plan-ceo-review) but Claude legitimately answers directly instead of invoking a skill. The other 10 journey tests cover routing with clear, actionable signals. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.12.7.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Arun Kumar Thiagarajan <arunkt.bm14@gmail.com> Co-authored-by: bluzername <bluzer@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Greg Jackson <gregario@users.noreply.github.com>

HMAKT99 and others added 22 commits March 24, 2026 13:35

fix: treat relative dot-paths as file paths in screenshot command

e2b8567

Closes #495

Merge branch 'pr-433' into garrytan/merge-open-prs

8f298f7

Merge branch 'pr-469' into garrytan/merge-open-prs

f36bb26

Merge branch 'pr-472' into garrytan/merge-open-prs

fbd4786

Merge branch 'pr-525' into garrytan/merge-open-prs

ddaf622

Merge branch 'pr-499' into garrytan/merge-open-prs

0dc1899

Merge branch 'pr-523' into garrytan/merge-open-prs

6ac9d35

Merge branch 'pr-521' into garrytan/merge-open-prs

3fd14b8

Merge remote-tracking branch 'origin/main' into garrytan/merge-open-prs

eae56ff

Merge remote-tracking branch 'origin/main' into garrytan/merge-open-prs

35bc49e

chore: bump version and changelog (v0.12.7.0)

0ce5ea8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

garrytan merged commit b343ba2 into main Mar 27, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: community PRs + security hardening + E2E stability (v0.12.7.0)#552

fix: community PRs + security hardening + E2E stability (v0.12.7.0)#552
garrytan merged 22 commits intomainfrom
garrytan/merge-open-prs

garrytan commented Mar 27, 2026

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

garrytan commented Mar 27, 2026

Summary

Test Coverage

Pre-Landing Review

Test plan

Uh oh!

github-actions bot commented Mar 27, 2026

E2E Evals: ✅ PASS

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants