為 Go LLM Agent 建立雙層自動測試系統

為 LLM Agent 寫測試面臨兩難：呼叫真實 API 慢且貴，但完全 mock 又無法驗證 Agent 行為。雙層系統以 live mode 錄製 fixture，再以 regression mode 零 LLM 重播，同時解決兩個問題。

架構概覽

Story YAML
    │
    ▼
  Runner
    ├──→ [live mode]       ADK HTTP → JSONL log → fixture JSON
    └──→ [regression mode] ToolRegistry → DB → 驗證狀態

Live mode：向真實 ADK server 發送訊息，從 OTEL JSONL log 擷取工具呼叫，存成 fixture
Regression mode：載入 fixture，直接呼叫 ToolRegistry，不碰 LLM，驗證 DB 狀態與回應內容

Story YAML 格式

每個 scenario 描述一個測試情境：

stories:
  - id: "create_order_basic"
    description: "建立簡單訂單"
    depends_on: []          # 前置 story（先執行）
    db_seed: "seeds/basic.sql"
    steps:
      - user_message: "幫我建立一筆訂單，客戶是王小明"
        tool_calls:         # 子序列匹配，順序相對即可
          - name: "get_customer"
          - name: "create_order"
        response_contains:  # 回應需包含以下任一字串
          - "訂單已建立"
          - "order_id"
    db_state:               # 執行後 DB 斷言
      orders:
        count: 1

tool_calls 採子序列匹配：fixture 中只要依序出現這些工具呼叫即符合，中間可有其他呼叫。

Fixture JSON 格式

路徑規則：tests/fixtures/<module>/<story>/<scenario>.json

{
  "story_id": "create_order_basic",
  "recorded_at": "2025-03-20T10:30:00Z",
  "steps": [
    {
      "user_message": "幫我建立一筆訂單，客戶是王小明",
      "tool_calls": [
        {
          "name": "get_customer",
          "args": {"name": "王小明"},
          "result": {"id": 1, "name": "王小明"}
        },
        {
          "name": "create_order",
          "args": {"customer_id": 1},
          "result": {"order_id": 42}
        }
      ],
      "final_response": "訂單已建立，訂單編號 42"
    }
  ]
}

Live Mode 流程

ResetAndSeed(db_seed)
  → CreateSession(ADK HTTP /sessions)
  → Send(user_message)
  → WaitForLogFlush(JSONL)         ← 等 OTEL BatchProcessor flush
  → ExtractToolCalls(JSONL)        ← event="gen_ai.choice"
  → MatchSubsequence(expected)     ← 驗證工具呼叫順序
  → SaveFixture(fixture_path)

Regression Mode 流程

ResetAndSeed(db_seed)
  → LoadFixture(fixture_path)
  → for each step:
      ToolRegistry.Execute(tool_calls)  ← 直接呼叫，跳過 LLM
  → CheckDBState(db_state assertions)
  → CheckResponse(response_contains)

不需要 GOOGLE_API_KEY，執行速度快 10-50x。

JSONL 解析路徑

OTEL trace log 的工具呼叫藏在以下路徑：

event = "gen_ai.choice"
  body
    └─ content
         └─ parts[]
               └─ functionCall
                    ├─ name
                    └─ args

解析時過濾 event == "gen_ai.choice"，再遞迴取出所有 functionCall。

CI/CD 設定

GitHub Actions 中 regression mode 不需要 API key：

services:
  postgres:
    image: postgres:16
    env:
      POSTGRES_PASSWORD: password
      POSTGRES_DB: products
    ports:
      - 5432:5432
    options: >-
      --health-cmd pg_isready
      --health-interval 10s

steps:
  - name: Run regression tests
    env:
      POSTGRES_URL: postgresql://postgres:password@localhost:5432/products?sslmode=disable
      # 無需 GOOGLE_API_KEY
    run: go test ./tests/... -run TestRegression -v

⚠️ 踩坑：`package main` 無法被 import

MCP server 的工具邏輯若放在 cmd/commerce-db-mcp/main.go（package main），測試無法直接 import。

解法：將 handler 函式抽出到 internal/mcptools/，main.go 只做 wiring。測試與 regression runner 都 import internal/mcptools。

internal/
  mcptools/
    products.go    ← 工具實作
    orders.go
    ...
cmd/
  commerce-db-mcp/
    main.go        ← 只 import internal/mcptools 並註冊