TypeScript SDK
YAML remains AgentV’s canonical, portable eval format. The SDK surfaces below are for cases where you want to generate YAML-shaped definitions in code, embed eval runs inside another application, or write executable graders and prompt templates.
AgentV currently provides two npm packages for programmatic use:
@agentv/eval— custom assertions and code graders@agentv/core— programmatic evaluation API and typed configuration
Installation
Section titled “Installation”# Assertion SDK (defineAssertion, defineCodeGrader)npm install @agentv/eval
# Programmatic API (evaluate, defineConfig)npm install @agentv/coreChoose a Surface
Section titled “Choose a Surface”Use the simplest surface that matches the job:
- YAML / JSONL first for portable eval specs you want to run from the CLI, check into a repo, or share across TypeScript and Python workflows.
evaluate({ specFile })when you want library control around an existing YAML suite.- Inline
evaluate({ tests })when the eval definition truly belongs inside application code. The programmatic API mirrors YAML, but uses current TypeScript naming such asexpectedOutputandassert. defineAssertion/defineCodeGraderwhen the grading logic itself must execute code.
There is no separate first-party Python authoring SDK today. Python-facing workflows should either emit canonical YAML/JSONL or implement executable graders that consume the standard snake_case wire format.
Custom Assertions
Section titled “Custom Assertions”Use defineAssertion from @agentv/eval to create reusable assertion types. Place them in .agentv/assertions/ — they’re auto-discovered by filename.
Pass/Fail Pattern
Section titled “Pass/Fail Pattern”import { defineAssertion } from '@agentv/eval';
export default defineAssertion(({ output }) => { const wordCount = (output ?? '').trim().split(/\s+/).filter(Boolean).length; const pass = wordCount >= 3; return { pass, assertions: [{ text: `Output has ${wordCount} words`, passed: pass }], };});Score Pattern
Section titled “Score Pattern”Return a score (0–1) instead of pass for graded evaluation:
import { defineAssertion } from '@agentv/eval';
export default defineAssertion(({ output, traceSummary }) => { const hasContent = (output ?? '').length > 0 ? 0.5 : 0; const isEfficient = (traceSummary?.eventCount ?? 0) <= 10 ? 0.5 : 0; return { score: hasContent + isEfficient, reasoning: 'Checks content exists and is efficient', };});If only pass is given, score is 1 (pass) or 0 (fail).
Using in YAML
Section titled “Using in YAML”Convention-based discovery maps filename → assertion type:
.agentv/assertions/word-count.ts → type: word-count.agentv/assertions/sentiment.ts → type: sentimentReference directly in your eval file — no command: needed:
assertions: - type: word-count - type: contains value: "Hello"Code Graders
Section titled “Code Graders”Use defineCodeGrader from @agentv/eval for full control over scoring with an explicit assertions array:
import { defineCodeGrader } from '@agentv/eval';
export default defineCodeGrader(({ output, traceSummary }) => ({ score: (output ?? '').length > 0 && (traceSummary?.eventCount ?? 0) <= 5 ? 1.0 : 0.5, assertions: [ { text: 'Answer is not empty', passed: (output ?? '').length > 0 }, { text: 'Efficient tool usage', passed: (traceSummary?.eventCount ?? 0) <= 5 }, ],}));defineCodeGrader graders are referenced in YAML with type: code-grader and command: [bun, run, grader.ts]. defineAssertion uses convention-based discovery instead — just place in .agentv/assertions/ and reference by name.
For detailed patterns, input/output contracts, and language-agnostic examples, see Code Graders.
Wire Format vs SDK Format
Section titled “Wire Format vs SDK Format”Raw grader stdin uses snake_case because it crosses a process boundary and may be consumed by Python, shell, jq, or external dashboards. The @agentv/eval SDK converts that payload to idiomatic TypeScript camelCase before calling your handler.
| Raw stdin | SDK handler field |
|---|---|
expected_output | expectedOutput |
output_path | outputPath |
trace_summary | traceSummary |
token_usage | tokenUsage |
cost_usd | costUsd |
duration_ms | durationMs |
workspace_path | workspacePath |
output is already the final answer string in both formats. Transcript-aware code should read messages, trace.messages, or trace.events; answer-text graders should read output.
Programmatic API
Section titled “Programmatic API”Use evaluate() from @agentv/core to run evaluations as a library. The most portable pattern is still to keep the suite in YAML and point specFile at it; inline tests are best when the eval is tightly coupled to application code.
Inline Test Definitions
Section titled “Inline Test Definitions”import { evaluate } from '@agentv/core';
const { results, summary } = await evaluate({ tests: [ { id: 'greeting', input: 'Say hello', expectedOutput: 'Hello there!', assert: [{ type: 'contains', value: 'Hello' }], }, ],});
console.log(`${summary.passed}/${summary.total} passed`);Auto-discovers the default target from .agentv/targets.yaml and .env credentials.
File-Based via specFile
Section titled “File-Based via specFile”Point to an existing YAML eval instead of inlining tests:
import { evaluate } from '@agentv/core';
const { results, summary } = await evaluate({ specFile: './evals/my-eval.eval.yaml',});This is the recommended bridge when you want SDK control without creating a separate code-first eval surface.
Typed Configuration
Section titled “Typed Configuration”Create agentv.config.ts at your project root for type-safe, validated configuration using defineConfig() from @agentv/core:
import { defineConfig } from '@agentv/core';
export default defineConfig({ execution: { workers: 5, maxRetries: 2, verbose: true, otelFile: '.agentv/results/otel-{timestamp}.json', }, output: { dir: './results' }, limits: { maxCostUsd: 10.0 },});The config file is auto-discovered by the CLI from your project root and validated with Zod at startup.
Observability Export
Section titled “Observability Export”AgentV’s observability surface is OpenTelemetry. For post-run workflows:
- Use
agentv eval ... --otel-file traces/eval.otlp.jsonto write OTLP JSON you can import into systems such as Opik. - Use
agentv eval ... --export-otel --otel-backend <name>for live export when a built-in or local resolver exists.
AgentV does not currently ship a dedicated Opik authoring facade or built-in opik backend resolver. Keep the eval definition in YAML and route observability through OTLP export.
Scaffold Commands
Section titled “Scaffold Commands”Bootstrap new assertions and eval files from the CLI:
# Create a new assertion typeagentv create assertion <name> # → .agentv/assertions/<name>.ts
# Create a new eval with test casesagentv create eval <name> # → evals/<name>.eval.yaml + .cases.jsonl