Hand us your stack.
We send in the agents.
Share your frontend, backend, and a little data — we stand up an isolated sandbox of your product and run population-scale agent simulations through your real flows. No production access. No real PII.
Three things, that's it.
Frontend
A staging URL (or a build we can host) the agents can drive. Optional repo for deeper context, and a test account to log in.
Backend
Your API base + OpenAPI/GraphQL spec — or a container / docker-compose we spin up in the sandbox. Source repo optional, for tracing.
Data
A schema plus an anonymized or synthetic seed so flows have realistic state. We require no real PII.
One file describes it all.
One matraix.yaml is the entire handoff. It points us at your frontend, backend, and data, defines the input each flow expects, and declares the output you want back — so the report drops straight into your pipeline. The file is organized into six practical blocks (project/contact, frontend, backend, data, goals, segments/report), and each run returns two core artifacts: a human-readable report (report.json/.html) and machine-ready trajectories (trajectories.jsonl). Copy the file below, fill in your values, and send it over (or drop it in your repo root).
# matraix.yaml — describe your stack so the agents can run it
project: acme-checkout # slug we use to name your sandbox
contact: "eng@acme.com" # who we ping with questions
# ── 1. The surface the agents click through ──────────────
frontend:
url: "https://staging.acme.com" # reachable HTTPS URL the agents drive
repo: "github.com/acme/web" # optional, for deeper context
auth: # how agents sign in
type: test-account
user: "bot@acme.com"
secret: env:MATRAIX_PW # reference a secret — never paste it
# ── 2. The API behind every click (the I/O contract) ─────
backend:
api: "https://staging.acme.com/api"
spec: "openapi.yaml" # OpenAPI 3 / GraphQL SDL — request+response shapes
sandbox: "docker-compose.yml" # or a container image we spin up
repo: "github.com/acme/api" # optional source for tracing failures
# ── 3. Enough state to make flows realistic ──────────────
data:
schema: "schema.sql" # table / model definitions (.sql, prisma, JSON Schema)
seed: "anonymized_sample.csv" # synthetic or anonymized rows (.csv / .json / dump)
pii: none # we require no real PII
# ── 4. What a successful run looks like, end to end ──────
goals:
- flow: checkout
entry: "/cart" # where the agent starts
success: order_confirmed # event/state that counts as a win
inputs: # data the agent supplies, WITH format
- { field: card, type: string, format: "test PAN, e.g. 4242 4242 4242 4242" }
- { field: shipping_zip, type: string, format: "5-digit US ZIP" }
output: # what you want captured per attempt
schema: "order.schema.json" # JSON Schema of the result we record
metrics: [conversion rate, latency_ms, error rate]
# ── 5. Who the agents emulate ───────────────────────────
segments: [mobile, new_users, non-native English]
# ── 6. Where the report lands in your pipeline ──────────
report:
format: [json, html] # machine-readable + human-readable
webhook: "https://acme.com/hooks/matraix" # we POST results here when done
From your repo to results.
01Share
Send your matraix.yaml — frontend URL, backend spec, and an anonymized data seed. That's all we need to start.
02Sandbox
We stand up an isolated, per-customer sandbox of your stack — network-restricted, no production access, torn down after the run.
03Simulate
Millions of persona-weighted agents run your real flows end-to-end across 1,162 dimensions — the long tail included.
04Report
You get a segmented report — friction, lift, and failures by segment — plus full replays and training-ready trajectories.
Where it runs, and who can see it.
Q01Where does our code run?+
Q02Do you need real production data?+
Q03Who can access it, and can we revoke it?+
Q04What if we can't share source code?+
Ready when you are.
Send your matraix.yaml or just a staging link — we'll stand up the sandbox and walk through a first run on your flows.