Skip to main content

Upload via API

Upload interactions and queue AI evaluations via Voxjar's API

Upload API Reference (POST /api/v1/upload)

The canonical endpoint for getting interactions into Voxjar programmatically. One interaction per request. Handles audio, video, chat, email, and SMS, and can optionally queue an AI evaluation in the same call so you never have to chain upload → list scorecards → ai/queue.

This reference is derived directly from the endpoint implementation, so it reflects exactly what the code does today.

Authentication

Send a Bearer token in the Authorization header (an API token, user JWT, or session access token):

Authorization: Bearer YOUR_API_TOKEN

Generate API tokens under Settings → Developers. The account is taken from your token — you never pass an account ID.

Choosing a transport

Content-Type

Use it for

multipart/form-data

Uploading a raw audio/video file from disk

application/json

Everything else: downloadUrl, transcripts, chat, email, SMS

Multipart requests have two parts:

  • file — the raw audio/video bytes

  • payload — a JSON string containing all the fields below

Direct file uploads are bounded by the platform request limit (~32MB). For larger media, pass a downloadUrl instead. URL-fetched media is allowed up to 500MB.

Base64-encoded media in the JSON body is not supported. Use multipart or a downloadUrl.

Required fields

These three are always required, for every interaction type:

Field

Type

Notes

sourceIdentifier

string

Your unique ID for this interaction

direction

"INBOUND" | "OUTBOUND"

Anything else is rejected

timestamp

ISO 8601 string

When the interaction occurred — not upload time

Interaction type

interactionType may be one of AUDIO, VIDEO, CHAT, EMAIL, SMS. You can usually omit it — the server infers the type:

  • An email object present → EMAIL

  • A media file or downloadUrlAUDIO or VIDEO, auto-detected from the bytes

  • Transcript segments present → AUDIO

  • Plain transcript textSMS if short (≤ 320 chars, single line), otherwise CHAT

If the type can't be determined, the request returns 400 asking you to set it explicitly.

Content each type requires:

  • CHAT / SMS → must include transcript text

  • AUDIO / VIDEO → must include a file, a downloadUrl, or a transcript

  • EMAIL → provide an email object

Full payload reference

{
// --- required ---
"sourceIdentifier": "call-12345",
"direction": "INBOUND", // INBOUND | OUTBOUND
"timestamp": "2026-06-23T14:30:00Z", // ISO 8601, when it occurred

// --- optional ---
"interactionType": "AUDIO", // omit to auto-detect
"duration": 125000, // ms
"downloadUrl": "https://...", // http(s); private/internal hosts are blocked (SSRF)

"agent": { "name": "Jane", "email": "[email protected]", "phone": "+1...", "sourceIdentifier": "agent-7" },
"customer": { "name": "Bob", "email": "[email protected]", "phone": "+1...", "sourceIdentifier": "cust-9" },

"tags": ["VIP", "tag-id-or-name"], // matched by id OR name; must already exist on your account

// Filterable metadata
"customMetadata": {
"disposition": "sale", // STRING
"recordingConsent": true, // BOOLEAN
"dealValue": 50000 // INT (non-integer number → FLOAT)
},

"transcript": { /* see below */ },
"email": { /* see below */ },

// Auto-queue an AI evaluation once transcription completes:
"evaluation": { "scorecardId": "ACTIVE-scorecard-id" }
}

transcript object

Provide this when you already have a transcript — the system then skips speech-to-text and marks the transcript COMPLETED. Both forms below are accepted; you can send either or both.

"transcript": {
"text": "Agent: Hello\nCustomer: I need help", // plain-text fallback view
"language": "en", // EN/ES hint
"agentChannel": 0, // which channel is the agent
"segments": [ // Whisper format, e.g. OpenAI verbose_json
{
"start": 0.0, "end": 1.0, // seconds
"text": "Hello world", // used when `words` is absent
"channel": 0, // or "speaker"
"words": [{ "word": "Hello", "start": 0.0, "end": 0.5 }, { "word": "world", "start": 0.5, "end": 1.0 }]
}
]
}

Render priority in-app is segments → fullText. If you send segments without a top-level text, the plain-text fallback is derived from them automatically.

email object

"email": {
"from": "[email protected]",
"to": ["[email protected]"], "cc": [], "bcc": [],
"subject": "Re: order",
"body": "...",
"threadId": "thread-1",
"sourceId": "msg-1",
"timeToReply": 3600, // seconds
"isReply": true
}

Billing and credits

Credits are only consumed when work actually runs — i.e. transcription is needed and/or an evaluation is queued. Providing your own transcript on audio/video means no transcription is billed.

  • Audio/video → billed per minute.

  • Transcript/text → billed by word count.

If your account lacks the credits, the request returns 402 and no interaction is created.

Response

{
"success": true,
"interactionId": "uuid",
"interactionType": "AUDIO",
"statusUrl": "/api/v1/interactions/<id>", // poll this for transcription/eval results
"processing": { "transcription": true, "evaluation": false },
"scoreSubmissionId": null // set when an evaluation was queued
}

The success response is immediate — it means the interaction was created and (if applicable) queued. Transcription and evaluation finish asynchronously afterward.

Getting results back

There are two ways to receive the final transcription/evaluation results:

1. Poll the status URL

GET the statusUrl from the response (/api/v1/interactions/<id>) until processing completes. Simple, but you have to keep checking.

2. Have results pushed to a webhook (recommended for evaluations)

Instead of polling, configure a webhook on the scorecard so Voxjar POSTs the completed evaluation to your endpoint as soon as it's scored. This pairs naturally with evaluation.scorecardId on upload: you upload once, and the results arrive at your URL when ready.

Set it up from your Scorecard editor → (your scorecard) → Actions:

  1. Add an Action (at the scorecard level to fire on every completed evaluation, or on a specific question / score threshold / pass-fail result to fire conditionally).

  2. Choose Post to a Webhook as the action type.

  3. Enter your Webhook URL. Optionally customize the JSON payload (you can include the audio and transcript).

  4. Save.

From then on, every evaluation queued via this endpoint against that scorecard will POST its results to your URL when scoring finishes — no polling required. See Scorecard Actions & Webhooks for the full payload and trigger options.

Examples

Upload a recording file and auto-evaluate (multipart)

url -X POST https://app.voxjar.com/api/v1/upload \
-H "Authorization: Bearer $TOKEN" \
-F '[email protected]' \
-F 'payload={
"sourceIdentifier": "call-12345",
"direction": "INBOUND",
"timestamp": "2026-06-23T14:30:00Z",
"agent": { "name": "Jane", "sourceIdentifier": "agent-7" },
"customer": { "name": "Bob", "phone": "+15551234" },
"evaluation": { "scorecardId": "sc_abc123" }
}'

Large recording via URL (JSON)

curl -X POST https://app.voxjar.com/api/v1/upload \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{
"sourceIdentifier": "call-67890",
"direction": "OUTBOUND",
"timestamp": "2026-06-23T14:30:00Z",
"downloadUrl": "https://storage.example.com/calls/67890.mp4"
}'

Error reference

Status

Meaning

400

Missing/invalid required field, malformed payload JSON, undeterminable type, missing content for the type, or bad/blocked downloadUrl

402

Not enough credits — interaction not created

404

scorecardId not found or not ACTIVE on your account

500

Server error

Gotchas worth knowing

  • timestamp is the event time, not now — and it's required.

  • Tags must already exist. They're connected by id or name, never created.

  • Participant emails are globally unique. If an email already belongs to a different participant, it's silently dropped from this upload rather than failing the whole request.

  • downloadUrl blocks private/internal hosts (localhost, link-local, RFC-1918 ranges, .internal) to prevent SSRF — only public http(s) URLs work.

  • Human-evaluator assignment isn't supported evaluation only accepts scorecardId for AI evaluations.

Did this answer your question?