Upload API Reference (POST /api/v1/upload)
The canonical endpoint for getting interactions into Voxjar programmatically. One interaction per request. Handles audio, video, chat, email, and SMS, and can optionally queue an AI evaluation in the same call so you never have to chain upload → list scorecards → ai/queue.
This reference is derived directly from the endpoint implementation, so it reflects exactly what the code does today.
Authentication
Send a Bearer token in the Authorization header (an API token, user JWT, or session access token):
Authorization: Bearer YOUR_API_TOKEN
Generate API tokens under Settings → Developers. The account is taken from your token — you never pass an account ID.
Choosing a transport
Content-Type | Use it for |
| Uploading a raw audio/video file from disk |
| Everything else: |
Multipart requests have two parts:
file— the raw audio/video bytespayload— a JSON string containing all the fields below
Direct file uploads are bounded by the platform request limit (~32MB). For larger media, pass a downloadUrl instead. URL-fetched media is allowed up to 500MB.
Base64-encoded media in the JSON body is not supported. Use multipart or a downloadUrl.
Required fields
These three are always required, for every interaction type:
Field | Type | Notes |
| string | Your unique ID for this interaction |
|
| Anything else is rejected |
| ISO 8601 string | When the interaction occurred — not upload time |
Interaction type
interactionType may be one of AUDIO, VIDEO, CHAT, EMAIL, SMS. You can usually omit it — the server infers the type:
An
emailobject present →EMAILA media
fileordownloadUrl→AUDIOorVIDEO, auto-detected from the bytesTranscript
segmentspresent →AUDIOPlain transcript
text→SMSif short (≤ 320 chars, single line), otherwiseCHAT
If the type can't be determined, the request returns 400 asking you to set it explicitly.
Content each type requires:
CHAT/SMS→ must include transcript textAUDIO/VIDEO→ must include afile, adownloadUrl, or atranscriptEMAIL→ provide anemailobject
Full payload reference
{
// --- required ---
"sourceIdentifier": "call-12345",
"direction": "INBOUND", // INBOUND | OUTBOUND
"timestamp": "2026-06-23T14:30:00Z", // ISO 8601, when it occurred
// --- optional ---
"interactionType": "AUDIO", // omit to auto-detect
"duration": 125000, // ms
"downloadUrl": "https://...", // http(s); private/internal hosts are blocked (SSRF)
"agent": { "name": "Jane", "email": "[email protected]", "phone": "+1...", "sourceIdentifier": "agent-7" },
"customer": { "name": "Bob", "email": "[email protected]", "phone": "+1...", "sourceIdentifier": "cust-9" },
"tags": ["VIP", "tag-id-or-name"], // matched by id OR name; must already exist on your account
// Filterable metadata
"customMetadata": {
"disposition": "sale", // STRING
"recordingConsent": true, // BOOLEAN
"dealValue": 50000 // INT (non-integer number → FLOAT)
},
"transcript": { /* see below */ },
"email": { /* see below */ },
// Auto-queue an AI evaluation once transcription completes:
"evaluation": { "scorecardId": "ACTIVE-scorecard-id" }
}
transcript object
Provide this when you already have a transcript — the system then skips speech-to-text and marks the transcript COMPLETED. Both forms below are accepted; you can send either or both.
"transcript": {
"text": "Agent: Hello\nCustomer: I need help", // plain-text fallback view
"language": "en", // EN/ES hint
"agentChannel": 0, // which channel is the agent
"segments": [ // Whisper format, e.g. OpenAI verbose_json
{
"start": 0.0, "end": 1.0, // seconds
"text": "Hello world", // used when `words` is absent
"channel": 0, // or "speaker"
"words": [{ "word": "Hello", "start": 0.0, "end": 0.5 }, { "word": "world", "start": 0.5, "end": 1.0 }]
}
]
}Render priority in-app is segments → fullText. If you send segments without a top-level text, the plain-text fallback is derived from them automatically.
email object
"email": {
"from": "[email protected]",
"to": ["[email protected]"], "cc": [], "bcc": [],
"subject": "Re: order",
"body": "...",
"threadId": "thread-1",
"sourceId": "msg-1",
"timeToReply": 3600, // seconds
"isReply": true
}Billing and credits
Credits are only consumed when work actually runs — i.e. transcription is needed and/or an evaluation is queued. Providing your own transcript on audio/video means no transcription is billed.
Audio/video → billed per minute.
Transcript/text → billed by word count.
If your account lacks the credits, the request returns 402 and no interaction is created.
Response
{
"success": true,
"interactionId": "uuid",
"interactionType": "AUDIO",
"statusUrl": "/api/v1/interactions/<id>", // poll this for transcription/eval results
"processing": { "transcription": true, "evaluation": false },
"scoreSubmissionId": null // set when an evaluation was queued
}
The success response is immediate — it means the interaction was created and (if applicable) queued. Transcription and evaluation finish asynchronously afterward.
Getting results back
There are two ways to receive the final transcription/evaluation results:
1. Poll the status URL
GET the statusUrl from the response (/api/v1/interactions/<id>) until processing completes. Simple, but you have to keep checking.
2. Have results pushed to a webhook (recommended for evaluations)
Instead of polling, configure a webhook on the scorecard so Voxjar POSTs the completed evaluation to your endpoint as soon as it's scored. This pairs naturally with evaluation.scorecardId on upload: you upload once, and the results arrive at your URL when ready.
Set it up from your Scorecard editor → (your scorecard) → Actions:
Add an Action (at the scorecard level to fire on every completed evaluation, or on a specific question / score threshold / pass-fail result to fire conditionally).
Choose Post to a Webhook as the action type.
Enter your Webhook URL. Optionally customize the JSON payload (you can include the audio and transcript).
Save.
From then on, every evaluation queued via this endpoint against that scorecard will POST its results to your URL when scoring finishes — no polling required. See Scorecard Actions & Webhooks for the full payload and trigger options.
Examples
Upload a recording file and auto-evaluate (multipart)
url -X POST https://app.voxjar.com/api/v1/upload \
-H "Authorization: Bearer $TOKEN" \
-F '[email protected]' \
-F 'payload={
"sourceIdentifier": "call-12345",
"direction": "INBOUND",
"timestamp": "2026-06-23T14:30:00Z",
"agent": { "name": "Jane", "sourceIdentifier": "agent-7" },
"customer": { "name": "Bob", "phone": "+15551234" },
"evaluation": { "scorecardId": "sc_abc123" }
}'
Large recording via URL (JSON)
curl -X POST https://app.voxjar.com/api/v1/upload \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{
"sourceIdentifier": "call-67890",
"direction": "OUTBOUND",
"timestamp": "2026-06-23T14:30:00Z",
"downloadUrl": "https://storage.example.com/calls/67890.mp4"
}'
Error reference
Status | Meaning |
| Missing/invalid required field, malformed |
| Not enough credits — interaction not created |
|
|
| Server error |
Gotchas worth knowing
timestampis the event time, not now — and it's required.Tags must already exist. They're connected by id or name, never created.
Participant emails are globally unique. If an email already belongs to a different participant, it's silently dropped from this upload rather than failing the whole request.
downloadUrlblocks private/internal hosts (localhost, link-local, RFC-1918 ranges,.internal) to prevent SSRF — only public http(s) URLs work.Human-evaluator assignment isn't supported —
evaluationonly acceptsscorecardIdfor AI evaluations.