Upload API Reference (`POST /api/v1/upload`)

The canonical endpoint for getting interactions into Voxjar programmatically. One interaction per request. Handles audio, video, chat, email, and SMS, and can optionally queue an AI evaluation in the same call so you never have to chain upload → list scorecards → ai/queue.

This reference is derived directly from the endpoint implementation, so it reflects exactly what the code does today.

Authentication

Send a Bearer token in the Authorization header (an API token, user JWT, or session access token):

Authorization: Bearer YOUR_API_TOKEN

Generate API tokens under Settings → Developers. The account is taken from your token — you never pass an account ID.

Choosing a transport

Content-Type	Use it for
`multipart/form-data`	Uploading a raw audio/video file from disk
`application/json`	Everything else: `downloadUrl`, transcripts, chat, email, SMS

Multipart requests have two parts:

file — the raw audio/video bytes
payload — a JSON string containing all the fields below

Direct file uploads are bounded by the platform request limit (~32MB). For larger media, pass a downloadUrl instead. URL-fetched media is allowed up to 500MB.

Base64-encoded media in the JSON body is not supported. Use multipart or a downloadUrl.

Required fields

These three are always required, for every interaction type:

Field	Type	Notes
`sourceIdentifier`	string	Your unique ID for this interaction
`direction`	`"INBOUND"` \| `"OUTBOUND"`	Anything else is rejected
`timestamp`	ISO 8601 string	When the interaction occurred — not upload time

Interaction type

interactionType may be one of AUDIO, VIDEO, CHAT, EMAIL, SMS. You can usually omit it — the server infers the type:

An email object present → EMAIL
A media file or downloadUrl → AUDIO or VIDEO, auto-detected from the bytes
Transcript segments present → AUDIO
Plain transcript text → SMS if short (≤ 320 chars, single line), otherwise CHAT

If the type can't be determined, the request returns 400 asking you to set it explicitly.

Content each type requires:

CHAT / SMS → must include transcript text
AUDIO / VIDEO → must include a file, a downloadUrl, or a transcript
EMAIL → provide an email object

Full payload reference

{
  // --- required ---
  "sourceIdentifier": "call-12345",
  "direction": "INBOUND",                // INBOUND | OUTBOUND
  "timestamp": "2026-06-23T14:30:00Z",   // ISO 8601, when it occurred

  // --- optional ---
  "interactionType": "AUDIO",            // omit to auto-detect
  "duration": 125000,                    // ms
  "downloadUrl": "https://...",          // http(s); private/internal hosts are blocked (SSRF)

  "agent":    { "name": "Jane", "email": "[email protected]", "phone": "+1...", "sourceIdentifier": "agent-7" },
  "customer": { "name": "Bob",  "email": "[email protected]",  "phone": "+1...", "sourceIdentifier": "cust-9" },

  "tags": ["VIP", "tag-id-or-name"],     // matched by id OR name; must already exist on your account

  // Filterable metadata 
  "customMetadata": {
    "disposition": "sale",               // STRING
    "recordingConsent": true,            // BOOLEAN
    "dealValue": 50000                   // INT (non-integer number → FLOAT)
  },

  "transcript": { /* see below */ },
  "email": { /* see below */ },

  // Auto-queue an AI evaluation once transcription completes:
  "evaluation": { "scorecardId": "ACTIVE-scorecard-id" }
}

`transcript` object

Provide this when you already have a transcript — the system then skips speech-to-text and marks the transcript COMPLETED. Both forms below are accepted; you can send either or both.

"transcript": {
  "text": "Agent: Hello\nCustomer: I need help",  // plain-text fallback view
  "language": "en",                                // EN/ES hint
  "agentChannel": 0,                               // which channel is the agent
  "segments": [                                    // Whisper format, e.g. OpenAI verbose_json
    {
      "start": 0.0, "end": 1.0,                    // seconds
      "text": "Hello world",                             // used when `words` is absent
      "channel": 0,                                // or "speaker"
      "words": [{ "word": "Hello", "start": 0.0, "end": 0.5 }, { "word": "world", "start": 0.5, "end": 1.0 }]
    }
  ]
}

Render priority in-app is segments → fullText. If you send segments without a top-level text, the plain-text fallback is derived from them automatically.

`email` object

"email": {
  "from": "[email protected]",
  "to": ["[email protected]"], "cc": [], "bcc": [],
  "subject": "Re: order",
  "body": "...",
  "threadId": "thread-1",
  "sourceId": "msg-1",
  "timeToReply": 3600,                  // seconds
  "isReply": true
}

Billing and credits

Credits are only consumed when work actually runs — i.e. transcription is needed and/or an evaluation is queued. Providing your own transcript on audio/video means no transcription is billed.

Audio/video → billed per minute.
Transcript/text → billed by word count.

If your account lacks the credits, the request returns 402 and no interaction is created.

Response

{
  "success": true,
  "interactionId": "uuid",
  "interactionType": "AUDIO",
  "statusUrl": "/api/v1/interactions/<id>",   // poll this for transcription/eval results
  "processing": { "transcription": true, "evaluation": false },
  "scoreSubmissionId": null                    // set when an evaluation was queued
}

The success response is immediate — it means the interaction was created and (if applicable) queued. Transcription and evaluation finish asynchronously afterward.

Getting results back

There are two ways to receive the final transcription/evaluation results:

1. Poll the status URL

GET the statusUrl from the response (/api/v1/interactions/<id>) until processing completes. Simple, but you have to keep checking.

2. Have results pushed to a webhook (recommended for evaluations)

Instead of polling, configure a webhook on the scorecard so Voxjar POSTs the completed evaluation to your endpoint as soon as it's scored. This pairs naturally with evaluation.scorecardId on upload: you upload once, and the results arrive at your URL when ready.

Set it up from your Scorecard editor → (your scorecard) → Actions:

Add an Action (at the scorecard level to fire on every completed evaluation, or on a specific question / score threshold / pass-fail result to fire conditionally).
Choose Post to a Webhook as the action type.
Enter your Webhook URL. Optionally customize the JSON payload (you can include the audio and transcript).
Save.

From then on, every evaluation queued via this endpoint against that scorecard will POST its results to your URL when scoring finishes — no polling required. See Scorecard Actions & Webhooks for the full payload and trigger options.

Examples

Upload a recording file and auto-evaluate (multipart)

url -X POST https://app.voxjar.com/api/v1/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F '[email protected]' \
  -F 'payload={
    "sourceIdentifier": "call-12345",
    "direction": "INBOUND",
    "timestamp": "2026-06-23T14:30:00Z",
    "agent": { "name": "Jane", "sourceIdentifier": "agent-7" },
    "customer": { "name": "Bob", "phone": "+15551234" },
    "evaluation": { "scorecardId": "sc_abc123" }
  }'

Large recording via URL (JSON)

curl -X POST https://app.voxjar.com/api/v1/upload \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{
    "sourceIdentifier": "call-67890",
    "direction": "OUTBOUND",
    "timestamp": "2026-06-23T14:30:00Z",
    "downloadUrl": "https://storage.example.com/calls/67890.mp4"
  }'

Error reference

Status	Meaning
`400`	Missing/invalid required field, malformed `payload` JSON, undeterminable type, missing content for the type, or bad/blocked `downloadUrl`
`402`	Not enough credits — interaction not created
`404`	`scorecardId` not found or not `ACTIVE` on your account
`500`	Server error

Gotchas worth knowing

timestamp is the event time, not now — and it's required.
Tags must already exist. They're connected by id or name, never created.
Participant emails are globally unique. If an email already belongs to a different participant, it's silently dropped from this upload rather than failing the whole request.
downloadUrl blocks private/internal hosts (localhost, link-local, RFC-1918 ranges, .internal) to prevent SSRF — only public http(s) URLs work.
Human-evaluator assignment isn't supported — evaluation only accepts scorecardId for AI evaluations.

Upload via API

Upload API Reference (POST /api/v1/upload)