Skip to content

File Management Guide

This guide covers uploading, processing, and managing files in Satori enclaves.

Supported File Types

Satori supports a wide variety of file types:

Documents

  • PDF: application/pdf
  • Text: text/plain, text/csv, text/tsv
  • Word: .docx, .doc
  • Excel: .xlsx, .xls
  • PowerPoint: .pptx, .ppt
  • OpenDocument: .odt, .ods, .odp
  • Other: JSON, XML, RTF

Images

  • JPEG, PNG, GIF, WebP, SVG, TIFF, BMP

Video (with transcription)

  • MP4, MPEG, AVI, MOV, WMV, WebM, MKV, FLV

Audio (with transcription)

  • MP3, WAV, OGG, M4A, AAC, MIDI

Archives

  • ZIP, RAR, 7Z, TAR, GZIP

Uploading Files

Basic Upload

curl -X POST "{api_host}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/" \
  -H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
  -F "file=@/path/to/document.pdf"

Upload with Metadata

curl -X POST "{api_host}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/" \
  -H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
  -F "file=@document.pdf" \
  -F 'metadata={"author": "John Doe", "category": "research", "date": "2025-01-15"}'

Upload with Webhook

curl -X POST "{api_host}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/" \
  -H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
  -F "file=@document.pdf" \
  -F "webhook_url=https://your-server.com/webhook/file-processed"

Python Example

import requests

def upload_file(file_path, enclave_id, metadata=None, webhook_url=None):
    url = f"{BASE_URL}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/"
    headers = {"Authorization": f"Bearer {JWT_TOKEN}"}

    files = {"file": open(file_path, "rb")}
    data = {}

    if metadata:
        data["metadata"] = json.dumps(metadata)
    if webhook_url:
        data["webhook_url"] = webhook_url

    response = requests.post(url, headers=headers, files=files, data=data)
    return response.json()

# Usage
file_info = upload_file(
    "document.pdf",
    enclave_id,
    metadata={"author": "John Doe", "category": "research"},
    webhook_url="https://myapp.com/webhook"
)
print(f"File uploaded: {file_info['id']}, Status: {file_info['status']}")

JavaScript/TypeScript Example

async function uploadFile(
  file: File,
  enclaveId: string,
  metadata?: Record<string, any>,
  webhookUrl?: string
) {
  const formData = new FormData();
  formData.append("file", file);

  if (metadata) {
    formData.append("metadata", JSON.stringify(metadata));
  }
  if (webhookUrl) {
    formData.append("webhook_url", webhookUrl);
  }

  const response = await fetch(
    `/api/tenants/${tenantId}/enclaves/${enclaveId}/files/`,
    {
      method: "POST",
      headers: {
        Authorization: `Bearer ${token}`,
      },
      body: formData,
    }
  );

  return await response.json();
}

File Processing Pipeline

Files go through several processing stages:

  1. pending → File uploaded, queued for processing
  2. processing → Content extraction in progress
  3. clearing_artifacts → Cleaning up temporary files
  4. building_artifacts → Creating vector embeddings
  5. classifying → AI classification (optional)
  6. ready → File ready for queries
  7. failed → Processing failed (check logs)

Processing Times

  • Small PDFs (< 10MB): 30-60 seconds
  • Large PDFs (> 100MB): 2-5 minutes
  • Videos: 1-10 minutes (depends on length)
  • Audio: 30 seconds - 3 minutes
  • Images: 10-30 seconds

Monitoring File Status

Check Single File Status

curl -X GET "{api_host}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/{file_id}" \
  -H "Authorization: Bearer <YOUR_JWT_TOKEN>"

List All Files

curl -X GET "{api_host}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/" \
  -H "Authorization: Bearer <YOUR_JWT_TOKEN>"

Polling for Ready Status

import time

def wait_for_file_ready(file_id, max_wait=300, poll_interval=5):
    """Wait for file to be ready, with timeout."""
    start_time = time.time()

    while time.time() - start_time < max_wait:
        response = requests.get(
            f"{BASE_URL}/files/{file_id}",
            headers={"Authorization": f"Bearer {JWT_TOKEN}"}
        )
        file = response.json()

        if file["status"] == "ready":
            return file
        elif file["status"] == "failed":
            raise Exception(f"File processing failed: {file_id}")

        time.sleep(poll_interval)

    raise TimeoutError(f"File not ready within {max_wait} seconds")

Webhooks

Webhooks notify your server when file processing completes.

Webhook Payload

{
  "event": "file.status_changed",
  "file_id": "850e8400-e29b-41d4-a716-446655440000",
  "status": "ready",
  "tenant_id": "550e8400-e29b-41d4-a716-446655440000",
  "enclave_id": "750e8400-e29b-41d4-a716-446655440000",
  "timestamp": "2025-01-15T10:05:30Z",
  "metadata": {
    "file_name": "contract.pdf",
    "size_bytes": 245000
  }
}

Webhook Implementation

from fastapi import FastAPI, Request

app = FastAPI()

@app.post("/webhook/file-processed")
async def handle_file_webhook(request: Request):
    payload = await request.json()

    if payload["status"] == "ready":
        file_id = payload["file_id"]
        # File is ready - start querying
        await process_ready_file(file_id)
    elif payload["status"] == "failed":
        # Handle failure
        await handle_failed_upload(payload["file_id"])

    return {"status": "received"}

Webhook Requirements

  • HTTPS only: Webhook URLs must use HTTPS
  • Retry logic: Satori retries failed webhooks (3 attempts with exponential backoff)
  • Response: Your endpoint should return 200 OK

File Metadata

Adding Metadata

Metadata is stored as JSON and can include any key-value pairs:

metadata = {
    "author": "John Doe",
    "date": "2025-01-15",
    "category": "research",
    "department": "engineering",
    "project": "project-alpha",
    "version": "1.0",
    "tags": ["important", "reviewed"]
}

Best Practices

  • Keep under 10KB: Large metadata can slow processing
  • Use searchable fields: Include fields you might want to filter by
  • Consistent structure: Use the same fields across similar files
  • Include timestamps: Track when files were created/uploaded

Retrieving Metadata

response = requests.get(
    f"{BASE_URL}/files/{file_id}",
    headers={"Authorization": f"Bearer {JWT_TOKEN}"}
)
file = response.json()
metadata = file.get("file_meta", {})
print(f"Author: {metadata.get('author')}")

File Limits

Size Limits

  • Maximum file size: 512MB
  • Recommended: Keep files under 100MB for faster processing
  • Large files: Consider splitting into multiple files

Handling Large Files

def split_large_pdf(file_path, max_size_mb=100):
    """Split large PDF into smaller chunks."""
    file_size_mb = os.path.getsize(file_path) / (1024 * 1024)

    if file_size_mb > max_size_mb:
        # Use PDF splitting library
        # Upload each chunk separately
        pass

Getting Transcripts

For video and audio files, retrieve transcripts:

curl -X GET "{api_host}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/{file_id}/transcript" \
  -H "Authorization: Bearer <YOUR_JWT_TOKEN>"

Response:

{
  "file_id": "850e8400-e29b-41d4-a716-446655440000",
  "filename": "meeting_recording.mp4",
  "content_type": "video/mp4",
  "transcript": "Welcome everyone to today's meeting...",
  "keywords": ["quarterly results", "revenue increase"],
  "created_at": "2025-01-15T10:05:00Z",
  "updated_at": "2025-01-15T10:05:00Z"
}

Deleting Files

⚠️ Warning: Deletion is permanent and cannot be undone.

curl -X DELETE "{api_host}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/{file_id}" \
  -H "Authorization: Bearer <YOUR_JWT_TOKEN>"

What gets deleted:

  • File record from database
  • Object storage object
  • Vector embeddings
  • Transcripts
  • All processing artifacts

Duplicate Handling

Files are deduplicated by SHA-256 hash:

  • Uploading the same file twice returns the existing file
  • Use a different file_id to force a new upload
  • Duplicate detection happens automatically

Error Handling

Common Errors

413 Payload Too Large

  • File exceeds 512MB limit
  • Solution: Split or compress the file

415 Unsupported Media Type

  • File type not allowed
  • Solution: Check supported file types list

400 Bad Request

  • Invalid metadata JSON
  • Invalid webhook URL (must be HTTPS)
  • Missing required fields

404 Not Found

  • File doesn't exist
  • Check file_id and enclave_id

Best Practices

✅ DO:

  • Use webhooks for async processing
  • Add meaningful metadata
  • Monitor file processing status
  • Handle file size limits
  • Use appropriate file types

❌ DON'T:

  • Upload files without checking status
  • Upload duplicate files unnecessarily
  • Upload files larger than 512MB
  • Ignore failed processing status
  • Use insecure webhook URLs (HTTP)

Next Steps