File Management Guide
This guide covers uploading, processing, and managing files in Satori enclaves.
Supported File Types
Satori supports a wide variety of file types:
Documents
- PDF:
application/pdf - Text:
text/plain,text/csv,text/tsv - Word:
.docx,.doc - Excel:
.xlsx,.xls - PowerPoint:
.pptx,.ppt - OpenDocument:
.odt,.ods,.odp - Other: JSON, XML, RTF
Images
- JPEG, PNG, GIF, WebP, SVG, TIFF, BMP
Video (with transcription)
- MP4, MPEG, AVI, MOV, WMV, WebM, MKV, FLV
Audio (with transcription)
- MP3, WAV, OGG, M4A, AAC, MIDI
Archives
- ZIP, RAR, 7Z, TAR, GZIP
Uploading Files
Basic Upload
curl -X POST "{api_host}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/" \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
-F "file=@/path/to/document.pdf"
Upload with Metadata
curl -X POST "{api_host}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/" \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
-F "file=@document.pdf" \
-F 'metadata={"author": "John Doe", "category": "research", "date": "2025-01-15"}'
Upload with Webhook
curl -X POST "{api_host}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/" \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
-F "file=@document.pdf" \
-F "webhook_url=https://your-server.com/webhook/file-processed"
Python Example
import requests
def upload_file(file_path, enclave_id, metadata=None, webhook_url=None):
url = f"{BASE_URL}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/"
headers = {"Authorization": f"Bearer {JWT_TOKEN}"}
files = {"file": open(file_path, "rb")}
data = {}
if metadata:
data["metadata"] = json.dumps(metadata)
if webhook_url:
data["webhook_url"] = webhook_url
response = requests.post(url, headers=headers, files=files, data=data)
return response.json()
# Usage
file_info = upload_file(
"document.pdf",
enclave_id,
metadata={"author": "John Doe", "category": "research"},
webhook_url="https://myapp.com/webhook"
)
print(f"File uploaded: {file_info['id']}, Status: {file_info['status']}")
JavaScript/TypeScript Example
async function uploadFile(
file: File,
enclaveId: string,
metadata?: Record<string, any>,
webhookUrl?: string
) {
const formData = new FormData();
formData.append("file", file);
if (metadata) {
formData.append("metadata", JSON.stringify(metadata));
}
if (webhookUrl) {
formData.append("webhook_url", webhookUrl);
}
const response = await fetch(
`/api/tenants/${tenantId}/enclaves/${enclaveId}/files/`,
{
method: "POST",
headers: {
Authorization: `Bearer ${token}`,
},
body: formData,
}
);
return await response.json();
}
File Processing Pipeline
Files go through several processing stages:
- pending → File uploaded, queued for processing
- processing → Content extraction in progress
- clearing_artifacts → Cleaning up temporary files
- building_artifacts → Creating vector embeddings
- classifying → AI classification (optional)
- ready → File ready for queries
- failed → Processing failed (check logs)
Processing Times
- Small PDFs (< 10MB): 30-60 seconds
- Large PDFs (> 100MB): 2-5 minutes
- Videos: 1-10 minutes (depends on length)
- Audio: 30 seconds - 3 minutes
- Images: 10-30 seconds
Monitoring File Status
Check Single File Status
curl -X GET "{api_host}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/{file_id}" \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>"
List All Files
curl -X GET "{api_host}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/" \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>"
Polling for Ready Status
import time
def wait_for_file_ready(file_id, max_wait=300, poll_interval=5):
"""Wait for file to be ready, with timeout."""
start_time = time.time()
while time.time() - start_time < max_wait:
response = requests.get(
f"{BASE_URL}/files/{file_id}",
headers={"Authorization": f"Bearer {JWT_TOKEN}"}
)
file = response.json()
if file["status"] == "ready":
return file
elif file["status"] == "failed":
raise Exception(f"File processing failed: {file_id}")
time.sleep(poll_interval)
raise TimeoutError(f"File not ready within {max_wait} seconds")
Webhooks
Webhooks notify your server when file processing completes.
Webhook Payload
{
"event": "file.status_changed",
"file_id": "850e8400-e29b-41d4-a716-446655440000",
"status": "ready",
"tenant_id": "550e8400-e29b-41d4-a716-446655440000",
"enclave_id": "750e8400-e29b-41d4-a716-446655440000",
"timestamp": "2025-01-15T10:05:30Z",
"metadata": {
"file_name": "contract.pdf",
"size_bytes": 245000
}
}
Webhook Implementation
from fastapi import FastAPI, Request
app = FastAPI()
@app.post("/webhook/file-processed")
async def handle_file_webhook(request: Request):
payload = await request.json()
if payload["status"] == "ready":
file_id = payload["file_id"]
# File is ready - start querying
await process_ready_file(file_id)
elif payload["status"] == "failed":
# Handle failure
await handle_failed_upload(payload["file_id"])
return {"status": "received"}
Webhook Requirements
- HTTPS only: Webhook URLs must use HTTPS
- Retry logic: Satori retries failed webhooks (3 attempts with exponential backoff)
- Response: Your endpoint should return 200 OK
File Metadata
Adding Metadata
Metadata is stored as JSON and can include any key-value pairs:
metadata = {
"author": "John Doe",
"date": "2025-01-15",
"category": "research",
"department": "engineering",
"project": "project-alpha",
"version": "1.0",
"tags": ["important", "reviewed"]
}
Best Practices
- Keep under 10KB: Large metadata can slow processing
- Use searchable fields: Include fields you might want to filter by
- Consistent structure: Use the same fields across similar files
- Include timestamps: Track when files were created/uploaded
Retrieving Metadata
response = requests.get(
f"{BASE_URL}/files/{file_id}",
headers={"Authorization": f"Bearer {JWT_TOKEN}"}
)
file = response.json()
metadata = file.get("file_meta", {})
print(f"Author: {metadata.get('author')}")
File Limits
Size Limits
- Maximum file size: 512MB
- Recommended: Keep files under 100MB for faster processing
- Large files: Consider splitting into multiple files
Handling Large Files
def split_large_pdf(file_path, max_size_mb=100):
"""Split large PDF into smaller chunks."""
file_size_mb = os.path.getsize(file_path) / (1024 * 1024)
if file_size_mb > max_size_mb:
# Use PDF splitting library
# Upload each chunk separately
pass
Getting Transcripts
For video and audio files, retrieve transcripts:
curl -X GET "{api_host}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/{file_id}/transcript" \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>"
Response:
{
"file_id": "850e8400-e29b-41d4-a716-446655440000",
"filename": "meeting_recording.mp4",
"content_type": "video/mp4",
"transcript": "Welcome everyone to today's meeting...",
"keywords": ["quarterly results", "revenue increase"],
"created_at": "2025-01-15T10:05:00Z",
"updated_at": "2025-01-15T10:05:00Z"
}
Deleting Files
⚠️ Warning: Deletion is permanent and cannot be undone.
curl -X DELETE "{api_host}/api/tenants/{tenant_id}/enclaves/{enclave_id}/files/{file_id}" \
-H "Authorization: Bearer <YOUR_JWT_TOKEN>"
What gets deleted:
- File record from database
- Object storage object
- Vector embeddings
- Transcripts
- All processing artifacts
Duplicate Handling
Files are deduplicated by SHA-256 hash:
- Uploading the same file twice returns the existing file
- Use a different
file_idto force a new upload - Duplicate detection happens automatically
Error Handling
Common Errors
413 Payload Too Large
- File exceeds 512MB limit
- Solution: Split or compress the file
415 Unsupported Media Type
- File type not allowed
- Solution: Check supported file types list
400 Bad Request
- Invalid metadata JSON
- Invalid webhook URL (must be HTTPS)
- Missing required fields
404 Not Found
- File doesn't exist
- Check file_id and enclave_id
Best Practices
✅ DO:
- Use webhooks for async processing
- Add meaningful metadata
- Monitor file processing status
- Handle file size limits
- Use appropriate file types
❌ DON'T:
- Upload files without checking status
- Upload duplicate files unnecessarily
- Upload files larger than 512MB
- Ignore failed processing status
- Use insecure webhook URLs (HTTP)
Next Steps
- Querying Documents Guide - Query your uploaded files
- Best Practices Guide - General usage best practices
- API Reference - Full API documentation