Text Processing Guide

Satori provides AI-powered text processing tools for summarization, grammar fixing, expansion, simplification, and more.

Overview

The text processing endpoint (/api/tenants/{tenant_id}/tools/text) provides:

Summarization: Condense long content into key points
Grammar/Spelling Fix: Correct errors and improve clarity
Expansion: Add detail and depth to brief text
Simplification: Make complex text more accessible
Improvement: Enhance overall writing quality

Note: For custom prompts and advanced text generation with context-awareness, see the Text Completions API.

Available Actions

1. Summarize

Condense text into a concise summary:

curl -X POST "__API_HOST__/api/tenants/{tenant_id}/tools/text" \
  -H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "action": "summarize",
    "text": "Long document text here...",
    "use_knowledge_base": false,
    "use_general_knowledge": false,
    "enclave_id": "750e8400-e29b-41d4-a716-446655440000"
  }'

2. Fix

Correct spelling and grammatical errors:

curl -X POST "__API_HOST__/api/tenants/{tenant_id}/tools/text" \
  -H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "action": "fix",
    "text": "Text with speling and grammer errors.",
    "use_knowledge_base": false,
    "use_general_knowledge": false,
    "enclave_id": "750e8400-e29b-41d4-a716-446655440000"
  }'

3. Expand

Add detail, context, and depth to brief text:

curl -X POST "__API_HOST__/api/tenants/{tenant_id}/tools/text" \
  -H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "action": "expand",
    "text": "Short text that needs more detail.",
    "use_knowledge_base": true,
    "use_general_knowledge": false,
    "enclave_id": "750e8400-e29b-41d4-a716-446655440000"
  }'

4. Simplify

Make complex text more concise and easier to understand:

curl -X POST "__API_HOST__/api/tenants/{tenant_id}/tools/text" \
  -H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "action": "simplify",
    "text": "Complex technical explanation...",
    "use_knowledge_base": false,
    "use_general_knowledge": false,
    "enclave_id": "750e8400-e29b-41d4-a716-446655440000"
  }'

5. Improve

Enhance overall writing quality, tone, and clarity:

curl -X POST "__API_HOST__/api/tenants/{tenant_id}/tools/text" \
  -H "Authorization: Bearer <YOUR_JWT_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "action": "improve",
    "text": "Text to be enhanced...",
    "use_knowledge_base": false,
    "use_general_knowledge": false,
    "enclave_id": "750e8400-e29b-41d4-a716-446655440000"
  }'

Migration to Text Completions API

The action: "prompt" parameter has been removed from the /tools/text endpoint. For custom prompts and advanced text generation features, migrate to the new Text Completions API:

Old (Deprecated):

curl -X POST "/api/tenants/{tenant_id}/tools/text" \
  -d '{
    "action": "prompt",
    "text": "Text to process",
    "prompt": "Translate this to Spanish",
    ...
  }'

New (Recommended):

curl -X POST "/api/tenants/{tid}/enclaves/{eid}/tools/text/completions/stream" \
  -d '{
    "prompt": "Translate this to Spanish",
    "context": {
      "highlightText": "Text to process"
    }
  }'

The new API provides: - Rich context parameters (ai_instructions, doc_context, highlight_text, placement_template) - Tool integration (knowledge_base, web_search) - Streaming and structured response modes - Better control over AI behavior

See the Text Completions API Guide for complete documentation.

Knowledge Base Integration

Using Knowledge Base

When use_knowledge_base: true, the processing incorporates information from documents in the specified enclave:

def process_with_knowledge_base(text: str, action: str, enclave_id: str):
    response = requests.post(
        f"{BASE_URL}/api/tenants/{tenant_id}/tools/text",
        headers={"Authorization": f"Bearer {JWT_TOKEN}"},
        json={
            "action": action,
            "text": text,
            "use_knowledge_base": True,
            "use_general_knowledge": False,
            "enclave_id": enclave_id
        }
    )
    return response.json()["text"]

Use Cases:

Summarize using context from your documents
Expand text with domain-specific information
Improve text using terminology from your knowledge base

Using General Knowledge

When use_general_knowledge: true, the AI uses its general knowledge:

def process_with_general_knowledge(text: str, action: str, enclave_id: str):
    response = requests.post(
        f"{BASE_URL}/api/tenants/{tenant_id}/tools/text",
        headers={"Authorization": f"Bearer {JWT_TOKEN}"},
        json={
            "action": action,
            "text": text,
            "use_knowledge_base": False,
            "use_general_knowledge": True,
            "enclave_id": enclave_id
        }
    )
    return response.json()["text"]

Use Cases:

General questions not specific to uploaded content
Explanations using common knowledge
Translations and general improvements

No Knowledge Base

When both are false, pure text processing without external context:

def simple_process(text: str, action: str, enclave_id: str):
    response = requests.post(
        f"{BASE_URL}/api/tenants/{tenant_id}/tools/text",
        headers={"Authorization": f"Bearer {JWT_TOKEN}"},
        json={
            "action": action,
            "text": text,
            "use_knowledge_base": False,
            "use_general_knowledge": False,
            "enclave_id": enclave_id
        }
    )
    return response.json()["text"]

Use Cases:

Grammar and spelling fixes
Formatting improvements
Simple text transformations

Response Formats

Streaming Response (Default)

Default format streams text as JSON events (AI SDK 6 Data Stream Protocol):

import json

def process_text_streaming(text: str, action: str, enclave_id: str):
    response = requests.post(
        f"{BASE_URL}/api/tenants/{tenant_id}/tools/text",
        headers={"Authorization": f"Bearer {JWT_TOKEN}"},
        json={
            "action": action,
            "text": text,
            "use_knowledge_base": False,
            "use_general_knowledge": False,
            "enclave_id": enclave_id,
            "response_format": "stream_text"
        },
        stream=True
    )

    full_text = ""
    for line in response.iter_lines():
        if line:
            text_line = line.decode('utf-8')
            if text_line.startswith('data: '):
                content = text_line[6:]
                if content == '[DONE]':
                    break

                try:
                    event = json.loads(content)
                    if event.get("type") == "text-delta":
                        chunk = event.get("delta", "")
                        full_text += chunk
                        print(chunk, end='', flush=True)
                    elif event.get("type") == "finish":
                        print(f"\n\nFinished: {event.get('finishReason')}")
                except json.JSONDecodeError:
                    pass

    return full_text

Stream Event Format:

data: {"type":"text-delta","delta":"The text"}
data: {"type":"text-delta","delta":" has been"}
data: {"type":"text-delta","delta":" processed"}
data: {"type":"finish","finishReason":"stop"}
data: [DONE]

JSON Response

Get complete text in a single response:

def process_text_json(text: str, action: str, enclave_id: str):
    response = requests.post(
        f"{BASE_URL}/api/tenants/{tenant_id}/tools/text",
        headers={"Authorization": f"Bearer {JWT_TOKEN}"},
        json={
            "action": action,
            "text": text,
            "use_knowledge_base": False,
            "use_general_knowledge": False,
            "enclave_id": enclave_id,
            "response_format": "json"
        }
    )
    return response.json()["text"]

Structured extraction: truncation handling

When requesting a structured response_schema from tools/text/completions/structured, the synthesis model can occasionally exceed its output-token budget on very large extractions. The response now degrades gracefully instead of returning malformed JSON:

The response always parses; any recoverable partial data is preserved.
error_message explains the cause, and a top-level synthesis_truncated: true flag is set so clients (e.g. the realtime dashboard) can attribute the gap to synthesis rather than retrieval.

An optional use_native_emitter: bool request flag (default false) routes the structured emit through OpenAI's native parse API with per-artifact chunking, which avoids truncation on large multi-artifact schemas. It is off by default pending live validation; enable per-request to opt in.

Python Helper Functions

Complete Text Processing Class

class TextProcessor:
    def __init__(self, base_url: str, tenant_id: str, jwt_token: str):
        self.base_url = base_url
        self.tenant_id = tenant_id
        self.jwt_token = jwt_token

    def process(
        self,
        text: str,
        action: str,
        enclave_id: str,
        use_kb: bool = False,
        use_general: bool = False,
        prompt: str = None,
        response_format: str = "json"
    ) -> str:
        """Process text with specified action."""
        data = {
            "action": action,
            "text": text,
            "use_knowledge_base": use_kb,
            "use_general_knowledge": use_general,
            "enclave_id": enclave_id,
            "response_format": response_format
        }

        if prompt:
            data["prompt"] = prompt

        response = requests.post(
            f"{self.base_url}/api/tenants/{self.tenant_id}/tools/text",
            headers={"Authorization": f"Bearer {self.jwt_token}"},
            json=data
        )

        if response_format == "json":
            return response.json()["text"]
        else:
            # Handle streaming
            return self._process_stream(response)

    def summarize(self, text: str, enclave_id: str, use_kb: bool = False) -> str:
        """Summarize text."""
        return self.process(text, "summarize", enclave_id, use_kb=use_kb)

    def fix(self, text: str, enclave_id: str) -> str:
        """Fix grammar and spelling."""
        return self.process(text, "fix", enclave_id)

    def expand(self, text: str, enclave_id: str, use_kb: bool = False) -> str:
        """Expand brief text."""
        return self.process(text, "expand", enclave_id, use_kb=use_kb)

    def simplify(self, text: str, enclave_id: str) -> str:
        """Simplify complex text."""
        return self.process(text, "simplify", enclave_id)

    def improve(self, text: str, enclave_id: str) -> str:
        """Improve writing quality."""
        return self.process(text, "improve", enclave_id)

    def _process_stream(self, response):
        """Process streaming response."""
        import json
        full_text = ""
        for line in response.iter_lines():
            if line:
                text_line = line.decode('utf-8')
                if text_line.startswith('data: '):
                    content = text_line[6:]
                    if content == '[DONE]':
                        break
                    try:
                        event = json.loads(content)
                        if event.get("type") == "text-delta":
                            full_text += event.get("delta", "")
                    except json.JSONDecodeError:
                        pass
        return full_text

# Usage
processor = TextProcessor(BASE_URL, tenant_id, JWT_TOKEN)

# Summarize
summary = processor.summarize(long_document, enclave_id)

# Fix grammar
corrected = processor.fix(draft_text, enclave_id)

# Expand with knowledge base
detailed = processor.expand(brief_note, enclave_id, use_kb=True)

# Improve writing
improved = processor.improve(draft_text, enclave_id)

Use Cases

Content Creation

# Draft blog post from bullet points
bullet_points = """
- New product launch
- Features include AI and automation
- Available Q2 2025
"""

expanded = processor.expand(bullet_points, enclave_id)
improved = processor.improve(expanded, enclave_id)

Document Processing

# Summarize research paper
summary = processor.summarize(research_paper_text, enclave_id, use_kb=True)

# Simplify technical documentation
simplified = processor.simplify(technical_docs, enclave_id)

Quality Assurance

# Fix user-generated content
corrected = processor.fix(user_submitted_text, enclave_id)

# Ensure consistent tone
improved = processor.improve(inconsistent_text, enclave_id)

Knowledge Work

# Enhance notes with context
enhanced = processor.expand(meeting_notes, enclave_id, use_kb=True)

# Simplify for different audiences
simple_explanation = processor.simplify(complex_concept, enclave_id)

Best Practices

✅ DO:

Use knowledge base for domain-specific processing
Use general knowledge for general questions
Use simple processing for grammar/fixing
Provide specific prompts for custom actions
Use streaming for better UX
Keep text under 10,000 words for best results

❌ DON'T:

Use knowledge base for simple grammar fixes
Ignore the enclave_id requirement (even when not using KB)
Make prompts too vague
Process extremely long texts without chunking
Mix different processing modes unnecessarily

Error Handling

Common Errors

400 Bad Request

Invalid action
Missing required fields (e.g., prompt for custom action)
Solution: Check request format and required fields

404 Not Found

Enclave not found (invalid enclave_id)
Solution: Verify enclave_id

500 Internal Server Error

Text processing error
Solution: Try simpler text or different action

Error Handling Example

def safe_process_text(text: str, action: str, enclave_id: str):
    try:
        processor = TextProcessor(BASE_URL, tenant_id, JWT_TOKEN)
        return processor.process(text, action, enclave_id)
    except requests.HTTPError as e:
        if e.response.status_code == 400:
            error_detail = e.response.json().get("detail", "Invalid request")
            raise ValueError(f"Bad request: {error_detail}")
        elif e.response.status_code == 404:
            raise ValueError("Enclave not found")
        elif e.response.status_code == 500:
            raise ValueError("Text processing error")
        else:
            raise

Next Steps

Querying Documents Guide - Query your documents
Chat & Agents Guide - Using AI agents
Best Practices Guide - General usage best practices
API Reference - Full API documentation