Chat Completions

Complete reference for chat completion requests with memory and RAG support.

Overview

The chat completions endpoint is the core of Super Agent Stack. It's fully compatible with the OpenAI API but adds powerful features like conversation memory, RAG-enhanced responses, and intelligent context management.

Endpoint

text
POST https://superagentstack.orionixtech.com/api/v1/chat/completions

Parameters

Standard OpenAI Parameters

ParameterTypeRequiredDescription
modelstringYesModel identifier (e.g., "anthropic/claude-3-sonnet")
messagesarrayYesArray of message objects with role and content
streambooleanNoEnable streaming responses (default: false)
temperaturenumberNoSampling temperature 0-2 (default: 1)
max_tokensnumberNoMaximum tokens to generate
top_pnumberNoNucleus sampling parameter 0-1

Super Agent Stack Extensions

ParameterTypeRequiredDescription
sessionIdstringNoSession identifier for conversation memory
saveToMemorybooleanNoSave conversation to memory (default: true)
useRAGbooleanNoEnable RAG search in uploaded files (default: true)
ragQuerystringNoCustom query for RAG search (defaults to user message)

Basic Example

basic-chat.ts
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://superagentstack.orionixtech.com/api/v1',
  apiKey: process.env.OPENROUTER_KEY,
  defaultHeaders: {
    'superAgentKey': process.env.SUPER_AGENT_KEY,
  },
});

const completion = await client.chat.completions.create({
  model: 'anthropic/claude-3-sonnet',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is machine learning?' }
  ],
  temperature: 0.7,
  max_tokens: 1000,
});

console.log(completion.choices[0].message.content);

Using Session Memory

Enable conversation memory by providing a sessionId. The AI will remember previous messages in the same session.

Creating a Session

session-memory.ts
// First message - creates a new session
const response1 = await client.chat.completions.create({
  model: 'anthropic/claude-3-sonnet',
  messages: [
    { role: 'user', content: 'My name is Sarah and I love Python programming.' }
  ],
  sessionId: 'user-sarah-123',  // Your custom session ID
  saveToMemory: true,            // Save this conversation
});

console.log(response1.choices[0].message.content);
// "Nice to meet you, Sarah! Python is a great language..."

// Second message - uses existing session
const response2 = await client.chat.completions.create({
  model: 'anthropic/claude-3-sonnet',
  messages: [
    { role: 'user', content: 'What programming language do I like?' }
  ],
  sessionId: 'user-sarah-123',  // Same session ID
});

console.log(response2.choices[0].message.content);
// "You mentioned that you love Python programming!"

Session ID Format

Session IDs can be any string. Common patterns:
  • user-{userId}-{conversationId}
  • {userId}-chat-{timestamp}
  • {uuid}

Controlling Memory Behavior

memory-control.ts
// Don't save this specific message to memory
const response = await client.chat.completions.create({
  model: 'anthropic/claude-3-sonnet',
  messages: [
    { role: 'user', content: 'This is a test message' }
  ],
  sessionId: 'user-sarah-123',
  saveToMemory: false,  // Don't save this conversation
});

// The AI can still access previous memories, but this message won't be saved

RAG Integration

When you upload files to your knowledge base, the AI automatically searches them to provide grounded, accurate responses.

Automatic RAG

rag-automatic.ts
// RAG is enabled by default
const response = await client.chat.completions.create({
  model: 'anthropic/claude-3-sonnet',
  messages: [
    { role: 'user', content: 'What does our privacy policy say about data retention?' }
  ],
  // useRAG: true is the default
});

// The AI will search your uploaded files and cite relevant information

Custom RAG Query

rag-custom-query.ts
// Use a different query for RAG search
const response = await client.chat.completions.create({
  model: 'anthropic/claude-3-sonnet',
  messages: [
    { role: 'user', content: 'Can you summarize the key points?' }
  ],
  useRAG: true,
  ragQuery: 'data retention policy privacy',  // Custom search query
});

// The AI will search for "data retention policy privacy" in your files
// but respond to "Can you summarize the key points?"

Disabling RAG

rag-disabled.ts
// Disable RAG for general knowledge questions
const response = await client.chat.completions.create({
  model: 'anthropic/claude-3-sonnet',
  messages: [
    { role: 'user', content: 'What is the capital of France?' }
  ],
  useRAG: false,  // Don't search uploaded files
});

Combining Memory + RAG + Streaming

Use all features together for the most powerful experience:

combined-features.ts
const stream = await client.chat.completions.create({
  model: 'anthropic/claude-3-sonnet',
  messages: [
    { role: 'user', content: 'Based on our previous discussion and the uploaded docs, what should I do next?' }
  ],
  // Memory
  sessionId: 'user-sarah-123',
  saveToMemory: true,
  // RAG
  useRAG: true,
  // Streaming
  stream: true,
  // Standard parameters
  temperature: 0.7,
  max_tokens: 2000,
});

// Process stream
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

// The AI will:
// 1. Remember previous conversation (sessionId)
// 2. Search uploaded documents (useRAG)
// 3. Stream the response in real-time (stream)
// 4. Save this conversation to memory (saveToMemory)

Response Format

Standard OpenAI-compatible response with optional metadata:

json
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "anthropic/claude-3-sonnet",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Based on your uploaded documentation and our previous discussion..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 200,
    "total_tokens": 350
  },
  "_metadata": {
    "memory": {
      "sessionId": "user-sarah-123",
      "historyMessages": 5
    },
    "rag": {
      "enabled": true,
      "resultsFound": 3,
      "query": "documentation previous discussion"
    },
    "context": {
      "totalTokens": 150,
      "includedMessages": 5,
      "includedChunks": 3,
      "truncated": false
    }
  }
}

Metadata

The _metadata field provides insights into how memory and RAG were used in the response.

Best Practices

  • Use consistent session IDs: Keep the same sessionId for related conversations
  • Set appropriate max_tokens: Prevent unexpectedly long responses
  • Use system messages: Set behavior and context with system role messages
  • Enable RAG selectively: Disable RAG for general knowledge questions
  • Monitor token usage: Track usage to stay within your plan limits
  • Handle errors gracefully: Implement proper error handling and retries

Next Steps