Chat Completions
Complete reference for chat completion requests with memory and RAG support.
Overview
The chat completions endpoint is the core of Super Agent Stack. It's fully compatible with the OpenAI API but adds powerful features like conversation memory, RAG-enhanced responses, and intelligent context management.
Endpoint
text
POST https://superagentstack.orionixtech.com/api/v1/chat/completionsParameters
Standard OpenAI Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier (e.g., "anthropic/claude-3-sonnet") |
messages | array | Yes | Array of message objects with role and content |
stream | boolean | No | Enable streaming responses (default: false) |
temperature | number | No | Sampling temperature 0-2 (default: 1) |
max_tokens | number | No | Maximum tokens to generate |
top_p | number | No | Nucleus sampling parameter 0-1 |
Super Agent Stack Extensions
| Parameter | Type | Required | Description |
|---|---|---|---|
sessionId | string | No | Session identifier for conversation memory |
saveToMemory | boolean | No | Save conversation to memory (default: true) |
useRAG | boolean | No | Enable RAG search in uploaded files (default: true) |
ragQuery | string | No | Custom query for RAG search (defaults to user message) |
Basic Example
basic-chat.ts
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://superagentstack.orionixtech.com/api/v1',
apiKey: process.env.OPENROUTER_KEY,
defaultHeaders: {
'superAgentKey': process.env.SUPER_AGENT_KEY,
},
});
const completion = await client.chat.completions.create({
model: 'anthropic/claude-3-sonnet',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is machine learning?' }
],
temperature: 0.7,
max_tokens: 1000,
});
console.log(completion.choices[0].message.content);Using Session Memory
Enable conversation memory by providing a sessionId. The AI will remember previous messages in the same session.
Creating a Session
session-memory.ts
// First message - creates a new session
const response1 = await client.chat.completions.create({
model: 'anthropic/claude-3-sonnet',
messages: [
{ role: 'user', content: 'My name is Sarah and I love Python programming.' }
],
sessionId: 'user-sarah-123', // Your custom session ID
saveToMemory: true, // Save this conversation
});
console.log(response1.choices[0].message.content);
// "Nice to meet you, Sarah! Python is a great language..."
// Second message - uses existing session
const response2 = await client.chat.completions.create({
model: 'anthropic/claude-3-sonnet',
messages: [
{ role: 'user', content: 'What programming language do I like?' }
],
sessionId: 'user-sarah-123', // Same session ID
});
console.log(response2.choices[0].message.content);
// "You mentioned that you love Python programming!"Session ID Format
Session IDs can be any string. Common patterns:
user-{userId}-{conversationId}{userId}-chat-{timestamp}{uuid}
Controlling Memory Behavior
memory-control.ts
// Don't save this specific message to memory
const response = await client.chat.completions.create({
model: 'anthropic/claude-3-sonnet',
messages: [
{ role: 'user', content: 'This is a test message' }
],
sessionId: 'user-sarah-123',
saveToMemory: false, // Don't save this conversation
});
// The AI can still access previous memories, but this message won't be savedRAG Integration
When you upload files to your knowledge base, the AI automatically searches them to provide grounded, accurate responses.
Automatic RAG
rag-automatic.ts
// RAG is enabled by default
const response = await client.chat.completions.create({
model: 'anthropic/claude-3-sonnet',
messages: [
{ role: 'user', content: 'What does our privacy policy say about data retention?' }
],
// useRAG: true is the default
});
// The AI will search your uploaded files and cite relevant informationCustom RAG Query
rag-custom-query.ts
// Use a different query for RAG search
const response = await client.chat.completions.create({
model: 'anthropic/claude-3-sonnet',
messages: [
{ role: 'user', content: 'Can you summarize the key points?' }
],
useRAG: true,
ragQuery: 'data retention policy privacy', // Custom search query
});
// The AI will search for "data retention policy privacy" in your files
// but respond to "Can you summarize the key points?"Disabling RAG
rag-disabled.ts
// Disable RAG for general knowledge questions
const response = await client.chat.completions.create({
model: 'anthropic/claude-3-sonnet',
messages: [
{ role: 'user', content: 'What is the capital of France?' }
],
useRAG: false, // Don't search uploaded files
});Combining Memory + RAG + Streaming
Use all features together for the most powerful experience:
combined-features.ts
const stream = await client.chat.completions.create({
model: 'anthropic/claude-3-sonnet',
messages: [
{ role: 'user', content: 'Based on our previous discussion and the uploaded docs, what should I do next?' }
],
// Memory
sessionId: 'user-sarah-123',
saveToMemory: true,
// RAG
useRAG: true,
// Streaming
stream: true,
// Standard parameters
temperature: 0.7,
max_tokens: 2000,
});
// Process stream
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
// The AI will:
// 1. Remember previous conversation (sessionId)
// 2. Search uploaded documents (useRAG)
// 3. Stream the response in real-time (stream)
// 4. Save this conversation to memory (saveToMemory)Response Format
Standard OpenAI-compatible response with optional metadata:
json
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "anthropic/claude-3-sonnet",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Based on your uploaded documentation and our previous discussion..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 150,
"completion_tokens": 200,
"total_tokens": 350
},
"_metadata": {
"memory": {
"sessionId": "user-sarah-123",
"historyMessages": 5
},
"rag": {
"enabled": true,
"resultsFound": 3,
"query": "documentation previous discussion"
},
"context": {
"totalTokens": 150,
"includedMessages": 5,
"includedChunks": 3,
"truncated": false
}
}
}Metadata
The
_metadata field provides insights into how memory and RAG were used in the response.Best Practices
- Use consistent session IDs: Keep the same sessionId for related conversations
- Set appropriate max_tokens: Prevent unexpectedly long responses
- Use system messages: Set behavior and context with system role messages
- Enable RAG selectively: Disable RAG for general knowledge questions
- Monitor token usage: Track usage to stay within your plan limits
- Handle errors gracefully: Implement proper error handling and retries