Skip to main content

AI Chat Models

MachinaOs supports 10 LLM providers for chat completions, with models fetched dynamically from each provider’s API. The backend uses a hybrid architecture: a native SDK layer in server/services/llm/ for direct chat completions, and a LangChain + LangGraph path for agent tool-calling.

Available Providers

ProviderModelsBest For
OpenAIGPT-4.x, GPT-5.x, o1, o3, o4-miniGeneral purpose, reasoning
AnthropicClaude 4.x Opus/Sonnet/HaikuCoding, analysis, extended thinking
Google GeminiGemini 2.5 Pro/Flash, Gemini 3Multimodal, long context (1M+)
OpenRouter200+ modelsAccess multiple providers via single API
GroqLlama 4 Scout, Qwen3, GPT-OSSUltra-fast inference (LPU)
CerebrasLlama, QwenUltra-fast on wafer-scale hardware
xAIGrok modelsOpenAI-compatible API
DeepSeekdeepseek-chat, deepseek-reasonerV3 with always-on Chain-of-Thought
Kimikimi-k2.5, kimi-k2-thinking256K context, thinking mode
Mistralmistral-large, codestralUp to 256K context

Adding API Keys

  1. Click the key icon in the toolbar
  2. Select the provider
  3. Enter your API key
  4. Click Validate to test
API keys are encrypted and stored locally. They’re never sent to MachinaOs servers.

OpenAI Chat Model

Models

ModelBest For
gpt-4oMost capable, multimodal
gpt-4-turboFast, cost-effective GPT-4
o1Complex reasoning tasks
o3Advanced reasoning
o4-miniFast, efficient reasoning

Parameters

model
select
required
The model to use
prompt
string
required
The message to send. Supports template variables.
temperature
slider
default:"0.7"
Randomness (0 = deterministic, 1 = creative)
maxTokens
number
default:"1000"
Maximum response length
responseFormat
select
default:"text"
Output format: text or json_object
reasoningEffort
select
default:"medium"
For o-series models: minimal, low, medium, or high reasoning effort

Output

{
  "response": "The AI's response text",
  "model": "gpt-4o",
  "thinking": "Reasoning process (o-series only)",
  "usage": {
    "prompt_tokens": 50,
    "completion_tokens": 100,
    "total_tokens": 150
  }
}

Anthropic Claude Model

Models

ModelBest For
claude-3-5-sonnet-20241022Best for coding and complex tasks
claude-3-opus-20240229Most capable, detailed analysis
claude-3-haiku-20240307Fast responses, simple tasks

Parameters

model
select
required
Claude model to use
prompt
string
required
The message to send
systemPrompt
string
System instructions for the model
temperature
slider
default:"0.7"
Randomness (0-1)
maxTokens
number
default:"1000"
Maximum response length
thinkingEnabled
boolean
default:"false"
Enable extended thinking mode (Claude 3.5 Sonnet, Claude 3 Opus)
thinkingBudget
number
default:"2048"
Token budget for thinking (1024-16000). Shown when thinkingEnabled is true.

Extended Thinking

Claude’s extended thinking mode shows the model’s reasoning process:
{
  "response": "Claude's final response",
  "thinking": "Let me analyze this step by step...",
  "model": "claude-3-5-sonnet-20241022",
  "stop_reason": "end_turn"
}
When thinking is enabled, max_tokens must be greater than thinkingBudget. Temperature is automatically set to 1.

Google Gemini Model

Models

ModelBest For
gemini-2.5-proMost intelligent, complex tasks
gemini-2.5-flashFast, frontier performance
gemini-2.0-flash-thinkingReasoning with thinking output

Parameters

model
select
required
Gemini model to use
prompt
string
required
The message to send
temperature
slider
default:"0.7"
Randomness (0-1)
maxTokens
number
default:"1000"
Maximum response length
safetySettings
select
default:"default"
Content safety level
thinkingEnabled
boolean
default:"false"
Enable thinking mode (Gemini 2.5 models, Flash Thinking)

Output

{
  "response": "Gemini's response",
  "thinking": "Reasoning process (when enabled)",
  "model": "gemini-2.5-pro"
}

OpenRouter Model

OpenRouter provides access to 200+ models from multiple providers through a single API.

Features

  • Unified API: One API key for OpenAI, Anthropic, Google, Meta, Mistral, and more
  • Free Models: Some models available at no cost (marked with [FREE] prefix)
  • Fallback: Automatic model fallback if primary is unavailable

Models

Models are grouped by cost in the dropdown:
  • Free models: [FREE] prefix, no cost
  • Paid models: Standard pricing per provider
Popular models include:
  • openai/gpt-4o
  • anthropic/claude-3.5-sonnet
  • google/gemini-2.5-pro
  • meta-llama/llama-3.1-405b-instruct
  • mistralai/mixtral-8x22b-instruct

Parameters

model
select
required
Model in format: provider/model-name
prompt
string
required
The message to send
temperature
slider
default:"0.7"
Randomness (0-1)
maxTokens
number
default:"1000"
Maximum response length

Output

{
  "response": "Model's response",
  "model": "openai/gpt-4o",
  "provider": "openrouter"
}

Groq Model

Groq provides ultra-fast inference on custom LPU (Language Processing Unit) hardware.

Models

ModelBest For
llama-3.1-70b-versatileGeneral purpose, fast
llama-3.1-8b-instantUltra-fast, simple tasks
mixtral-8x7b-32768Long context, reasoning
qwen3-32bReasoning with parsed output
qwq-32bAdvanced reasoning

Parameters

model
select
required
Groq model to use
prompt
string
required
The message to send
temperature
slider
default:"0.7"
Randomness (0-1)
maxTokens
number
default:"1000"
Maximum response length
reasoningFormat
select
default:"parsed"
For Qwen3/QwQ models: “parsed” returns reasoning, “hidden” returns only final answer

Reasoning Output

Qwen3 and QwQ models support reasoning output:
{
  "response": "The final answer",
  "thinking": "Step-by-step reasoning process",
  "model": "qwen3-32b"
}

Cerebras Model

Cerebras provides ultra-fast inference on custom wafer-scale AI hardware.

Models

ModelBest For
llama3.1-8bFast, efficient
llama3.1-70bCapable, balanced
qwen-2.5-32bReasoning tasks

Parameters

model
select
required
Cerebras model to use
prompt
string
required
The message to send
temperature
slider
default:"0.7"
Randomness (0-1)
maxTokens
number
default:"1000"
Maximum response length

Output

{
  "response": "Cerebras model response",
  "model": "llama3.1-70b"
}

xAI (Grok)

xAI’s Grok models via their OpenAI-compatible API. The backend routes xAI through the shared OpenAIProvider with base_url=https://api.x.ai/v1.

Models

ModelBest For
grok-betaReal-time knowledge, conversational
grok-vision-betaMultimodal with image understanding

Parameters

model
select
required
Grok model to use
prompt
string
required
The message to send
temperature
slider
default:"0.7"
Randomness (0-2)
maxTokens
number
default:"4096"
Maximum response length

DeepSeek

DeepSeek V3 models with optional always-on Chain-of-Thought reasoning.

Models

ModelBest For
deepseek-chatGeneral purpose, fast
deepseek-reasonerAlways-on Chain-of-Thought reasoning (128K context, up to 64K output)

Parameters

model
select
required
DeepSeek model
prompt
string
required
The message to send
temperature
slider
default:"0.7"
Randomness (0-2)
maxTokens
number
default:"8192"
Maximum response length (up to 64K for reasoner)

Reasoner Output

The deepseek-reasoner model always returns reasoning in the reasoning_content field. MachinaOs maps this to the standard thinking field:
{
  "response": "The final answer",
  "thinking": "Step-by-step reasoning (reasoner only)",
  "model": "deepseek-reasoner"
}
DeepSeek Reasoner’s thinking is always on and cannot be disabled.

Kimi (Moonshot AI)

Moonshot’s Kimi models with 256K context window and optional thinking mode.

Models

ModelBest For
kimi-k2.5General purpose, 256K context, 96K output
kimi-k2-thinkingDeep reasoning tasks

Parameters

model
select
required
Kimi model
prompt
string
required
The message to send
maxTokens
number
default:"4096"
Maximum response length (up to 96K)
Kimi models use fixed temperatures: 0.6 for instant (k2.5), 1.0 for thinking (k2-thinking). User-set temperature is ignored for compatibility with LangGraph agent mode.

Output

{
  "response": "Kimi response",
  "thinking": "Reasoning (k2-thinking only)",
  "model": "kimi-k2.5"
}

Mistral

Mistral AI models including Large, Small, and Codestral for code tasks.

Models

ModelBest For
mistral-large-latestMost capable, general purpose
mistral-small-latestFast, cost-effective
codestral-latestCode generation and completion

Parameters

model
select
required
Mistral model
prompt
string
required
The message to send
temperature
slider
default:"0.7"
Randomness (0-1.5)
maxTokens
number
default:"8192"
Maximum response length (up to 131K)

Output

{
  "response": "Mistral response",
  "model": "mistral-large-latest"
}
Mistral models support up to 256K context but do not have a thinking mode. Temperature range is 0-1.5 (not 0-2).

Native SDK vs LangChain Path

MachinaOs uses a hybrid architecture for LLM access:
  • Native SDK path (server/services/llm/): Used by execute_chat() for direct chat completions. Returns a normalized LLMResponse dataclass across all providers. Bypasses LangChain for OpenAI, Anthropic, Gemini, OpenRouter, xAI, DeepSeek, Kimi, and Mistral.
  • LangChain path: Used by execute_agent() and execute_chat_agent() for tool-calling agents via LangGraph. All 10 providers supported.
OpenAI-compatible providers (xAI, DeepSeek, Kimi, Mistral) reuse the OpenAIProvider class with base_url read from server/config/llm_defaults.json. Adding a new OpenAI-compatible provider is a pure config change with no new code.

Thinking/Reasoning Modes

Several providers support extended thinking or reasoning modes that show the model’s internal reasoning process.
ProviderModelsParameter
ClaudeClaude 4.x with thinkingthinkingBudget (tokens)
Gemini2.5 Pro/Flash, Gemini 3thinkingBudget (tokens) or thinking_level
OpenAIo1, o3, o4 series, GPT-5 hybridreasoningEffort (low/medium/high)
GroqQwen3-32breasoningFormat (parsed/hidden)
CerebrasQwen-3-235breasoningFormat (parsed/hidden)
DeepSeekdeepseek-reasonerAlways on (not configurable)
Kimikimi-k2-thinkingOn by default

Using Thinking Output

The thinking field is available in the node output for downstream nodes:
{{openaiChatModel.thinking}}
{{anthropicChatModel.thinking}}

Comparing Providers

FeatureOpenAIClaudeGeminiOpenRouterGroqCerebrasxAIDeepSeekKimiMistral
SpeedFastMediumFastVariesUltra-fastUltra-fastFastFastFastFast
Reasoningo-series, GPT-5Extended thinkingThinking modeModel-dependentQwen3Qwen-3-Always-on CoTK2-thinking-
Context Window128K-1M200K-1M1M+Varies32K-131K128K128K128K256K256K
MultimodalYesYesYesModel-dependentNoNoYesNoNoNo
JSON ModeYesNoNoModel-dependentNoNoYesYesYesYes

Common Use Cases

Text Generation

Prompt: Write a product description for: {{input.product_name}}
Temperature: 0.8

Data Extraction

Prompt: Extract the email and phone from: {{input.text}}
Response Format: json_object
Temperature: 0

Complex Reasoning (with thinking)

Model: claude-3-5-sonnet
Thinking Enabled: true
Thinking Budget: 4096
Prompt: Analyze this code and explain the bug: {{input.code}}

Tips

Use temperature 0 for deterministic outputs like data extraction.
Use temperature 0.7-0.9 for creative writing tasks.
Enable thinking mode for complex reasoning tasks that benefit from step-by-step analysis.
Use OpenRouter to experiment with different models without managing multiple API keys.
API calls cost money. Monitor your usage in your provider’s dashboard.

Error Handling

ErrorCauseSolution
401 UnauthorizedInvalid API keyCheck/update API key
429 Rate LimitedToo many requestsAdd delay, reduce frequency
500 Server ErrorProvider issueRetry later

AI Agent

Use models with memory and tools

AI Skills

Extend Chat Agent capabilities

AI Tools

Tool nodes for AI agents

AI Tutorial

Build an AI-powered workflow