AI Chat Models
MachinaOs supports 10 LLM providers for chat completions, with models fetched dynamically from each provider’s API. The backend uses a hybrid architecture: a native SDK layer inserver/services/llm/ for direct chat completions, and a LangChain + LangGraph path for agent tool-calling.
Available Providers
| Provider | Models | Best For |
|---|---|---|
| OpenAI | GPT-4.x, GPT-5.x, o1, o3, o4-mini | General purpose, reasoning |
| Anthropic | Claude 4.x Opus/Sonnet/Haiku | Coding, analysis, extended thinking |
| Google Gemini | Gemini 2.5 Pro/Flash, Gemini 3 | Multimodal, long context (1M+) |
| OpenRouter | 200+ models | Access multiple providers via single API |
| Groq | Llama 4 Scout, Qwen3, GPT-OSS | Ultra-fast inference (LPU) |
| Cerebras | Llama, Qwen | Ultra-fast on wafer-scale hardware |
| xAI | Grok models | OpenAI-compatible API |
| DeepSeek | deepseek-chat, deepseek-reasoner | V3 with always-on Chain-of-Thought |
| Kimi | kimi-k2.5, kimi-k2-thinking | 256K context, thinking mode |
| Mistral | mistral-large, codestral | Up to 256K context |
Adding API Keys
- Click the key icon in the toolbar
- Select the provider
- Enter your API key
- Click Validate to test
API keys are encrypted and stored locally. They’re never sent to MachinaOs servers.
OpenAI Chat Model
Models
| Model | Best For |
|---|---|
| gpt-4o | Most capable, multimodal |
| gpt-4-turbo | Fast, cost-effective GPT-4 |
| o1 | Complex reasoning tasks |
| o3 | Advanced reasoning |
| o4-mini | Fast, efficient reasoning |
Parameters
The model to use
The message to send. Supports template variables.
Randomness (0 = deterministic, 1 = creative)
Maximum response length
Output format: text or json_object
For o-series models: minimal, low, medium, or high reasoning effort
Output
Anthropic Claude Model
Models
| Model | Best For |
|---|---|
| claude-3-5-sonnet-20241022 | Best for coding and complex tasks |
| claude-3-opus-20240229 | Most capable, detailed analysis |
| claude-3-haiku-20240307 | Fast responses, simple tasks |
Parameters
Claude model to use
The message to send
System instructions for the model
Randomness (0-1)
Maximum response length
Enable extended thinking mode (Claude 3.5 Sonnet, Claude 3 Opus)
Token budget for thinking (1024-16000). Shown when thinkingEnabled is true.
Extended Thinking
Claude’s extended thinking mode shows the model’s reasoning process:Google Gemini Model
Models
| Model | Best For |
|---|---|
| gemini-2.5-pro | Most intelligent, complex tasks |
| gemini-2.5-flash | Fast, frontier performance |
| gemini-2.0-flash-thinking | Reasoning with thinking output |
Parameters
Gemini model to use
The message to send
Randomness (0-1)
Maximum response length
Content safety level
Enable thinking mode (Gemini 2.5 models, Flash Thinking)
Output
OpenRouter Model
OpenRouter provides access to 200+ models from multiple providers through a single API.Features
- Unified API: One API key for OpenAI, Anthropic, Google, Meta, Mistral, and more
- Free Models: Some models available at no cost (marked with [FREE] prefix)
- Fallback: Automatic model fallback if primary is unavailable
Models
Models are grouped by cost in the dropdown:- Free models: [FREE] prefix, no cost
- Paid models: Standard pricing per provider
openai/gpt-4oanthropic/claude-3.5-sonnetgoogle/gemini-2.5-prometa-llama/llama-3.1-405b-instructmistralai/mixtral-8x22b-instruct
Parameters
Model in format: provider/model-name
The message to send
Randomness (0-1)
Maximum response length
Output
Groq Model
Groq provides ultra-fast inference on custom LPU (Language Processing Unit) hardware.Models
| Model | Best For |
|---|---|
| llama-3.1-70b-versatile | General purpose, fast |
| llama-3.1-8b-instant | Ultra-fast, simple tasks |
| mixtral-8x7b-32768 | Long context, reasoning |
| qwen3-32b | Reasoning with parsed output |
| qwq-32b | Advanced reasoning |
Parameters
Groq model to use
The message to send
Randomness (0-1)
Maximum response length
For Qwen3/QwQ models: “parsed” returns reasoning, “hidden” returns only final answer
Reasoning Output
Qwen3 and QwQ models support reasoning output:Cerebras Model
Cerebras provides ultra-fast inference on custom wafer-scale AI hardware.Models
| Model | Best For |
|---|---|
| llama3.1-8b | Fast, efficient |
| llama3.1-70b | Capable, balanced |
| qwen-2.5-32b | Reasoning tasks |
Parameters
Cerebras model to use
The message to send
Randomness (0-1)
Maximum response length
Output
xAI (Grok)
xAI’s Grok models via their OpenAI-compatible API. The backend routes xAI through the sharedOpenAIProvider with base_url=https://api.x.ai/v1.
Models
| Model | Best For |
|---|---|
| grok-beta | Real-time knowledge, conversational |
| grok-vision-beta | Multimodal with image understanding |
Parameters
Grok model to use
The message to send
Randomness (0-2)
Maximum response length
DeepSeek
DeepSeek V3 models with optional always-on Chain-of-Thought reasoning.Models
| Model | Best For |
|---|---|
| deepseek-chat | General purpose, fast |
| deepseek-reasoner | Always-on Chain-of-Thought reasoning (128K context, up to 64K output) |
Parameters
DeepSeek model
The message to send
Randomness (0-2)
Maximum response length (up to 64K for reasoner)
Reasoner Output
Thedeepseek-reasoner model always returns reasoning in the reasoning_content field. MachinaOs maps this to the standard thinking field:
DeepSeek Reasoner’s thinking is always on and cannot be disabled.
Kimi (Moonshot AI)
Moonshot’s Kimi models with 256K context window and optional thinking mode.Models
| Model | Best For |
|---|---|
| kimi-k2.5 | General purpose, 256K context, 96K output |
| kimi-k2-thinking | Deep reasoning tasks |
Parameters
Kimi model
The message to send
Maximum response length (up to 96K)
Output
Mistral
Mistral AI models including Large, Small, and Codestral for code tasks.Models
| Model | Best For |
|---|---|
| mistral-large-latest | Most capable, general purpose |
| mistral-small-latest | Fast, cost-effective |
| codestral-latest | Code generation and completion |
Parameters
Mistral model
The message to send
Randomness (0-1.5)
Maximum response length (up to 131K)
Output
Mistral models support up to 256K context but do not have a thinking mode. Temperature range is 0-1.5 (not 0-2).
Native SDK vs LangChain Path
MachinaOs uses a hybrid architecture for LLM access:- Native SDK path (
server/services/llm/): Used byexecute_chat()for direct chat completions. Returns a normalizedLLMResponsedataclass across all providers. Bypasses LangChain for OpenAI, Anthropic, Gemini, OpenRouter, xAI, DeepSeek, Kimi, and Mistral. - LangChain path: Used by
execute_agent()andexecute_chat_agent()for tool-calling agents via LangGraph. All 10 providers supported.
OpenAIProvider class with base_url read from server/config/llm_defaults.json. Adding a new OpenAI-compatible provider is a pure config change with no new code.
Thinking/Reasoning Modes
Several providers support extended thinking or reasoning modes that show the model’s internal reasoning process.| Provider | Models | Parameter |
|---|---|---|
| Claude | Claude 4.x with thinking | thinkingBudget (tokens) |
| Gemini | 2.5 Pro/Flash, Gemini 3 | thinkingBudget (tokens) or thinking_level |
| OpenAI | o1, o3, o4 series, GPT-5 hybrid | reasoningEffort (low/medium/high) |
| Groq | Qwen3-32b | reasoningFormat (parsed/hidden) |
| Cerebras | Qwen-3-235b | reasoningFormat (parsed/hidden) |
| DeepSeek | deepseek-reasoner | Always on (not configurable) |
| Kimi | kimi-k2-thinking | On by default |
Using Thinking Output
Thethinking field is available in the node output for downstream nodes:
Comparing Providers
| Feature | OpenAI | Claude | Gemini | OpenRouter | Groq | Cerebras | xAI | DeepSeek | Kimi | Mistral |
|---|---|---|---|---|---|---|---|---|---|---|
| Speed | Fast | Medium | Fast | Varies | Ultra-fast | Ultra-fast | Fast | Fast | Fast | Fast |
| Reasoning | o-series, GPT-5 | Extended thinking | Thinking mode | Model-dependent | Qwen3 | Qwen-3 | - | Always-on CoT | K2-thinking | - |
| Context Window | 128K-1M | 200K-1M | 1M+ | Varies | 32K-131K | 128K | 128K | 128K | 256K | 256K |
| Multimodal | Yes | Yes | Yes | Model-dependent | No | No | Yes | No | No | No |
| JSON Mode | Yes | No | No | Model-dependent | No | No | Yes | Yes | Yes | Yes |
Common Use Cases
Text Generation
Data Extraction
Complex Reasoning (with thinking)
Tips
Error Handling
| Error | Cause | Solution |
|---|---|---|
| 401 Unauthorized | Invalid API key | Check/update API key |
| 429 Rate Limited | Too many requests | Add delay, reduce frequency |
| 500 Server Error | Provider issue | Retry later |
Related
AI Agent
Use models with memory and tools
AI Skills
Extend Chat Agent capabilities
AI Tools
Tool nodes for AI agents
AI Tutorial
Build an AI-powered workflow