Desktop Only

NeXeon is designed for desktop browsers. Please switch to a computer for the best experience.

Return Home

Introduction

Everything you need to integrate NeXeonAI into your application

NeXeonAI provides a unified API to access GPT-5, Claude, Gemini, DeepSeek, and 50+ models from a single endpoint. Drop-in compatible with OpenAI and Anthropic SDKs—switch your base URL and you're done.

Base URL

https://api.nexeonai.com/v1

Authentication

Bearer token or x-api-key header

OpenAI

GPT-5, GPT-4o, o1, o3

Anthropic

Claude Opus 4.5, Sonnet 4.5, Haiku 4.5

Google

Gemini 2.5, Gemini 2.0

Quickstart

Get up and running in under 2 minutes

1

Get your API key

Create an account and generate an API key from your dashboard.

2

Install the SDK

Use the official OpenAI or Anthropic SDK—no custom packages needed.

pip install openai
3

Make your first request

Point the SDK to NeXeonAI and make a request.

from openai import OpenAI

client = OpenAI(
    api_key="nex-...",
    base_url="https://api.nexeonai.com/v1"
)

response = client.chat.completions.create(
    model="gpt-5.2-chat-latest",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Authentication

Secure your API requests with API keys

All API requests require authentication via your API key. Include it in the request headers.

Bearer Token (OpenAI-compatible)

bash
curl https://api.nexeonai.com/v1/chat/completions \
  -H "Authorization: Bearer nex-..."

x-api-key Header (Anthropic-compatible)

bash
curl https://api.nexeonai.com/v1/messages \
  -H "x-api-key: nex-..."

Keep your API keys secure. Never expose them in client-side code or public repositories.

Models

List available models and their capabilities

GET/v1/models

Returns a list of all available models across all providers, including pricing and context window information.

Request

bash
curl https://api.nexeonai.com/v1/models \
  -H "Authorization: Bearer nex-..."

Response

json
{
  "object": "list",
  "data": [
    {
      "id": "gpt-5.2-chat-latest",
      "object": "model",
      "created": 1737921600,
      "owned_by": "openai"
    },
    {
      "id": "claude-sonnet-4-5-20250929",
      "object": "model",
      "created": 1735689600,
      "owned_by": "anthropic"
    },
    {
      "id": "claude-opus-4-5-20251101",
      "object": "model",
      "created": 1735689600,
      "owned_by": "anthropic"
    }
  ]
}

Chat Completions

Generate conversational responses from any model

POST/v1/chat/completions

The primary endpoint for generating AI responses. Compatible with OpenAI's chat completions format and works with all supported models.

Request Body

ParameterTypeDescription
modelrequiredstringModel ID (e.g., "gpt-5.2-chat-latest", "claude-sonnet-4-5-20250929")
messagesrequiredarrayArray of message objects with role and content
max_tokensintegerMaximum tokens to generate
temperaturenumberSampling temperature (0-2)
streambooleanEnable streaming responses
top_pnumberNucleus sampling parameter

Request

bash
curl https://api.nexeonai.com/v1/chat/completions \
  -H "Authorization: Bearer nex-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.2-chat-latest",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing briefly."}
    ],
    "max_tokens": 256
  }'

Response

json
{
  "id": "chatcmpl-nxn-abc123def456789",
  "object": "chat.completion",
  "created": 1737921600,
  "model": "gpt-5.2-chat-latest",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits...",
        "tool_calls": null
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 128,
    "total_tokens": 152
  }
}

Responses API

OpenAI's new API for GPT-5+ reasoning models

POST/v1/responses

The Responses API provides 3% better reasoning performance and up to 80% improved cache utilization compared to Chat Completions.

Optimized for GPT-5 and newer reasoning models with built-in tools, multi-turn support, and better performance.

Request Body

ParameterTypeDescription
modelrequiredstringModel ID (e.g., "gpt-5", "gpt-5.2-chat-latest")
inputrequiredstring | arrayInput text or array of messages
instructionsstringSystem-level instructions
max_output_tokensintegerMaximum tokens to generate
toolsarrayBuilt-in tools (web_search, file_search, code_interpreter)
streambooleanEnable streaming responses

Request

bash
curl https://api.nexeonai.com/v1/responses \
  -H "Authorization: Bearer nex-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5",
    "instructions": "You are a helpful assistant.",
    "input": "What is the weather in Tokyo?",
    "tools": [{"type": "web_search"}]
  }'

Response

json
{
  "id": "resp_nxn-abc123def456789",
  "object": "response",
  "created_at": 1737921600,
  "model": "gpt-5",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "id": "msg_nxn-xyz789",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "Based on current data..."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 15,
    "output_tokens": 48,
    "total_tokens": 63
  }
}

Messages API

Anthropic-compatible endpoint for Claude and all models

POST/v1/messages

Use the Anthropic SDK with any model. Native Claude support plus automatic format bridging for OpenAI, Gemini, and DeepSeek models.

Request Body

ParameterTypeDescription
modelrequiredstringModel ID (works with any provider)
messagesrequiredarrayMessages in Anthropic format
max_tokensrequiredintegerMaximum tokens to generate
systemstringSystem prompt (separate from messages)
temperaturenumberSampling temperature
streambooleanEnable streaming responses

Request

bash
curl https://api.nexeonai.com/v1/messages \
  -H "x-api-key: nex-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "max_tokens": 1024,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Response

json
{
  "id": "msg_nxn-abc123def456789012345678",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! How can I help you today?"
    }
  ],
  "model": "claude-sonnet-4-5-20250929",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 10
  }
}

Reasoning Models

Advanced models with chain-of-thought reasoning capabilities

Reasoning models like OpenAI's o1, o3, o4-mini and Claude's extended thinking provide superior problem-solving capabilities by "thinking" through complex tasks step-by-step before responding.

Reasoning models may take longer to respond but provide significantly better results for complex tasks like coding, math, logic puzzles, and multi-step analysis.

Available Reasoning Models

ModelProviderBest For
o1OpenAIComplex reasoning, PhD-level tasks
o1-miniOpenAIFast reasoning, coding tasks
o1-proOpenAIMaximum reasoning depth
o3OpenAINext-gen reasoning
o3-miniOpenAIFast next-gen reasoning
o4-miniOpenAILatest compact reasoner
claude-opus-4-5-20251101AnthropicMost capable, extended thinking
claude-sonnet-4-5-20250929AnthropicBalanced reasoning + speed

Key Differences from Standard Models

ParameterTypeDescription
max_completion_tokensrequiredintegerUse this instead of max_tokens for OpenAI reasoning models (o1, o3, o4)
reasoning_effortstringControl reasoning depth: "low", "medium", "high" (o1, o3 models)
temperaturenumberFixed at 1 for reasoning models—parameter is ignored

o1/o3 Request

bash
curl https://api.nexeonai.com/v1/chat/completions \
  -H "Authorization: Bearer nex-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "o1",
    "messages": [
      {"role": "user", "content": "Solve this step by step: If a train travels 120km in 2 hours, then stops for 30 minutes, then travels 90km in 1.5 hours, what is the average speed for the entire journey including the stop?"}
    ],
    "max_completion_tokens": 4096,
    "reasoning_effort": "high"
  }'

Response

json
{
  "id": "chatcmpl-nxn-abc123def456789",
  "object": "chat.completion",
  "created": 1737921600,
  "model": "o1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Let me solve this step by step...\n\nTotal distance: 120 + 90 = 210 km\nTotal time: 2 + 0.5 + 1.5 = 4 hours\nAverage speed: 210 / 4 = 52.5 km/h",
        "tool_calls": null
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 48,
    "completion_tokens": 256,
    "total_tokens": 304
  }
}

Reasoning models do not support system messages or temperature parameters. Place all instructions in the user message.

Streaming

Real-time token-by-token responses

Enable streaming for real-time responses. The API sends Server-Sent Events (SSE) as tokens are generated, reducing time-to-first-token significantly.

Non-streaming timeout: Requests without stream: true have a 2 minute timeout. For reasoning models or long responses, always use streaming.

Streaming is recommended for user-facing applications. It provides a much better UX as users see responses as they're generated.

SSE Keep-Alive

During streaming, the API sends periodic keep-alive comments (: NX) every 300ms to prevent CDN timeouts. These are standard SSE comments and are automatically filtered by compliant clients.

from openai import OpenAI

client = OpenAI(
    api_key="nex-...",
    base_url="https://api.nexeonai.com/v1"
)

stream = client.chat.completions.create(
    model="gpt-5.2-chat-latest",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Extended Thinking

Claude's chain-of-thought reasoning with visible thinking process

Extended thinking allows Claude models to work through complex problems step-by-step, showing their reasoning process. This is especially powerful for coding, math, analysis, and complex multi-step tasks.

Extended thinking is available on Claude Opus 4.5 and Sonnet 4.5 models via the Messages API. The thinking process is returned in a separate content block.

How Extended Thinking Works

1

Enable thinking

Set thinking.type to "enabled" with a budget_tokens value

2

Claude thinks

Model reasons through the problem internally

3

Get response

Receive both thinking and final answer

Request Parameters

ParameterTypeDescription
thinking.typerequiredstringSet to "enabled" to activate extended thinking
thinking.budget_tokensrequiredintegerMaximum tokens for thinking (1024-32768). Higher = deeper reasoning
max_tokensrequiredintegerMaximum tokens for the final response (separate from thinking budget)

Request with Extended Thinking

bash
curl https://api.nexeonai.com/v1/messages \
  -H "x-api-key: nex-..." \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4-5-20251101",
    "max_tokens": 8192,
    "thinking": {
      "type": "enabled",
      "budget_tokens": 8000
    },
    "messages": [
      {
        "role": "user",
        "content": "Write a Python function to find the longest palindromic substring in O(n) time using Manacher algorithm. Explain your approach."
      }
    ]
  }'

Response with Thinking

json
{
  "id": "msg_nxn-abc123def456789012345678",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me think through Manacher's algorithm...\n\nThe key insight is that palindromes have mirror properties..."
    },
    {
      "type": "text",
      "text": "Here's an implementation of Manacher's algorithm...\n\n```python\ndef longest_palindrome(s: str) -> str:\n    t = '#' + '#'.join(s) + '#'\n    ..."
    }
  ],
  "model": "claude-opus-4-5-20251101",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 42,
    "output_tokens": 1847
  }
}

Python SDK Example

python
from anthropic import Anthropic

client = Anthropic(
    api_key="nex-...",
    base_url="https://api.nexeonai.com"
)

response = client.messages.create(
    model="claude-opus-4-5-20251101",
    max_tokens=8192,
    thinking={
        "type": "enabled",
        "budget_tokens": 8000  # Allow up to 8k tokens for reasoning
    },
    messages=[
        {
            "role": "user",
            "content": "Analyze this code for security vulnerabilities and suggest fixes."
        }
    ]
)

# Access thinking and response separately
for block in response.content:
    if block.type == "thinking":
        print("=== Claude's Reasoning ===")
        print(block.thinking)
    elif block.type == "text":
        print("\n=== Final Answer ===")
        print(block.text)

For complex tasks, use higher budget_tokens (8000-16000). For simpler tasks where you just want better accuracy, 2000-4000 tokens is usually sufficient.

Extended thinking tokens are billed at the same rate as output tokens. Monitor your thinking budget to control costs.

Error Handling

Understand and handle API errors gracefully

The API uses standard HTTP status codes. Errors include a JSON body with details about what went wrong.

HTTP Status Codes

CodeStatusDescription
200OKRequest succeeded
400Bad RequestInvalid request body or parameters
401UnauthorizedInvalid or missing API key
402Payment RequiredInsufficient credits
403ForbiddenAPI key lacks permission
404Not FoundModel or resource not found
408Request TimeoutNon-streaming request exceeded 2 minute limit
429Too Many RequestsRate limit exceeded
500Server ErrorInternal server error
502Bad GatewayUpstream provider error

Error Response Format

json
{
  "error": {
    "type": "invalid_request_error",
    "message": "Model 'unknown-model' is not available",
    "param": "model",
    "code": "model_not_found"
  }
}

Non-Streaming Timeout

Non-streaming requests have a 2 minute timeout. For long-running requests (reasoning models, complex tasks), we recommend using streaming mode for better reliability.

json
{
  "detail": {
    "type": "timeout_error",
    "message": "Request timed out after 120 seconds. For long-running requests, we recommend using streaming mode (stream: true) for better reliability."
  }
}

Always use stream: true for reasoning models (o1, o3, o4) and extended thinking requests to avoid timeouts.

SSE Keep-Alive Comments

During streaming responses, the API sends periodic keep-alive comments (: NX) to prevent connection timeouts. These are standard SSE comments and should be ignored by your client.

text
: NX

data: {"id":"chatcmpl-nxn-abc123","object":"chat.completion.chunk",...}

: NX

data: {"id":"chatcmpl-nxn-abc123","object":"chat.completion.chunk",...}

data: [DONE]

Keep-alive comments are sent every 300ms during periods of inactivity to prevent CDN/proxy timeouts (e.g., Cloudflare 524 errors). SSE-compatible clients automatically filter these out.

Rate Limits

Understand API usage limits

Rate limits are applied per account to ensure fair usage and platform stability.

Limit TypeDefault
Requests per minute1,000 RPM

If you hit rate limits or need higher throughput for your application, reach out to our team and we can adjust your limits.

Need higher rate limits? Join our Discord and open a support ticket.