DeepSeek V4 API: The Complete Developer Guide

DeepSeek V3 marks a notable shift in the open-weight LLM market, delivering competitive reasoning and code generation results at a fraction of GPT-4o’s per-token cost (see the DeepSeek pricing page for current rates). For developers building JavaScript applications, the DeepSeek API offers a direct path to high-quality reasoning, code generation, and multi-turn conversation capabilities without the price tag associated with GPT-4o or Claude 3.5 Sonnet.

This guide walks through everything needed to go from zero to a working application: environment setup, core API features, streaming, structured output, a complete code review tool, and migration from other providers.

How to Integrate the DeepSeek V3 API in a JavaScript Project

Create a DeepSeek account at platform.deepseek.com and generate an API key.
Store the key in a .env file and add it to .gitignore.
Install the OpenAI SDK and dotenv: npm install openai@4 dotenv@16.
Configure the SDK with DeepSeek’s base URL and your API key.
Send a chat completion request using the deepseek-chat model.
Enable streaming for lower time-to-first-token in user-facing apps.
Implement retry logic with exponential backoff for 429 and 5xx errors.
Monitor token usage via the response usage object for cost control.

By the end, readers will have built a functional AI-powered code reviewer CLI tool and will understand the key surface area of the DeepSeek API through the OpenAI-compatible SDK.

You will need Node.js 18 or later, npm, a DeepSeek API key (covered below), and working familiarity with JavaScript and async/await patterns.

What’s New in DeepSeek V3

Architecture and Performance Improvements

DeepSeek V3 uses a Mixture-of-Experts (MoE) architecture with refined expert routing and a 64K-token context window. The model posts competitive scores on code generation benchmarks like HumanEval and MBPP, and performs well on mathematical reasoning tasks (GSM8K, MATH) and instruction following. Against frontier competitors, DeepSeek V3 positions itself in the same tier as GPT-4o and Gemini 2.5 Pro on these benchmarks while maintaining the open-weight availability that distinguishes the DeepSeek family.

API Changes and Compatibility

The API surface remains OpenAI-compatible, meaning any application built against the OpenAI REST API specification can switch to DeepSeek by changing the base URL and model identifier. The primary API model identifier is deepseek-chat, which currently resolves to the DeepSeek V3 model. DeepSeek’s pricing structure continues to undercut major competitors significantly on both input and output token costs, making it particularly attractive for high-volume applications. New parameters and structured output modes are available, detailed in the sections that follow.

Any application built against the OpenAI REST API specification can switch to DeepSeek by changing the base URL and model identifier.

Getting Started: API Key and Environment Setup

Creating Your DeepSeek Account and API Key

Registration begins at platform.deepseek.com. After creating an account, navigate to the API Keys section in the dashboard and generate a new key. Copy the key immediately; it will not be shown again.

Store the key in an environment variable. Never hardcode API keys in source files, commit them to version control, or expose them in client-side code.

Project Initialization

Set up a new Node.js project and install the required dependencies:

mkdir deepseek-v3-demo
cd deepseek-v3-demo
npm init -y
npm install openai@4 dotenv@16

This guide was tested with openai@4.x and dotenv@16.x. Pin versions to avoid breaking changes.

All example files use the .mjs extension to enable ES module syntax (including top-level await). Alternatively, add "type": "module" to package.json to use import in .js files.

Create a .env file in the project root:

DEEPSEEK_API_KEY=your_api_key_here
DEEPSEEK_BASE_URL=https://api.deepseek.com

Add .env to .gitignore to prevent accidental exposure:

echo ".env" >> .gitignore

Your First DeepSeek V3 API Call

Configuring the OpenAI SDK for DeepSeek

The OpenAI Node.js SDK accepts a baseURL constructor parameter. Pointing it at https://api.deepseek.com routes all requests to DeepSeek’s servers while preserving the exact same method signatures, request formats, and response shapes. You don’t need a wrapper library or adapter.

Create a file named basic.mjs:

import "dotenv/config";
import OpenAI from "openai";

const apiKey = process.env.DEEPSEEK_API_KEY;
const baseURL = process.env.DEEPSEEK_BASE_URL;

if (!apiKey || apiKey.trim() === "") {
  console.error("Error: DEEPSEEK_API_KEY is not set or is empty in your .env file.");
  process.exit(1);
}
if (!baseURL || baseURL.trim() === "") {
  console.error("Error: DEEPSEEK_BASE_URL is not set or is empty in your .env file.");
  process.exit(1);
}

const client = new OpenAI({
  baseURL,
  apiKey,
  timeout: 60_000,
  maxRetries: 0,
});

const response = await client.chat.completions.create({
  model: "deepseek-chat",
  messages: [
    { role: "system", content: "You are a helpful programming assistant." },
    { role: "user", content: "Explain the difference between map and flatMap in JavaScript." },
  ],
});

const choice = response.choices?.[0];
if (!choice || choice.finish_reason === "content_filter") {
  console.error(
    "No valid completion returned. finish_reason:",
    choice?.finish_reason ?? "no choices"
  );
  process.exit(1);
}
const content = choice.message?.content;
if (typeof content !== "string") {
  console.error("Unexpected response shape: missing message content.");
  process.exit(1);
}

console.log(content);
console.log("Token usage:", response.usage);

Run with node basic.mjs.

Understanding the Response Object

The response follows the OpenAI chat completion schema. response.choices is an array where each entry contains a message object with role and content fields. The finish_reason field indicates why generation stopped: "stop" for natural completion, "length" if the response hit the max_tokens cap, "tool_calls" if the model invoked a function, or "content_filter" if content filtering blocked the response. The usage object reports prompt_tokens, completion_tokens, and total_tokens, which map directly to billing. Monitoring these values is essential for cost tracking in production.

Core API Features and Parameters

System Prompts and Multi-Turn Conversations

The messages array supports three roles: system (sets behavior and constraints), user (end-user input), and assistant (model responses from previous turns). Multi-turn conversations require the developer to maintain and append to this array across interactions.

Create multiturn.mjs:

import "dotenv/config";
import OpenAI from "openai";

const apiKey = process.env.DEEPSEEK_API_KEY;
const baseURL = process.env.DEEPSEEK_BASE_URL;

if (!apiKey || apiKey.trim() === "") {
  console.error("Error: DEEPSEEK_API_KEY is not set or is empty in your .env file.");
  process.exit(1);
}
if (!baseURL || baseURL.trim() === "") {
  console.error("Error: DEEPSEEK_BASE_URL is not set or is empty in your .env file.");
  process.exit(1);
}

const client = new OpenAI({
  baseURL,
  apiKey,
  timeout: 60_000,
  maxRetries: 0,
});

const conversationHistory = [
  { role: "system", content: "You are a senior JavaScript developer. Be concise and precise." },
];

const MAX_HISTORY_TURNS = 10; 

function appendAndTrim(history, role, content) {
  history.push({ role, content });
  
  const systemPrompt = history[0].role === "system" ? [history[0]] : [];
  const turns = history.slice(systemPrompt.length);
  const maxTurnMessages = MAX_HISTORY_TURNS * 2; 
  const trimmed = turns.slice(Math.max(0, turns.length - maxTurnMessages));
  history.length = 0;
  history.push(...systemPrompt, ...trimmed);
}

async function chat(userMessage) {
  appendAndTrim(conversationHistory, "user", userMessage);

  const response = await client.chat.completions.create({
    model: "deepseek-chat",
    messages: conversationHistory,
  });

  const choice = response.choices?.[0];
  if (!choice || choice.finish_reason === "content_filter") {
    throw new Error(
      `No valid completion returned. finish_reason: ${choice?.finish_reason ?? "no choices"}`
    );
  }
  const assistantMessage = choice.message?.content;
  if (typeof assistantMessage !== "string") {
    throw new Error("Unexpected response shape: missing message content.");
  }
  appendAndTrim(conversationHistory, "assistant", assistantMessage);

  return assistantMessage;
}

console.log(await chat("What is a closure in JavaScript?"));
console.log(await chat("Can you give me a practical example of one?"));
console.log(await chat("How does that relate to the module pattern?"));

Each call sends the accumulated history (trimmed to a sliding window), allowing the model to reference earlier turns without unbounded memory growth.

Key Parameters for Controlling Output

The API accepts several parameters for shaping generation behavior. temperature (0 to 2) controls randomness; lower values produce more deterministic output. Check the DeepSeek API docs for the current default. Use top_p (0 to 1) for nucleus sampling, and max_tokens to cap response length. frequency_penalty and presence_penalty (both -2 to 2 per the OpenAI-compatible spec; verify these parameters are honored by the DeepSeek endpoint, as behavior may differ from OpenAI) discourage repetition and encourage topic diversity respectively. If you need to halt generation at specific delimiter strings, pass them via the stop parameter as an array.

For structured output, set response_format: { type: "json_object" } and instruct the model in the system or user prompt to produce JSON. Where the endpoint supports it, this mode increases the likelihood of valid JSON output. Verify support in the DeepSeek API docs and always wrap JSON.parse() in a try/catch block.

Create jsonmode.mjs:

import "dotenv/config";
import OpenAI from "openai";

const apiKey = process.env.DEEPSEEK_API_KEY;
const baseURL = process.env.DEEPSEEK_BASE_URL;

if (!apiKey || apiKey.trim() === "") {
  console.error("Error: DEEPSEEK_API_KEY is not set or is empty in your .env file.");
  process.exit(1);
}
if (!baseURL || baseURL.trim() === "") {
  console.error("Error: DEEPSEEK_BASE_URL is not set or is empty in your .env file.");
  process.exit(1);
}

const client = new OpenAI({
  baseURL,
  apiKey,
  timeout: 60_000,
  maxRetries: 0,
});

const response = await client.chat.completions.create({
  model: "deepseek-chat",
  response_format: { type: "json_object" },
  messages: [
    {
      role: "system",
      content: "You are an API that returns JSON. Always respond with a valid JSON object.",
    },
    {
      role: "user",
      content: "List three common JavaScript array methods with their descriptions and return types.",
    },
  ],
});

const choice = response.choices?.[0];
if (!choice || choice.finish_reason === "content_filter") {
  console.error(
    "No valid completion returned. finish_reason:",
    choice?.finish_reason ?? "no choices"
  );
  process.exit(1);
}
const rawContent = choice.message?.content;
if (typeof rawContent !== "string") {
  console.error("Unexpected response shape: missing message content.");
  process.exit(1);
}

let parsed;
try {
  parsed = JSON.parse(rawContent);
} catch (e) {
  console.error("Failed to parse model response as JSON:", e.message);
  console.error("Raw response:", rawContent);
  process.exit(1);
}

const isValidObject = parsed !== null && typeof parsed === "object" && !Array.isArray(parsed);
console.log(isValidObject ? "Valid JSON object received" : "Unexpected format");
console.log(JSON.stringify(parsed, null, 2));

Streaming Responses

Streaming reduces perceived latency by delivering tokens as they are generated, which is critical for user-facing applications where time-to-first-token (TTFT – the elapsed time between sending a request and receiving the first token of the response) matters more than total generation time.

Create streaming.mjs:

import "dotenv/config";
import OpenAI from "openai";

const apiKey = process.env.DEEPSEEK_API_KEY;
const baseURL = process.env.DEEPSEEK_BASE_URL;

if (!apiKey || apiKey.trim() === "") {
  console.error("Error: DEEPSEEK_API_KEY is not set or is empty in your .env file.");
  process.exit(1);
}
if (!baseURL || baseURL.trim() === "") {
  console.error("Error: DEEPSEEK_BASE_URL is not set or is empty in your .env file.");
  process.exit(1);
}

const client = new OpenAI({
  baseURL,
  apiKey,
  timeout: 60_000,
  maxRetries: 0,
});

const stream = await client.chat.completions.create({
  model: "deepseek-chat",
  messages: [
    { role: "user", content: "Write a brief explanation of event-driven architecture." },
  ],
  stream: true,
});

let fullResponse = "";

for await (const chunk of stream) {
  const content = chunk.choices?.[0]?.delta?.content ?? "";
  process.stdout.write(content);
  fullResponse += content;
}

console.log("

Full response length:", fullResponse.length, "characters");

Each chunk contains a delta object with incremental content. The loop assembles the complete response while simultaneously writing to stdout.

Building a Complete Application: AI-Powered Code Reviewer

Application Architecture

This CLI tool reads a JavaScript file from disk, sends its contents to DeepSeek with a detailed code review system prompt, and requests structured JSON feedback. The application exercises DeepSeek V3’s code understanding, reasoning, and structured output capabilities in a single cohesive workflow.

This CLI tool reads a JavaScript file from disk, sends its contents to DeepSeek with a detailed code review system prompt, and requests structured JSON feedback.

Create review.mjs:

import "dotenv/config";
import OpenAI from "openai";
import { readFile } from "fs/promises";
import { resolve, extname } from "path";

const apiKey = process.env.DEEPSEEK_API_KEY;
const baseURL = process.env.DEEPSEEK_BASE_URL;

if (!apiKey || apiKey.trim() === "") {
  console.error("Error: DEEPSEEK_API_KEY is not set or is empty in your .env file.");
  process.exit(1);
}
if (!baseURL || baseURL.trim() === "") {
  console.error("Error: DEEPSEEK_BASE_URL is not set or is empty in your .env file.");
  process.exit(1);
}

const client = new OpenAI({
  baseURL,
  apiKey,
  timeout: 60_000,
  maxRetries: 0,
});

const filePath = process.argv[2];
if (!filePath) {
  console.error("Usage: node review.mjs ");
  process.exit(1);
}

const resolvedPath = resolve(filePath);
const allowedBase = resolve(process.cwd());
if (!resolvedPath.startsWith(allowedBase + "https://www.sitepoint.com/") && resolvedPath !== allowedBase) {
  console.error("Error: File must be within the current working directory.");
  process.exit(1);
}

const allowedExtensions = [".js", ".mjs", ".cjs", ".ts"];
if (!allowedExtensions.includes(extname(resolvedPath))) {
  console.error("Error: Only JavaScript/TypeScript files are supported.");
  process.exit(1);
}


const code = await readFile(resolvedPath, "utf-8");
if (Buffer.byteLength(code, "utf-8") > 100_000) {
  console.error("File too large (>100KB). Truncate or split before reviewing.");
  process.exit(1);
}

const systemPrompt = `You are a senior code reviewer. Analyze the provided JavaScript code and return a JSON object with the following structure:
{
  "summary": "Brief overall assessment",
  "issues": [
    {
      "severity": "high" | "medium" | "low",
      "line": ,
      "description": "What the issue is",
      "suggestion": "How to fix it"
    }
  ],
  "strengths": ["List of things done well"],
  "score": <1-10 overall quality score>
}
Only return valid JSON. No markdown fences.`;

const response = await client.chat.completions.create({
  model: "deepseek-chat",
  response_format: { type: "json_object" },
  temperature: 0.3,
  max_tokens: 2048,
  messages: [
    { role: "system", content: systemPrompt },
    { role: "user", content: `Review this code:

${code}` },
  ],
});

const choice = response.choices?.[0];
if (!choice || choice.finish_reason === "content_filter") {
  console.error(
    "No valid completion returned. finish_reason:",
    choice?.finish_reason ?? "no choices"
  );
  process.exit(1);
}
const responseContent = choice.message?.content;
if (typeof responseContent !== "string") {
  console.error("Unexpected response shape: missing message content.");
  process.exit(1);
}

let review;
try {
  review = JSON.parse(responseContent);
} catch (e) {
  console.error("Failed to parse model response as JSON:", e.message);
  console.error("Raw response:", responseContent);
  process.exit(1);
}

console.log(`
📋 Code Review: ${filePath}`);
console.log(`Score: ${review.score}/10`);
console.log(`Summary: ${review.summary}
`);

if (review.issues?.length) {
  console.log("Issues:");
  review.issues.forEach((issue, i) => {
    const severityRaw = issue.severity;
    const severity =
      typeof severityRaw === "string" ? severityRaw.toUpperCase() : "UNKNOWN";
    const line = issue.line != null ? ` (line ~${issue.line})` : "";
    const description =
      typeof issue.description === "string" ? issue.description : "[no description]";
    const suggestion =
      typeof issue.suggestion === "string" ? issue.suggestion : "[no suggestion]";
    console.log(`  ${i + 1}. [${severity}]${line} ${description}`);
    console.log(`     Fix: ${suggestion}`);
  });
}

if (review.strengths?.length) {
  console.log("
Strengths:");
  review.strengths.forEach((s) => console.log(`  ✓ ${s}`));
}

console.log(`
Tokens used: ${response.usage?.total_tokens ?? "unknown"}`);

Run against any JavaScript file: node review.mjs ./basic.mjs.

Enhancing with Error Handling and Retries

Production API calls must account for rate limits (HTTP 429) and transient server errors (5xx). DeepSeek returns standard rate limit headers. A retry wrapper with exponential backoff handles both cases gracefully.

Create retry.mjs:

export async function withRetry(fn, { maxRetries = 3, baseDelay = 1000 } = {}) {
  if (maxRetries < 1) throw new RangeError("maxRetries must be >= 1");

  let lastError;
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error;
      const status = error?.status ?? error?.response?.status;

      
      
      const isRetryable =
        typeof status === "number" &&
        (status === 429 || (status >= 500 && status < 600));

      if (!isRetryable || attempt === maxRetries - 1) {
        throw error;
      }

      const delay = baseDelay * Math.pow(2, attempt) + Math.random() * 500;
      console.warn(
        `Retryable error (HTTP ${status}): ${error.message}. ` +
          `Attempt ${attempt + 1}/${maxRetries}. Waiting ${Math.round(delay)}ms...`
      );
      await new Promise((resolve) => setTimeout(resolve, delay));
    }
  }
  
  throw lastError;
}

Example usage in another file:

import { withRetry } from "./retry.mjs";

const response = await withRetry(() => client.chat.completions.create({  }));

The function applies exponential backoff with jitter, retries only on 429 and 5xx HTTP status codes, and throws immediately on non-retryable errors (including non-HTTP errors like network failures or DNS resolution errors).

Migration Guide: Switching from OpenAI or Other Providers

The Two-Line Migration

For applications already using the OpenAI Node.js SDK, switching to DeepSeek requires changing at minimum two values: the base URL and the model identifier. Review the behavioral differences section below before assuming a drop-in swap.


const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});



const client = new OpenAI({
  baseURL: "https://api.deepseek.com",
  apiKey: process.env.DEEPSEEK_API_KEY,
});

The rest of the application code, including message formatting, parameter passing, and response parsing, remains identical.

Behavioral Differences to Watch For

Despite API compatibility, the models differ in subtle ways. Default temperature behavior, token limit defaults, and system prompt sensitivity vary between providers. DeepSeek V3 may produce noticeably different output for the same prompt; for example, it tends to generate shorter, more direct answers to open-ended questions than GPT-4o, and it can interpret ambiguous instructions more literally. DeepSeek V3 does support function calling (tool use) using the same OpenAI tool schema; however, DeepSeek has not confirmed parity for all OpenAI features (e.g., fine-tuning, certain response format modes), so verify the latest supported capabilities in the DeepSeek API documentation.

The recommended testing strategy: run existing prompt suites through both providers, compare output quality and structure, and adjust system prompts where DeepSeek V3’s behavior diverges.

Best Practices and Optimization

Prompt Engineering Tips for DeepSeek V3

DeepSeek V3 responds well to structured instructions with explicit output format specifications. Chain-of-thought prompting (asking the model to reason step by step before answering) improves accuracy on math and multi-step reasoning tasks in DeepSeek’s published evaluations. Vague prompts and very long context windows without clear focus tend to degrade output quality; benchmark your specific use case, but treat quality degradation past roughly 40K tokens of context as a reasonable default assumption to validate.

Reducing Token Costs

Estimate token counts before sending requests using tokenizer libraries to avoid unexpected costs. Note that OpenAI’s tiktoken library uses a different tokenizer than DeepSeek V3, so token counts will be approximate; check DeepSeek’s documentation for a compatible tokenizer if precise estimates are needed. Cache responses for repeated or identical queries. Set max_tokens to the minimum necessary for each use case rather than relying on defaults. For simpler tasks like classification or short-form extraction, consider whether a lighter, cheaper model suffices before routing everything through DeepSeek V3.

Estimate token counts before sending requests using tokenizer libraries to avoid unexpected costs.

Implementation Checklist

Setup

DeepSeek account created and API key generated
API key stored in environment variable (never hardcoded)
OpenAI SDK installed and configured with DeepSeek base URL
Basic chat completion working

Testing

Error handling and retry logic implemented
Streaming implemented for user-facing features
JSON mode tested for structured outputs
Rate limit handling confirmed
Existing prompts tested and adapted for DeepSeek V3 behavioral differences

Production

Token usage monitoring in place
Validate cost estimates against pricing tiers
Production logging and monitoring configured

What Comes Next

Start with the code samples above, then explore the official DeepSeek API documentation for the latest on function calling support, fine-tuning capabilities, and rate limit specifications. DeepSeek V3 pairs low per-token cost with competitive benchmark results through a familiar API surface, so the migration cost is low for teams already on the OpenAI SDK.

The open-weight nature of DeepSeek models enables self-hosting, subject to the terms of the DeepSeek License Agreement, which includes restrictions on commercial use. Review the license before self-hosting for production or commercial purposes.

Subscribe to Updates

What's Hot