How to Set Up Continue.dev as a Local AI Coding Assistant
- Install Ollama for your operating system and start the service with
ollama serve. - Pull the required models: a 7B code model for autocomplete, a larger model for chat, and an embeddings model.
- Install the Continue.dev extension or plugin in VS Code, JetBrains, or Neovim.
- Create the
~/.continue/config.jsonfile with your model provider, context providers, and slash commands. - Configure tab autocomplete settings including debounce delay, max tokens, and temperature for low-latency suggestions.
- Enable context providers like
@codebase,@file, and@terminalfor codebase-aware prompts. - Add custom slash commands for team workflows such as code review and test generation.
- Version-control your
config.jsonin a dotfiles repository, omitting any API keys.
AI coding assistants have reshaped how developers write software, but they route proprietary source code through third-party cloud services and carry recurring subscription costs. Continue.dev offers a different path: an open-source, local-first AI coding assistant that works across VS Code, JetBrains IDEs, and Neovim. This article walks through a fully configured local AI coding assistant running in your preferred editor, tuned for JavaScript, React, and Node.js development.
Note: This guide assumes Continue.dev v0.9.x (verify your installed version) and Ollama v0.3.x or later. Config keys and behavior may differ across versions. Pin your versions to ensure reproducibility.
Table of Contents
Why Local AI Coding Assistants Matter Now
AI coding assistants have reshaped how developers write software. Tools like GitHub Copilot, Cursor, and Codeium reduce boilerplate keystrokes and speed up common patterns, but they route proprietary source code through third-party cloud services and carry recurring subscription costs. For developers working under strict data governance policies, or those who simply want full control over their toolchain, the costs add up: data exposure risk, $10-20/month per seat, and vendor lock-in to a single provider’s model choices. Continue.dev offers a different path: an open-source, local-first AI coding assistant that works across VS Code, JetBrains IDEs, and Neovim.
Continue.dev acts as middleware between an editor and any large language model, whether running locally through Ollama or hosted in the cloud. It supports tab autocomplete, inline chat, contextual code retrieval, and custom slash commands, all governed by a single JSON configuration file. This delivers codebase-aware retrieval comparable to Copilot’s core features, without mandatory cloud dependencies.
This article walks through a fully configured local AI coding assistant running in your preferred editor, tuned for JavaScript, React, and Node.js development. Every configuration file shown is copy-paste ready.
What Is Continue.dev and How Does It Work?
Architecture Overview
Continue.dev operates across three layers. The outermost layer is the editor extension or plugin, which handles UI rendering, keybindings, and inline diff presentation. Beneath that sits the configuration layer, defined by a single config.json file that specifies which models to use, what context providers to enable, and which slash commands are available. The innermost layer is the model provider layer, which communicates with the actual LLM backend.
Continue.dev connects your editor to an LLM; it ships no model of its own. It routes prompts and code context from the editor to whatever model provider the developer configures: Ollama, LM Studio, llama.cpp, any OpenAI-compatible API endpoint, or cloud providers like Anthropic and OpenAI directly. To swap models or providers, change a few lines in the config file. Nothing needs reinstalling.
Continue.dev connects your editor to an LLM; it ships no model of its own. It routes prompts and code context from the editor to whatever model provider the developer configures.
Key Features at a Glance
When you pause mid-line, Continue.dev offers tab autocomplete with inline ghost text suggestions, comparable to Copilot’s core experience. Highlighting code and pressing the chat keybinding lets you ask questions or request modifications without leaving the editor. Context providers such as @codebase, @file, @docs, @terminal, and @git inject relevant project context into prompts, which reduces hallucinated references and improves answer accuracy. Slash commands like /edit, /comment, and /share trigger predefined prompt templates. The entire system is extensible through custom model configurations, custom commands, and custom context providers.
Prerequisites and Local Model Setup with Ollama
Hardware Requirements
Running LLMs locally demands hardware that machines from 2020 onward with 8+ GB RAM can generally provide, though with trade-offs. 8 GB of system RAM is the practical floor for 7B models in Q4 quantization; 16 GB is recommended for comfortable headroom. For chat-oriented tasks where response quality matters more, 13B to 34B parameter models perform significantly better but require 32 GB of RAM or a dedicated GPU with sufficient VRAM. An NVIDIA GPU with at least 8 GB of VRAM cuts inference latency by roughly 5-10x compared to CPU-only execution. Apple Silicon Macs benefit from unified memory, which eliminates CPU-to-GPU copy overhead and makes the full system memory available to the GPU for inference.
Ensure at least 20 GB of free disk space for the recommended model set (codellama:7b-code is approximately 4 GB, deepseek-coder-v2:16b approximately 9 GB, plus the embeddings model).
For the setup described in this article, a 7B model handles autocomplete duties while a larger model handles chat. This split targets autocomplete suggestions arriving within 200-500 ms while preserving answer quality for interactive conversations.
Installing Ollama and Pulling Models
Ollama is the recommended local model runtime for Continue.dev. It manages model downloads, quantization, and serves models via a local HTTP API on port 11434.
brew install ollama
curl -fsSL https://ollama.com/install.sh -o ollama_install.sh
less ollama_install.sh
sha256sum ollama_install.sh
sh ollama_install.sh
ollama serve
curl http://localhost:11434/api/tags
ollama pull codellama:7b-code
ollama pull deepseek-coder-v2:16b
ollama pull llama3.1:8b
ollama pull nomic-embed-text
ollama list
After these commands complete, Ollama should be serving models at http://localhost:11434. The codellama:7b-code model is specifically trained for code completion with fill-in-the-middle capability, making it well suited for autocomplete. The deepseek-coder-v2:16b model provides stronger reasoning for chat-based code generation and refactoring tasks. The nomic-embed-text model is a small embedding model used for the @codebase semantic search feature.
Installation Across VS Code, JetBrains, and Neovim
VS Code Installation
In VS Code, open the Extensions panel (Ctrl+Shift+X), search for “Continue,” and install the extension published by Continue.dev. On first launch, Continue.dev presents a setup wizard that walks through selecting a model provider. For a local-only setup, select Ollama as the provider. The extension creates its configuration file at ~/.continue/config.json, which is the central artifact for all subsequent customization.
The Continue.dev sidebar panel appears in the left activity bar, providing access to the chat interface. Tab autocomplete activates automatically once a tabAutocompleteModel is defined in the configuration.
JetBrains Installation (IntelliJ, WebStorm, PyCharm)
In any JetBrains IDE, navigate to Settings → Plugins → Marketplace and search for “Continue.” Install the plugin and restart the IDE. The JetBrains integration reads from the same ~/.continue/config.json file as VS Code, meaning any configuration done for one editor applies to the other automatically.
JetBrains-specific quirks exist. The plugin’s tool window appears in the right sidebar by default rather than the left. Indexing-heavy operations in JetBrains IDEs can occasionally conflict with Continue.dev’s own file indexing for the @codebase context provider, leading to elevated memory usage during initial project opens. Disabling JetBrains’ built-in AI Assistant plugin, if installed, avoids keybinding conflicts.
Neovim Installation
Neovim integration requires a plugin manager. The following is an example lazy.nvim spec. Verify the exact repo path, build command, and function names against the official Continue.dev Neovim documentation before using, as the plugin packaging may change between versions. Node.js and npm are required for the build step.
return {
"continuedev/continue",
build = "make",
lazy = false,
config = function()
vim.keymap.set("n", "cc" , function()
vim.cmd("ContinueChat")
end, { desc = "Continue Chat" })
vim.keymap.set("v", "cs" , function()
vim.cmd("ContinueSendToChat")
end, { desc = "Send to Continue Chat" })
vim.keymap.set("v", "ce" , function()
vim.cmd("ContinueInlineEdit")
end, { desc = "Continue Inline Edit" })
vim.keymap.set("i", "" , function()
if vim.fn["continue#has_suggestion"]() == 1 then
return vim.fn["continue#accept_suggestion"]()
else
return vim.api.nvim_replace_termcodes("" , true, false, true)
end
end, { expr = true })
end,
}
This configuration binds the core Continue.dev actions to leader-key combinations and wires Tab to accept autocomplete suggestions when they are available. The Neovim integration reads from the same ~/.continue/config.json as the other editors.
Unified Configuration: One Config, Every Editor
The single most powerful aspect of Continue.dev’s architecture is that ~/.continue/config.json is shared across VS Code, JetBrains, and Neovim. Model, context provider, and slash command settings are shared. Editor-specific behavior (keybindings, sidebar placement, plugin-specific commands) varies per editor. This file is plain JSON and version-controllable. Keeping it in a dotfiles repository means a developer’s AI assistant configuration travels with them across machines. Do not commit config.json to a public repository if it contains API keys for cloud providers.
Note: On Windows, the config path resolves to %USERPROFILE%\.continue\config.json.
Note: Continue.dev uses a JSONC parser that supports // comments in config.json. If you process this file with standard JSON tools (e.g., JSON.parse in Node.js), strip comments first or use a JSONC-aware parser. The reference configs in this article use JSONC syntax for readability.
Core Configuration: Connecting Continue.dev to Local Models
The config.json Anatomy
The configuration file has several top-level keys. models defines the chat models available in the sidebar. tabAutocompleteModel specifies the model used for inline ghost-text suggestions. contextProviders registers the context sources available via @ mentions. slashCommands defines built-in command shortcuts. customCommands allows teams to create their own prompt templates.
{
"models": [
{
"title": "DeepSeek Coder V2",
"provider": "ollama",
"model": "deepseek-coder-v2:16b",
"contextLength": 32768, // Capped below the model's 128k maximum to limit RAM usage; increase only if you have 64+ GB RAM
"completionOptions": {
"temperature": 0.3,
"maxTokens": 2048
},
"requestOptions": {
"timeout": 60000
}
},
{
"title": "Llama 3.1 8B",
"provider": "ollama",
"model": "llama3.1:8b",
"contextLength": 32768, // Capped below the model's 128k maximum to limit RAM usage; increase only if you have 64+ GB RAM
"completionOptions": {
"temperature": 0.5,
"maxTokens": 1024
},
"requestOptions": {
"timeout": 60000
}
}
],
"tabAutocompleteModel": {
"title": "CodeLlama 7B",
"provider": "ollama",
"model": "codellama:7b-code",
"contextLength": 4096,
"completionOptions": {
"maxTokens": 256,
"temperature": 0.1,
"stop": ["
", "```"]
}
},
"contextProviders": [
{ "name": "file" },
{ "name": "codebase" },
{ "name": "terminal" },
{ "name": "git" }
],
"slashCommands": [
{ "name": "edit", "description": "Edit selected code" },
{ "name": "comment", "description": "Add comments to code" },
{ "name": "share", "description": "Export chat to markdown" }
]
}
This configuration gives the developer two chat models to switch between (DeepSeek Coder V2 for heavy reasoning tasks, Llama 3.1 for faster responses), CodeLlama for autocomplete, and a working set of context providers and slash commands. The requestOptions.timeout value of 60000 milliseconds accommodates the slower inference times typical of local model execution.
Configuring Tab Autocomplete
Developers trigger autocomplete more than any other assistant feature, so tuning it for local latency matters most. Getting it right for local models prevents the most common source of frustration: laggy or irrelevant suggestions.
{
"tabAutocompleteOptions": {
"debounceDelay": 500,
"multilineCompletions": "auto",
"maxPromptTokens": 1024,
"disableInFiles": ["*.md", "*.txt"],
"useCache": true
},
"tabAutocompleteModel": {
"title": "CodeLlama 7B",
"provider": "ollama",
"model": "codellama:7b-code",
"contextLength": 4096,
"completionOptions": {
"maxTokens": 256,
"temperature": 0.1,
"stop": ["
", "```"]
}
}
}
The debounceDelay of 500 milliseconds prevents the model from being invoked on every keystroke, which matters when inference runs on local hardware. Disabling autocomplete for Markdown and text files avoids wasted computation where code suggestions add no value. Keeping maxTokens at 256 for autocomplete ensures suggestions arrive quickly. The low temperature of 0.1 produces deterministic, predictable completions rather than creative but potentially wrong ones. Setting useCache to true lets Continue.dev reuse recent completions when the context has not changed.
Adding Context Providers
Context providers are what elevate Continue.dev from a simple chat wrapper to a codebase-aware assistant. The @codebase provider uses embeddings to perform semantic search across the entire project, allowing the model to answer questions about code it has never seen in the current prompt window.
Caution: The @terminal context provider injects terminal output into prompts. Avoid using it in sessions where secrets, API keys, or credentials may appear in stdout.
{
"contextProviders": [
{
"name": "codebase",
"params": {
"nRetrieve": 15,
"nFinal": 5,
"useReranking": true
}
},
{ "name": "file" },
{ "name": "docs" },
{ "name": "terminal" },
{ "name": "git" }
],
"embeddingsProvider": {
"provider": "ollama",
"model": "nomic-embed-text"
}
}
The embeddingsProvider block configures Continue.dev to use the nomic-embed-text model through Ollama for generating embeddings locally. No data leaves the machine. The nRetrieve parameter controls how many code chunks are initially retrieved from the vector index, while nFinal limits how many are actually sent to the chat model after reranking. This keeps prompt sizes manageable for models with limited context windows. Note: useReranking uses Continue.dev’s built-in reranking logic. Verify against your installed version’s documentation whether additional configuration is required.
Hands-On: Using Continue.dev with a JavaScript/React/Node.js Project
Inline Chat for Code Generation
Highlighting code or placing the cursor in a file and pressing Cmd+L (macOS) or Ctrl+L (Windows/Linux) opens the inline chat panel. (This is the default keybinding; check your editor’s keybinding settings if it conflicts with an existing shortcut.) The model receives the selected code as context along with the developer’s natural language prompt.
import { useState, useEffect } from 'react';
function DebouncedSearchInput({ onSearch, delay = 300, placeholder = "Search..." }) {
const [query, setQuery] = useState('');
const [debouncedQuery, setDebouncedQuery] = useState('');
useEffect(() => {
const timer = setTimeout(() => {
setDebouncedQuery(query);
}, delay);
return () => clearTimeout(timer);
}, [query, delay]);
useEffect(() => {
onSearch(debouncedQuery);
}, [debouncedQuery, onSearch]);
return (
<input
type="text"
value={query}
onChange={(e) => setQuery(e.target.value)}
placeholder={placeholder}
/>
);
}
export default DebouncedSearchInput;
The model produces a functional component using standard React hooks. Code quality scales with model size: for example, the 16B DeepSeek Coder V2 correctly handles useCallback dependency arrays and avoids stale closure bugs that the 8B Llama 3.1 model tends to miss.
Inline Editing with /edit
Selecting a function and invoking /edit triggers a targeted refactoring workflow. Continue.dev shows a diff of proposed changes that can be accepted or rejected inline.
app.get('/api/users/:id', (req, res) => {
db.query('SELECT * FROM users WHERE id = ?', [req.params.id])
.then(user => {
if (!user) {
return res.status(404).json({ error: 'User not found' });
}
res.json(user);
})
.catch(err => {
console.log(err);
res.status(500).send('Server error');
});
});
app.get('/api/users/:id', async (req, res, next) => {
try {
const id = parseInt(req.params.id, 10);
if (!Number.isInteger(id) || id <= 0) {
return res.status(400).json({ error: 'Invalid user ID' });
}
const user = await db.query('SELECT * FROM users WHERE id = ?', [id]);
if (!user) {
return res.status(404).json({ error: 'User not found' });
}
res.json(user);
} catch (err) {
next(err);
}
});
The refactored handler uses async/await, validates the id parameter before querying, wraps the logic in try/catch, and delegates errors to Express error-handling middleware via next(err) rather than logging and sending a generic response. This kind of structural refactoring is more convenient with /edit than a separate chat window because the diff appears inline, right where you are reading the code.
This kind of structural refactoring is more convenient with
/editthan a separate chat window because the diff appears inline, right where you are reading the code.
Using Context Providers for Codebase-Aware Answers
Typing @codebase in the chat prompt triggers semantic search across the project. A prompt like “Using @codebase, explain how authentication middleware flows in this project” causes Continue.dev to embed the query, search the local vector index, retrieve the most relevant code chunks, and include them in the prompt to the chat model. Typing @file followed by a filename injects that specific file’s contents into the prompt context.
Note that @codebase requires initial indexing on first use. Indexing a large project can take several minutes and spike CPU usage.
Custom Slash Commands for Team Workflows
Custom commands allow teams to standardize prompt patterns across developers.
{
"customCommands": [
{
"name": "review",
"description": "Perform a code review on the selected code",
"prompt": "Review the following code for bugs, security issues, performance problems, and adherence to best practices. Provide specific, actionable feedback with line references:
{{{ input }}}"
},
{
"name": "testgen",
"description": "Generate Jest tests for the selected code",
"prompt": "Generate comprehensive Jest unit tests for the following code. Include edge cases, error scenarios, and mock external dependencies where appropriate. Use describe/it blocks:
{{{ input }}}"
}
]
}
Continue.dev replaces the {{{ input }}} placeholder with the currently selected code or whatever is included in the chat. (Verify this syntax against the Continue.dev custom commands documentation for your installed version, as placeholder syntax may vary.) Teams can commit these commands to a shared repository, ensuring every developer uses consistent review criteria and test generation patterns.
Performance Optimization and Troubleshooting
Reducing Latency with Local Models
GPU offloading is the single largest factor in local inference speed. Ollama automatically uses available GPU resources when appropriate drivers are installed (CUDA for NVIDIA, ROCm for AMD, Metal for Apple Silicon). GPU layer offloading is configured via model parameters or Modelfile settings; consult the Ollama documentation for the current recommended approach. More layers on the GPU means faster inference at the cost of VRAM.
Model quantization directly affects the speed-quality trade-off. Q4_K_M quantization reduces model size and memory requirements significantly while retaining roughly 95-98% of FP16 perplexity on code benchmarks (see llama.cpp quantization comparisons for methodology). Q8 quantization preserves more quality but requires roughly twice the memory. A 16B Q8 model requires approximately 16 GB VRAM, which exceeds most consumer GPUs; on most consumer hardware, Q4_K_M is the practical maximum for this model size. For autocomplete with a 7B model, Q4_K_M is sufficient.
Setting requestOptions.timeout to at least 60000 in the configuration prevents timeouts during longer generations on CPU-bound hardware.
Common Issues and Fixes
Verify that ollama serve is running and that http://localhost:11434/api/tags returns a model list; a common mistake is pulling a model but not having the Ollama service active. Slow autocomplete usually means the model is too large for the available hardware. Reduce contextLength and maxTokens in the tabAutocompleteModel block to compensate. Context window errors manifest as truncated or nonsensical responses, and they typically mean the contextLength value in the config exceeds the model’s actual trained context window; match the config value to the model’s specification. In JetBrains IDEs, disabling the built-in “AI Assistant” plugin and reducing the IDE’s background indexing frequency can resolve resource contention.
Implementation Checklist: Your Complete Setup Reference
Setup Checklist
- Install Ollama for your operating system
- Run
ollama serveto start the local model server - Verify the service is running:
curl http://localhost:11434/api/tags - Pull
codellama:7b-codefor autocomplete - Pull
deepseek-coder-v2:16b(orllama3.1:8b) for chat - Pull
nomic-embed-textfor local embeddings - Verify all models with
ollama list - Install Continue.dev extension/plugin in your editor
- Create or edit
~/.continue/config.jsonwith the reference config below - Test tab autocomplete in a JavaScript file
- Test inline chat with
Cmd/Ctrl+L - Test
@codebasecontext provider by asking a project-level question - Add custom slash commands for your team’s workflow
- Version-control your
config.jsonin your dotfiles repository (omit API keys)
Complete Reference config.json
{
"models": [
{
"title": "DeepSeek Coder V2",
"provider": "ollama",
"model": "deepseek-coder-v2:16b",
"contextLength": 32768, // Capped below the model's 128k maximum to limit RAM usage; increase only if you have 64+ GB RAM
"completionOptions": {
"temperature": 0.3,
"maxTokens": 2048
},
"requestOptions": {
"timeout": 60000
}
},
{
"title": "Llama 3.1 8B",
"provider": "ollama",
"model": "llama3.1:8b",
"contextLength": 32768, // Capped below the model's 128k maximum to limit RAM usage; increase only if you have 64+ GB RAM
"completionOptions": {
"temperature": 0.5,
"maxTokens": 1024
},
"requestOptions": {
"timeout": 60000
}
}
],
"tabAutocompleteModel": {
"title": "CodeLlama 7B",
"provider": "ollama",
"model": "codellama:7b-code",
"contextLength": 4096,
"completionOptions": {
"maxTokens": 256,
"temperature": 0.1,
"stop": ["
", "```"]
}
},
"tabAutocompleteOptions": {
"debounceDelay": 500,
"multilineCompletions": "auto",
"maxPromptTokens": 1024,
"disableInFiles": ["*.md", "*.txt"],
"useCache": true
},
"contextProviders": [
{
"name": "codebase",
"params": {
"nRetrieve": 15,
"nFinal": 5,
"useReranking": true
}
},
{ "name": "file" },
{ "name": "docs" },
{ "name": "terminal" },
{ "name": "git" }
],
"embeddingsProvider": {
"provider": "ollama",
"model": "nomic-embed-text"
},
"slashCommands": [
{ "name": "edit", "description": "Edit selected code" },
{ "name": "comment", "description": "Add comments to code" },
{ "name": "share", "description": "Export chat to markdown" }
],
"customCommands": [
{
"name": "review",
"description": "Perform a code review on the selected code",
"prompt": "Review the following code for bugs, security issues, performance problems, and adherence to best practices. Provide specific, actionable feedback with line references:
{{{ input }}}"
},
{
"name": "testgen",
"description": "Generate Jest tests for the selected code",
"prompt": "Generate comprehensive Jest unit tests for the following code. Include edge cases, error scenarios, and mock external dependencies where appropriate. Use describe/it blocks:
{{{ input }}}"
}
]
}
This single file delivers a fully functional local AI coding assistant. Copy it to ~/.continue/config.json. Model, context provider, and slash command settings are shared. Editor-specific behavior (keybindings, sidebar placement) varies per editor.
What’s Next: Cloud Hybrids, MCP, and the Continue.dev Roadmap
Continue.dev’s configuration model makes hybrid setups straightforward. A common pattern is running a local 7B model for low-latency autocomplete while routing complex chat queries to a cloud model like Claude or GPT-4. To add a cloud model, insert one entry in the models array with the appropriate provider and API key. Do not commit config.json to version control if it contains API keys. Use environment variable substitution where supported.
Model Context Protocol (MCP) support is an active area of development, enabling Continue.dev to connect to external tools and data sources beyond the editor. This opens possibilities for integrating with databases, documentation systems, and CI/CD pipelines directly from the chat interface.
For developers who want AI-assisted coding without compromising on data sovereignty, Continue.dev stands out among alternatives like Tabby, Cody, and Codeium’s self-hosted tier as an actively maintained, local-first option.
Continue.dev remains fully open source under the Apache 2.0 license. The project’s GitHub repository and Discord server are the primary venues for community contributions and support. For developers who want AI-assisted coding without compromising on data sovereignty, Continue.dev stands out among alternatives like Tabby, Cody, and Codeium’s self-hosted tier as an actively maintained, local-first option.

