How to add local LLM source code auto-completion to VS Code

Table of Contents

Some LLM models are trained for source code generation. The goal here is to run the LLM model locally, thereby allowing private and unlimited, “free” use. The trade-off is that the local model is generally smaller than commercial ones and, thus, is not as “smart.”

We will outline the installation steps for VS Code. Future LLM models will be smarter. This installation allows drop-in replacement.

LLM backend #

Install and run LM Studio.
Turn on Developer mode. Click on “Developer” button on the bottom status bar.
One source code generation LLM is the Qwen2.5-Coder. Search for qwen2.5-coder-7b-instruct model and download the 4-bit model.
- If you are on an Apple Silicon Mac, consider using a MLX variant of the model rather than GGUF. Both will work, but a MLX variant will be more performant.
For verification, open a chat window like on ChatGPT and load the model we just downloaded. Then enter the following prompt:

Write a Python function to calculate the factorial of a number

The result will be similar to below:

Note the tokens per second and time to first token at the bottom of the screenshot. Because of Developer mode, these metrics are reported. The latter metric will give you a sense of the latency of auto-completion. Bigger models will generally have longer latency.

Tokens per second, the former metric, is the speed of AI writing. If we are in chatbot mode, we want the LLM’s writing speed be faster than our reading speed. Here auto-completion is generally shorter amount of text, so writing speed probably does not matter that much. A shorter latency is probably more desirable.

VS Code #

Install VS Code or VS Codium, which is a community-driven, freely-licensed binary distribution of Microsoft’s editor VS Code.
Install the Continue extension. Here is the official installation guide.
- Their installation guide has steps for JetBrains IDE, but I have not tested it.
Since we are running local model, account creation is not needed for Continue.
Find local config setting and edit config.yaml as follows.

name: Local Config
version: 1.0.0
schema: v1
models:
  - name: qwen2.5-coder-7b-instruct for autocomplete
    provider: lmstudio
    model: qwen2.5-coder-7b-instruct
    apiBase: http://localhost:1234/v1/
    roles:
      - autocomplete

Note model: qwen2.5-coder-7b-instruct. The model name comes from the model name we downloaded in LM Studio. To get this name, go back to the factorial chat window in LM Studio. Right-click on the model name at the center top and choose “Copy Default Identifier”.

Verification. Create a new file and save it as test.py.

Auto-completion example — An auto-completion example

Note the for i in range(n): in italic? That’s the auto-completion by the LLM. My cursor (not in the screenshot) is before for.

Tweaks #

In Continue, Disable autocomplete in files: **/*.(txt,md)