Local LLM

Planted 02025-07-08

macOS, Ollama makes it easy to run LLMs locally.

As of writing, small LLMs capable of running on 16GB of ram have limited coding ability - when compared to new models from Anthropic they are not even worth it. However, as a form of text compression, small LLMs are very useful to have - and have the added benefit of working offline.

Download Ollama

brew install ollama

Start Ollama server

ollama serve

Download a model

ollama pull llama3.2:3b

Run a model in the command line

ollama run llama3.2:3b

Opencode

For actual use, you will want to change the prompt default limits of Ollama by using a Modelfile.

Example modelfile (this can be named anything)

FROM llama3.2:3b

# Increase context window for larger codebases (32K tokens)
PARAMETER num_ctx 32768

# Lower temperature for more focused, deterministic generation
PARAMETER temperature 0.1

# Reduce repetition
PARAMETER repeat_penalty 1.1

# Look back further to avoid repetitive patterns
PARAMETER repeat_last_n 128

# More conservative sampling
PARAMETER top_p 0.9
PARAMETER top_k 40

# System message
SYSTEM """Insert your system prompt here"""

ollama create <new_model_name> -f <path_to_model_file>

Connect a model to opencode

{
	"$schema": "https://opencode.ai/config.json",
	"provider": {
		"ollama": {
			"npm": "@ai-sdk/openai-compatible",
			"options": {
				"baseURL": "http://localhost:11434/v1"
			},
			"models": {
				"llama3.2": {}
			}
		}
	}
}

Making the most out of small models

Your prompt > LLM prompt optimization > LLM search engine query results > LLM summarization / tool calls.

See How to fix your context.

Prompts

opencode prompts
fabric

Strategies

fabric
DSPy

Claude models have “think” directives”.