Local LLM
Planted 02025-07-08
macOS, Ollama makes it easy to run LLMs locally.
As of writing, small LLMs capable of running on 16GB of ram have limited coding ability - when compared to new models from Anthropic they are not even worth it. However, as a form of text compression, small LLMs are very useful to have - and have the added benefit of working offline.
Download Ollama
brew install ollama
Start Ollama server
ollama serve
Download a model
ollama pull llama3.2:3b
Run a model in the command line
ollama run llama3.2:3b
Opencode
For actual use, you will want to change the prompt default limits of Ollama by using a Modelfile.
Example modelfile (this can be named anything)
FROM llama3.2:3b
# Increase context window for larger codebases (32K tokens)
PARAMETER num_ctx 32768
# Lower temperature for more focused, deterministic generation
PARAMETER temperature 0.1
# Reduce repetition
PARAMETER repeat_penalty 1.1
# Look back further to avoid repetitive patterns
PARAMETER repeat_last_n 128
# More conservative sampling
PARAMETER top_p 0.9
PARAMETER top_k 40
# System message
SYSTEM """Insert your system prompt here"""
ollama create <new_model_name> -f <path_to_model_file>
Connect a model to opencode
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://localhost:11434/v1"
},
"models": {
"llama3.2": {}
}
}
}
}
Making the most out of small models
Your prompt > LLM prompt optimization > LLM search engine query results > LLM summarization / tool calls.
Prompts
Strategies
Claude models have โthinkโ directivesโ.