Updates from Lukas Murdock

Tools for work

June 30, 2025

I’ve long had a weird compulsion to learn about the various tools used by an industry or area of work and internalize the landscape they make up.

When I got into video editing years ago the simple thing would have been to stick with the first tool I came across and get to work. Instead, off the top of my head, I can still recall important feature sets, differences, and pain points between

iMovie (Apple)
Final Cut Pro (Apple)
Adobe Premiere
Adobe After Effects
HitFilm
Avid

And in design

GIMP
Inkscape
Adobe Illustrator/Photoshop/InDesign
Affinity Designer/Photo/Publisher
Canva

And in marketing

Moz
Ahrefs
SEMRush
Screaming Frog
SiteBulb
Nightwatch

And in development… well that’s a bigger rabbit hole.

I had a plumbing issue recently and learned all about manual augers, drum augers, flat tape augers, power augers… even after I knew I was only going to buy a drum auger I wanted to know the landscape.

“Conway’s Law” is a reference that you can understand systems by understanding the shape of the organization behind them.

Organizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations.

— Melvin E. Conway, How Do Committees Invent?

I think I am drawn to exploring the tool space because it seems clear to me that you can understand work by understanding the tools used to create it.

Tools which create work are constrained to produce outputs that are copies of the structure, logic, and limitations of the tools themselves.

It seems knowledge about tools are a bus ticket of mine — I collect them and keep distinctions just for the love of it.

Around the web on tools

AI Programs

June 30, 2025

What if you didn’t have to be dependent on strategies, optimizations, and models to improve AI task performance?

No more watering the garden squares of “do this not that” edge cases. No more “prompt engineering.”

Just program logic and optimization metrics.

Enter: DSPy

Probability example

Before DSPy

prompt = "You are a helpful assistant. Answer this math question step by step: Two dice are tossed. What is the probability that the sum equals two?"

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
)
# Yes a dumb case. But string-based prompting performance is brittle.

After DSPy

# Declarative, structured approach
import dspy

# Configure your language model
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

# Define behavior through signatures
math = dspy.ChainOfThought("question -> answer: float")

# Use the module - DSPy handles prompting automatically
result = math(question="Two dice are tossed. What is the probability that the sum equals two?")

# Get structured output with reasoning
print(result.reasoning)  # Step-by-step explanation
print(result.answer)     # 0.0277776

Okay, this example didn’t mean much to me when I first saw it. Let’s see a more real-world use case: extracting information from emails.

A Customer Support Email Classifier

Before DSPy

def process_email_old_way(subject, body, sender):
    # Separate prompts for each task - brittle and hard to maintain

    # Email classification
    classify_prompt = f"""
    Classify this email as one of: order_confirmation, support_request, meeting_invitation, newsletter, promotional, invoice, shipping_notification, other

    Subject: {subject}
    Body: {body}
    Sender: {sender}

    Classification:"""

    classification = call_openai(classify_prompt)

    # Entity extraction - different prompt structure
    extract_prompt = f"""
    Extract the following from this email:
    - Financial amounts (format: $X.XX)
    - Important dates (format: MM/DD/YYYY)
    - Contact information
    - Action items

    Email: {subject} {body}

    Extracted info:"""

    entities = call_openai(extract_prompt)

    # Urgency detection - yet another prompt
    urgency_prompt = f"""
    Rate the urgency of this email from 1-4:
    1=low, 2=medium, 3=high, 4=critical

    Consider: {subject}

    Urgency level:"""

    urgency = call_openai(urgency_prompt)

    # Manual parsing hell
    try:
        # Hope the LLM returned exactly what we expected...
        classification = classification.strip().lower()
        urgency_num = int(urgency.strip())

        # Parse entities with regex and prayer
        amounts = re.findall(r'\$[\d,]+\.?\d*', entities)
        dates = re.findall(r'\d{1,2}/\d{1,2}/\d{4}', entities)

        return {
            'type': classification,
            'urgency': urgency_num,
            'amounts': amounts,
            'dates': dates
        }
    except:
        # When it inevitably breaks...
        return {'error': 'Parsing failed'}

# Problems:
# - 4 separate API calls (slow, expensive)
# - Fragile string parsing
# - No consistency between outputs
# - Breaks when switching models
# - Manual prompt engineering for each task
# - No systematic way to improve accuracy

Want to optimize?

# When accuracy is poor, you manually add examples:
classify_prompt = f"""
Examples:
"Server down" -> support_request, critical
"Order confirmed" -> order_confirmation, low
"Meeting tomorrow" -> meeting_invitation, medium

Now classify: {subject}
"""
# Still brittle, still manual...

After DSPy

import dspy

class EmailProcessor(dspy.Module):
    def __init__(self):
        # Define WHAT you want, not HOW to prompt for it
        self.classifier = dspy.ChainOfThought(ClassifyEmail)
        self.entity_extractor = dspy.ChainOfThought(ExtractEntities)
        self.action_generator = dspy.ChainOfThought(GenerateActionItems)
        self.summarizer = dspy.ChainOfThought(SummarizeEmail)

    def forward(self, email_subject, email_body, sender):
        # Compose modules together - DSPy handles the prompting
        classification = self.classifier(
            email_subject=email_subject,
            email_body=email_body,
            sender=sender
        )

        entities = self.entity_extractor(
            email_content=f"{email_subject}\n{email_body}",
            email_type=classification.email_type
        )

        # Get structured, typed outputs automatically
        return dspy.Prediction(
            email_type=classification.email_type,
            urgency=classification.urgency,
            financial_amount=entities.financial_amount,  # Proper float
            important_dates=entities.important_dates,    # Proper list
            action_required=True if classification.urgency == "critical" else False
        )

# Usage - clean and simple
processor = EmailProcessor()
result = processor(
    "URGENT: Server Down",
    "Production is offline, need immediate help",
    "[email protected]"
)

print(result.email_type)      # EmailType.SUPPORT_REQUEST
print(result.urgency)         # UrgencyLevel.CRITICAL
print(result.financial_amount) # None (properly typed)

Want to optimize?

# Load your email dataset
emails = load_historical_emails()  # 1000 labeled emails

# Define success metric
def email_accuracy(example, prediction):
    return (example.email_type == prediction.email_type and
            example.urgency == prediction.urgency)

# Optimize the ENTIRE pipeline automatically
optimizer = dspy.MIPROv2(metric=email_accuracy)
optimized_processor = optimizer.compile(processor, trainset=emails)

# Optimized prompts for each module
# Handles edge cases automatically

You should probably be using DSPy.

DSPY Modules

Signatures specify the input/output behavior of a DSPy module. Any valid variable names work, the DSPy compiler will optimize the keywords.

For example, for summarization, “document -> summary”, “text -> gist”, or “long_context -> tldr” all invoke summarization.

Modules are building blocks that handle signatures and prompt configuration and can be composed into bigger modules.

Core modules

dspy.Predict: Basic predictor. Does not modify the signature. Handles the key forms of learning (i.e., storing the instructions and demonstrations and updates to the LM).
dspy.ChainOfThought: Teaches the LM to think step-by-step before committing to the signature’s response.
dspy.ProgramOfThought: Teaches the LM to output code, whose execution results will dictate the response.
dspy.ReAct: An agent that can use tools to implement the given signature.
dspy.MultiChainComparison: Can compare multiple outputs from the ChainOfThought module to produce a final prediction.

More modules

dspy.BestOfN: Runs a module up to N times with different temperatures and returns the best prediction out of N attempts or the first prediction that passes the threshold.
dspy.CodeAct: CodeAct is a module that utilizes the Code Interpreter and predefined tools to solve the problem.
dspy.ProgramOfThought: A DSPy module that runs Python programs to solve a problem. This module requires deno to be installed.
dspy.Refine: Refines a module by running it up to N times with different temperatures and returns the best prediction.

A few examples

These are taken directly from https://dspy.ai/learn/programming/modules/

Math

math = dspy.ChainOfThought("question -> answer: float")
math(question="Two dice are tossed. What is the probability that the sum equals two?")
# Prediction(
#     reasoning='When two dice are tossed, each die has 6 faces, resulting in a total of 6 x 6 = 36 possible outcomes. The sum of the numbers on the two dice equals two only when both dice show a 1. This is just one specific outcome: (1, 1). Therefore, there is only 1 favorable outcome. The probability of the sum being two is the number of favorable outcomes divided by the total number of possible outcomes, which is 1/36.',
#     answer=0.0277776
# )

Retrieval-Augmented Generation

def search(query: str) -> list[str]:
    """Retrieves abstracts from Wikipedia."""
    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    return [x['text'] for x in results]

rag = dspy.ChainOfThought('context, question -> response')

question = "What's the name of the castle that David Gregory inherited?"
rag(context=search(question), question=question)

# Prediction(
#     reasoning='The context provides information about David Gregory, a Scottish physician and inventor. It specifically mentions that he inherited Kinnairdy Castle in 1664. This detail directly answers the question about the name of the castle that David Gregory inherited.',
#     response='Kinnairdy Castle'
# )

ColBERT is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.

— stanford-futuredata/ColBERT

Classification

from typing import Literal

class Classify(dspy.Signature):
    """Classify sentiment of a given sentence."""

    sentence: str = dspy.InputField()
    sentiment: Literal['positive', 'negative', 'neutral'] = dspy.OutputField()
    confidence: float = dspy.OutputField()

classify = dspy.Predict(Classify)
classify(sentence="This book was super fun to read, though not the last chapter.")
# Prediction(
#     sentiment='positive',
#     confidence=0.75
# )

Information Extraction

text = "Apple Inc. announced its latest iPhone 14 today. The CEO, Tim Cook, highlighted its new features in a press release."

module = dspy.Predict("text -> title, headings: list[str], entities_and_metadata: list[dict[str, str]]")
response = module(text=text)

print(response.title)
print(response.headings)
print(response.entities_and_metadata)
# Apple Unveils iPhone 14
# ['Introduction', 'Key Features', "CEO's Statement"]
# [{'entity': 'Apple Inc.', 'type': 'Organization'}, {'entity': 'iPhone 14', 'type': 'Product'}, {'entity': 'Tim Cook', 'type': 'Person'}]

Agents

def evaluate_math(expression: str) -> float:
    return dspy.PythonInterpreter({}).execute(expression)

def search_wikipedia(query: str) -> str:
    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    return [x['text'] for x in results]

react = dspy.ReAct("question -> answer: float", tools=[evaluate_math, search_wikipedia])

pred = react(question="What is 9362158 divided by the year of birth of David Gregory of Kinnairdy castle?")
print(pred.answer)
# 5761.328

Resources

Check out my AI tools & resources reference

Building Geodetic Go

June 29, 2025

The Hunt Begins

Late one night, a friend mentioned something that would consume me for the next day: survey marks.

Survey markers, also called survey marks, survey monuments, or geodetic marks, are objects placed to mark key survey points on the Earth’s surface. They are used in geodetic and land surveying. A benchmark is a type of survey marker that indicates elevation (vertical position). Horizontal position markers used for triangulation are also known as triangulation stations. Benchmarking is the hobby of “hunting” for these marks.

Since 1807, NOAA’s National Geodetic Survey (NGS) and its predecessor agencies have placed permanent survey marks or monuments throughout the United States so we can know exact locations and elevations on the surface of the Earth. A typical mark is a brass, bronze, or aluminum disk (or rod), but marks might also be prominent objects like water towers or church spires. The National Geodetic Survey’s database contains information on over 1.5 million survey disks, each with a detailed datasheet describing its exact position and physical characteristics.

The National Geodetic Survey Map is an ArcGIS Online Web Map Application that enables users to view multiple datasets provided by the National Geodetic Survey.

The Mark Recovery Dashboard displays mark recoveries that have been submitted to NGS.

An app about hunting these marks and tracking progress in a region would give my friends and I a reason to explore places — some nerds need nerdy nudges to navigate nature.

Geocaching.com seems to have had this feature at some point but seems to have removed the dataset (or maybe they just removed the mark page).

Benchmark Hunter is an iOS app to hunt for NGS Survey Marks released in 2021.

This seemed like the perfect excuse to answer a bigger question: In June 2025, what does AI-assisted development look like for a solo developer building something real?

With respect to Pokemon Go, I made a new directory geodetic-go.

Data Pipeline

Information about survey monuments (aka “marks”) stored in the National Geodetic Survey’s Integrated Database (NGS IDB) may be retrieved and displayed in a variety of methods. One standard is known as a datasheet, an ASCII text file consisting of rigorously formatted lines of 80 columns

— The DSDATA Format

The NGS provides datasheets at the state-level.

I chucked the DSDATA format spec into Google Gemini Chat and asked it to write a parser with a focus on extracting latitude, longitude, and market type. First I had it write TypeScript — as the rest of the codebase would be. However, it kept producing non-working stuff. Then, I asked it to do it without specifying a language and it started doing it in python but I don’t want to deal with the (venv) stuff. So I told it to write it in Go and it worked first try.

type Datasheet struct {
	PID               string `parquet:"pid"`
	Designation       string `parquet:"designation"`
	State             string `parquet:"state"`
	County            string `parquet:"county"`
	Latitude          string `parquet:"latitude"`
	Longitude         string `parquet:"longitude"`
	OrthometricHeight string `parquet:"orthometric_height"`
	EllipsoidHeight   string `parquet:"ellipsoid_height"`
	MarkerType        string `parquet:"marker_type"`
	RawText           string `parquet:"raw_text"`
}

I knew I wanted to parse and store the data in Cloudflare R2 because I enjoy the product and pricing. My first idea was to store the data in SQLite but after realizing the total text size when storing the raw text I want to display would be in the GBs — I didn’t want to send GBs down to a client — I realized I needed to change my approach. With the importance of data compression and the write-once, read-many style of data, I chose to use Apache Parquet files.

NGS provides DataSheets at the State level. However, to minimize data requirements I partitioned by county as well.

The pipeline is:

Download DataSheet text files from NGS (these come as .zip files)
Parse and partition into Parquet files
Upload parquet files

To upload into R2, I prompted Claude Code to write a script to use Rclone.

Frontend

I knew I wanted to use React Router v7 SPA mode — I’ve used Remix for years and have used React Router v7 in various other projects. Vite comes with Tailwind support but I just had Claude Code write a STYLE.md file and it starts like this

Terminal Design Features

Visual Style:

Classic green-on-black terminal color scheme

JetBrains Mono monospace font throughout

Terminal-style borders and panels

Animated blinking cursor effect

CRT-style scan line animation

Subtle screen grain effect

[…]

I didn’t start with this though but early on told Claude Code to redesign the frontend in this style and it did a great job.

AI models are great at writing terrible React code. Terrible, terrible, React code. I imagine this will get better over time — and there is certainly prompting improvements I can do but wow is it annoying.

Backend

I knew I wanted to use Cloudflare Workers for the backend if I could — there are a lot of limitations if you choose workers runtime but when it works it’s great and the platform is great. I chose Hono as the web framework and copied the Hono Stacks markdown documentation for Claude Code to use.

Hono’s RPC feature allows you to share API specs with little change to your code. The client generated by hc will read the spec and access the endpoint type-safety.

— Hono Stacks

I love this feature but interestingly enough Claude Code wrote 100% of the backend and client API code in this project. Not without needing adjustments though.

Dev missteps

Storing data in SQLite » moved to Apache Parquet
Served data through API » query directly from frontend
- parquetjs in workers runtime: cloudflare workers fs.stat not implemented yet » hyparquet

AI Review

I have yet to use Claude Code in an unchained manner — I either tell it exactly what to do or I tell it to think about how to do something, review that, and then tell it to do it and approve/deny every step of the way. With AI tools, you can adjust the input and in some cases you can poke the black-box a bit to change the output — but at the end of the day the output is still non-deterministic.

From what I’ve seen, if you do not technically understand the output you will immediately shoot yourself in the foot. If you cannot tell if the output is bad you cannot adjust it and you just dig a deeper hole in which shit is flung. Even in this project, by choosing React and deciding to move fast, and not having proper guides setup before hand, I let slide several useEffects and useStates of genuinely bad code! AI will produce many egregious suggestions but if you are knowledgeable about it you can fix it.

I used Repomix to pack the documentation for Hono.js and React Router into their own markdown files so I could tell Claude Code to search the file on how to use a certain thing. I also copy-paste specific documentation from Cloudflare into markdown files — their site has a copy-as-markdown button and it works beautifully — and tell Claude Code to read that file.

80% of the time I went Read [x] [y] [z] and think about how to implement [a]. There’s certainly better ways of going about it but this works pretty good.

We recommend using the word “think” to trigger extended thinking mode, which gives Claude additional computation time to evaluate alternatives more thoroughly. These specific phrases are mapped directly to increasing levels of thinking budget in the system: “think” < “think hard” < “think harder” < “ultrathink.” Each level allocates progressively more thinking budget for Claude to use.

— Claude Code Best Practices

AI amplifies specific development practices. Good practices become superpowers, bad practices become disasters. The two best practices you can do right now are

Clear documentation and convention.
Defined workflows, quality gates, and testing strategies.

Tools used

To code
To product
- DataSheet downloader/parser script
  - Language: Go
  - Storage sync: Rclone
- Storage: Cloudflare R2
  - Apache Parquet files
- Frontend
  - Language: TypeScript
  - Build: Vite
  - Web framework: React Router, framework mode, SPA
  - Interactive maps: Leaflet
  - Map tile layer: OpenStreetMap Carto
- Backend
  - Runtime: Cloudflare Workers
  - Web framework: Hono
  - Database: Cloudflare D1

Packages structure

packages
- backend - Hono/Cloudflare Workers API
- datasheet-downloader - Go downloader for NGS DataSheets
- datasheet-parser - Go parser for NGS DataSheets → Parquet files
- frontend - React Router web application

Resources

Check out my AI tools & resources reference

Tools for work

June 30, 2025

I’ve long had a weird compulsion to learn about the various tools used by an industry or area of work and internalize the landscape they make up.

iMovie (Apple)
Final Cut Pro (Apple)
Adobe Premiere
Adobe After Effects
HitFilm
Avid

And in design

GIMP
Inkscape
Adobe Illustrator/Photoshop/InDesign
Affinity Designer/Photo/Publisher
Canva

And in marketing

Moz
Ahrefs
SEMRush
Screaming Frog
SiteBulb
Nightwatch

And in development… well that’s a bigger rabbit hole.

“Conway’s Law” is a reference that you can understand systems by understanding the shape of the organization behind them.

Organizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations.

— Melvin E. Conway, How Do Committees Invent?

I think I am drawn to exploring the tool space because it seems clear to me that you can understand work by understanding the tools used to create it.

Tools which create work are constrained to produce outputs that are copies of the structure, logic, and limitations of the tools themselves.

It seems knowledge about tools are a bus ticket of mine — I collect them and keep distinctions just for the love of it.

Around the web on tools

AI Programs

June 30, 2025

What if you didn’t have to be dependent on strategies, optimizations, and models to improve AI task performance?

No more watering the garden squares of “do this not that” edge cases. No more “prompt engineering.”

Just program logic and optimization metrics.

Enter: DSPy

Probability example

Before DSPy

prompt = "You are a helpful assistant. Answer this math question step by step: Two dice are tossed. What is the probability that the sum equals two?"

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
)
# Yes a dumb case. But string-based prompting performance is brittle.

After DSPy

# Declarative, structured approach
import dspy

# Configure your language model
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

# Define behavior through signatures
math = dspy.ChainOfThought("question -> answer: float")

# Use the module - DSPy handles prompting automatically
result = math(question="Two dice are tossed. What is the probability that the sum equals two?")

# Get structured output with reasoning
print(result.reasoning)  # Step-by-step explanation
print(result.answer)     # 0.0277776

Okay, this example didn’t mean much to me when I first saw it. Let’s see a more real-world use case: extracting information from emails.

A Customer Support Email Classifier

Before DSPy

def process_email_old_way(subject, body, sender):
    # Separate prompts for each task - brittle and hard to maintain

    # Email classification
    classify_prompt = f"""
    Classify this email as one of: order_confirmation, support_request, meeting_invitation, newsletter, promotional, invoice, shipping_notification, other

    Subject: {subject}
    Body: {body}
    Sender: {sender}

    Classification:"""

    classification = call_openai(classify_prompt)

    # Entity extraction - different prompt structure
    extract_prompt = f"""
    Extract the following from this email:
    - Financial amounts (format: $X.XX)
    - Important dates (format: MM/DD/YYYY)
    - Contact information
    - Action items

    Email: {subject} {body}

    Extracted info:"""

    entities = call_openai(extract_prompt)

    # Urgency detection - yet another prompt
    urgency_prompt = f"""
    Rate the urgency of this email from 1-4:
    1=low, 2=medium, 3=high, 4=critical

    Consider: {subject}

    Urgency level:"""

    urgency = call_openai(urgency_prompt)

    # Manual parsing hell
    try:
        # Hope the LLM returned exactly what we expected...
        classification = classification.strip().lower()
        urgency_num = int(urgency.strip())

        # Parse entities with regex and prayer
        amounts = re.findall(r'\$[\d,]+\.?\d*', entities)
        dates = re.findall(r'\d{1,2}/\d{1,2}/\d{4}', entities)

        return {
            'type': classification,
            'urgency': urgency_num,
            'amounts': amounts,
            'dates': dates
        }
    except:
        # When it inevitably breaks...
        return {'error': 'Parsing failed'}

# Problems:
# - 4 separate API calls (slow, expensive)
# - Fragile string parsing
# - No consistency between outputs
# - Breaks when switching models
# - Manual prompt engineering for each task
# - No systematic way to improve accuracy

Want to optimize?

# When accuracy is poor, you manually add examples:
classify_prompt = f"""
Examples:
"Server down" -> support_request, critical
"Order confirmed" -> order_confirmation, low
"Meeting tomorrow" -> meeting_invitation, medium

Now classify: {subject}
"""
# Still brittle, still manual...

After DSPy

import dspy

class EmailProcessor(dspy.Module):
    def __init__(self):
        # Define WHAT you want, not HOW to prompt for it
        self.classifier = dspy.ChainOfThought(ClassifyEmail)
        self.entity_extractor = dspy.ChainOfThought(ExtractEntities)
        self.action_generator = dspy.ChainOfThought(GenerateActionItems)
        self.summarizer = dspy.ChainOfThought(SummarizeEmail)

    def forward(self, email_subject, email_body, sender):
        # Compose modules together - DSPy handles the prompting
        classification = self.classifier(
            email_subject=email_subject,
            email_body=email_body,
            sender=sender
        )

        entities = self.entity_extractor(
            email_content=f"{email_subject}\n{email_body}",
            email_type=classification.email_type
        )

        # Get structured, typed outputs automatically
        return dspy.Prediction(
            email_type=classification.email_type,
            urgency=classification.urgency,
            financial_amount=entities.financial_amount,  # Proper float
            important_dates=entities.important_dates,    # Proper list
            action_required=True if classification.urgency == "critical" else False
        )

# Usage - clean and simple
processor = EmailProcessor()
result = processor(
    "URGENT: Server Down",
    "Production is offline, need immediate help",
    "[email protected]"
)

print(result.email_type)      # EmailType.SUPPORT_REQUEST
print(result.urgency)         # UrgencyLevel.CRITICAL
print(result.financial_amount) # None (properly typed)

Want to optimize?

# Load your email dataset
emails = load_historical_emails()  # 1000 labeled emails

# Define success metric
def email_accuracy(example, prediction):
    return (example.email_type == prediction.email_type and
            example.urgency == prediction.urgency)

# Optimize the ENTIRE pipeline automatically
optimizer = dspy.MIPROv2(metric=email_accuracy)
optimized_processor = optimizer.compile(processor, trainset=emails)

# Optimized prompts for each module
# Handles edge cases automatically

You should probably be using DSPy.

DSPY Modules

Signatures specify the input/output behavior of a DSPy module. Any valid variable names work, the DSPy compiler will optimize the keywords.

For example, for summarization, “document -> summary”, “text -> gist”, or “long_context -> tldr” all invoke summarization.

Modules are building blocks that handle signatures and prompt configuration and can be composed into bigger modules.

Core modules

dspy.Predict: Basic predictor. Does not modify the signature. Handles the key forms of learning (i.e., storing the instructions and demonstrations and updates to the LM).
dspy.ChainOfThought: Teaches the LM to think step-by-step before committing to the signature’s response.
dspy.ProgramOfThought: Teaches the LM to output code, whose execution results will dictate the response.
dspy.ReAct: An agent that can use tools to implement the given signature.
dspy.MultiChainComparison: Can compare multiple outputs from the ChainOfThought module to produce a final prediction.

More modules

dspy.BestOfN: Runs a module up to N times with different temperatures and returns the best prediction out of N attempts or the first prediction that passes the threshold.
dspy.CodeAct: CodeAct is a module that utilizes the Code Interpreter and predefined tools to solve the problem.
dspy.ProgramOfThought: A DSPy module that runs Python programs to solve a problem. This module requires deno to be installed.
dspy.Refine: Refines a module by running it up to N times with different temperatures and returns the best prediction.

A few examples

These are taken directly from https://dspy.ai/learn/programming/modules/

Math

math = dspy.ChainOfThought("question -> answer: float")
math(question="Two dice are tossed. What is the probability that the sum equals two?")
# Prediction(
#     reasoning='When two dice are tossed, each die has 6 faces, resulting in a total of 6 x 6 = 36 possible outcomes. The sum of the numbers on the two dice equals two only when both dice show a 1. This is just one specific outcome: (1, 1). Therefore, there is only 1 favorable outcome. The probability of the sum being two is the number of favorable outcomes divided by the total number of possible outcomes, which is 1/36.',
#     answer=0.0277776
# )

Retrieval-Augmented Generation

def search(query: str) -> list[str]:
    """Retrieves abstracts from Wikipedia."""
    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    return [x['text'] for x in results]

rag = dspy.ChainOfThought('context, question -> response')

question = "What's the name of the castle that David Gregory inherited?"
rag(context=search(question), question=question)

# Prediction(
#     reasoning='The context provides information about David Gregory, a Scottish physician and inventor. It specifically mentions that he inherited Kinnairdy Castle in 1664. This detail directly answers the question about the name of the castle that David Gregory inherited.',
#     response='Kinnairdy Castle'
# )

ColBERT is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.

— stanford-futuredata/ColBERT

Classification

from typing import Literal

class Classify(dspy.Signature):
    """Classify sentiment of a given sentence."""

    sentence: str = dspy.InputField()
    sentiment: Literal['positive', 'negative', 'neutral'] = dspy.OutputField()
    confidence: float = dspy.OutputField()

classify = dspy.Predict(Classify)
classify(sentence="This book was super fun to read, though not the last chapter.")
# Prediction(
#     sentiment='positive',
#     confidence=0.75
# )

Information Extraction

text = "Apple Inc. announced its latest iPhone 14 today. The CEO, Tim Cook, highlighted its new features in a press release."

module = dspy.Predict("text -> title, headings: list[str], entities_and_metadata: list[dict[str, str]]")
response = module(text=text)

print(response.title)
print(response.headings)
print(response.entities_and_metadata)
# Apple Unveils iPhone 14
# ['Introduction', 'Key Features', "CEO's Statement"]
# [{'entity': 'Apple Inc.', 'type': 'Organization'}, {'entity': 'iPhone 14', 'type': 'Product'}, {'entity': 'Tim Cook', 'type': 'Person'}]

Agents

def evaluate_math(expression: str) -> float:
    return dspy.PythonInterpreter({}).execute(expression)

def search_wikipedia(query: str) -> str:
    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    return [x['text'] for x in results]

react = dspy.ReAct("question -> answer: float", tools=[evaluate_math, search_wikipedia])

pred = react(question="What is 9362158 divided by the year of birth of David Gregory of Kinnairdy castle?")
print(pred.answer)
# 5761.328

Resources

Check out my AI tools & resources reference

Building Geodetic Go

June 29, 2025

The Hunt Begins

Late one night, a friend mentioned something that would consume me for the next day: survey marks.

Survey markers, also called survey marks, survey monuments, or geodetic marks, are objects placed to mark key survey points on the Earth’s surface. They are used in geodetic and land surveying. A benchmark is a type of survey marker that indicates elevation (vertical position). Horizontal position markers used for triangulation are also known as triangulation stations. Benchmarking is the hobby of “hunting” for these marks.

The National Geodetic Survey Map is an ArcGIS Online Web Map Application that enables users to view multiple datasets provided by the National Geodetic Survey.

The Mark Recovery Dashboard displays mark recoveries that have been submitted to NGS.

An app about hunting these marks and tracking progress in a region would give my friends and I a reason to explore places — some nerds need nerdy nudges to navigate nature.

Geocaching.com seems to have had this feature at some point but seems to have removed the dataset (or maybe they just removed the mark page).

Benchmark Hunter is an iOS app to hunt for NGS Survey Marks released in 2021.

This seemed like the perfect excuse to answer a bigger question: In June 2025, what does AI-assisted development look like for a solo developer building something real?

With respect to Pokemon Go, I made a new directory geodetic-go.

Data Pipeline

Information about survey monuments (aka “marks”) stored in the National Geodetic Survey’s Integrated Database (NGS IDB) may be retrieved and displayed in a variety of methods. One standard is known as a datasheet, an ASCII text file consisting of rigorously formatted lines of 80 columns

— The DSDATA Format

The NGS provides datasheets at the state-level.

type Datasheet struct {
	PID               string `parquet:"pid"`
	Designation       string `parquet:"designation"`
	State             string `parquet:"state"`
	County            string `parquet:"county"`
	Latitude          string `parquet:"latitude"`
	Longitude         string `parquet:"longitude"`
	OrthometricHeight string `parquet:"orthometric_height"`
	EllipsoidHeight   string `parquet:"ellipsoid_height"`
	MarkerType        string `parquet:"marker_type"`
	RawText           string `parquet:"raw_text"`
}

NGS provides DataSheets at the State level. However, to minimize data requirements I partitioned by county as well.

The pipeline is:

Download DataSheet text files from NGS (these come as .zip files)
Parse and partition into Parquet files
Upload parquet files

To upload into R2, I prompted Claude Code to write a script to use Rclone.

Frontend

Terminal Design Features

Visual Style:

Classic green-on-black terminal color scheme

JetBrains Mono monospace font throughout

Terminal-style borders and panels

Animated blinking cursor effect

CRT-style scan line animation

Subtle screen grain effect

[…]

I didn’t start with this though but early on told Claude Code to redesign the frontend in this style and it did a great job.

Backend

Hono’s RPC feature allows you to share API specs with little change to your code. The client generated by hc will read the spec and access the endpoint type-safety.

— Hono Stacks

I love this feature but interestingly enough Claude Code wrote 100% of the backend and client API code in this project. Not without needing adjustments though.

Dev missteps

Storing data in SQLite » moved to Apache Parquet
Served data through API » query directly from frontend
- parquetjs in workers runtime: cloudflare workers fs.stat not implemented yet » hyparquet

AI Review

80% of the time I went Read [x] [y] [z] and think about how to implement [a]. There’s certainly better ways of going about it but this works pretty good.

We recommend using the word “think” to trigger extended thinking mode, which gives Claude additional computation time to evaluate alternatives more thoroughly. These specific phrases are mapped directly to increasing levels of thinking budget in the system: “think” < “think hard” < “think harder” < “ultrathink.” Each level allocates progressively more thinking budget for Claude to use.

— Claude Code Best Practices

AI amplifies specific development practices. Good practices become superpowers, bad practices become disasters. The two best practices you can do right now are

Clear documentation and convention.
Defined workflows, quality gates, and testing strategies.

Tools used

To code
To product
- DataSheet downloader/parser script
  - Language: Go
  - Storage sync: Rclone
- Storage: Cloudflare R2
  - Apache Parquet files
- Frontend
  - Language: TypeScript
  - Build: Vite
  - Web framework: React Router, framework mode, SPA
  - Interactive maps: Leaflet
  - Map tile layer: OpenStreetMap Carto
- Backend
  - Runtime: Cloudflare Workers
  - Web framework: Hono
  - Database: Cloudflare D1

Packages structure

packages
- backend - Hono/Cloudflare Workers API
- datasheet-downloader - Go downloader for NGS DataSheets
- datasheet-parser - Go parser for NGS DataSheets → Parquet files
- frontend - React Router web application

Resources

Check out my AI tools & resources reference