AI Programs

Planted 02025-06-30

DSPy

What if you didn’t have to be dependent on strategies, optimizations, and models to improve AI task performance?

No more watering the garden squares of “do this not that” edge cases. No more “prompt engineering.”

Just program logic and optimization metrics.

Enter: DSPy

Probability example

Before DSPy

prompt = "You are a helpful assistant. Answer this math question step by step: Two dice are tossed. What is the probability that the sum equals two?"

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
)
# Yes a dumb case. But string-based prompting performance is brittle.

After DSPy

# Declarative, structured approach
import dspy

# Configure your language model
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

# Define behavior through signatures
math = dspy.ChainOfThought("question -> answer: float")

# Use the module - DSPy handles prompting automatically
result = math(question="Two dice are tossed. What is the probability that the sum equals two?")

# Get structured output with reasoning
print(result.reasoning)  # Step-by-step explanation
print(result.answer)     # 0.0277776

Okay, this example didn’t mean much to me when I first saw it. Let’s see a more real-world use case: extracting information from emails.

A Customer Support Email Classifier

Before DSPy

def process_email_old_way(subject, body, sender):
    # Separate prompts for each task - brittle and hard to maintain

    # Email classification
    classify_prompt = f"""
    Classify this email as one of: order_confirmation, support_request, meeting_invitation, newsletter, promotional, invoice, shipping_notification, other

    Subject: {subject}
    Body: {body}
    Sender: {sender}

    Classification:"""

    classification = call_openai(classify_prompt)

    # Entity extraction - different prompt structure
    extract_prompt = f"""
    Extract the following from this email:
    - Financial amounts (format: $X.XX)
    - Important dates (format: MM/DD/YYYY)
    - Contact information
    - Action items

    Email: {subject} {body}

    Extracted info:"""

    entities = call_openai(extract_prompt)

    # Urgency detection - yet another prompt
    urgency_prompt = f"""
    Rate the urgency of this email from 1-4:
    1=low, 2=medium, 3=high, 4=critical

    Consider: {subject}

    Urgency level:"""

    urgency = call_openai(urgency_prompt)

    # Manual parsing hell
    try:
        # Hope the LLM returned exactly what we expected...
        classification = classification.strip().lower()
        urgency_num = int(urgency.strip())

        # Parse entities with regex and prayer
        amounts = re.findall(r'\$[\d,]+\.?\d*', entities)
        dates = re.findall(r'\d{1,2}/\d{1,2}/\d{4}', entities)

        return {
            'type': classification,
            'urgency': urgency_num,
            'amounts': amounts,
            'dates': dates
        }
    except:
        # When it inevitably breaks...
        return {'error': 'Parsing failed'}

# Problems:
# - 4 separate API calls (slow, expensive)
# - Fragile string parsing
# - No consistency between outputs
# - Breaks when switching models
# - Manual prompt engineering for each task
# - No systematic way to improve accuracy

Want to optimize?

# When accuracy is poor, you manually add examples:
classify_prompt = f"""
Examples:
"Server down" -> support_request, critical
"Order confirmed" -> order_confirmation, low
"Meeting tomorrow" -> meeting_invitation, medium

Now classify: {subject}
"""
# Still brittle, still manual...

After DSPy

import dspy

class EmailProcessor(dspy.Module):
    def __init__(self):
        # Define WHAT you want, not HOW to prompt for it
        self.classifier = dspy.ChainOfThought(ClassifyEmail)
        self.entity_extractor = dspy.ChainOfThought(ExtractEntities)
        self.action_generator = dspy.ChainOfThought(GenerateActionItems)
        self.summarizer = dspy.ChainOfThought(SummarizeEmail)

    def forward(self, email_subject, email_body, sender):
        # Compose modules together - DSPy handles the prompting
        classification = self.classifier(
            email_subject=email_subject,
            email_body=email_body,
            sender=sender
        )

        entities = self.entity_extractor(
            email_content=f"{email_subject}\n{email_body}",
            email_type=classification.email_type
        )

        # Get structured, typed outputs automatically
        return dspy.Prediction(
            email_type=classification.email_type,
            urgency=classification.urgency,
            financial_amount=entities.financial_amount,  # Proper float
            important_dates=entities.important_dates,    # Proper list
            action_required=True if classification.urgency == "critical" else False
        )

# Usage - clean and simple
processor = EmailProcessor()
result = processor(
    "URGENT: Server Down",
    "Production is offline, need immediate help",
    "[email protected]"
)

print(result.email_type)      # EmailType.SUPPORT_REQUEST
print(result.urgency)         # UrgencyLevel.CRITICAL
print(result.financial_amount) # None (properly typed)

Want to optimize?

# Load your email dataset
emails = load_historical_emails()  # 1000 labeled emails

# Define success metric
def email_accuracy(example, prediction):
    return (example.email_type == prediction.email_type and
            example.urgency == prediction.urgency)

# Optimize the ENTIRE pipeline automatically
optimizer = dspy.MIPROv2(metric=email_accuracy)
optimized_processor = optimizer.compile(processor, trainset=emails)

# Optimized prompts for each module
# Handles edge cases automatically

You should probably be using DSPy.

DSPY Modules

Signatures specify the input/output behavior of a DSPy module. Any valid variable names work, the DSPy compiler will optimize the keywords.

For example, for summarization, “document -> summary”, “text -> gist”, or “long_context -> tldr” all invoke summarization.

Modules are building blocks that handle signatures and prompt configuration and can be composed into bigger modules.

Core modules

dspy.Predict: Basic predictor. Does not modify the signature. Handles the key forms of learning (i.e., storing the instructions and demonstrations and updates to the LM).
dspy.ChainOfThought: Teaches the LM to think step-by-step before committing to the signature’s response.
dspy.ProgramOfThought: Teaches the LM to output code, whose execution results will dictate the response.
dspy.ReAct: An agent that can use tools to implement the given signature.
dspy.MultiChainComparison: Can compare multiple outputs from the ChainOfThought module to produce a final prediction.

More modules

dspy.BestOfN: Runs a module up to N times with different temperatures and returns the best prediction out of N attempts or the first prediction that passes the threshold.
dspy.CodeAct: CodeAct is a module that utilizes the Code Interpreter and predefined tools to solve the problem.
dspy.ProgramOfThought: A DSPy module that runs Python programs to solve a problem. This module requires deno to be installed.
dspy.Refine: Refines a module by running it up to N times with different temperatures and returns the best prediction.

A few examples

These are taken directly from https://dspy.ai/learn/programming/modules/

Math

math = dspy.ChainOfThought("question -> answer: float")
math(question="Two dice are tossed. What is the probability that the sum equals two?")
# Prediction(
#     reasoning='When two dice are tossed, each die has 6 faces, resulting in a total of 6 x 6 = 36 possible outcomes. The sum of the numbers on the two dice equals two only when both dice show a 1. This is just one specific outcome: (1, 1). Therefore, there is only 1 favorable outcome. The probability of the sum being two is the number of favorable outcomes divided by the total number of possible outcomes, which is 1/36.',
#     answer=0.0277776
# )

Retrieval-Augmented Generation

def search(query: str) -> list[str]:
    """Retrieves abstracts from Wikipedia."""
    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    return [x['text'] for x in results]

rag = dspy.ChainOfThought('context, question -> response')

question = "What's the name of the castle that David Gregory inherited?"
rag(context=search(question), question=question)

# Prediction(
#     reasoning='The context provides information about David Gregory, a Scottish physician and inventor. It specifically mentions that he inherited Kinnairdy Castle in 1664. This detail directly answers the question about the name of the castle that David Gregory inherited.',
#     response='Kinnairdy Castle'
# )

ColBERT is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.

— stanford-futuredata/ColBERT

Classification

from typing import Literal

class Classify(dspy.Signature):
    """Classify sentiment of a given sentence."""

    sentence: str = dspy.InputField()
    sentiment: Literal['positive', 'negative', 'neutral'] = dspy.OutputField()
    confidence: float = dspy.OutputField()

classify = dspy.Predict(Classify)
classify(sentence="This book was super fun to read, though not the last chapter.")
# Prediction(
#     sentiment='positive',
#     confidence=0.75
# )

Information Extraction

text = "Apple Inc. announced its latest iPhone 14 today. The CEO, Tim Cook, highlighted its new features in a press release."

module = dspy.Predict("text -> title, headings: list[str], entities_and_metadata: list[dict[str, str]]")
response = module(text=text)

print(response.title)
print(response.headings)
print(response.entities_and_metadata)
# Apple Unveils iPhone 14
# ['Introduction', 'Key Features', "CEO's Statement"]
# [{'entity': 'Apple Inc.', 'type': 'Organization'}, {'entity': 'iPhone 14', 'type': 'Product'}, {'entity': 'Tim Cook', 'type': 'Person'}]

Agents

def evaluate_math(expression: str) -> float:
    return dspy.PythonInterpreter({}).execute(expression)

def search_wikipedia(query: str) -> str:
    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    return [x['text'] for x in results]

react = dspy.ReAct("question -> answer: float", tools=[evaluate_math, search_wikipedia])

pred = react(question="What is 9362158 divided by the year of birth of David Gregory of Kinnairdy castle?")
print(pred.answer)
# 5761.328

Resources

Check out my AI tools & resources reference