AI Programs
Planted 02025-06-30
DSPy
What if you didn’t have to be dependent on strategies, optimizations, and models to improve AI task performance?
No more watering the garden squares of “do this not that” edge cases. No more “prompt engineering.”
Just program logic and optimization metrics.
Enter: DSPy
Probability example
Before DSPy
prompt = "You are a helpful assistant. Answer this math question step by step: Two dice are tossed. What is the probability that the sum equals two?"
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
# Yes a dumb case. But string-based prompting performance is brittle.
After DSPy
# Declarative, structured approach
import dspy
# Configure your language model
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
# Define behavior through signatures
math = dspy.ChainOfThought("question -> answer: float")
# Use the module - DSPy handles prompting automatically
result = math(question="Two dice are tossed. What is the probability that the sum equals two?")
# Get structured output with reasoning
print(result.reasoning) # Step-by-step explanation
print(result.answer) # 0.0277776
Okay, this example didn’t mean much to me when I first saw it. Let’s see a more real-world use case: extracting information from emails.
A Customer Support Email Classifier
Before DSPy
def process_email_old_way(subject, body, sender):
# Separate prompts for each task - brittle and hard to maintain
# Email classification
classify_prompt = f"""
Classify this email as one of: order_confirmation, support_request, meeting_invitation, newsletter, promotional, invoice, shipping_notification, other
Subject: {subject}
Body: {body}
Sender: {sender}
Classification:"""
classification = call_openai(classify_prompt)
# Entity extraction - different prompt structure
extract_prompt = f"""
Extract the following from this email:
- Financial amounts (format: $X.XX)
- Important dates (format: MM/DD/YYYY)
- Contact information
- Action items
Email: {subject} {body}
Extracted info:"""
entities = call_openai(extract_prompt)
# Urgency detection - yet another prompt
urgency_prompt = f"""
Rate the urgency of this email from 1-4:
1=low, 2=medium, 3=high, 4=critical
Consider: {subject}
Urgency level:"""
urgency = call_openai(urgency_prompt)
# Manual parsing hell
try:
# Hope the LLM returned exactly what we expected...
classification = classification.strip().lower()
urgency_num = int(urgency.strip())
# Parse entities with regex and prayer
amounts = re.findall(r'\$[\d,]+\.?\d*', entities)
dates = re.findall(r'\d{1,2}/\d{1,2}/\d{4}', entities)
return {
'type': classification,
'urgency': urgency_num,
'amounts': amounts,
'dates': dates
}
except:
# When it inevitably breaks...
return {'error': 'Parsing failed'}
# Problems:
# - 4 separate API calls (slow, expensive)
# - Fragile string parsing
# - No consistency between outputs
# - Breaks when switching models
# - Manual prompt engineering for each task
# - No systematic way to improve accuracy
Want to optimize?
# When accuracy is poor, you manually add examples:
classify_prompt = f"""
Examples:
"Server down" -> support_request, critical
"Order confirmed" -> order_confirmation, low
"Meeting tomorrow" -> meeting_invitation, medium
Now classify: {subject}
"""
# Still brittle, still manual...
After DSPy
import dspy
class EmailProcessor(dspy.Module):
def __init__(self):
# Define WHAT you want, not HOW to prompt for it
self.classifier = dspy.ChainOfThought(ClassifyEmail)
self.entity_extractor = dspy.ChainOfThought(ExtractEntities)
self.action_generator = dspy.ChainOfThought(GenerateActionItems)
self.summarizer = dspy.ChainOfThought(SummarizeEmail)
def forward(self, email_subject, email_body, sender):
# Compose modules together - DSPy handles the prompting
classification = self.classifier(
email_subject=email_subject,
email_body=email_body,
sender=sender
)
entities = self.entity_extractor(
email_content=f"{email_subject}\n{email_body}",
email_type=classification.email_type
)
# Get structured, typed outputs automatically
return dspy.Prediction(
email_type=classification.email_type,
urgency=classification.urgency,
financial_amount=entities.financial_amount, # Proper float
important_dates=entities.important_dates, # Proper list
action_required=True if classification.urgency == "critical" else False
)
# Usage - clean and simple
processor = EmailProcessor()
result = processor(
"URGENT: Server Down",
"Production is offline, need immediate help",
"[email protected]"
)
print(result.email_type) # EmailType.SUPPORT_REQUEST
print(result.urgency) # UrgencyLevel.CRITICAL
print(result.financial_amount) # None (properly typed)
Want to optimize?
# Load your email dataset
emails = load_historical_emails() # 1000 labeled emails
# Define success metric
def email_accuracy(example, prediction):
return (example.email_type == prediction.email_type and
example.urgency == prediction.urgency)
# Optimize the ENTIRE pipeline automatically
optimizer = dspy.MIPROv2(metric=email_accuracy)
optimized_processor = optimizer.compile(processor, trainset=emails)
# Optimized prompts for each module
# Handles edge cases automatically
You should probably be using DSPy.
DSPY Modules
Signatures specify the input/output behavior of a DSPy module. Any valid variable names work, the DSPy compiler will optimize the keywords.
For example, for summarization, “document -> summary”, “text -> gist”, or “long_context -> tldr” all invoke summarization.
Modules are building blocks that handle signatures and prompt configuration and can be composed into bigger modules.
Core modules
- dspy.Predict: Basic predictor. Does not modify the signature. Handles the key forms of learning (i.e., storing the instructions and demonstrations and updates to the LM).
- dspy.ChainOfThought: Teaches the LM to think step-by-step before committing to the signature’s response.
- dspy.ProgramOfThought: Teaches the LM to output code, whose execution results will dictate the response.
- dspy.ReAct: An agent that can use tools to implement the given signature.
- dspy.MultiChainComparison: Can compare multiple outputs from the ChainOfThought module to produce a final prediction.
More modules
- dspy.BestOfN: Runs a module up to N times with different temperatures and returns the best prediction out of N attempts or the first prediction that passes the threshold.
- dspy.CodeAct: CodeAct is a module that utilizes the Code Interpreter and predefined tools to solve the problem.
- dspy.ProgramOfThought: A DSPy module that runs Python programs to solve a problem. This module requires deno to be installed.
- dspy.Refine: Refines a module by running it up to N times with different temperatures and returns the best prediction.
A few examples
These are taken directly from https://dspy.ai/learn/programming/modules/
Math
math = dspy.ChainOfThought("question -> answer: float")
math(question="Two dice are tossed. What is the probability that the sum equals two?")
# Prediction(
# reasoning='When two dice are tossed, each die has 6 faces, resulting in a total of 6 x 6 = 36 possible outcomes. The sum of the numbers on the two dice equals two only when both dice show a 1. This is just one specific outcome: (1, 1). Therefore, there is only 1 favorable outcome. The probability of the sum being two is the number of favorable outcomes divided by the total number of possible outcomes, which is 1/36.',
# answer=0.0277776
# )
Retrieval-Augmented Generation
def search(query: str) -> list[str]:
"""Retrieves abstracts from Wikipedia."""
results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
return [x['text'] for x in results]
rag = dspy.ChainOfThought('context, question -> response')
question = "What's the name of the castle that David Gregory inherited?"
rag(context=search(question), question=question)
# Prediction(
# reasoning='The context provides information about David Gregory, a Scottish physician and inventor. It specifically mentions that he inherited Kinnairdy Castle in 1664. This detail directly answers the question about the name of the castle that David Gregory inherited.',
# response='Kinnairdy Castle'
# )
ColBERT is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
Classification
from typing import Literal
class Classify(dspy.Signature):
"""Classify sentiment of a given sentence."""
sentence: str = dspy.InputField()
sentiment: Literal['positive', 'negative', 'neutral'] = dspy.OutputField()
confidence: float = dspy.OutputField()
classify = dspy.Predict(Classify)
classify(sentence="This book was super fun to read, though not the last chapter.")
# Prediction(
# sentiment='positive',
# confidence=0.75
# )
Information Extraction
text = "Apple Inc. announced its latest iPhone 14 today. The CEO, Tim Cook, highlighted its new features in a press release."
module = dspy.Predict("text -> title, headings: list[str], entities_and_metadata: list[dict[str, str]]")
response = module(text=text)
print(response.title)
print(response.headings)
print(response.entities_and_metadata)
# Apple Unveils iPhone 14
# ['Introduction', 'Key Features', "CEO's Statement"]
# [{'entity': 'Apple Inc.', 'type': 'Organization'}, {'entity': 'iPhone 14', 'type': 'Product'}, {'entity': 'Tim Cook', 'type': 'Person'}]
Agents
def evaluate_math(expression: str) -> float:
return dspy.PythonInterpreter({}).execute(expression)
def search_wikipedia(query: str) -> str:
results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
return [x['text'] for x in results]
react = dspy.ReAct("question -> answer: float", tools=[evaluate_math, search_wikipedia])
pred = react(question="What is 9362158 divided by the year of birth of David Gregory of Kinnairdy castle?")
print(pred.answer)
# 5761.328
Resources
Check out my AI tools & resources reference