AI Recipe Parsing: How We Taught Claude to Understand Sourdough

Doughflow Team

February 15, 202512 min read

The Problem: Recipes Are Unstructured Chaos

If you have ever tried to parse sourdough recipes programmatically, you know the pain. Recipe authors write for humans, not machines. They assume context, use inconsistent terminology, embed critical timing information in casual asides, and structure their instructions in wildly varying formats.

Consider what a typical recipe looks like in the wild:

Day 1 - Evening (around 9pm)
Mix your levain: 20g starter, 50g bread flour, 50g whole wheat, 100g water. 
Let it hang out overnight - you want it nice and bubbly by morning, probably 
8-10 hours depending on how warm your kitchen is.
 
Day 2 - Morning
When your levain has at least doubled and smells sweet/yeasty, mix the main dough. 
I usually do this around 7am but honestly whenever you wake up is fine as long as 
the levain is ready...

This snippet contains at least six distinct pieces of structured data: a preparation phase, ingredient quantities with units, a duration range, a temperature-dependent condition, a visual readiness indicator, and a flexible time window. Extracting this reliably from free-form text is a non-trivial NLP problem.

Traditional approaches using regex and keyword matching fail here. The phrase "8-10 hours" could mean fermentation time, or it could appear in "I first tried this recipe 8-10 hours after reading about it online." Context matters.

Why LLMs Changed Everything

Large language models brought something qualitatively different to this problem: genuine semantic understanding. An LLM does not pattern-match against predefined rules. It builds a representation of meaning and uses that representation to extract structured information.

When Claude reads "let it hang out overnight," it understands that:

"It" refers to the levain mentioned in the previous sentence
"Hang out" is a colloquial way of saying "ferment at room temperature"
"Overnight" typically means 8-12 hours in the context of bread baking
This is a passive waiting step, not an active task

No regex can capture this. The model's training on millions of cooking and baking texts has encoded the implicit knowledge that recipe authors assume their readers possess.

The Architecture of Recipe Parsing

Our parsing pipeline operates in three stages:

Raw Recipe Text
      |
      v
+------------------+
|  Segmentation    |  Split into logical sections
+------------------+
      |
      v
+------------------+
|  Extraction      |  Pull structured data from each section
+------------------+
      |
      v
+------------------+
|  Normalization   |  Convert to consistent units and formats
+------------------+
      |
      v
Structured Schedule

Each stage leverages Claude's understanding differently. Segmentation requires recognizing narrative boundaries. Extraction requires domain knowledge about baking terminology. Normalization requires understanding equivalences like "overnight" equals "8-12 hours" and "room temperature" equals "68-75F unless otherwise specified."

Challenge One: Timing Extraction

Timing is where the complexity explodes. Sourdough recipes express duration in at least a dozen different ways:

Explicit absolute times:

"Bulk ferment from 10am to 2pm"
"Mix at 8:00 AM"

Explicit relative durations:

"Let rise for 4 hours"
"Proof 45 minutes to 1 hour"

Implicit relative durations:

"Overnight" (8-12 hours by convention)
"Until doubled" (1-4 hours depending on conditions)
"While you sleep" (6-10 hours typically)

Condition-based durations:

"Until the dough passes the windowpane test"
"When the levain floats in water"
"Once you see bubbles throughout"

Environment-dependent durations:

"3-4 hours in a warm kitchen, longer if cold"
"Faster in summer, slower in winter"

Compound expressions:

"Every 30 minutes for the first 2 hours"
"Three or four stretch-and-folds at 45-minute intervals"

The model must recognize all of these patterns and convert them to machine-readable representations. We structured this as a JSON schema that the model outputs:

interface TimingData {
  type: 'absolute' | 'relative' | 'condition' | 'interval';
  minDuration?: number;  // seconds
  maxDuration?: number;  // seconds
  condition?: string;
  repeatCount?: number;
  repeatInterval?: number;  // seconds
  temperatureDependence?: {
    baseTemp: number;
    adjustment: number;  // seconds per degree
  };
}

The prompt engineering here is critical. We found that asking Claude to "extract timing information" produced inconsistent results. Asking it to "identify all temporal expressions and classify each according to this schema" with explicit examples produced dramatically better output.

Handling Ambiguity

Some timing expressions are genuinely ambiguous, and pretending otherwise produces garbage output. When a recipe says "proof until ready," we cannot manufacture a number. Instead, we extract the condition ("ready") and flag the step for user confirmation.

Our schema includes a confidence field:

interface ParsedStep {
  action: string;
  timing: TimingData;
  confidence: 'high' | 'medium' | 'low';
  rawText: string;  // Original text for user verification
}

Low-confidence extractions surface to the user for validation. This is not a failure mode. It is an acknowledgment that recipe text is often underspecified and human judgment is required.

Challenge Two: Step Classification

Not all recipe steps are equal from a scheduling perspective. We classify steps into distinct categories that affect how they appear in a baking schedule:

Active steps require the baker's attention and hands:

Mixing ingredients
Performing stretch-and-folds
Shaping the loaf
Scoring before baking

Passive steps happen without intervention:

Bulk fermentation
Proofing
Levain maturation
Cold retard

Flex steps can be shortened, extended, or repositioned:

Cold retard (8-72 hours typically acceptable)
Autolyse (30 minutes to 2 hours)
Some bulk fermentation periods

Critical steps have narrow timing windows:

Baking (cannot pause once started)
Shaping (dough degasses if you wait too long)
Scoring (must happen immediately before oven)

Classification requires understanding the underlying baking science, not just the words used. "Let the dough rest" could be autolyse (passive, flex), bench rest (passive, somewhat critical), or final proof (passive, critical). The model must infer from context which applies.

We improved classification accuracy by including domain knowledge in the system prompt:

You are analyzing a sourdough bread recipe. Key domain knowledge:
- Autolyse: flour and water rest before adding starter/salt. Flexible timing.
- Bulk fermentation: primary rise with starter. Time-critical at room temp.
- Cold retard: refrigerated rest. Highly flexible, 8-72 hours typical.
- Final proof: shaped dough rising before bake. Time-critical.
- Bench rest: brief shaped rest. Typically 15-30 minutes.

This context injection reduced misclassification errors by 40% compared to the model operating without explicit domain knowledge.

Challenge Three: Temperature Parsing

Temperature appears in recipes in several contexts:

Ingredient temperatures:

"Use room temperature water"
"Ice cold water to slow fermentation"
"Water at 90F to compensate for cold flour"

Environment temperatures:

"Keep dough in a warm spot, around 78-80F"
"If your kitchen is cold, ferment in the oven with the light on"

Process temperatures:

"Bake at 475F with steam for 20 minutes"
"Drop to 425F and bake until internal temp reaches 205F"

Calculated temperatures (DDT):

"Target dough temperature of 78F"
"Desired dough temp: 76F"

Each type has different implications for scheduling. Ingredient temperature affects how you prepare. Environment temperature affects fermentation timing. Process temperature is informational for the baking step. DDT is a target that determines water temperature calculation.

The model must also handle unit conversion (Celsius vs Fahrenheit), approximate expressions ("lukewarm," "hand-hot," "blood temperature"), and implicit defaults ("room temperature" when no number is given).

We normalize all temperatures to a common representation:

interface Temperature {
  value: number;        // Always stored in Fahrenheit
  source: 'explicit' | 'inferred' | 'default';
  context: 'ingredient' | 'environment' | 'process' | 'target';
  rawText?: string;
}

Real-World Example: Messy Recipe to Structured Schedule

Here is an actual recipe snippet (anonymized) and what the parsing pipeline produces:

Input:

TARTINE COUNTRY LOAF - my tweaked version
 
Levain build (do this before bed)
Mix 1 tbsp starter with 200g flour (I use half bread half WW) and 200g water. 
Cover and leave on counter til morning - should be domed and active, 
smelling sweet not sour.
 
The next morning when you wake up (I usually start around 6:30-7) mix your dough:
1000g bread flour
200g whole wheat  
780g water (I like 78-80F for my kitchen which runs cold)
200g levain
22g salt
 
First mix everything except salt. Let it autolyze 30min-1hr while you have 
coffee, then add salt and pinch it in. Mix with your hands until salt is 
incorporated (2-3 min).
 
Bulk ferment 3-5 hours at room temp, doing stretch and folds every half 
hour for the first 2 hours (so 4 sets total). After that just leave it 
alone. You're looking for 50% volume increase, jiggly when you shake 
the container.

Output (simplified):

{
  "phases": [
    {
      "name": "Levain Build",
      "steps": [
        {
          "action": "Mix levain",
          "ingredients": [
            {"name": "starter", "quantity": 1, "unit": "tbsp"},
            {"name": "bread flour", "quantity": 100, "unit": "g"},
            {"name": "whole wheat flour", "quantity": 100, "unit": "g"},
            {"name": "water", "quantity": 200, "unit": "g"}
          ],
          "timing": {
            "type": "relative",
            "minDuration": 28800,
            "maxDuration": 43200,
            "condition": "domed and active, smelling sweet"
          },
          "classification": "passive",
          "confidence": "high"
        }
      ],
      "suggestedStart": "21:00",
      "suggestedStartDescription": "before bed"
    },
    {
      "name": "Dough Mixing",
      "steps": [
        {
          "action": "Mix flour, water, and levain (no salt)",
          "ingredients": [
            {"name": "bread flour", "quantity": 1000, "unit": "g"},
            {"name": "whole wheat flour", "quantity": 200, "unit": "g"},
            {"name": "water", "quantity": 780, "unit": "g", "temperature": 79},
            {"name": "levain", "quantity": 200, "unit": "g"}
          ],
          "timing": {"type": "relative", "minDuration": 180, "maxDuration": 300},
          "classification": "active",
          "confidence": "high"
        },
        {
          "action": "Autolyse",
          "timing": {
            "type": "relative",
            "minDuration": 1800,
            "maxDuration": 3600
          },
          "classification": "passive-flex",
          "confidence": "high"
        },
        {
          "action": "Add salt and incorporate",
          "ingredients": [{"name": "salt", "quantity": 22, "unit": "g"}],
          "timing": {"type": "relative", "minDuration": 120, "maxDuration": 180},
          "classification": "active",
          "confidence": "high"
        }
      ]
    },
    {
      "name": "Bulk Fermentation",
      "steps": [
        {
          "action": "Stretch and fold",
          "timing": {
            "type": "interval",
            "repeatCount": 4,
            "repeatInterval": 1800,
            "minDuration": 60,
            "maxDuration": 120
          },
          "classification": "active",
          "confidence": "high"
        },
        {
          "action": "Continue bulk fermentation (hands off)",
          "timing": {
            "type": "condition",
            "minDuration": 3600,
            "maxDuration": 10800,
            "condition": "50% volume increase, jiggly when shaken"
          },
          "classification": "passive",
          "confidence": "medium"
        }
      ]
    }
  ],
  "metadata": {
    "assumedEnvironmentTemp": 68,
    "authorNote": "kitchen runs cold",
    "hydration": 65
  }
}

The model correctly extracted:

Multi-phase structure spanning two days
Ingredient quantities with unit normalization
Water temperature from an embedded parenthetical
Duration ranges with condition-based endpoints
Interval-based repeated actions (4 stretch-and-folds at 30-minute intervals)
Classification of active vs passive steps
Implicit timing (overnight, morning)
Author's environmental context (cold kitchen)

This structured output feeds directly into schedule generation. Given a target completion time, we can work backward through the phases, respecting timing constraints and flexibility windows.

Prompt Engineering Lessons

Several non-obvious techniques improved our parsing accuracy:

Explicit schema definition: Providing a TypeScript interface or JSON schema in the prompt dramatically improves output consistency compared to free-form extraction requests.

Few-shot examples: Including 2-3 complete input/output examples covers edge cases that instructions alone cannot specify. We found examples from real messy recipes performed better than synthetic clean examples.

Chain of thought for ambiguity: For genuinely unclear passages, asking the model to "explain your reasoning before outputting the final JSON" catches errors that silent extraction misses.

Validation loops: Parsing the model's JSON output and sending validation errors back for correction handles malformed output gracefully. We use a maximum of two retry cycles before falling back to human review.

Error Handling in Production

Real recipes contain errors. Authors forget to specify salt quantities. They contradict themselves between different sections. They assume knowledge that novice bakers do not have.

Our pipeline distinguishes between:

Parse failures: Text that the model cannot interpret at all. These surface immediately as errors requiring human recipe input or correction.

Incomplete extractions: Missing data that has reasonable defaults. Salt omitted from ingredients but mentioned later? Assume 2% of flour weight. No temperature specified for water? Default to 78F.

Inconsistencies: Contradictory information in the source text. Total flour does not match sum of flour additions. Timing in summary differs from timing in steps. These flag for user attention but do not block schedule creation.

Implausible values: Extracted values that fall outside expected ranges. 18 hours of stretch-and-folds? 500g of salt? These trigger re-parsing attempts before surfacing as errors.

What Makes This Hard

After building this system, we have clarity on what makes recipe parsing a genuinely difficult problem:

Language variation: The same instruction can be expressed hundreds of ways. "Let it rise" and "allow to proof" and "bulk ferment" and "first rise" all mean the same thing in context.

Implicit knowledge: Recipes assume you know that "room temperature" means something different in a sourdough context than in a chocolate chip cookie context.

Author errors: Source data is often wrong or ambiguous in the original. No parser can extract correct information from incorrect input.

Context-dependence: The meaning of a phrase depends on what came before. "Fold the dough" means something different in laminated doughs than in standard sourdough.

Temporal reasoning: Understanding recipe sequences requires reasoning about time and dependencies in ways that simple extraction cannot capture.

What This Enables

Accurate recipe parsing is the foundation that makes Doughflow useful. When we can reliably convert any recipe text into structured timing data, we can:

Generate schedules that work backward from your target completion time
Adjust timing based on your actual kitchen temperature
Identify conflicts between recipe steps and your calendar
Send notifications at the right moments for each step
Compare different recipes' actual time demands

Try It Yourself

We built this parsing system so you do not have to think about it. Paste any sourdough recipe into Doughflow and let us handle the extraction.

Tell us when you want fresh bread on the table. We will calculate the schedule, accounting for your kitchen conditions and the recipe's actual requirements.

No more manual timeline calculations. No more missed stretch-and-folds because you lost track of time.

Create your free account and let the AI handle the parsing while you focus on the baking.

Written by

Doughflow Team

Tips, guides, and baking science from the Doughflow team. We help home bakers schedule their bakes without sacrificing sleep.

@doughflow

Continue Reading

March 5, 202510 min read

How to Feed Sourdough Starter: Complete Schedule Guide

Learn the best feeding schedule for your sourdough starter. Daily maintenance, fridge storage, and how to time feedings for your bake.

starter beginner

March 1, 20259 min read

The Science of Sourdough Timing: Why Most Recipe Instructions Fail

Understand why sourdough recipes never work exactly as written. The science of fermentation, temperature, and how to adapt any recipe to your kitchen.

baking-science fermentation

February 25, 202511 min read

How to Schedule Multiple Sourdough Recipes at Once

Baking two or three loaves? Learn how to coordinate timelines, share oven time, and manage multiple doughs without losing your mind.

scheduling advanced

The Architecture of Recipe Parsing

Handling Ambiguity

Doughflow Team

Continue Reading

Ready to bake with perfect timing?