Request entry and the agent chain

Every agent system has an unglamorous job before it gets to be clever: turn user input into a run. A prompt from the frontend needs an identity, a lifecycle, runtime context, and an event stream before the model says anything. This stop follows the DeerFlow Gateway as it accepts that prompt, records it as a run, installs context, and streams execution events back as they happen.

Keep one idea in view: DeerFlow has to be a runtime before it can be an agent. The Gateway is that runtime host. It owns intake, run status, cancellation, context injection, streaming, and completion. The actual reasoning starts later.

flowchart TD
U["Frontend / IM / LangGraph SDK"] --> EP["POST /threads/{id}/runs/stream"]
EP --> SR["stream_run() · start_run()"]
SR --> RM["RunManager.create_or_reject()"]
RM --> CFG["build_run_config() · compat shim"]
CFG --> TASK["create_task(run_agent)"]
TASK -. background .-> RA["run_agent()"]
RA --> RT["Runtime(context=...)"]
RT --> MK["make_lead_agent(config)"]
MK --> ASM["agent.astream(...)"]
ASM --> SB["StreamBridge"]
SR --> SSE["sse_consumer() · SSE events"]
SB --> SSE

One prompt in: how it becomes a background run with a live stream

Three runtime names, three different jobs

The easiest trap here is collapsing these names into one idea:

LangGraph         = the execution kernel agents use (how a graph runs)
LangGraph Server  = the optional official HTTP runtime
DeerFlow Gateway  = the current primary HTTP runtime, LangGraph-API compatible

backend/langgraph.json still registers lead_agent, which keeps the official LangGraph Server, Studio, and CLI path working. But on the main path, the Gateway imports and calls the graph factory directly. There is no official server in between. The usual product path is DeerFlow’s own Gateway runtime, not LangGraph Server.

The four parts that land a request

Open thread_runs.py and services.py and you’ll keep meeting four roles. Don’t memorize the fields — remember which question each one answers.

RunCreateRequest — the shape of a run request. It carries input, config, context, metadata, stream_mode, plus control bits like interrupt_before / interrupt_after, multitask_strategy, and on_disconnect. It is the caller’s declaration of “how this run should run.”

RunManager — the run registry. create_or_reject() decides whether to create this run at all (for example, the same thread is already running, so the request may be rejected according to multitask_strategy), then handles set_status, cancel, and run metadata persistence.

StreamBridge — a decoupling point. The background graph produces events; the HTTP side consumes them; the two do not call each other directly. run_agent() publishes into the bridge, and sse_consumer() reads from it and formats SSE events. If the frontend disconnects, the backend does not collapse with it.

run_agent() — the background worker that actually does the work ( worker.py ). Note the boundary: it is not the agent. It is the worker that drives graph execution.

`configurable`, or `context`?

build_run_config() does a translation job: it turns the fields in the HTTP request into the RunnableConfig shape LangGraph understands. And here sits the one piece of compatibility debt most worth remembering:

config["configurable"]   the older runtime-options channel
config["context"]        the newer LangGraph runtime-context channel

In compatibility mode, DeerFlow writes key fields into both channels. Whitelisted fields like model_name, mode, thinking_enabled, reasoning_effort, is_plan_mode, subagent_enabled, agent_name, and is_bootstrap get copied across. Why the apparent waste? Older DeerFlow code reads from configurable; newer ToolRuntime.context consumers read from context. Write only one side and a whole class of consumers sees nothing.

Then run_agent() builds a LangGraph Runtime, packs run-scoped data (thread_id / run_id / user_id / app_config) into runtime.context, and tucks it back into config["configurable"]["__pregel_runtime"] so downstream middleware and tools can all reach it through LangGraph’s runtime API.

Design rationale

Why does the Gateway reimplement a LangGraph-compatible API instead of just using the official LangGraph Server?

Because that’s how DeerFlow gets to own the application-level concerns: auth, thread/run metadata, the run event store, the SSE bridge, rollback, frontend compatibility, hot-reload boundaries, its own persistence. None of these are “how the graph runs” — they’re “how a product serves the graph.”

The cost is a layer of compatibility glue: request fields must be translated into LangGraph-flavored config, stream modes mapped, runtime context installed by hand. The lesson rhymes with the next stop — funnel the compatibility awkwardness into the entry layer, and let the graph execution below stay clean. Kernel runs; host absorbs the compatibility.

Own the host · Gateway

owns auth / metadata / rollback / persistence
one intake for frontend, IM, and SDK hosts
full control over a run’s lifecycle

Cost · compat glue

fields must be translated into LangGraph config
configurable / context dual-write, historical debt
run_agent() carries too much in one function

Where it will trip you up

configurable and context are historical compatibility debt. A field written to only one side may read empty for old code or for newer ToolRuntime.context consumers. Adding a field? First decide which whitelist it belongs to and whether it needs the dual-write.
run_agent() does too much in one place. It mixes graph invocation, persistence, stream publishing, rollback, tracing, and cleanup in a single path — powerful, but heavy. Know which part you’re touching before you touch it.
It runs as a background task. Disconnect, cancellation, rollback, and final checkpoint serialization all need care, or you may return “half-finished state” as success.
It doesn’t mutate ThreadState directly. Run records, thread metadata, and stream events are written by the Gateway; the agent’s graph state is written by graph execution. When state changes, first identify which layer wrote it.

The entry layer turns user input into a running graph, streams it back, and then gets out of the way. It has caught the request, filed the run, and installed context. Next, the run reaches make_lead_agent(), where the request stops being intake data and becomes a runnable graph.