real_run3.py)This page describes how to run the tutoring simulation experiments at the repository root: batch conditions, learners, LLM calls via OpenRouter, and JSON outputs under simulation_logs/.
For what each metric means, see Metrics & baselines. For equations and turn logic, see Formal mechanics.
Python environment with project dependencies (from the repo root):
uv sync
The simulation uses langchain-openai (declared in pyproject.toml) to call OpenRouter’s OpenAI-compatible API.
OpenRouter API key — required for every LLM turn. Create a key at openrouter.ai and export it:
export OPENROUTER_API_KEY="sk-or-..."
Optional: model choice
export OPENROUTER_MODEL="openai/gpt-4o-mini"
Use any OpenRouter model id you have access to (e.g. anthropic/claude-3.5-haiku, meta-llama/llama-3.3-70b-instruct).
Optional OpenRouter headers (for their rankings dashboard):
export OPENROUTER_HTTP_REFERER="https://your-site.example"
export OPENROUTER_APP_TITLE="Dialograph experiments"
From the repository root (where real_run3.py lives):
uv run python real_run3.py
Or, with a shell that already has OPENROUTER_API_KEY set:
python real_run3.py
fragile, misconception, guesser (see FragileCorrectLearner, MisconceptionLearner, OverconfidentGuesser in real_run3.py). FragileCorrectLearner always answers correctly, so learning curves and error rate stay trivial for that archetype; use misconception or guesser when you need variability for plots and tables.full_dialograph, no_policy, no_temporal, single_node, llm_baseline, plus three KT controllers: kt_heuristic_baseline (SimpleKT), kt_bkt_baseline (BKT), kt_dkt_style_baseline (DKTStyle).DEFAULT_SIMULATION_TURNS (50 by default).Each combination calls the LLM once per turn, so full batch cost scales with
(3 \times 9 \times 50 = 1350) completions (plus your chosen model’s pricing).
For what to claim in the paper about KT vs Dialograph, see Paper: KT positioning & claims.
All files go to simulation_logs/ (created automatically):
| File pattern | Content |
|---|---|
{learner}_{condition}_log.json |
Per-turn trace (condition, action, policy, confidence, LLM instruction/response, …) |
{learner}_{condition}_metrics.json |
Result of compute_metrics(log) (curves, stability, premature advancement, …) |
Example: misconception_full_dialograph_metrics.json.
The script also prints a one-line summary per run to stdout.
MisconceptionLearner("name", rng_seed=...).real_run3.py today). For strict reproducibility, log model id, temperature (if exposed later), and timestamps.You can run a subset or change the horizon without editing the if __name__ == "__main__" block by importing from real_run3:
from real_run3 import (
MisconceptionLearner,
SimulationRunConfig,
run_simulation,
compute_metrics,
DialographAgentLLM,
)
learner = MisconceptionLearner("misconception", rng_seed=42)
cfg = SimulationRunConfig(
"full_dialograph",
graphs_on=True,
policy_mode="dialograph",
temporal_on=True,
navigation_on=True,
)
agent = DialographAgentLLM() # uses OPENROUTER_* env vars
log = run_simulation(learner, cfg, turns=30, agent=agent)
metrics = compute_metrics(log)
For offline tests, pass a mock agent with a next_action(self, instruction: str) -> str method instead of DialographAgentLLM().
SimulationRunConfig.name |
Role |
|---|---|
full_dialograph |
Full system |
no_policy |
Graph + temporal, naive advance/hint only |
no_temporal |
No retention / memory-strength updates |
single_node |
One node, no prerequisite navigation |
llm_baseline |
No graph; scripted tutor + same simulated learner |
kt_heuristic_baseline |
SimpleKT (heuristic scalar) |
kt_bkt_baseline |
BKT (Bayesian KT, fixed hyperparameters) |
kt_dkt_style_baseline |
DKTStyle (untrained latent; not full DKT) |
| Symptom | Likely cause |
|---|---|
OpenRouter API key missing |
OPENROUTER_API_KEY not set in the environment |
ModuleNotFoundError: langchain_openai |
Run uv sync from repo root |
| Very slow or rate limits | Reduce turns, fewer conditions, or a faster/cheaper model |
| Empty or sparse metrics for graph-off rows | Expected for fields that need confidence / learner_correct; LLM baseline still logs learner signals — see metrics doc |