dialograph

Paper addendum: knowledge tracing (KT) baselines and positioning

Use this note when revising the 2AI / camera-ready manuscript: what the code implements, what to write, and what not to overclaim.


What the repository implements (real_run3.py)

Baseline (config name) Controller Credible for reviewers as…
kt_heuristic_baseline SimpleKT Cheap heuristic; good as internal contrast only.
kt_bkt_baseline BKT Classical BKT update (observe + learn) with fixed p_{\mathrm{learn}}, p_{\mathrm{slip}}, p_{\mathrm{guess}} — not fitted to your logs or a public dataset.
kt_dkt_style_baseline DKTStyle DKT-inspired scalar dynamics without neural training or sequence data.

Formal equations: Formal mechanics §6.



Core positioning sentence (adapt for abstract / intro)

Knowledge tracing models focus on estimating latent mastery from interaction sequences. Dialograph targets pedagogical decision-making: a temporal graph of concepts plus explicit policy rules that choose tutorial actions, with an LLM handling surface realization. We compare against BKT and a DKT-style scalar baseline that also map belief to coarse actions (review / practice / advance) so that differences reflect policy and structure, not only whether a latent trace exists.


What to add in the paper (checklist)

  1. Methods (one subsection)
    • Define BKT observe + learn step (copy from formal mechanics or standard BKT text).
    • List fixed hyperparameters used in code (p_init, p_learn, p_slip, p_guess).
    • Define DKTStyle as an untrained latent with gated updates; clarify trained DKT.
  2. Experiments
    • Table or figure: Dialograph (full) vs no_policy vs LLM baseline vs BKT vs DKTStyle (and optional heuristic KT) on simulation metrics (e.g. premature advancement, concept stability, time-to-mastery, intervention effectiveness — see metrics doc).
    • State simulated learners and horizon (e.g. 50 turns).
  3. Limitations (honest)
    • BKT parameters are not fit to data.
    • DKTStyle is not SOTA deep KT.
    • No replacement for human studies or benchmark datasets unless you add them later.
  4. Language to avoid
    • “We implement DKT” / “we beat GKT” without qualification.
    • Prefer: “BKT baseline with fixed parameters,” “DKT-style untrained latent,” “graph KT cited for related work.”

Optional extensions (future work / appendix)