real_run3.py)This note answers reproducibility concerns: exact equations, pseudocode, and conflict resolution as implemented in the simulation script real_run3.py (not hand-wavy “Ebbinghaus-inspired” prose alone).
Math below uses standard LaTeX. Render with a Markdown engine that supports math (e.g. MkDocs + Arithmatex, Pandoc, or a LaTeX preprint).
For each semantic node (v), temporal state is
[ \mathbf{s}_v = \bigl(c_v,\, n_v,\, \tau^{\mathrm{last}}_v,\, S_v,\, R_v\bigr), ]
where:
| Symbol | Code field | Role |
|---|---|---|
| (c_v \in [0,1]) | confidence |
Reported learner confidence (epistemic / calibration proxy) |
| (n_v \in \mathbb{Z}_{\ge 0}) | activation_count |
Number of times (v) has been activated |
| (\tau^{\mathrm{last}}_v) | last_activated_turn |
Turn index of last activation before the current turn’s updates |
| (S_v > 0) | memory_strength |
Scale in the exponential decay (larger (\Rightarrow) slower forgetting in (R_v)) |
| (R_v \in (0,1]) | retention |
Decay-based accessibility before re-activation this turn |
Initialization when a node is added (add_node):
(c_v \leftarrow 0.4), (n_v \leftarrow 0), (\tau^{\mathrm{last}}_v \leftarrow 0), (S_v \leftarrow 1.5), (R_v \leftarrow 1.0).
At the start of a graph-on turn for active node (v) at global turn (\tau), if temporal updates are enabled (temporal_on):
[ \Delta t_v = \tau - \tau^{\mathrm{last}}_v, \qquad R_v \;\leftarrow\; \exp!\left(-\,\frac{\Delta t_v}{S_v}\right). ]
This is exponential decay in discrete turn lag (\Delta t_v), with time constant (S_v) in turns (not wall-clock seconds in this script). Same functional form as a continuous-time Ebbinghaus-style curve (e^{-t/S}), with (t) instantiated as (\Delta t_v).
Reference (code): Dialograph.update_retention uses dt = current_turn - state.last_activated_turn and retention = exp(-dt / memory_strength).
After the learner produces ((y, c^{\mathrm{new}}_v, \eta)) with correctness (y \in {\mathrm{true},\mathrm{false}}) and explanation tag (\eta \in {\texttt{shallow},\texttt{fluent},\texttt{confident-but-wrong},\ldots}), if temporal updates are enabled:
[ S_v \;\leftarrow\; \begin{cases} 0.9\,S_v, & y = \mathrm{false}, \[4pt] S_v + 0.3, & y = \mathrm{true} \;\wedge\; \eta = \texttt{shallow}, \[4pt] S_v + 0.6, & y = \mathrm{true} \;\wedge\; \eta = \texttt{fluent}, \[4pt] S_v, & \text{otherwise (correct but other } \eta \text{)}. \end{cases} ]
Then clamp:
[ S_v \;\leftarrow\; \mathrm{clip}(S_v,\, 0.5,\, 8.0). ]
Reference: update_memory_strength.
After the policy chooses an action and the LLM step runs, activate executes:
[ n_v \leftarrow n_v + 1,\qquad \tau^{\mathrm{last}}_v \leftarrow \tau,\qquad c_v \leftarrow c^{\mathrm{new}}_v\ \ (\text{if provided}). ]
Order of operations in one turn (graph-on, temporal_on):
(1) update (R_v) via §2.1 → (2) sample learner → (3) update (S_v) via §2.2 → (4) policy / action → (5) LLM → (6) activate §2.3.
When temporal_on is false, steps (1) and (3) are skipped; (R_v,S_v) stay at their previous values (up to activation still incrementing (n_v) and updating (c_v)).
Dialograph policies are a fixed ordered list of rules ({\mathcal{R}\rho}{\rho=1}^{K}). Each rule (\mathcal{R}\rho) is a predicate (P\rho(\mathbf{s}v, y, \eta)) mapped to an action (a\rho) and a label (\ell_\rho).
Conflict resolution: first-match wins (deterministic, no continuous blending). Define one-hot weights
[
\omega_\rho =
\begin{cases}
1, & \rho = \min{\rho’ : P_{\rho’} = \text{true}},
0, & \text{otherwise}.
\end{cases}
]
Then
[ a^\star = \sum_{\rho=1}^{K} \omega_\rho\, a_\rho, \qquad \ell^\star = \sum_{\rho=1}^{K} \omega_\rho\, \ell_\rho \quad (\text{only one term nonzero}). ]
If no predicate holds before the fallback, the last rule always fires (default advance). This is the precise sense in which (\rho) is priority and (\omega) is the deterministic winner indicator—not a learned mixture.
policy_decision order)Let (c=c_v), (n=n_v), (R=R_v), (S=S_v). Below, rules are listed in ascending (\rho) (evaluated top-to-bottom in code).
| (\rho) | Predicate (P_\rho) | Action (a_\rho) | Label (\ell_\rho) |
|---|---|---|---|
| 1 | (c < 0.6 \land y) | ask_why |
Fragile Knowledge |
| 2 | (\neg y \land c > 0.7) | challenge |
Overconfident Error |
| 3 | (\neg y \land n > 2) | restructure |
Persistent Misconception |
| 4 | (y \land \eta \in {\texttt{fluent},\texttt{shallow}}) | elicit_explanation |
Guessing / Low Engagement |
| 5 | (\eta = \texttt{shallow}) | give_hint |
Illusion of Mastery |
| 6 | (R < 0.3) | review |
Forgetting / Spaced Reactivation |
| 7 | (\neg y) | give_hint |
Productive Struggle |
| 8 | (y \land c \ge 0.7 \land S > 2.0) | advance |
Mastery-Based Advancement |
| 9 | (c < 0.4) | simplify |
Cognitive Overload |
| 10 | (y \land S > 3.0) | transfer |
Transfer Readiness |
| 11 | (fallback, always true) | advance |
Default Policy |
Remarks for reproducibility
policy_mode = none):There is no separate retrieval model that ranks arbitrary nodes by embedding similarity. Navigation after advance is purely graph topology:
[ \mathrm{Succ}(v) = \bigl[ u : (v \to u) \in E \bigr] ]
ordered as edges appear in graph.semantic_edges. The next node is
[
v^+ =
\begin{cases}
\mathrm{Succ}(v)[0], & \mathrm{Succ}(v) \neq \emptyset \;\wedge\; \text{navigation enabled},
v, & \text{otherwise}.
\end{cases}
]
So “retrieval scoring” in the reviewer sense is degenerate: first outgoing prerequisite edge wins (deterministic). If you need a scored retrieval extension for a paper, define a separate function (f(u \mid v, \mathbf{s})) and replace (v^+) by (\arg\max_{u \in \mathrm{Succ}(v)} f(u \mid v,\mathbf{s}))—that is not present in real_run3.py today.
Input: graph G, active node v, turn τ, learner L, temporal_on, policy_mode, navigation_on
State s ← G.temporal_nodes[v]
if temporal_on:
Δt ← τ - s.last_activated_turn
s.retention ← exp(-Δt / s.memory_strength)
(y, c_new, η) ← L.respond(semantic_node(v), s)
if temporal_on:
update_memory_strength(s, y, η) // §2.2
if policy_mode == dialograph:
(action, label) ← first_matching_rule(s, y, η) // §3.2
else if policy_mode == none:
(action, label) ← (advance if y else give_hint, None)
else:
KT.update(y); action ← KT.decide(); label ← KT policy tag (e.g. KT_BKT)
instruction ← UPPER(action) + ": " + content(v)
response ← LLM(instruction)
activate(G, v, c_new, τ) // §2.3
if action == advance and navigation_on:
Succ ← outgoing_targets(G, v)
v_next ← Succ[0] if Succ non-empty else v
else:
v_next ← v
return (action, v_next)
All three share the same interface: update(y) after observing correctness (y), then decide() maps a scalar belief to review / practice / advance. The log field kt_mastery stores the belief after each turn.
SimpleKT, policy_mode = kt)Scalar (m \in (0,1)):
[
m \leftarrow
\begin{cases}
m + 0.1\,(1-m), & y,
m - 0.1\,m, & \neg y,
\end{cases}
\qquad
m \leftarrow \mathrm{clip}(m,\,0.01,\,0.99).
]
Thresholds: (\texttt{review}) if (m<0.5), (\texttt{practice}) if (0.5 \le m < 0.8), else (\texttt{advance}).
BKT, policy_mode = kt_bkt)One skill; (P(\text{know}) = p \in (0,1)). Fixed parameters (p_{\mathrm{learn}}, p_{\mathrm{slip}}, p_{\mathrm{guess}}) (defaults (0.2, 0.1, 0.2); initial (p \leftarrow 0.3)).
Bayesian observe given response (y \in {\text{correct},\text{incorrect}}):
[ p \leftarrow \frac{P(\text{know})\,P(y \mid \text{know})}{P(y)}, \quad P(y) = P(\text{know})P(y\mid\text{know}) + P(\neg\text{know})P(y\mid\neg\text{know}), ]
with (P(\text{correct}\mid\text{know}) = 1-p_{\mathrm{slip}}), (P(\text{correct}\mid\neg\text{know}) = p_{\mathrm{guess}}), (P(\text{incorrect}\mid\text{know}) = p_{\mathrm{slip}}), (P(\text{incorrect}\mid\neg\text{know}) = 1-p_{\mathrm{guess}}).
Learning transition (same step as code):
[ p \leftarrow p + (1-p)\, p_{\mathrm{learn}}, \qquad p \leftarrow \mathrm{clip}(p,\,0.01,\,0.99). ]
Thresholds: (\texttt{review}) if (p<0.5), (\texttt{practice}) if (0.5 \le p < 0.8), else (\texttt{advance}).
Paper note: parameters are not fit to data in this repo; state that explicitly or report a sensitivity table.
DKTStyle, policy_mode = kt_dkt_style)Untrained latent (h \in [0,1]) (stands in for a hidden state without LSTM training):
[
h \leftarrow
\begin{cases}
h + 0.1\,(1-h), & y,
h - 0.1\,h, & \neg y,
\end{cases}
\qquad
h \leftarrow \mathrm{clip}(h,\,0,\,1).
]
Thresholds: (\texttt{review}) if (h<0.4), (\texttt{practice}) if (0.4 \le h < 0.7), else (\texttt{advance}).
You can point reviewers here and state explicitly:
This separates what the code does from optional future extensions (e.g. weighted policy mixtures or learned retrieval).