The Math → Theoretical AI Journal

Status: Inception
Objective: Mathematical maturity and provable AI

The goal is not to compute, but to see. To move from the “how” of calculation to the “why” of existence and convergence. If I cannot prove why an algorithm generalizes, I do not understand it.

Foundations

The Language of Truth: Real Analysis I
Course: MIT 18.100C (Real Analysis)
Focus: Epsilon-delta reasoning and writing full proofs.
Milestone: Write a clear proof of the Bolzano–Weierstrass theorem in my notebook.
Note to self: Limits and convergence show up everywhere. If I do not understand this, I do not understand learning dynamics.
The Skeleton of Data: Linear Algebra
Course: MIT 18.06 (Strang) and MIT 18.700 (rigorous linear algebra)
Focus: Vector spaces, linear maps, eigenstructure, SVD.
Milestone: Explain SVD geometrically and analyze a simple PCA implementation.
Note to self: Most model operations are linear maps in high dimensions. I need to see them that way.
The Logic of Machines: Discrete Mathematics
Course: MIT 6.042J (Mathematics for Computer Science)
Focus: Induction, graph theory, combinatorics, discrete probability.
Milestone: Write full solutions to induction and graph-based proof problems.
Note to self: This is the language of algorithms. Fluency is non-negotiable.

Math 55 core

The Geometry of Symmetry: Abstract Algebra
Course: MIT 18.701 (Algebra I)
Focus: Groups, rings, fields, homomorphisms.
Milestone: Rewrite key proofs in my own words without notes.
Reflection: I want to understand how symmetry and equivariance show up in model design.
The Topology of Reality: Real Analysis II
Course: MIT 18.100B / 18.101
Focus: Metric spaces, compactness, uniform approximation.
Milestone: Prove a version of the Stone–Weierstrass theorem and connect it to neural network approximation.

Engine of uncertainty

The Measure of Doubt: Probability Theory
Course: MIT 18.440 (Probability) / MIT 18.175 (Theory of Probability)
Focus: Measure-based probability, CLT, martingales, concentration.
Milestone: Re-derive a concentration inequality and apply it to a learning bound.
Note to self: Generalization is a probability statement. Precision matters.
The Search for Minimums: Optimization
Course: Convex optimization material from MIT OCW
Focus: Convex analysis, Lagrangians, KKT conditions, gradient dynamics.
Milestone: Derive the dual of an SVM and observe convergence in a simple gradient experiment.

Theoretical frontier

Statistical Learning Theory
Course: MIT 18.657 (Mathematics of Machine Learning)
Focus: PAC learning, kernels, regularization, double descent.
Milestone: Write a short note on double descent: assumptions, results, and limitations.
Provable Algorithms
Course: MIT 18.409 (Algorithmic Aspects of Machine Learning)
Focus: Tensor methods, spectral algorithms, provable approaches to nonconvex problems.
Milestone: Reproduce and document one provable algorithm from the course notes.

Operating rules

The proof journal
I keep a physical notebook. A theorem is not done until I can reproduce its proof from memory.
No lurking
Watching lectures is not progress. Solving problem sets is progress.
Implementation check
I implement small pieces of theory in code to see how the math behaves.