Priyanshu Sah
Priyanshu Sah

Was reading through world model / model-based RL papers and wanted to actuall...

Was reading through world model / model-based RL papers and wanted to actuall...

Was reading through world model / model-based RL papers and wanted to actually get my hands dirty instead of just reading started small, using Pokémon Red as a quick sandbox, with an eye toward eventually working up to replicating something like DIAMOND (https://lnkd.in/g-_qGZ6a) properly. This first pass is nowhere close to that, and that's expected wanted to feel the actual failure modes firsthand before reading about how the bigger papers solve them.

What I tried: a VAE + recurrent dynamics model that imagines future game frames without running the emulator, plus a frozen PPO checkpoint steered toward targets without retraining (a rough System 1 / System 2 split). Also started a discrete RSSM at native resolution, a few epochs in.
What didn't work / what I learned: → Imagined rollouts degrade fast compounding error means predictions drift away from reality within a handful of steps, the core challenge this whole model class exists to solve. → Scheduled sampling (occasionally feeding the model its own predictions during training, instead of always ground truth) measurably reduced that drift the one result I'd call a real finding rather than a guess. → A few epochs on a discrete RSSM isn't enough to get clean reconstructions visual fidelity needs real training time I didn't put in yet.

No polished demo here, just the thinking behind a quick first pass and where it broke. Next: actually train a policy inside the imagined rollouts instead of just visiting them, give the RSSM proper training time, and keep working through the papers (PlaNet → Dreamer → DIAMOND) that this small experiment is helping me actually understand instead of just skim.

Repo: https://lnkd.in/gJB_zib4

#reinforcementlearning#worldmodels#machinelearning

Want to explore my full interactive portfolio?

Experience 3D environments, cinematic looping backgrounds, and my complete engineering journey.

Launch Interactive App 🚀