🧠
AI Pricing Simulator
🎯 Competitive market: RL vs APO vs Fixed-price — 3 players, shared demand, 1 winner
Educational Demo
💰 Best Revenue (RL)
Run simulation to see results
📦 Units Sold
out of 1,500
💹 RL vs Fixed
revenue uplift
⏱️ Sold Out In
market rounds
How do pricing algorithms learn to win?
Three lightweight approaches — no GPU, no neural nets — that still beat hand-crafted rules
🔍
Exploration vs. Exploitation
Every algorithm here faces the same core dilemma: try something new (explore) or stick with what works (exploit). Each solves it differently.
Simulated Annealing starts hot — accepts even bad moves to escape local optima — then cools down and converges on the best policy it found.
Thompson Sampling maintains a probability distribution over each price's expected reward. Uncertainty drives exploration naturally: untested prices have wide distributions and are sampled often; well-tested prices are exploited confidently.
Q-Learning starts fully random (ε = 100%) and gradually reduces exploration as its Q-table fills with experience.
📊
Three Algorithms, One Problem
Simulated Annealing (SA) searches for an optimal price table: one price per (inventory level × time remaining) cell. It perturbs one cell at a time and uses a temperature schedule to accept worse solutions early on. Pure numpy — trains in ~3 s on any CPU.
Thompson Sampling (TS) treats each price as a "bandit arm" per market context. After each step it updates a Gaussian posterior: mean reward and uncertainty per price. No gradients, no episodes — purely Bayesian updating.
Q-Learning (QL) stores Q(state, action) values in a lookup table and updates them via the Bellman equation after every step. An ensemble of 4 runs with averaged Q-tables reduces variance from random initialisation.
🏢
Business Applications
This exact framework applies to real business problems:
  • E-commerce: Dynamic pricing for flash sales & limited stock
  • Hotels & Airlines: Yield management — sell seats at the right price as the flight fills up
  • Advertising: Bid optimisation in real-time auctions
  • Supply Chain: Inventory replenishment and markdown decisions
  • Ride-sharing: Surge pricing based on real-time demand signals
  • SaaS: Optimal discount depth for conversion vs. lifetime value
All three algorithms here run on a single CPU core with no GPU — making them deployable in cost-constrained environments without sacrificing learning quality.