RL Pricing Simulator — AI Business Demo

🌍 Market Parameters

Base Demand 60

Units demanded at price = $0

Price Elasticity 0.25

Demand drop per $1 price increase

Market Noise 15%

Random demand fluctuation

Competition Intensity 0.06

How much competitor pricing shifts your demand

💲 Pricing

Initial Price $85

Unit Cost $45

Inventory Size 1,500

Max Steps 100

🛒 Market Model

Demand Model

Linear demand formula with noise

🎛️ Manual Price Test

Enter a specific price to compare against all strategies. Leave empty to skip.

Trains Simulated Annealing, then runs all 3 in competitive market

How do pricing algorithms learn to win?

Three lightweight approaches — no GPU, no neural nets — that still beat hand-crafted rules

🔍

Exploration vs. Exploitation

Every algorithm here faces the same core dilemma: try something new (explore) or stick with what works (exploit). Each solves it differently.

Simulated Annealing starts hot — accepts even bad moves to escape local optima — then cools down and converges on the best policy it found.

Thompson Sampling maintains a probability distribution over each price's expected reward. Uncertainty drives exploration naturally: untested prices have wide distributions and are sampled often; well-tested prices are exploited confidently.

Q-Learning starts fully random (ε = 100%) and gradually reduces exploration as its Q-table fills with experience.

📊

Three Algorithms, One Problem

Simulated Annealing (SA) searches for an optimal price table: one price per (inventory level × time remaining) cell. It perturbs one cell at a time and uses a temperature schedule to accept worse solutions early on. Pure numpy — trains in ~3 s on any CPU.

Thompson Sampling (TS) treats each price as a "bandit arm" per market context. After each step it updates a Gaussian posterior: mean reward and uncertainty per price. No gradients, no episodes — purely Bayesian updating.

Q-Learning (QL) stores Q(state, action) values in a lookup table and updates them via the Bellman equation after every step. An ensemble of 4 runs with averaged Q-tables reduces variance from random initialisation.

🏢

Business Applications

This exact framework applies to real business problems:

E-commerce: Dynamic pricing for flash sales & limited stock
Hotels & Airlines: Yield management — sell seats at the right price as the flight fills up
Advertising: Bid optimisation in real-time auctions
Supply Chain: Inventory replenishment and markdown decisions
Ride-sharing: Surge pricing based on real-time demand signals
SaaS: Optimal discount depth for conversion vs. lifetime value

All three algorithms here run on a single CPU core with no GPU — making them deployable in cost-constrained environments without sacrificing learning quality.