Mathematical models for lake size distributions derived from first principles in Mathematical Geoenergy (GEM), Chapter 15 (Latent Energy: Hydrological Cycle).
The lake size distribution is an archetypal example of a dispersive aggregation problem driven by the Principle of Maximum Entropy (MaxEnt). The derivation in GEM Ch. 15 starts from two elementary observations:
Combining these two facts by integrating over all possible depths yields the Lomax (Pareto Type II) distribution for lake size with shape parameter $\kappa = 1$. This matches rank-order data for Northern Quebec lakes, Amazon lakes, and global lake inventories over several orders of magnitude with only a single free parameter: the median lake area $A_m$.
Eq. 15-1 — Lake size CDF (generic form)
\[P(\text{Size}) = \frac{1}{1 + \dfrac{a}{\text{Size}}}\]General form of the CDF; the scale parameter $a$ equals the median.
Eq. 15-2 / 15-11 — Lake size CDF (median form)
\[P(\text{Size}\\,|\\,\text{Median}) = \frac{1}{1 + \dfrac{\text{Median}}{\text{Size}}} = \frac{\text{Size}}{\text{Size} + \text{Median}}\]The single parameter $\text{Median} = A_m = k H$ (collection efficiency times mean basin depth) fully determines the distribution.
PDF (derived by differentiation):
\[p(A) = \frac{A_m}{(A + A_m)^2}\]Complementary CDF (fraction of lakes with area $> A$):
\[\bar{P}(A) = \frac{A_m}{A + A_m} \to \frac{A_m}{A} \quad (A \gg A_m)\]The $1/A$ power-law tail at large sizes is the signature seen in ranked lake-size plots (GEM Figs. 15-1, 15-2).
Eq. 15-4 — MaxEnt dispersal-rate distribution
\[p(r \mid g) = \frac{1}{g}\cdot e^{-r/g}, \quad r \geq 0\]Applying MaxEnt with only a known mean rate $g$ yields an exponential distribution for the water-inflow rate $r$.
Eq. 15-5 — Distance PDF after change of variables $x = r t$
\[p(x \mid g, t) = \frac{1}{g t}\cdot e^{-x/(gt)}\]At fixed time $t$, the diffusion distance $x = r t$ inherits the same exponential shape, rescaled by $t$.
Eq. 15-6 / 15-7 — Exceedance probability
\[P(x > x_0 \mid g, t) = \int_{x_0}^{\infty} p(x \mid g, t)\\,dx = e^{-x_0/(gt)}\]Alternative derivation (Eq. 15-7) via integrating the rate PDF:
\[P(x_0 \mid g) = \int_{r = x_0/t}^{\infty} p(r)\\,dr = e^{-x_0/(gt)}\]Both routes give the same result.
Eq. 15-8 — Basin-depth / height-capacity distribution
\[f(x \mid H) = \frac{1}{H}\cdot e^{-x/H}\]The depth $x$ at which water can accumulate also follows MaxEnt with mean $H$.
Eq. 15-9 — Integration over basin depth → CDF
\[C(t \mid H) = \int_0^{\infty} f(x \mid H) \cdot e^{-x/(gt)}\\,dx = \frac{1}{1 + H/(g t)}\]The product of two exponentials integrates analytically to a rational function — the Lomax CDF.
Eq. 15-10 — CDF in terms of collected volume $W = g t k$
\[C(W \mid H) = \frac{1}{1 + k H / W}\]Setting $W = A$ (lake area) and $A_m = k H$ recovers Eq. 15-2.
| File | Purpose |
|---|---|
lake_symbolic.py |
Symbolic derivation of all equations using SymPy |
lake_numerical.py |
Numerical implementation, validation, and composite figure |
lake_model_output.png |
Output figure (6 panels) generated by lake_numerical.py |
Install dependencies (from models/requirements.txt):
pip install -r ../requirements.txt
Run symbolic derivation (all assertions print ✓):
python lake_symbolic.py
Run numerical model and generate figure:
MPLBACKEND=Agg python lake_numerical.py
Single-parameter model: the Lomax distribution has only one free parameter ($A_m$, the median lake area), which is set by observed data. There is no curve-fitting beyond this one number.
Two MaxEnt priors combine to give a Lomax: an exponential rate distribution (Eq. 15-4) convolved with an exponential depth distribution (Eq. 15-8) integrates to a rational CDF (Eq. 15-9). This contrasts with the wind super-statistics (GEM Ch. 11, Eq. 11-15) where two exponentials combine to give a Bessel-$K$ distribution.
Power-law tail with exponent 1: for large areas $\bar{P}(A) \approx A_m/A$, giving the observed “reciprocal power law” in rank plots (GEM Ch. 15: “exponent usually appearing arbitrarily close to one”).
Universal character: the same Lomax CDF fits Northern Quebec lakes, Amazon lakes, and global lake inventories. All that is required is a median value for lake area (GEM Ch. 15).
No finite mean: for Lomax with $\kappa = 1$, $E[A]$ diverges, consistent with the observed heavy tails (a few very large lakes like the Great Lakes dominate the aggregate area).
Clouds and oil reservoirs share the same model: the identical mathematical structure governs cloud sizes (Eq. 15-3) and underground oil-reservoir volumes (GEM Ch. 14), demonstrating the universal character of the entropic dispersive-aggregation argument.