gem

Lake Size Distribution — GEM Chapter 15

Mathematical models for lake size distributions derived from first principles in Mathematical Geoenergy (GEM), Chapter 15 (Latent Energy: Hydrological Cycle).

Overview

The lake size distribution is an archetypal example of a dispersive aggregation problem driven by the Principle of Maximum Entropy (MaxEnt). The derivation in GEM Ch. 15 starts from two elementary observations:

Water accumulates in basins at a stochastic dispersal rate — the inflow rate $r$ is uncertain, and MaxEnt constrains it to an exponential distribution when only the mean rate is known.
Basin depth / capacity is also exponentially distributed — applying MaxEnt to the height $H$ at which water can accumulate yields a second exponential.

Combining these two facts by integrating over all possible depths yields the Lomax (Pareto Type II) distribution for lake size with shape parameter $\kappa = 1$. This matches rank-order data for Northern Quebec lakes, Amazon lakes, and global lake inventories over several orders of magnitude with only a single free parameter: the median lake area $A_m$.

Equations

Part 1 — Lake Size Distribution

Eq. 15-1 — Lake size CDF (generic form)

\[P(\text{Size}) = \frac{1}{1 + \dfrac{a}{\text{Size}}}\]

General form of the CDF; the scale parameter $a$ equals the median.

Eq. 15-2 / 15-11 — Lake size CDF (median form)

\[P(\text{Size}\\,|\\,\text{Median}) = \frac{1}{1 + \dfrac{\text{Median}}{\text{Size}}} = \frac{\text{Size}}{\text{Size} + \text{Median}}\]

The single parameter $\text{Median} = A_m = k H$ (collection efficiency times mean basin depth) fully determines the distribution.

PDF (derived by differentiation):

\[p(A) = \frac{A_m}{(A + A_m)^2}\]

Complementary CDF (fraction of lakes with area $> A$):

\[\bar{P}(A) = \frac{A_m}{A + A_m} \to \frac{A_m}{A} \quad (A \gg A_m)\]

The $1/A$ power-law tail at large sizes is the signature seen in ranked lake-size plots (GEM Figs. 15-1, 15-2).

Part 2 — Derivation Steps

Eq. 15-4 — MaxEnt dispersal-rate distribution

\[p(r \mid g) = \frac{1}{g}\cdot e^{-r/g}, \quad r \geq 0\]

Applying MaxEnt with only a known mean rate $g$ yields an exponential distribution for the water-inflow rate $r$.

Eq. 15-5 — Distance PDF after change of variables $x = r t$

\[p(x \mid g, t) = \frac{1}{g t}\cdot e^{-x/(gt)}\]

At fixed time $t$, the diffusion distance $x = r t$ inherits the same exponential shape, rescaled by $t$.

Eq. 15-6 / 15-7 — Exceedance probability

\[P(x > x_0 \mid g, t) = \int_{x_0}^{\infty} p(x \mid g, t)\\,dx = e^{-x_0/(gt)}\]

Alternative derivation (Eq. 15-7) via integrating the rate PDF:

\[P(x_0 \mid g) = \int_{r = x_0/t}^{\infty} p(r)\\,dr = e^{-x_0/(gt)}\]

Both routes give the same result.

Eq. 15-8 — Basin-depth / height-capacity distribution

\[f(x \mid H) = \frac{1}{H}\cdot e^{-x/H}\]

The depth $x$ at which water can accumulate also follows MaxEnt with mean $H$.

Eq. 15-9 — Integration over basin depth → CDF

\[C(t \mid H) = \int_0^{\infty} f(x \mid H) \cdot e^{-x/(gt)}\\,dx = \frac{1}{1 + H/(g t)}\]

The product of two exponentials integrates analytically to a rational function — the Lomax CDF.

Eq. 15-10 — CDF in terms of collected volume $W = g t k$

\[C(W \mid H) = \frac{1}{1 + k H / W}\]

Setting $W = A$ (lake area) and $A_m = k H$ recovers Eq. 15-2.

Repository Files

File	Purpose
`lake_symbolic.py`	Symbolic derivation of all equations using SymPy
`lake_numerical.py`	Numerical implementation, validation, and composite figure
`lake_model_output.png`	Output figure (6 panels) generated by `lake_numerical.py`

Usage

Install dependencies (from models/requirements.txt):

pip install -r ../requirements.txt

Run symbolic derivation (all assertions print ✓):

python lake_symbolic.py

Run numerical model and generate figure:

MPLBACKEND=Agg python lake_numerical.py

Key Physical Insights

Single-parameter model: the Lomax distribution has only one free parameter ($A_m$, the median lake area), which is set by observed data. There is no curve-fitting beyond this one number.
Two MaxEnt priors combine to give a Lomax: an exponential rate distribution (Eq. 15-4) convolved with an exponential depth distribution (Eq. 15-8) integrates to a rational CDF (Eq. 15-9). This contrasts with the wind super-statistics (GEM Ch. 11, Eq. 11-15) where two exponentials combine to give a Bessel-$K$ distribution.
Power-law tail with exponent 1: for large areas $\bar{P}(A) \approx A_m/A$, giving the observed “reciprocal power law” in rank plots (GEM Ch. 15: “exponent usually appearing arbitrarily close to one”).
Universal character: the same Lomax CDF fits Northern Quebec lakes, Amazon lakes, and global lake inventories. All that is required is a median value for lake area (GEM Ch. 15).
No finite mean: for Lomax with $\kappa = 1$, $E[A]$ diverges, consistent with the observed heavy tails (a few very large lakes like the Great Lakes dominate the aggregate area).
Clouds and oil reservoirs share the same model: the identical mathematical structure governs cloud sizes (Eq. 15-3) and underground oil-reservoir volumes (GEM Ch. 14), demonstrating the universal character of the entropic dispersive-aggregation argument.

References

Jaynes, E. T. (1957). Information Theory and Statistical Mechanics. Physical Review, 106(4).
Pukite, P., Challou, D., & Coyne, C. (2019). Mathematical Geoenergy: Discovery, Depletion, and Renewal. Wiley/IEEE Press. ISBN 978-1-119-43429-0 (Chapter 15).
Downing, J. A. et al. (2006). The global abundance and size distribution of lakes, ponds, and impoundments. Limnology and Oceanography, 2388–2397.
Hamilton, S. K., Melack, J. M., Goodchild, M. F., & Lewis, W. (1992). Estimation of the fractal dimension of terrain from lake size distributions. Lowland Floodplain Rivers, Wiley, 145–163.
Cael, B. B. & Seekell, D. A. (2016). The size-distribution of Earth’s lakes. Scientific Reports, 6, 29633.