112. A Linear Model of Unemployment#
GPU
This lecture was built using a machine with access to a GPU — although it will also run without one.
Google Colab has a free tier with GPUs that you can access as follows:
Click on the “play” icon top right
Select Colab
Set the runtime environment to include a GPU
In addition to what’s in Anaconda, this lecture needs the following libraries.
We first install numpyro and jax:
!pip install numpyro jax
We also install pandas_datareader, which we use to download data from FRED:
!pip install pandas_datareader
112.1. Overview#
This lecture applies Bayesian estimation to a linear AR(1) model of the US unemployment rate.
We met the machinery in Posterior Distributions for AR(1) Parameters, but there the data were simulated and the focus was theoretical.
Here we work carefully through real data, organizing the lecture around a question that divided macroeconomists in the 1980s and 1990s: is unemployment a random walk?
In the process, we will observe an asymmetry the model cannot capture.
This will motivate a sequel lecture, titled Asymmetry and Large Shocks in Unemployment.
As in Posterior Distributions for AR(1) Parameters and Non-Conjugate Priors, we estimate by sampling posteriors with the NUTS sampler in NumPyro.
(See the introduction to NUTS in Non-Conjugate Priors for a brief account of how it works.)
Let’s start with some imports.
import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
from pandas_datareader import data as web
import jax.numpy as jnp
from jax import random
import numpyro
import numpyro.distributions as dist
from numpyro.infer import MCMC, NUTS
112.2. The natural rate versus hysteresis#
In the early 1980s, Nelson and Plosser [1982] launched the unit-root literature.
They examined 14 macroeconomic time series and found that they could reject the random walk for only one of them.
Roughly speaking, this meant that, for most macroeconomic series, the effects of shocks looked permanent rather than transitory.
For unemployment, that view found expression in the hysteresis hypothesis of [Blanchard and Summers, 1986].
This hypothesis states that a shock to unemployment can be more or less permanent, because spells of joblessness erode skills and attachment to the labor force.
In contrast, the natural rate hypothesis of [Friedman, 1968] holds that unemployment fluctuates around a stable equilibrium rate — and hence shocks are transitory rather than permanent.
The debate matters because the two views imply different policies.
In particular, if shocks are permanent, a deep recession leaves a lasting scar, encouraging remedial action.
Here we reexamine the question with Bayesian estimation.
Note
There is an irony in the history described above.
While Nelson and Plosser [1982] fueled the unit root debate, and hence the hysteresis hypothesis, the one macroeconomic series they rejected the unit root for was the unemployment rate.
112.3. The data#
We use the US civilian unemployment rate, series UNRATE from FRED, monthly and seasonally adjusted.
start, end = dt.datetime(1948, 1, 1), dt.datetime(2024, 12, 31)
unrate = web.DataReader("UNRATE", "fred", start, end)["UNRATE"]
The COVID-19 spike of 2020 is an extreme outlier driven by events our models know nothing about, so we drop it.
pre_covid = unrate[unrate.index < "2020-01-01"]
u_monthly = pre_covid.to_numpy()
print(f"{len(u_monthly)} monthly observations")
864 monthly observations
Here is the monthly series.
fig, ax = plt.subplots()
ax.plot(pre_covid.index, u_monthly, lw=2)
ax.set_xlabel('year')
ax.set_ylabel('unemployment rate (%)')
plt.show()
Fig. 112.1 US monthly unemployment rate, 1948–2019#
As Fig. 112.1 shows, the rate rises sharply in recessions and drifts down in recoveries, but it always stays inside a band — roughly 3% to 11% over the whole post-war period.
(Keep that band in mind, since it connects back to the unit root debate, as we discuss below.)
112.4. A linear model of unemployment#
Let’s now set up and estimate our model, using monthly data.
112.4.1. The model#
We let unemployment be pulled back toward a normal level \(\bar u\):
with \(0 \le \phi < 1\).
This is a linear AR(1) model, written so that \(\bar u\) is the level the series reverts to and \(\phi\) measures persistence.
The closer \(\phi\) is to one, the more slowly the series returns to \(\bar u\) after a shock; the random walk is the limiting case \(\phi = 1\).
112.4.2. Priors#
We treat \(\bar u\), \(\phi\) and \(\sigma\) as unknown and place weakly informative priors on them.
We give \(\phi\) a uniform prior on \([0, 1)\).
The upper endpoint is excluded deliberately, since doing so confines us to the stationary region.
This is necessary, as we’ll see below.
At the same time, the prior still lets \(\phi\) approach one as closely as the data demand.
We center \(\bar u\) on a plausible natural rate with a fairly wide normal prior, and give the shock scale \(\sigma\) a half-normal prior.
We write the model as a NumPyro function: each numpyro.sample introduces a random variable, and the keyword obs= ties the last one to the data, supplying the likelihood.
def linear_model(u):
ubar = numpyro.sample("ubar", dist.Normal(5.5, 2.0)) # Natural rate prior
φ = numpyro.sample("phi", dist.Uniform(0.0, 1.0)) # Persistence prior
σ = numpyro.sample("sigma", dist.HalfNormal(1.0)) # Volatility prior
μ = ubar + φ * (u[:-1] - ubar)
numpyro.sample("u_obs", dist.Normal(μ, σ), obs=u[1:])
The vector μ holds the conditional means \(\bar u + \phi(u_t - \bar u)\), and obs=u[1:] says that each next value is drawn from \(N(\mu_t, \sigma^2)\).
This one statement encodes the whole likelihood. (See Non-Conjugate Priors for more on writing NumPyro models.)
112.4.3. Estimation#
We sample the posterior with NUTS, running four chains so that we can check convergence.
We use chain_method="vectorized", which evaluates all chains together on a single device, so the same code runs unchanged on a CPU or a GPU.
def run_nuts(model, data, seed=0, num_warmup=1000, num_samples=2000, num_chains=4):
"Sample a NumPyro model with the NUTS sampler."
mcmc = MCMC(NUTS(model),
num_warmup=num_warmup, num_samples=num_samples,
num_chains=num_chains, chain_method="vectorized",
progress_bar=False)
mcmc.run(random.PRNGKey(seed), jnp.asarray(data))
return mcmc
We fit the model to the monthly data.
mcmc_monthly = run_nuts(linear_model, u_monthly)
Now we inspect the output.
mcmc_monthly.print_summary()
mean std median 5.0% 95.0% n_eff r_hat
phi 0.99 0.00 0.99 0.99 1.00 3607.29 1.00
sigma 0.21 0.01 0.21 0.20 0.22 4086.14 1.00
ubar 5.66 1.17 5.69 3.72 7.51 3250.21 1.00
Number of divergences: 0
Each row summarizes the posterior for one parameter.
The mean, median, and 5.0%/95.0% columns give the posterior mean, median, and a 90% credible interval; std is the posterior standard deviation.
The last two columns are convergence diagnostics: n_eff is the effective number of independent draws, and r_hat compares variation within and across chains — a value very close to \(1.0\) means the chains agree and the sampler has converged.
Here r_hat is essentially one and n_eff is large, so we can trust the draws.
The number that matters for us is the posterior for \(\phi\): its mass crowds right up against one.
In particular, the mean and median are very close to one, while the standard deviation is very small.
In other words, at a monthly frequency, US unemployment is almost a random walk.
This is the hysteresis boundary.
Thus, we find that the natural-rate and hysteresis views are nearly indistinguishable in the monthly data — the estimate is consistent with both.
This is why the unit-root tests of the era struggled to settle the debate [Røed, 1997].
112.5. A random walk would wander off#
While the estimation above seems to give reasonable support to the unit root hypothesis, there is a good reason to view it as false.
To see the argument, suppose that unemployment really is a pure random walk, with \(\phi = 1\):
Then \(u_t = u_0 + \sum_{s=1}^t \varepsilon_s\), so its variance grows without bound: \(\operatorname{Var}(u_t) = t\sigma^2\).
The distribution spreads out forever, and eventually probability mass leaves every bounded interval.
We can see this by simulating many random-walk paths, using the observed one-month changes to set the shock size, and watching them fan out.
rng = np.random.default_rng(0)
T = len(u_monthly)
σ_rw = np.diff(u_monthly).std()
paths = u_monthly[0] + np.cumsum(rng.normal(0, σ_rw, size=(400, T)), axis=1)
u_min, u_max = u_monthly.min(), u_monthly.max()
fig, ax = plt.subplots()
ax.plot(paths[:60].T, color='C0', lw=0.5, alpha=0.3)
ax.axhspan(u_min, u_max, color='C1', alpha=0.15, label='observed range')
ax.axhline(u_min, color='C1', ls='--', lw=1.5)
ax.axhline(u_max, color='C1', ls='--', lw=1.5)
ax.set_xlabel('months since 1948')
ax.set_ylabel('unemployment rate (%)')
ax.legend()
plt.show()
Fig. 112.2 Random walks leave the observed range#
In Fig. 112.2 the dashed lines mark the lowest and highest unemployment rates ever seen in the data, and the shaded region is the band between them.
The simulated paths fan out like \(\sqrt{t}\) and quickly spread far beyond this band, including into negative rates.
A random walk has no anchor, but unemployment clearly does — it has stayed inside a narrow band for seventy years.
So we can already rule out an exact random walk.
This will become even clearer when we examine annual data.
112.6. Monthly versus annual#
The monthly data leave \(\phi\) pinned against one.
Does a different frequency allow us to see something more?
We form an annual series from end-of-year values and fit the same model to it.
u_annual = pre_covid.resample("YE").last().to_numpy()
print(f"{len(u_annual)} annual observations")
72 annual observations
We fit the same model to this shorter series.
mcmc_annual = run_nuts(linear_model, u_annual)
Now we compare the posterior for \(\phi\) across the two frequencies.
φ_m = np.asarray(mcmc_monthly.get_samples()["phi"])
φ_a = np.asarray(mcmc_annual.get_samples()["phi"])
fig, ax = plt.subplots()
ax.hist(φ_m, bins=50, density=True, alpha=0.6, label='monthly')
ax.hist(φ_a, bins=50, density=True, alpha=0.6, label='annual')
ax.set_xlabel('$\\phi$')
ax.legend()
plt.show()
Fig. 112.3 Posterior for the persistence parameter#
Fig. 112.3 illustrates our main finding: the annual posterior for \(\phi\) sits well below one, with clear reversion, while the monthly posterior pushes up against the boundary.
This is not a contradiction.
If the monthly persistence is \(\phi\), then for end-of-year values the persistence is about \(\phi^{12}\), and raising a number near one to the twelfth power pulls it appreciably below one — in line with our annual estimate.
112.7. What the model misses#
Looking again at Fig. 112.1, we notice that unemployment jumps up quickly in recessions and drifts down slowly in recoveries.
This is an asymmetry that our current model cannot replicate.
To see why, we look closely at the one part of the model that is random — the shocks.
We proceed in three steps: state what the model assumes about the shocks, recover them from the data, and compare the two.
112.7.1. What the model assumes about the shocks#
Given last period’s rate \(u_t\) and the parameters, the next rate is a deterministic conditional mean plus a shock:
Rearranging, the shock is just the gap between what happened and what the model expected:
The model makes a strong and testable claim about these shocks: they are independent draws from a symmetric normal distribution.
If that claim holds, the shocks we recover from the data should look like a bell curve.
If it fails, the way it fails will tell us what the model is missing.
112.7.2. Recovering the shocks#
We cannot read the shocks off directly, because we do not know the parameters \(\bar u\) and \(\phi\).
So we estimate them, plugging a single representative value in for each.
We use the posterior medians — a reasonable choice for a quick diagnostic.
med = {k: np.median(np.asarray(mcmc_monthly.get_samples()[k]))
for k in ("ubar", "phi", "sigma")}
resid = u_monthly[1:] - (med["ubar"] + med["phi"] * (u_monthly[:-1] - med["ubar"]))
The resid array holds our estimated shocks, one for each month-to-month transition — the model’s residuals.
The slicing is what lines up each month with the one before it.
Each entry of resid is \(u_{t+1} - \big(\hat{\bar u} + \hat\phi\,(u_t - \hat{\bar u})\big)\), the model’s one-step-ahead forecast error, with hats denoting the median estimates.
(Because our estimate \(\hat\phi\) is so close to one, this is nearly just the monthly change \(u_{t+1} - u_t\).)
112.7.3. Comparing with the Gaussian#
Now we ask whether these residuals look Gaussian.
We overlay a normal density whose standard deviation we set equal to that of the residuals themselves.
This is deliberate: the residuals already have mean near zero, so matching the variance makes the mean and the spread agree.
Anything left over is then a difference in shape — which is what we want to isolate.
One way to measure the shape is the skewness, the third standardized moment:
This measure is zero for any symmetric distribution and positive when the right tail is the longer one.
We now plot the residuals and compute their skewness:
def skewness(x):
x = x - x.mean()
return (x**3).mean() / x.std()**3
fig, ax = plt.subplots()
ax.hist(resid, bins=60, density=True, alpha=0.6, label='residuals')
grid = np.linspace(resid.min(), resid.max(), 200)
gauss = np.exp(-grid**2 / (2 * resid.std()**2)) / (resid.std() * np.sqrt(2 * np.pi))
ax.plot(grid, gauss, 'C1', lw=2, label='symmetric Gaussian')
ax.set_xlabel('one-month change not explained by the model')
ax.legend()
plt.show()
print(f"residual skewness = {skewness(resid):.2f}")
Fig. 112.4 Model residuals are right-skewed#
residual skewness = 0.39
The residuals in Fig. 112.4 depart from the Gaussian in two ways.
They are heavy-tailed and sharply peaked: more tiny changes, and more large ones, than a bell curve allows.
Moreover, they are right-skewed, with the largest surprises on the upside (recessions).
The symmetric Gaussian (orange) can match neither feature: it treats upward and downward shocks as equally likely and has no room for the occasional very large jump.
So our model is bound to misread the data, seeing the rare jump up and the long gentle slide down as the same kind of shock.
112.7.4. A note on the plug-in#
A Bayesian purist would object that there is no single residual series here.
Every posterior draw of \((\bar u, \phi)\) implies its own, and we simply chose the medians.
That is fair, and the fully Bayesian version of this check — simulating whole datasets from the posterior and comparing a summary statistic — is what we do in Asymmetry and Large Shocks in Unemployment.
This plug-in check is a quick preview of that fuller test.
Capturing the asymmetry is the task of that next lecture, where we model the shocks themselves rather than the reversion curve.
112.8. Exercises#
The lecture Forecasting an AR(1) Process forecasts an AR(1) process by simulating future paths, considering both a predictive distribution that conditions on fixed parameter values and one that integrates over their posterior uncertainty.
The following exercises apply these ideas to our fitted unemployment model.
Exercise 112.1
Using the fitted annual model, forecast unemployment over the next \(H = 15\) years, starting from the last observed value, in two ways:
plug-in: hold the parameters at their posterior medians and simulate many future paths;
extended: for each future path, draw a fresh \((\bar u, \phi, \sigma)\) from the posterior.
Plot the 90% predictive band for each on the same axes and compare.
Which band is wider, and why?
Solution
We use the annual posterior draws and simulate forward from the last observation.
We work with the annual model because its persistence \(\phi\) is far less certain than at the monthly frequency, so parameter uncertainty has more to say.
post = mcmc_annual.get_samples()
ubar_s = np.asarray(post["ubar"])
φ_s = np.asarray(post["phi"])
σ_s = np.asarray(post["sigma"])
def sim_future(u_last, ubar, φ, σ, H, rng):
"Simulate H steps of the linear model forward from u_last."
u = np.empty(H)
prev = u_last
for h in range(H):
prev = ubar + φ * (prev - ubar) + rng.normal(0, σ)
u[h] = prev
return u
H, N = 15, 2000
u_last = u_annual[-1]
rng = np.random.default_rng(0)
# plug-in: parameters fixed at posterior medians
ub0, φ0, σ0 = np.median(ubar_s), np.median(φ_s), np.median(σ_s)
plug = np.array([sim_future(u_last, ub0, φ0, σ0, H, rng) for _ in range(N)])
# extended: a fresh posterior draw for each path
idx = rng.integers(0, len(φ_s), N)
ext = np.array([sim_future(u_last, ubar_s[i], φ_s[i], σ_s[i], H, rng)
for i in idx])
fig, ax = plt.subplots()
horizon = np.arange(1, H + 1)
for data, c, lab in [(plug, 'C0', 'plug-in'), (ext, 'C1', 'extended')]:
lo, hi = np.percentile(data, [5, 95], axis=0)
ax.fill_between(horizon, lo, hi, alpha=0.3, color=c, label=lab)
ax.axhline(u_last, color='k', lw=0.5)
ax.set_xlabel('years ahead')
ax.set_ylabel('unemployment rate (%)')
ax.legend()
plt.show()
Both forecasts drift up from the low starting value toward the reversion level \(\bar u\), with bands that widen as we look further ahead.
The extended band is clearly wider, because it adds uncertainty about the parameters on top of the uncertainty from future shocks — and here the persistence \(\phi\) and the level \(\bar u\) are both genuinely uncertain.
Exercise 112.2
Following Wecker (see Forecasting an AR(1) Process), we can also form a predictive distribution for a path statistic — a nonlinear function of the whole future path.
Take the maximum unemployment rate over the next \(H = 8\) years as the statistic: a simple measure of how bad the next several years might get.
Using the extended simulation (a posterior draw per path), compute this maximum for each path and plot its predictive distribution.
What is the posterior predictive probability that unemployment reaches at least \(7\%\) — recession territory — at some point over the next eight years?
Solution
We reuse sim_future and the posterior draws from the previous exercise.
H2, M = 8, 5000
idx = rng.integers(0, len(φ_s), M)
peak = np.array([sim_future(u_last, ubar_s[i], φ_s[i], σ_s[i], H2, rng).max()
for i in idx])
fig, ax = plt.subplots()
ax.hist(peak, bins=50, density=True, alpha=0.6)
ax.axvline(u_last, color='C3', lw=2, label='current rate')
ax.set_xlabel('maximum unemployment over next 8 years (%)')
ax.legend()
plt.show()
prob = (peak > 7.0).mean()
print(f"P(unemployment reaches 7% within 8 years) = {prob:.2f}")
P(unemployment reaches 7% within 8 years) = 0.32
The predictive distribution summarizes the plausible “worst case” over the next several years, integrating over both shocks and parameter uncertainty.
Starting from a cyclical low, the model sees a substantial chance of a return to recession-level unemployment within the horizon.