112. A Linear Model of Unemployment#

GPU

This lecture was built using a machine with access to a GPU — although it will also run without one.

Google Colab has a free tier with GPUs that you can access as follows:

  1. Click on the “play” icon top right

  2. Select Colab

  3. Set the runtime environment to include a GPU

In addition to what’s in Anaconda, this lecture needs the following libraries.

We first install numpyro and jax:

!pip install numpyro jax

Hide code cell output

Requirement already satisfied: numpyro in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (0.21.0)
Requirement already satisfied: jax in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (0.10.2)
Requirement already satisfied: jaxlib>=0.7.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from numpyro) (0.10.2)
Requirement already satisfied: multipledispatch in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from numpyro) (1.0.0)
Requirement already satisfied: numpy in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from numpyro) (2.4.6)
Requirement already satisfied: tqdm in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from numpyro) (4.68.2)
Requirement already satisfied: ml_dtypes>=0.5.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from jax) (0.5.4)
Requirement already satisfied: opt_einsum in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from jax) (3.4.0)
Requirement already satisfied: scipy>=1.14 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from jax) (1.17.1)

We also install pandas_datareader, which we use to download data from FRED:

!pip install pandas_datareader

Hide code cell output

Collecting pandas_datareader
  Downloading pandas_datareader-0.11.1-py3-none-any.whl.metadata (3.8 kB)
Collecting lxml (from pandas_datareader)
  Downloading lxml-6.1.1-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.metadata (3.5 kB)
Requirement already satisfied: pandas>=2.1.4 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pandas_datareader) (3.0.3)
Requirement already satisfied: requests>=2.19.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pandas_datareader) (2.34.2)
Requirement already satisfied: setuptools in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pandas_datareader) (82.0.1)
Requirement already satisfied: numpy>=1.26.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pandas>=2.1.4->pandas_datareader) (2.4.6)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from pandas>=2.1.4->pandas_datareader) (2.9.0.post0)
Requirement already satisfied: six>=1.5 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from python-dateutil>=2.8.2->pandas>=2.1.4->pandas_datareader) (1.17.0)
Requirement already satisfied: charset_normalizer<4,>=2 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from requests>=2.19.0->pandas_datareader) (3.4.4)
Requirement already satisfied: idna<4,>=2.5 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from requests>=2.19.0->pandas_datareader) (3.18)
Requirement already satisfied: urllib3<3,>=1.26 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from requests>=2.19.0->pandas_datareader) (2.7.0)
Requirement already satisfied: certifi>=2023.5.7 in /home/runner/miniconda3/envs/quantecon/lib/python3.13/site-packages (from requests>=2.19.0->pandas_datareader) (2026.5.20)
Downloading pandas_datareader-0.11.1-py3-none-any.whl (65 kB)
Downloading lxml-6.1.1-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl (5.2 MB)
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/5.2 MB ? eta -:--:--
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.2/5.2 MB 130.8 MB/s  0:00:00
?25h
Installing collected packages: lxml, pandas_datareader
?25l
   ━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━ 1/2 [pandas_datareader]
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2/2 [pandas_datareader]

Successfully installed lxml-6.1.1 pandas_datareader-0.11.1

112.1. Overview#

This lecture applies Bayesian estimation to a linear AR(1) model of the US unemployment rate.

We met the machinery in Posterior Distributions for AR(1) Parameters, but there the data were simulated and the focus was theoretical.

Here we work carefully through real data, organizing the lecture around a question that divided macroeconomists in the 1980s and 1990s: is unemployment a random walk?

In the process, we will observe an asymmetry the model cannot capture.

This will motivate a sequel lecture, titled Asymmetry and Large Shocks in Unemployment.

As in Posterior Distributions for AR(1) Parameters and Non-Conjugate Priors, we estimate by sampling posteriors with the NUTS sampler in NumPyro.

(See the introduction to NUTS in Non-Conjugate Priors for a brief account of how it works.)

Let’s start with some imports.

import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
from pandas_datareader import data as web

import jax.numpy as jnp
from jax import random
import numpyro
import numpyro.distributions as dist
from numpyro.infer import MCMC, NUTS

112.2. The natural rate versus hysteresis#

In the early 1980s, Nelson and Plosser [1982] launched the unit-root literature.

They examined 14 macroeconomic time series and found that they could reject the random walk for only one of them.

Roughly speaking, this meant that, for most macroeconomic series, the effects of shocks looked permanent rather than transitory.

For unemployment, that view found expression in the hysteresis hypothesis of [Blanchard and Summers, 1986].

This hypothesis states that a shock to unemployment can be more or less permanent, because spells of joblessness erode skills and attachment to the labor force.

In contrast, the natural rate hypothesis of [Friedman, 1968] holds that unemployment fluctuates around a stable equilibrium rate — and hence shocks are transitory rather than permanent.

The debate matters because the two views imply different policies.

In particular, if shocks are permanent, a deep recession leaves a lasting scar, encouraging remedial action.

Here we reexamine the question with Bayesian estimation.

Note

There is an irony in the history described above.

While Nelson and Plosser [1982] fueled the unit root debate, and hence the hysteresis hypothesis, the one macroeconomic series they rejected the unit root for was the unemployment rate.

112.3. The data#

We use the US civilian unemployment rate, series UNRATE from FRED, monthly and seasonally adjusted.

start, end = dt.datetime(1948, 1, 1), dt.datetime(2024, 12, 31)
unrate = web.DataReader("UNRATE", "fred", start, end)["UNRATE"]

The COVID-19 spike of 2020 is an extreme outlier driven by events our models know nothing about, so we drop it.

pre_covid = unrate[unrate.index < "2020-01-01"]
u_monthly = pre_covid.to_numpy()
print(f"{len(u_monthly)} monthly observations")
864 monthly observations

Here is the monthly series.

fig, ax = plt.subplots()
ax.plot(pre_covid.index, u_monthly, lw=2)
ax.set_xlabel('year')
ax.set_ylabel('unemployment rate (%)')
plt.show()
_images/824c2ef789f77338e27deb01d71f70fa9e4763f4407d8f9f78a4a1e25facb346.png

Fig. 112.1 US monthly unemployment rate, 1948–2019#

As Fig. 112.1 shows, the rate rises sharply in recessions and drifts down in recoveries, but it always stays inside a band — roughly 3% to 11% over the whole post-war period.

(Keep that band in mind, since it connects back to the unit root debate, as we discuss below.)

112.4. A linear model of unemployment#

Let’s now set up and estimate our model, using monthly data.

112.4.1. The model#

We let unemployment be pulled back toward a normal level \(\bar u\):

(112.1)#\[ u_{t+1} = \bar u + \phi\,(u_t - \bar u) + \varepsilon_{t+1}, \qquad \varepsilon_{t+1} \sim N(0, \sigma^2), \]

with \(0 \le \phi < 1\).

This is a linear AR(1) model, written so that \(\bar u\) is the level the series reverts to and \(\phi\) measures persistence.

The closer \(\phi\) is to one, the more slowly the series returns to \(\bar u\) after a shock; the random walk is the limiting case \(\phi = 1\).

112.4.2. Priors#

We treat \(\bar u\), \(\phi\) and \(\sigma\) as unknown and place weakly informative priors on them.

We give \(\phi\) a uniform prior on \([0, 1)\).

The upper endpoint is excluded deliberately, since doing so confines us to the stationary region.

This is necessary, as we’ll see below.

At the same time, the prior still lets \(\phi\) approach one as closely as the data demand.

We center \(\bar u\) on a plausible natural rate with a fairly wide normal prior, and give the shock scale \(\sigma\) a half-normal prior.

We write the model as a NumPyro function: each numpyro.sample introduces a random variable, and the keyword obs= ties the last one to the data, supplying the likelihood.

def linear_model(u):
    ubar = numpyro.sample("ubar",  dist.Normal(5.5, 2.0))      # Natural rate prior
    φ    = numpyro.sample("phi",   dist.Uniform(0.0, 1.0))     # Persistence prior
    σ    = numpyro.sample("sigma", dist.HalfNormal(1.0))       # Volatility prior
    μ = ubar + φ * (u[:-1] - ubar)
    numpyro.sample("u_obs", dist.Normal(μ, σ), obs=u[1:])

The vector μ holds the conditional means \(\bar u + \phi(u_t - \bar u)\), and obs=u[1:] says that each next value is drawn from \(N(\mu_t, \sigma^2)\).

This one statement encodes the whole likelihood. (See Non-Conjugate Priors for more on writing NumPyro models.)

112.4.3. Estimation#

We sample the posterior with NUTS, running four chains so that we can check convergence.

We use chain_method="vectorized", which evaluates all chains together on a single device, so the same code runs unchanged on a CPU or a GPU.

def run_nuts(model, data, seed=0, num_warmup=1000, num_samples=2000, num_chains=4):
    "Sample a NumPyro model with the NUTS sampler."
    mcmc = MCMC(NUTS(model),
                num_warmup=num_warmup, num_samples=num_samples,
                num_chains=num_chains, chain_method="vectorized",
                progress_bar=False)
    mcmc.run(random.PRNGKey(seed), jnp.asarray(data))
    return mcmc

We fit the model to the monthly data.

mcmc_monthly = run_nuts(linear_model, u_monthly)

Now we inspect the output.

mcmc_monthly.print_summary()
                mean       std    median      5.0%     95.0%     n_eff     r_hat
       phi      0.99      0.00      0.99      0.99      1.00   3607.29      1.00
     sigma      0.21      0.01      0.21      0.20      0.22   4086.14      1.00
      ubar      5.66      1.17      5.69      3.72      7.51   3250.21      1.00

Number of divergences: 0

Each row summarizes the posterior for one parameter.

The mean, median, and 5.0%/95.0% columns give the posterior mean, median, and a 90% credible interval; std is the posterior standard deviation.

The last two columns are convergence diagnostics: n_eff is the effective number of independent draws, and r_hat compares variation within and across chains — a value very close to \(1.0\) means the chains agree and the sampler has converged.

Here r_hat is essentially one and n_eff is large, so we can trust the draws.

The number that matters for us is the posterior for \(\phi\): its mass crowds right up against one.

In particular, the mean and median are very close to one, while the standard deviation is very small.

In other words, at a monthly frequency, US unemployment is almost a random walk.

This is the hysteresis boundary.

Thus, we find that the natural-rate and hysteresis views are nearly indistinguishable in the monthly data — the estimate is consistent with both.

This is why the unit-root tests of the era struggled to settle the debate [Røed, 1997].

112.5. A random walk would wander off#

While the estimation above seems to give reasonable support to the unit root hypothesis, there is a good reason to view it as false.

To see the argument, suppose that unemployment really is a pure random walk, with \(\phi = 1\):

\[ u_{t+1} = u_t + \varepsilon_{t+1}, \qquad \varepsilon_{t+1} \sim N(0, \sigma^2). \]

Then \(u_t = u_0 + \sum_{s=1}^t \varepsilon_s\), so its variance grows without bound: \(\operatorname{Var}(u_t) = t\sigma^2\).

The distribution spreads out forever, and eventually probability mass leaves every bounded interval.

We can see this by simulating many random-walk paths, using the observed one-month changes to set the shock size, and watching them fan out.

rng = np.random.default_rng(0)
T = len(u_monthly)
σ_rw = np.diff(u_monthly).std()
paths = u_monthly[0] + np.cumsum(rng.normal(0, σ_rw, size=(400, T)), axis=1)

u_min, u_max = u_monthly.min(), u_monthly.max()

fig, ax = plt.subplots()
ax.plot(paths[:60].T, color='C0', lw=0.5, alpha=0.3)
ax.axhspan(u_min, u_max, color='C1', alpha=0.15, label='observed range')
ax.axhline(u_min, color='C1', ls='--', lw=1.5)
ax.axhline(u_max, color='C1', ls='--', lw=1.5)
ax.set_xlabel('months since 1948')
ax.set_ylabel('unemployment rate (%)')
ax.legend()
plt.show()
_images/6573459b9655f09653250fc08fd2f20e3613f4580b954504dcba0bf8a2d03530.png

Fig. 112.2 Random walks leave the observed range#

In Fig. 112.2 the dashed lines mark the lowest and highest unemployment rates ever seen in the data, and the shaded region is the band between them.

The simulated paths fan out like \(\sqrt{t}\) and quickly spread far beyond this band, including into negative rates.

A random walk has no anchor, but unemployment clearly does — it has stayed inside a narrow band for seventy years.

So we can already rule out an exact random walk.

This will become even clearer when we examine annual data.

112.6. Monthly versus annual#

The monthly data leave \(\phi\) pinned against one.

Does a different frequency allow us to see something more?

We form an annual series from end-of-year values and fit the same model to it.

u_annual = pre_covid.resample("YE").last().to_numpy()
print(f"{len(u_annual)} annual observations")
72 annual observations

We fit the same model to this shorter series.

mcmc_annual = run_nuts(linear_model, u_annual)

Now we compare the posterior for \(\phi\) across the two frequencies.

φ_m = np.asarray(mcmc_monthly.get_samples()["phi"])
φ_a = np.asarray(mcmc_annual.get_samples()["phi"])

fig, ax = plt.subplots()
ax.hist(φ_m, bins=50, density=True, alpha=0.6, label='monthly')
ax.hist(φ_a, bins=50, density=True, alpha=0.6, label='annual')
ax.set_xlabel('$\\phi$')
ax.legend()
plt.show()
_images/928e2320e64184a0740f25752218870522e55177a829cbf506e4a27c2cf7789d.png

Fig. 112.3 Posterior for the persistence parameter#

Fig. 112.3 illustrates our main finding: the annual posterior for \(\phi\) sits well below one, with clear reversion, while the monthly posterior pushes up against the boundary.

This is not a contradiction.

If the monthly persistence is \(\phi\), then for end-of-year values the persistence is about \(\phi^{12}\), and raising a number near one to the twelfth power pulls it appreciably below one — in line with our annual estimate.

112.7. What the model misses#

Looking again at Fig. 112.1, we notice that unemployment jumps up quickly in recessions and drifts down slowly in recoveries.

This is an asymmetry that our current model cannot replicate.

To see why, we look closely at the one part of the model that is random — the shocks.

We proceed in three steps: state what the model assumes about the shocks, recover them from the data, and compare the two.

112.7.1. What the model assumes about the shocks#

Given last period’s rate \(u_t\) and the parameters, the next rate is a deterministic conditional mean plus a shock:

\[ u_{t+1} = \underbrace{\bar u + \phi\,(u_t - \bar u)}_{\text{conditional mean}} + \varepsilon_{t+1}, \qquad \varepsilon_{t+1} \sim N(0, \sigma^2). \]

Rearranging, the shock is just the gap between what happened and what the model expected:

\[ \varepsilon_{t+1} = u_{t+1} - \big(\bar u + \phi\,(u_t - \bar u)\big). \]

The model makes a strong and testable claim about these shocks: they are independent draws from a symmetric normal distribution.

If that claim holds, the shocks we recover from the data should look like a bell curve.

If it fails, the way it fails will tell us what the model is missing.

112.7.2. Recovering the shocks#

We cannot read the shocks off directly, because we do not know the parameters \(\bar u\) and \(\phi\).

So we estimate them, plugging a single representative value in for each.

We use the posterior medians — a reasonable choice for a quick diagnostic.

med = {k: np.median(np.asarray(mcmc_monthly.get_samples()[k]))
       for k in ("ubar", "phi", "sigma")}
resid = u_monthly[1:] - (med["ubar"] + med["phi"] * (u_monthly[:-1] - med["ubar"]))

The resid array holds our estimated shocks, one for each month-to-month transition — the model’s residuals.

The slicing is what lines up each month with the one before it.

Each entry of resid is \(u_{t+1} - \big(\hat{\bar u} + \hat\phi\,(u_t - \hat{\bar u})\big)\), the model’s one-step-ahead forecast error, with hats denoting the median estimates.

(Because our estimate \(\hat\phi\) is so close to one, this is nearly just the monthly change \(u_{t+1} - u_t\).)

112.7.3. Comparing with the Gaussian#

Now we ask whether these residuals look Gaussian.

We overlay a normal density whose standard deviation we set equal to that of the residuals themselves.

This is deliberate: the residuals already have mean near zero, so matching the variance makes the mean and the spread agree.

Anything left over is then a difference in shape — which is what we want to isolate.

One way to measure the shape is the skewness, the third standardized moment:

\[ \text{skew} = \frac{\frac1n \sum_i (\varepsilon_i - \bar\varepsilon)^3}{\Big(\frac1n \sum_i (\varepsilon_i - \bar\varepsilon)^2\Big)^{3/2}}. \]

This measure is zero for any symmetric distribution and positive when the right tail is the longer one.

We now plot the residuals and compute their skewness:

def skewness(x):
    x = x - x.mean()
    return (x**3).mean() / x.std()**3

fig, ax = plt.subplots()
ax.hist(resid, bins=60, density=True, alpha=0.6, label='residuals')
grid = np.linspace(resid.min(), resid.max(), 200)
gauss = np.exp(-grid**2 / (2 * resid.std()**2)) / (resid.std() * np.sqrt(2 * np.pi))
ax.plot(grid, gauss, 'C1', lw=2, label='symmetric Gaussian')
ax.set_xlabel('one-month change not explained by the model')
ax.legend()
plt.show()

print(f"residual skewness = {skewness(resid):.2f}")
_images/87157e2fee7892f738ad49a45a06c818778ee9e301a82becec94958c2262b0b5.png

Fig. 112.4 Model residuals are right-skewed#

residual skewness = 0.39

The residuals in Fig. 112.4 depart from the Gaussian in two ways.

They are heavy-tailed and sharply peaked: more tiny changes, and more large ones, than a bell curve allows.

Moreover, they are right-skewed, with the largest surprises on the upside (recessions).

The symmetric Gaussian (orange) can match neither feature: it treats upward and downward shocks as equally likely and has no room for the occasional very large jump.

So our model is bound to misread the data, seeing the rare jump up and the long gentle slide down as the same kind of shock.

112.7.4. A note on the plug-in#

A Bayesian purist would object that there is no single residual series here.

Every posterior draw of \((\bar u, \phi)\) implies its own, and we simply chose the medians.

That is fair, and the fully Bayesian version of this check — simulating whole datasets from the posterior and comparing a summary statistic — is what we do in Asymmetry and Large Shocks in Unemployment.

This plug-in check is a quick preview of that fuller test.

Capturing the asymmetry is the task of that next lecture, where we model the shocks themselves rather than the reversion curve.

112.8. Exercises#

The lecture Forecasting an AR(1) Process forecasts an AR(1) process by simulating future paths, considering both a predictive distribution that conditions on fixed parameter values and one that integrates over their posterior uncertainty.

The following exercises apply these ideas to our fitted unemployment model.

Exercise 112.1

Using the fitted annual model, forecast unemployment over the next \(H = 15\) years, starting from the last observed value, in two ways:

  1. plug-in: hold the parameters at their posterior medians and simulate many future paths;

  2. extended: for each future path, draw a fresh \((\bar u, \phi, \sigma)\) from the posterior.

Plot the 90% predictive band for each on the same axes and compare.

Which band is wider, and why?

Exercise 112.2

Following Wecker (see Forecasting an AR(1) Process), we can also form a predictive distribution for a path statistic — a nonlinear function of the whole future path.

Take the maximum unemployment rate over the next \(H = 8\) years as the statistic: a simple measure of how bad the next several years might get.

Using the extended simulation (a posterior draw per path), compute this maximum for each path and plot its predictive distribution.

What is the posterior predictive probability that unemployment reaches at least \(7\%\) — recession territory — at some point over the next eight years?