Estimating Parameters





Kerry Back

Overview

  • Using past samples to estimate means, standard deviations, and correlations is hazardous.
  • Especially if there are many assets.
  • Need long (stationary) time series of returns
  • And/or some models. Models limit the degrees of freedom and can help avoid overfitting.
  • And/or some “penalization” – constraints, etc.
  • This session: past samples.

Sampling Distributions

  • Suppose returns \(r_{1},\cdots,r_{n}\) are independent draws from a normal \((\mu,\sigma^2)\) distribution.

  • Let \(m =\) sample mean and \(s =\) sample std dev = \(\sqrt{\sum_{i=1}^n \frac{(r_{i}-m)^2}{n-1}}\)

  • Then, \(m\) is normal \((\mu,\sigma^2/n)\) and

  • \((n-1)s^2/\sigma^2\) is \(\chi^2(n-1)\).

Confidence Intervals

  • Example: \(n=25\), \(m=0.12\), \(s=0.30\).
  • The estimated std dev (std error) of \(\bar{r}\) is \(0.30/\sqrt{25}=0.06\).
  • A \(95\%\) confidence interval for \(\mu\) is

\[0.12 ± 1.96 \times 0.06 = [0.013,0.227]\] - A similarly wide confidence interval for \(\sigma\) is implied by the \(\chi^2\) distribution.

  • We can sample more frequently to get better estimates of standard deviations and correlations.
  • But it doesn’t help for means.
  • The problems with standard deviations and correlations are
    • Std devs vary over time (turbulent and calm markets).
    • Correlations also increase in turbulent markets.
    • There are too many correlations: \(n(n-1)/2\).

Sampling frequency

If we sample monthly, weekly, \(\ldots\) then we have more data points, so estimates are more accurate.

When we scale to annual parameters, the accuracy gain vanishes for the mean.


Simulation

To illustrate effect of sampling frequency,

  • Simulate 5,000 25-year histories of monthly returns.
  • Compound monthly returns to get annual returns.
  • Compute sampling distributions (across 5,000 samples) of monthly and annual statistics.

import numpy as np
from scipy.stats import norm

# monthly parameters
mu, sigma = 0.01, 0.3/np.sqrt(12)  

mrets = norm.rvs(loc=mu, scale=sigma, size=12*25*5000)
mrets = mrets.reshape(12, 25, 5000)
mmeans = np.mean(mrets, axis=(0,1))
msds = np.std(mrets, axis=(0,1))

arets = np.prod(1+mrets, axis=0) - 1
ameans = np.mean(arets, axis=0)
asds = np.std(arets, axis=0)

Means – sampling monthly doesn’t help

Standard deviations - monthly is better

Higher frequency is also better for correlations, covariances, and betas.