
Formulas and Mathematical Detail
================================

Seemingly Unrelated Regression (SUR/SURE)
-----------------------------------------

Seemingly unrelated regression is a system regression estimator which
jointly estimates multiple models. This allows joint hypothesis testing
of parameters across models since the parameter covariance is robust to
correlation of residuals across models. It can also lead to more precise
parameter estimates if some residuals are conditionally homoskedastic
and regressors differ across equations.

Basic Notation
~~~~~~~~~~~~~~

Assume :math:`K` series are observed for :math:`N` time periods. Denote
the stacked observations of series :math:`i` as :math:`Y_{i}` and its
corresponding regressor matrix :math:`X_{i}`. The complete model
spanning all series can be specified as

.. math::

   \begin{aligned}
   Y & =X\beta+\epsilon\\
   \left[\begin{array}{c}
   Y_{1}\\
   Y_{2}\\
   \vdots\\
   Y_{K}
   \end{array}\right] & =\left[\begin{array}{cccc}
   X_{1} & 0 & 0 & 0\\
   0 & X_{2} & 0 & 0\\
   \vdots & \vdots & \ddots & \vdots\\
   0 & 0 & 0 & X_{N}
   \end{array}\right]\left[\begin{array}{c}
   \beta_{1}\\
   \beta_{2}\\
   \vdots\\
   \beta_{K}
   \end{array}\right]+\left[\begin{array}{c}
   \epsilon_{1}\\
   \epsilon_{2}\\
   \vdots\\
   \epsilon_{K}
   \end{array}\right].\end{aligned}

The OLS estimator of :math:`\beta` is then

.. math:: \hat{\beta}=\left(X^{\prime}X\right)^{-1}X^{\prime}Y.

A GLS estimator can be similarly defined as

.. math:: \hat{\beta}_{GLS}=\left(X^{\prime}\Omega^{-1}X\right)^{-1}X^{\prime}\Omega^{-1}Y

where :math:`\Omega=I_{N}\otimes\Sigma` is the joint covariance of the
residuals. In practice :math:`\Sigma` is not known as so a feasible GLS
(FGLS) is implemented in two steps. The first uses OLS to estimate
:math:`\hat{\epsilon}` and then estimates the residual covariance as

.. math::

   \hat{\Sigma}=N^{-1}\left[\begin{array}{cccc}
   \hat{\epsilon}_{1} & \hat{\epsilon}_{2} & \ldots & \hat{\epsilon}_{N}\end{array}\right]^{\prime}\left[\begin{array}{cccc}
   \hat{\epsilon}_{1} & \hat{\epsilon}_{2} & \ldots & \hat{\epsilon}_{N}\end{array}\right].

The feasible GLS estimator is then

.. math:: \hat{\beta}_{FGLS}=\left(X^{\prime}\hat{\Omega}^{-1}X\right)^{-1}X^{\prime}\hat{\Omega}^{-1}Y

where :math:`\hat{\Omega}=\hat{\Sigma}\otimes I_{N}`.

Covariance Estimation
~~~~~~~~~~~~~~~~~~~~~

**Homoskedastic Data**

When residuals are assumed to be homoskedastic, the covariance can be
consistently estimated by

.. math:: \left(X^{\prime}\Delta^{-1}X\right)^{-1}\left(X^{\prime}\Delta^{-1}\hat{\Omega}\Delta^{-1}X\right)\left(X^{\prime}\Delta^{-1}X\right)^{-1}

where :math:`\Delta` is the weighting matrix used in the parameter
estimator. For example, in OLS :math:`\Delta=I_{NK}` while in
FGLS\ :math:`\Delta=\hat{\Omega}`. The estimator supports using FGLS
with an assumption that :math:`\hat{\Sigma}` is diagonal or using a
user-specified value of :math:`\Sigma`. When the FGLS estimator is used,
this simplifies to

.. math:: \left(X^{\prime}\Delta^{-1}X\right)^{-1}

**Heteroskedastic Data**

When residuals are heteroskedastic, the covariance can be consistently
estimated by

.. math:: \left(X^{\prime}\Delta^{-1}X\right)^{-1}\hat{S}\left(X^{\prime}\Delta^{-1}X\right)^{-1}

where :math:`\hat{S}` is a covariance estimator that is clustered by
time. Define the matrix of scores as

.. math::

   \hat{g}=\left[\begin{array}{c}
   \hat{g}_{1}\\
   \hat{g}_{2}\\
   \vdots\\
   \hat{g}_{N}
   \end{array}\right]=\Delta^{-\frac{1}{2}}X\odot\left(\hat{\epsilon}\iota_{M}^{\prime}\right)

where :math:`\iota_{M}` is a vector of ones with :math:`M` elements
where :math:`M` is the number of columns in :math:`X` and :math:`\odot`
is element by element multiplication. The clustered-by-time score
covariance estimator is then

.. math:: \hat{S}=N^{-1}\sum_{i=1}^{N}\Psi_{i}^{\prime}\Psi_{i}

where

.. math:: \Psi_{i}=\sum_{j=1}^{K}\hat{g}_{ij}

is the sum of the scores across models :math:`\left(j\right)` that have
the same observation index :math:`i`. This estimator allows arbitrary
correlation of residuals with the same observation index.

**Debiased Estimators**

When the debiased flag is set, a small sample adjustment if applied so
that element :math:`ij` of :math:`\hat{\Sigma}` is scaled by

.. math:: \frac{T}{\sqrt{\left(T-P_{i}\right)\left(T-P_{j}\right)}}.

Other Statistics
~~~~~~~~~~~~~~~~

**Goodness of fit**

The reported :math:`R^{2}` is always for the data, or weighted data is
weights are used, for either OLS or GLS. The means that the reported
:math:`R^{2}` for the GLS estimator may be negative

**F statistics**

When the debiased covariance estimator is used (small sample adjustment)
the reported :math:`F` statistics use :math:`K\left(T-\bar{P}\right)`
where :math:`\bar{P}=K^{-1}\sum_{i=1}^{K}P_{i}` where :math:`P_{i}` is
the number of variables including the constant in model :math:`i`. When
models include restrictions it may be the case that the covariance is
singular. When this occurs, the :math:`F` statistic cannot be
calculated.

Memory efficient calculations
-----------------------------

The parameter estimates in the SUR model can are

.. math:: \left(X^{\prime}\Omega^{-1}X\right)^{-1}\left(X^{\prime}\Omega^{-1}Y\right)^{-1}

where

.. math::

   \Omega^{-1}=\left[\begin{array}{cccc}
   \sigma^{11}I_{T} & \sigma^{12}I_{T} & \ldots & \sigma^{1k}I_{T}\\
   \sigma^{21}I_{T} & \sigma^{22}I_{T} & \dots & \sigma^{2k}I_{T}\\
   \vdots & \vdots & \ddots & \vdots\\
   \sigma^{k2}I_{T} & \sigma^{k2}I_{T} & \dots & \sigma^{kk}I_{T}
   \end{array}\right]

where :math:`\sigma^{ij}=\left[\Sigma^{-1}\right]_{ij}`. Since :math:`X`
is block diagonal,

.. math::

   X^{\prime}\Omega^{-1}=\left[\begin{array}{cccc}
   \sigma^{11}X_{1}^{\prime} & \sigma^{12}X_{1}^{\prime} & \ldots & \sigma^{1k}X_{1}^{\prime}\\
   \sigma^{21}X_{2}^{\prime} & \sigma^{22}X_{2}^{\prime} & \dots & \sigma^{2k}X_{2}^{\prime}\\
   \vdots & \vdots & \ddots & \vdots\\
   \sigma^{k2}X_{k}^{\prime} & \sigma^{k2}X_{k}^{\prime} & \dots & \sigma^{kk}X_{k}^{\prime}
   \end{array}\right]

and

.. math::

   X^{\prime}\Omega^{-1}X=\left[\begin{array}{cccc}
   \sigma^{11}X_{1}^{\prime}X_{1} & \sigma^{12}X_{1}^{\prime}X_{2} & \ldots & \sigma^{1k}X_{1}^{\prime}X_{k}\\
   \sigma^{21}X_{2}^{\prime}X_{1} & \sigma^{22}X_{2}^{\prime}X_{2} & \dots & \sigma^{2k}X_{2}^{\prime}X_{k}\\
   \vdots & \vdots & \ddots & \vdots\\
   \sigma^{k2}X_{k}^{\prime}X_{1} & \sigma^{k2}X_{k}^{\prime}X_{2} & \dots & \sigma^{kk}X_{k}^{\prime}X_{k}
   \end{array}\right]

Similarly

.. math::

   X^{\prime}\Omega^{-1}Y=\left[\begin{array}{c}
   \sigma^{11}X_{1}^{\prime}Y_{1}+\sigma^{12}X_{1}^{\prime}Y_{2}+\ldots\sigma^{1k}X_{1}^{\prime}Y_{k}\\
   \sigma^{21}X_{2}^{\prime}Y_{1}+\sigma^{22}X_{2}^{\prime}Y_{2}+\dots+\sigma^{2k}X_{2}^{\prime}Y_{k}\\
   \vdots\\
   \sigma^{k2}X_{k}^{\prime}Y_{1}+\sigma^{k2}X_{k}^{\prime}Y_{2}+\dots+\sigma^{kk}X_{k}^{\prime}Y_{k}
   \end{array}\right]

This suggests that constructing the high dimensional :math:`\Omega^{-1}`
can be avoided and many needless multiplications can be avoided by
directly computing these two components and the solution can be found
using ``solve``.

Three Stage Least Squares (3SLS)
--------------------------------

Three-stage least squares extends SUR to the case of endogenous
variables. It is the system extension of two-stage least squares (2SLS).
Like SUR, 3SLS jointly estimates multiple models which allows joint
hypothesis testing of parameters. It can also lead to more precise
parameter estimates if some residuals are conditionally homoskedastic
and regressors differ across equations.

Basic Notation
~~~~~~~~~~~~~~

Assume :math:`K` series are observed for :math:`N` time periods. Denote
the stacked observations of series :math:`i` as :math:`Y_{i}` and its
corresponding regressor matrix :math:`X_{i}`. The complete model
spanning all series can be specified as

.. math::

   \begin{aligned}
   Y & =X\beta+\epsilon\\
   \left[\begin{array}{c}
   Y_{1}\\
   Y_{2}\\
   \vdots\\
   Y_{K}
   \end{array}\right] & =\left[\begin{array}{cccc}
   X_{1} & 0 & 0 & 0\\
   0 & X_{2} & 0 & 0\\
   \vdots & \vdots & \ddots & \vdots\\
   0 & 0 & 0 & X_{N}
   \end{array}\right]\left[\begin{array}{c}
   \beta_{1}\\
   \beta_{2}\\
   \vdots\\
   \beta_{K}
   \end{array}\right]+\left[\begin{array}{c}
   \epsilon_{1}\\
   \epsilon_{2}\\
   \vdots\\
   \epsilon_{K}
   \end{array}\right].\end{aligned}

where :math:`X_{i}` contains both exogenous and endogenous variables.
For each equation denote the available instruments as :math:`Z_{i}`. The
instrument include both exogenous regressors and excluded instrumental
variables. Define the fitted values of the exogenous variables as

.. math:: \hat{X}_{i}=Z\left(Z^{\prime}Z\right)^{-1}Z^{\prime}X.

The IV estimator of :math:`\beta` is then

.. math:: \hat{\beta}_{IV}=\left(\hat{X}^{\prime}\hat{X}\right)^{-1}\hat{X}^{\prime}Y.

where

.. math::

   \hat{X}=\left[\begin{array}{cccc}
   \hat{X}_{1} & 0 & 0 & 0\\
   0 & \hat{X}_{2} & 0 & 0\\
   \vdots & \vdots & \ddots & \vdots\\
   0 & 0 & 0 & \hat{X}_{N}
   \end{array}\right]

 A GLS estimator can be similarly defined as

.. math:: \hat{\beta}_{GLS}=\left(\hat{X}^{\prime}\Omega^{-1}\hat{X}\right)^{-1}\hat{X}^{\prime}\Omega^{-1}Y
