# Set up environment -----------------------------------------------------------
# Load package that implements this method
library("PCovR")
# Load package for a measure of similarity (TuckerCoef)
library("RegularizedSCA")
# Load package for Tucker congruence between vectors
library("psych")
# Load example data from PCovR package
data(alexithymia)
# Subset data
<- alexithymia$X
X_raw <- alexithymia$Y[, 1, drop = FALSE] y_raw
Introduction
Principal covariates regression is a method to analyze the relationship between sets of multivariate data in the presence of highly-collinear variables. Principal covariates regression (PCovR) is an alternative approach that modifies the optimization criteria behind PCA to include a supervision aspect . PCovR looks for a low-dimensional representation of \(X\) that accounts for the maximum amount of variation in both \(X\) and \(y\). Compared to regular principal component regression, principal covariates regression PCovR extracts components that account for much of the variability in a set of \(X\) variables and that correlate well with a set of \(Y\) variables. For more information, I recommend reading Vervloet et al. (2015) and De Jong and Kiers (1992). In this post, you can find my R code notes on this method. In these notes, I show the computations used by the PCovR
R-package to perform the method.
To understand how PCovR differs from classical PCR we need to complicate the notation. Consider the following decomposition of the data:
\[ \begin{align} \mathbf{T} &= \mathbf{X} \mathbf{W} \\ \mathbf{X} &= \mathbf{T} \mathbf{P}_X + \mathbf{E}_X \\ y &= \mathbf{T} \mathbf{P}_y + \mathbf{e}_y \end{align} \]
where \(\mathbf{T}\) is the matrix of PCs defined above, \(\mathbf{W}\) is a \(p \times q\) matrix of component weights defining what linear combination of the columns of \(\mathbf{X}\) is used to compute the components, and \(\mathbf{P}_X\) and \(\mathbf{P}_y\) are the \(q \times p\) and \(q \times 1\) loading matrices relating the variables in \(\mathbf{X}\) and \(y\) to the component scores in \(\mathbf{T}\), respectively. \(\mathbf{E}_X\) and \(\mathbf{e}_y\) are the reconstruction errors, the information lost by using \(\mathbf{T}\) as summary of \(\mathbf{X}\).
Classical PCA can be formulated as the task of finding the \(\mathbf{W}\) and \(\mathbf{P}_X\) that minimize the reconstruction error \(\mathbf{E}_X\):
\[\begin{equation} (\mathbf{W}, \mathbf{P}) = \underset{\mathbf{W}, \mathbf{P}_X}{\operatorname{argmin}} \lVert \mathbf{X} - \mathbf{XWP}' \rVert^2 \end{equation} \]
subject to the constraint \(\mathbf{P}' \mathbf{P} = \mathbf{I}\). PCovR can be formulated as the task of minimizing a weighted combination of both \(\mathbf{E}_X\) and \(\mathbf{e}_y\):
\[ \begin{equation}\label{eq:PCovR} (\mathbf{W}, \mathbf{P}_X, \mathbf{P}_y) = \underset{\mathbf{W}, \mathbf{P}_X, \mathbf{P}_y}{\operatorname{argmin }} \alpha \lVert (\mathbf{X} - \mathbf{XWP}_X') \rVert^2 + (1 - \alpha) \lVert (y - \mathbf{XWP}_y') \rVert^2 \end{equation} \]
subject to the constraint \(\mathbf{P}' \mathbf{P} = \mathbf{I}\).
The parameter \(\alpha\) defines which reconstruction error is being prioritized. When \(\alpha = 1\), the emphasis is exclusively placed on reconstructing \(\mathbf{X}\), leading PCovR to PCR. When \(\alpha = 0.5\), the importance of \(\mathbf{X}\) and \(y\) is equally weighted, a case that resembles Partial least square, which we discuss below. In practice, its value can be found by cross-validation or according to a sequential procedure based on maximum likelihood principles (Vervloet et al. 2013). In particular, \[ \begin{equation}\label{eq:aml} \alpha_{ML} = \frac{\lVert \mathbf{X} \lVert^2}{\lVert \mathbf{X} \lVert^2 + \lVert y \lVert^2 \frac{\hat{\sigma}_{\mathbf{E}_X}^2}{\hat{\sigma}_{e_y}^2}} \end{equation} \]
where \(\hat{\sigma}_{\mathbf{E}_X}^2\) can be obtained as the unexplained variance by components computed according to classical PCA and \(\hat{\sigma}_{e_y}^2\) can be estimated as the unexplained variance by the linear model regressing \(y\) on \(\mathbf{X}\).
Learn by coding
To understand PCovR and its relationship with PCR we will use the R package PCovR
, the data alexithymia
, and the package RegularizedSCA
to compute a measure of similarity between matrices (Tucker’s congruence). The alexithymia
data reports the scores of 122 Belgian psychology students on three scales:
- 20-item Toronto Alexithymia Scale (TAS-20) measuring the inability to recognize emotions.
- Center for Epidemiological Studies Depression Scale (CES-D) measuring depression.
- Rosenberg Self-Esteem Scale (RSE).
The main objective of the data collection was to assess how well TAS-20 can predict depression and self-esteem1. We collect all of the predictors in the X_raw
object and the measure of depressive symptomatology CES-D in y_raw
.
The first thing we’ll do is explore the data measures of center and spread.
# Check variable means and variances
data.frame(
mean = c(
X = colMeans(X_raw),
y = colMeans(y_raw)
),variance = c(
X = apply(X_raw, 2, var),
y = apply(y_raw, 2, var)
) )
mean variance
X.confused 1.8196721 1.388701
X.right words 1.7950820 1.635348
X.sensations 0.5983607 1.151402
X.describe 2.2213115 1.479542
X.analyze problems 2.5081967 1.161089
X.upset 1.6065574 1.678634
X.puzzled 1.1393443 1.294472
X.let happen 1.2622951 1.302534
X.identify 1.6393443 1.620919
X.essential 2.7131148 1.231066
X.feel about people 1.7213115 1.475410
X.describe more 1.0081967 1.363569
X.going on 0.9836066 1.470803
X.why angry 1.2540984 1.496884
X.daily activities 1.6639344 1.514226
X.entertainment 1.6967213 1.601477
X.reveal feelings 1.6475410 2.097886
X.close 2.8442623 1.240008
X.useful 2.4672131 1.193131
X.hidden meanings 1.2131148 1.045116
y.CES-D 16.4672131 118.350156
We will need to scale the data before performing any form of dimensionality reduction.
# Scale data
<- scale(X_raw)# * (n - 1) / n
X <- scale(y_raw)# * (n - 1) / n y
Once the data has been mean-centered and standardized we can proceed to perform PCovR of the data. First, let’s assume that both \(\alpha\) and the number of PCs we want to compute are given.
# Estimation -------------------------------------------------------------------
# Define fixed parameters
<- .5
alpha <- 5 npcs
Then we can use the PCovR::pcovr_est()
function to estimate the loadings, and weights and compute the component scores.
# > Estimation with package ----------------------------------------------------
# Estimate with PCovR function
<- PCovR::pcovr_est(
package X = X,
Y = y,
a = alpha, # fixed alpha
r = npcs # fixed number of components
)
Inside the object package
we can find all of the desired matrices. For example, we can extract the component scores
# Extract PC scores
<- package$Te T_p
Let’s also compute the component scores manually based on the procedure described by Vervloet et al. (2016):
# > Estimation with Vervolet approach (manual) ---------------------------------
# Compute the hat matrix
<- X %*% solve(t(X) %*% X) %*% t(X)
Hx
# Compute the G matrix
<- alpha * X %*% t(X) / sum(X^2) + (1 - alpha) * Hx %*% y %*% t(y) %*% Hx / sum(y^2)
G
# Take the eigendecomposition of the G matrix
<- eigen(G)
EG
# Take the first npcs eigenvectors of G as the PC scores
<- EG$vectors[, 1:npcs] T_m
We can now compare the results obtained with the two approaches. To keep the section slim and easy to read we will use the Tucker congruence coefficient to compare matrices. This is a measure of similarity between matrices that ranges between -1 and +1. A congruence greater than 0.95 means the two matrices (or vectors) are virtually identical. First, notice that the PC scores obtained are the same:
# Compare results --------------------------------------------------------------
# T scores
TuckerCoef(T_p, T_m)$tucker_value
[1] 1
# Also interesting
TuckerCoef(T_p, X %*% package$W)$tucker_value
[1] 1
Given the component scores, we can compute all of the other matrices of interest and compare them with the results obtained with the R package PCovR
:
# W (weights)
<- package$W
W_p <- solve(t(X) %*% X) %*% t(X) %*% T_m
W_m TuckerCoef(W_p, W_m)$tucker_value
[1] 1
# Px
<- package$Px
Px_p <- t(W_m) %*% t(X) %*% X
Px_m TuckerCoef(t(Px_p), t(Px_m))$tucker_value
[1] 1
# Py
<- drop(package$Py)
Py_p <- drop(t(W_m) %*% t(X) %*% y) # WtXtY
Py_m <- t(T_m) %*% y #TtY
Py_m factor.congruence(x = Py_p, y = Py_m)
CES-D
[1,] 0.98
# B
<- drop(package$B)
B_p <- drop(W_m %*% Py_m) # WPY
B_m <- drop(W_m %*% t(W_m) %*% t(X) %*% y) # WWtXtY
B_m factor.congruence(B_p, B_m)
[,1]
[1,] 1
Reverting to PCA
I mentioned before that when \(\alpha = 1\), PCovR reduces to PCR. Let’s see that in action. First, we set the desired value for \(\alpha\):
# Reverting to PCA -------------------------------------------------------------
# Use alpha 1
<- 1 alpha
then, we can perform PCovR
# Estimate PCovR
<- PCovR::pcovr_est(
package X = X,
Y = y,
a = alpha,
r = npcs # fixed number of components
)
and classical PCA according to the Guerra-Urzola et al. (2021)
# Classical PCA
<- svd(X)
uSVt <- uSVt$u
U <- diag(uSVt$d)
D <- uSVt$v
V <- nrow(X) # Define the number of rows
I <- (I - 1)^{-1 / 2} * V %*% D # Component loadings
P_hat <- (I - 1)^{1 / 2} * V %*% solve(D) # Component weights
W_hat <- (I - 1)^{1 / 2} * U # Component scores
T_hat <- X %*% W_hat T_hat
We can now compare the results again
# The scores obtained with PCovR are the same as the ones obtained with PCA
TuckerCoef(package$Te, T_hat[, 1:npcs])$tucker_value
[1] 1
# The loadings obtained with PCovR are the same as the ones obtained with PCA
TuckerCoef(t(package$Px), P_hat[, 1:npcs])$tucker_value
[1] 1
# The weights obtained with PCovR are the same as the ones obtained with PCA
TuckerCoef(package$W, W_hat[, 1:npcs])$tucker_value
[1] 1
# P is now proportional to W
TuckerCoef(package$W, t(package$Px))$tucker_value
[1] 1
Maximum Likelihood estimation of \(\alpha\)
The value of \(\alpha\) is not usually chosen arbitrarily. One could cross-validate it or compute it with a closed-form solution that relies on the Maximum likelihood approach (Vervloet et al. 2016). Here, I show how to use this latter approach. First, let’s simply fit PCovR by using the PCovR::pcovr()
function and setting the model selection argument set to “seq”. As explained in the help-file, this “implies a sequential procedure in which the weighting value is determined on the basis of maximum likelihood principles”, exactly what we want.
# Maximum likelihood tuning of alpha -------------------------------------------
# Fit PCovR
<- pcovr(
package X = X_raw,
Y = y_raw,
rot = "none",
R = npcs, # fixed number of components
modsel = "seq" # fastest option
)
Then, we can compute the maximum likelihood value of \(\alpha\) by first computing the error terms and taking their ratio.
# Compute error ratio components
<- lm(y ~ -1 + X)
lm_mod <- 1 - summary(lm_mod)$r.squared
ery
<- npcs
Rmin <- npcs
Rmax <- svd(X)
sing <- Rmin:Rmax
vec <- c(vec[1] - 1, vec, vec[length(vec)] + 1)
vec <- c(0, cumsum(sing$d^2) / sum(sing$d^2))
VAF <- VAF[vec + 1]
VAF <- array(NA, c(1, length(vec)))
scr for (u in 2:(length(vec) - 1)) {
<- (VAF[u] - VAF[u - 1]) / (VAF[u + 1] - VAF[u])
scr[, u]
}<- 1 - VAF[which.max(scr)] erx
We could have computed the error ratio with the PCovR::err()
R function:
# Compute error ratio with function
<- ErrorRatio(
err X = X,
Y = y,
Rmin = npcs,
Rmax = npcs
)
# Is it the same?
- erx/ery err
[1] 1.110223e-15
With this value, it is now easy to compute \(\alpha\):
# Find alpha ML
<- sum(X^2) / (sum(X^2) + sum(y^2) * erx / ery)
alpha_ML
# Compare to one found by package
$a - alpha_ML package
[1] 0
TL;DR, just give me the code!
# Set up environment -----------------------------------------------------------
# Load package that implements this method
library("PCovR")
# Load package for a measure of similarity (TuckerCoef)
library("RegularizedSCA")
# Load package for Tucker congruence between vectors
library("psych")
# Load example data from PCovR package
data(alexithymia)
# Subset data
<- alexithymia$X
X_raw <- alexithymia$Y[, 1, drop = FALSE]
y_raw
# Check variable means and variances
data.frame(
mean = c(
X = colMeans(X_raw),
y = colMeans(y_raw)
),variance = c(
X = apply(X_raw, 2, var),
y = apply(y_raw, 2, var)
)
)
# Scale data
<- scale(X_raw)# * (n - 1) / n
X <- scale(y_raw)# * (n - 1) / n
y
# Estimation -------------------------------------------------------------------
# Define fixed parameters
<- .5
alpha <- 5
npcs
# > Estimation with package ----------------------------------------------------
# Estimate with PCovR function
<- PCovR::pcovr_est(
package X = X,
Y = y,
a = alpha, # fixed alpha
r = npcs # fixed number of components
)
# Extract PC scores
<- package$Te
T_p
# > Estimation with Vervolet approach (manual) ---------------------------------
# Compute the hat matrix
<- X %*% solve(t(X) %*% X) %*% t(X)
Hx
# Compute the G matrix
<- alpha * X %*% t(X) / sum(X^2) + (1 - alpha) * Hx %*% y %*% t(y) %*% Hx / sum(y^2)
G
# Take the eigendecomposition of the G matrix
<- eigen(G)
EG
# Take the first npcs eigenvectors of G as the PC scores
<- EG$vectors[, 1:npcs]
T_m
# Compare results --------------------------------------------------------------
# T scores
TuckerCoef(T_p, T_m)$tucker_value
# Also interesting
TuckerCoef(T_p, X %*% package$W)$tucker_value
# W (weights)
<- package$W
W_p <- solve(t(X) %*% X) %*% t(X) %*% T_m
W_m TuckerCoef(W_p, W_m)$tucker_value
# Px
<- package$Px
Px_p <- t(W_m) %*% t(X) %*% X
Px_m TuckerCoef(t(Px_p), t(Px_m))$tucker_value
# Py
<- drop(package$Py)
Py_p <- drop(t(W_m) %*% t(X) %*% y) # WtXtY
Py_m <- t(T_m) %*% y #TtY
Py_m factor.congruence(x = Py_p, y = Py_m)
# B
<- drop(package$B)
B_p <- drop(W_m %*% Py_m) # WPY
B_m <- drop(W_m %*% t(W_m) %*% t(X) %*% y) # WWtXtY
B_m factor.congruence(B_p, B_m)
# Reverting to PCA -------------------------------------------------------------
# Use alpha 1
<- 1
alpha
# Estimate PCovR
<- PCovR::pcovr_est(
package X = X,
Y = y,
a = alpha,
r = npcs # fixed number of components
)
# Classical PCA
<- svd(X)
uSVt <- uSVt$u
U <- diag(uSVt$d)
D <- uSVt$v
V <- nrow(X) # Define the number of rows
I <- (I - 1)^{-1 / 2} * V %*% D # Component loadings
P_hat <- (I - 1)^{1 / 2} * V %*% solve(D) # Component weights
W_hat <- (I - 1)^{1 / 2} * U # Component scores
T_hat <- X %*% W_hat
T_hat
# The scores obtained with PCovR are the same as the ones obtained with PCA
TuckerCoef(package$Te, T_hat[, 1:npcs])$tucker_value
# The loadings obtained with PCovR are the same as the ones obtained with PCA
TuckerCoef(t(package$Px), P_hat[, 1:npcs])$tucker_value
# The weights obtained with PCovR are the same as the ones obtained with PCA
TuckerCoef(package$W, W_hat[, 1:npcs])$tucker_value
# P is now proportional to W
TuckerCoef(package$W, t(package$Px))$tucker_value
# Maximum likelihood tuning of alpha -------------------------------------------
# Fit PCovR
<- pcovr(
package X = X_raw,
Y = y_raw,
rot = "none",
R = npcs, # fixed number of components
modsel = "seq" # fastest option
)
# Compute error ratio components
<- lm(y ~ -1 + X)
lm_mod <- 1 - summary(lm_mod)$r.squared
ery
<- npcs
Rmin <- npcs
Rmax <- svd(X)
sing <- Rmin:Rmax
vec <- c(vec[1] - 1, vec, vec[length(vec)] + 1)
vec <- c(0, cumsum(sing$d^2) / sum(sing$d^2))
VAF <- VAF[vec + 1]
VAF <- array(NA, c(1, length(vec)))
scr for (u in 2:(length(vec) - 1)) {
<- (VAF[u] - VAF[u - 1]) / (VAF[u + 1] - VAF[u])
scr[, u]
}<- 1 - VAF[which.max(scr)]
erx
# Compute error ratio with function
<- ErrorRatio(
err X = X,
Y = y,
Rmin = npcs,
Rmax = npcs
)
# Is it the same?
- erx/ery
err
# Find alpha ML
<- sum(X^2) / (sum(X^2) + sum(y^2) * erx / ery)
alpha_ML
# Compare to one found by package
$a - alpha_ML package
References
Footnotes
Check the helpfile for the
alexithymia
data in thePCovR
package for more information.↩︎