Dear Matt,
I do not answer your question directly but I can suggest another way to deal
with missing values in the framework of PCA. Indeed, it is possible to
modify PCA algorithms such that it can handle missing values. In my lab, we
work on this topic and we have developed an R package named missMDA. This
package is dedicated to handle missing values in principal components method
such as principal component analysis.
The rationale of the proposed method is the following:
First, an EM algorithm (named EM-PCA) is implemented to obtain estimate of
the scores and of the loadings despite the missing values. The algorithm
consists in alternating two steps: one step to estimate the parameters via
PCA and one step to impute the missing values using the PCA model (named
also the reconstruction formulae). Consequently, at the end of the
algorithm, a completed data set is obtained as well as the scores and
loadings (if you achieve your PCA on the completed data set, you find the
same loadings and scores).
Then we have proposed a multiple imputation procedure using the PCA model.
To visualize the different plausible imputations on the PCA maps, we have
proposed confidences areas around the position of the individuals and the
variables. References can be found in the following article: Josse, Julie;
Pagès, Jérôme; Husson, François (2011). Multiple imputation in principal
component analysis./Advances in Data Analysis and Classification/: 1-16,
March 06, 2011.
J. Josse
Le 07/04/2011 02:03, matthew-c.johnson(a)ubs.com a écrit :
Dear All,
I am using Amelia to fill in some gaps in national accounts data (and
similar panels of data); as a result of the structure of my panel,
there is no 'cs' parameter -- just 'ts'.
I intend to use the EM algorithim to complete my panel, and then to
extract factors via PCA -- as a reuslt i have two related questions.
1/ I would like to use the tscsPlot command (or similar) to plot the
observed values and the imputations (mean + 95% confidence bands) --
is this possible?
2/ What's the best way to use the output from the imputations to
generate the factors in the PCA? I have considered two methods, but am
unsure which is best (most valid) --
1/ fill the gaps in the panel with the the mean of the imputations and
use the single data set to extract the eigenvales and eigenvectors of
the assocaited covariance matrix - and then use these weights and the
'mean-filled in' data set to generate the factors.
2/ stack the imputation panels and use the stacked panel to generate
the eigenvalues and eigenvectors of the associated covariance matrix -
and then use the mean values from the imputation runs to fill the gaps
in my original panel, and the weights from the stacked panel?
thanks and best regards
Matt Johnson