Dear Matt, /Advances in Data Analysis and Classification/: 1-16, March 06, 2011. J. JosseI do not answer your question directly but I can suggest another way to deal with missing values in the framework of PCA. Indeed, it is possible to modify PCA algorithms such that it can handle missing values. In my lab, we work on this topic and we have developed an R package named missMDA. This package is dedicated to handle missing values in principal components method such as principal component analysis. The rationale of the proposed method is the following: First, an EM algorithm (named EM-PCA) is implemented to obtain estimate of the scores and of the loadings despite the missing values. The algorithm consists in alternating two steps: one step to estimate the parameters via PCA and one step to impute the missing values using the PCA model (named also the reconstruction formulae). Consequently, at the end of the algorithm, a completed data set is obtained as well as the scores and loadings (if you achieve your PCA on the completed data set, you find the same loadings and scores). Then we have proposed a multiple imputation procedure using the PCA model. To visualize the different plausible imputations on the PCA maps, we have proposed confidences areas around the position of the individuals and the variables. References can be found in the following article: Josse, Julie; Pagès, Jérôme; Husson, François (2011). Multiple imputation in principal component analysis.
Dear All,
I am using Amelia to fill in some gaps in national accounts data (and similar panels of data); as a result of the structure of my panel, there is no 'cs' parameter -- just 'ts'.
I intend to use the EM algorithim to complete my panel, and then to extract factors via PCA -- as a reuslt i have two related questions.
1/ I would like to use the tscsPlot command (or similar) to plot the observed values and the imputations (mean + 95% confidence bands) -- is this possible?
2/ What's the best way to use the output from the imputations to generate the factors in the PCA? I have considered two methods, but am unsure which is best (most valid) --
1/ fill the gaps in the panel with the the mean of the imputations and use the single data set to extract the eigenvales and eigenvectors of the assocaited covariance matrix - and then use these weights and the 'mean-filled in' data set to generate the factors.
2/ stack the imputation panels and use the stacked panel to generate the eigenvalues and eigenvectors of the associated covariance matrix - and then use the mean values from the imputation runs to fill the gaps in my original panel, and the weights from the stacked panel?
thanks and best regards
Matt Johnson