My use case is to build a model on a training data set and then
demonstrate its performance on a separate test data set; both data sets
may contain missing data.
Amelia seems to assume that missing values will be imputed on the same
data set used to estimate the imputation model itself. Is there an
interface, or a reasonably discrete section of the code in the package,
which allows an imputation model developed from a training set to be
used to impute missing values in a test set?
--
Paul Dunmore
100 Marine Parade
Paraparaumu 5032
New Zealand
Hi Stuart,
Unfortunately, we don't have any code for implementing PCA with Amelia
output. I was more providing a high-level idea for how one could implement
this. If separate PCA analyses don't work, then maybe combine the data
first and then run PCA on the stacked data (weighting each row by 1/64).
Cheers,
Matt
~~~~~~~~~~~
Matthew Blackwell
Associate Professor of Government
Harvard University
url: http://www.mattblackwell.org
On Mon, Oct 12, 2020 at 10:18 PM Dr Stuart Reece <asreece(a)bigpond.net.au>
wrote:
> Thanks Matt.
>
>
>
> Can you please provide code to work out the PCA in each imputed dataset???
>
> Actually I went through and did this by hand in all 64 imputed datasets
> (for 50% missing data) – and then the code for analyzing it would not work
> at all….
>
> Extremely frustrating….
>
>
>
> I tried this with missMDA and factoMineR and PCA – but it only gave one
> dataset at the end and the results were not robust….
>
>
>
> But I really liked the Amelia framework and wanted to use it – but could
> not make the code run after constructing PCA’s in each dataset as noted
> earlier.
>
>
>
> Thanks for your advice,
>
>
>
> Stuart.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Matt Blackwell [mailto:mblackwell@gov.harvard.edu]
> *Sent:* Tuesday, 13 October 2020 11:59 AM
> *To:* stuart.reece(a)bigpond.com
> *Cc:* amelia(a)lists.gking.harvard.edu; Gary King; James Honaker; Stuart
> Reece
> *Subject:* Re: Principal Components of Amelia Datasets
>
>
>
> Hi Stuart,
>
>
>
> Probably the most straightforward way to do this would be to apply PCA to
> each of the imputed data sets and then use those in whatever analysis
> models you want. As an alternative, you could use the stacked dataset of
> all imputation (see my earlier email) and run PCA giving each of the rows
> of the stacked data (1/m) weight where m is the number of imputed datasets.
> This would ensure that all of the imputed data sets use the same factor
> loadings.
>
>
>
> Cheers,
> Matt
>
>
>
> ~~~~~~~~~~~
>
> Matthew Blackwell
>
> Associate Professor of Government
>
> Harvard University
>
> url: http://www.mattblackwell.org
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mattblackwell.org&d…>
>
>
>
>
>
> On Fri, Oct 9, 2020 at 10:38 PM <stuart.reece(a)bigpond.com> wrote:
>
> Hi Amelia Users.
>
>
>
> I was wondering if anyone would advise how I can add principal components
> to imputed datasets – and how to correctly combine them from all the
> imputations ???
>
>
>
> I was not able to find anything on this online….
>
>
>
> Thanks so much,
>
>
>
> Stuart Reece.
>
>
Hi Stuart,
Sorry, no, ameliabind is also designed like the combine.output I described:
it combines multiple runs of Amelia in one output object of class
"amelia". Hope that helps!
Cheers,
Matt
~~~~~~~~~~~
Matthew Blackwell
Associate Professor of Government
Harvard University
url: http://www.mattblackwell.org
On Mon, Oct 12, 2020 at 10:13 PM Dr Stuart Reece <asreece(a)bigpond.net.au>
wrote:
> Thanks Matt.
>
> Yes I have used the do.call(rbind code many times.
>
>
>
> But I thought ameliabind was to do.call(rbind like dplyr’s bind_rows was
> to the do.call rbind….
>
> ameliabind doesn’t work that way????
>
> I could not find the syntax listed anywhere online… but I don’t mind using
> do.call..(rbind
>
> Thankyou so much,
>
> Stuart.
>
>
>
>
>
>
>
>
>
> *From:* Matt Blackwell [mailto:mblackwell@gov.harvard.edu]
> *Sent:* Tuesday, 13 October 2020 11:56 AM
> *To:* stuart.reece(a)bigpond.com
> *Cc:* amelia(a)lists.gking.harvard.edu; Gary King; James Honaker; Stuart
> Reece
> *Subject:* Re: Query on Amelia::combine.output()
>
>
>
> Hi Stuart,
>
>
>
> Ah, `combine.output()` actually just takes multiple Amelia runs that were
> done separately and combines them into one object, as if you didn't them
> all together. This is helpful if you want to run additional imputations
> after a first batch. Here is some code that will take Amelia output and
> create a stacked data frame of all imputations with a column for imputation
> numbers:
>
>
>
> library(Amelia)
> data(africa)
> imps <- 5
> a.out <- amelia(africa, cs = "country", ts = "year", m = imps)
>
> stacked_df <- do.call(rbind, a.out$imputations)
> stacked_df$imp_number <- rep(1:imps, each = nrow(africa))
>
>
>
> Having said all of that, you probably don't want to do this. Instead, you
> probably want to apply your analysis model to each of the imputed data sets
> and then combine the coefficients/model parameters using the Rubin rules
> described in the various Amelia papers.
>
>
>
> Cheers,
> Matt
>
>
>
> ~~~~~~~~~~~
>
> Matthew Blackwell
>
> Associate Professor of Government
>
> Harvard University
>
> url: http://www.mattblackwell.org
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mattblackwell.org&d…>
>
>
>
>
>
> On Fri, Oct 9, 2020 at 10:33 PM <stuart.reece(a)bigpond.com> wrote:
>
> Hi Amelia Users.
>
>
>
> I am running a Windows computer i9-9900K CPU 3.6GHz, 64MB RAM, 64-bit
> system.
>
>
>
> I have the R Studio 1.3.1093 based on R 4.0.2, just re-installed today.
>
>
>
> Amelia works on my data and runs models very nicely. The parallel
> routines work really well, which I very much appreciate on my 16 CPU’s.
>
>
>
> However I use complex geospatial models and would love to model the
> complete imputed geospatial data in R::splm.
>
>
>
> So combining all the imputations into one df would be a fantastic
> assistance.
>
>
>
> I think combine.output should do this very nicely.
>
>
>
> I think the syntax for combine.output is probably like that of ameliabind
> – really simple…. Can’t find the syntax online….
>
>
>
> But whenever I run combine.output – with whatever syntax – I always get
> the same error message which reads:
>
>
>
> CombAmelia1616 <- Amelia::combine.output(a.r.CS.Raw.ETOPFA.LIR.02.16,
> a.r.CS.Raw.ETOPFA.LIR.02.16e)
>
> Error: 'combine.output' is not an exported object from 'namespace:Amelia'
>
>
>
> I was wondering please if something is wrong??
>
>
>
> Also – could someone please confirm that the correct syntax for
> combine.output is the same as ameliabind – super simple????
>
>
>
> Thanks so much,
>
>
>
> Stuart Reece.
>
>
>
>
>
>
>
>
Hi Amelia Users,
I was also wondering how one makes graphs from iterative Amelia datasets???
Or even changes a variable onto a factor across all datasets???
Or is this best done prior to running Amelia??
I mostly use ggplot2.
Thanks again,
Stuart Reece.
Hi Stuart,
Probably the most straightforward way to do this would be to apply PCA to
each of the imputed data sets and then use those in whatever analysis
models you want. As an alternative, you could use the stacked dataset of
all imputation (see my earlier email) and run PCA giving each of the rows
of the stacked data (1/m) weight where m is the number of imputed datasets.
This would ensure that all of the imputed data sets use the same factor
loadings.
Cheers,
Matt
~~~~~~~~~~~
Matthew Blackwell
Associate Professor of Government
Harvard University
url: http://www.mattblackwell.org
On Fri, Oct 9, 2020 at 10:38 PM <stuart.reece(a)bigpond.com> wrote:
> Hi Amelia Users.
>
>
>
> I was wondering if anyone would advise how I can add principal components
> to imputed datasets – and how to correctly combine them from all the
> imputations ???
>
>
>
> I was not able to find anything on this online….
>
>
>
> Thanks so much,
>
>
>
> Stuart Reece.
>
Hi Stuart,
Ah, `combine.output()` actually just takes multiple Amelia runs that were
done separately and combines them into one object, as if you didn't them
all together. This is helpful if you want to run additional imputations
after a first batch. Here is some code that will take Amelia output and
create a stacked data frame of all imputations with a column for imputation
numbers:
library(Amelia)
data(africa)
imps <- 5
a.out <- amelia(africa, cs = "country", ts = "year", m = imps)
stacked_df <- do.call(rbind, a.out$imputations)
stacked_df$imp_number <- rep(1:imps, each = nrow(africa))
Having said all of that, you probably don't want to do this. Instead, you
probably want to apply your analysis model to each of the imputed data sets
and then combine the coefficients/model parameters using the Rubin rules
described in the various Amelia papers.
Cheers,
Matt
~~~~~~~~~~~
Matthew Blackwell
Associate Professor of Government
Harvard University
url: http://www.mattblackwell.org
On Fri, Oct 9, 2020 at 10:33 PM <stuart.reece(a)bigpond.com> wrote:
> Hi Amelia Users.
>
>
>
> I am running a Windows computer i9-9900K CPU 3.6GHz, 64MB RAM, 64-bit
> system.
>
>
>
> I have the R Studio 1.3.1093 based on R 4.0.2, just re-installed today.
>
>
>
> Amelia works on my data and runs models very nicely. The parallel
> routines work really well, which I very much appreciate on my 16 CPU’s.
>
>
>
> However I use complex geospatial models and would love to model the
> complete imputed geospatial data in R::splm.
>
>
>
> So combining all the imputations into one df would be a fantastic
> assistance.
>
>
>
> I think combine.output should do this very nicely.
>
>
>
> I think the syntax for combine.output is probably like that of ameliabind
> – really simple…. Can’t find the syntax online….
>
>
>
> But whenever I run combine.output – with whatever syntax – I always get
> the same error message which reads:
>
>
>
> CombAmelia1616 <- Amelia::combine.output(a.r.CS.Raw.ETOPFA.LIR.02.16,
> a.r.CS.Raw.ETOPFA.LIR.02.16e)
>
> Error: 'combine.output' is not an exported object from 'namespace:Amelia'
>
>
>
> I was wondering please if something is wrong??
>
>
>
> Also – could someone please confirm that the correct syntax for
> combine.output is the same as ameliabind – super simple????
>
>
>
> Thanks so much,
>
>
>
> Stuart Reece.
>
>
>
>
>
>
>