Hi Stuart,
Ah, `combine.output()` actually just takes multiple Amelia runs that were
done separately and combines them into one object, as if you didn't them
all together. This is helpful if you want to run additional imputations
after a first batch. Here is some code that will take Amelia output and
create a stacked data frame of all imputations with a column for imputation
numbers:
library(Amelia)
data(africa)
imps <- 5
a.out <- amelia(africa, cs = "country", ts = "year", m = imps)
stacked_df <- do.call(rbind, a.out$imputations)
stacked_df$imp_number <- rep(1:imps, each = nrow(africa))
Having said all of that, you probably don't want to do this. Instead, you
probably want to apply your analysis model to each of the imputed data sets
and then combine the coefficients/model parameters using the Rubin rules
described in the various Amelia papers.
Cheers,
Matt
~~~~~~~~~~~
Matthew Blackwell
Associate Professor of Government
Harvard University
url: http://www.mattblackwell.org
On Fri, Oct 9, 2020 at 10:33 PM <stuart.reece(a)bigpond.com> wrote:
> Hi Amelia Users.
>
>
>
> I am running a Windows computer i9-9900K CPU 3.6GHz, 64MB RAM, 64-bit
> system.
>
>
>
> I have the R Studio 1.3.1093 based on R 4.0.2, just re-installed today.
>
>
>
> Amelia works on my data and runs models very nicely. The parallel
> routines work really well, which I very much appreciate on my 16 CPU’s.
>
>
>
> However I use complex geospatial models and would love to model the
> complete imputed geospatial data in R::splm.
>
>
>
> So combining all the imputations into one df would be a fantastic
> assistance.
>
>
>
> I think combine.output should do this very nicely.
>
>
>
> I think the syntax for combine.output is probably like that of ameliabind
> – really simple…. Can’t find the syntax online….
>
>
>
> But whenever I run combine.output – with whatever syntax – I always get
> the same error message which reads:
>
>
>
> CombAmelia1616 <- Amelia::combine.output(a.r.CS.Raw.ETOPFA.LIR.02.16,
> a.r.CS.Raw.ETOPFA.LIR.02.16e)
>
> Error: 'combine.output' is not an exported object from 'namespace:Amelia'
>
>
>
> I was wondering please if something is wrong??
>
>
>
> Also – could someone please confirm that the correct syntax for
> combine.output is the same as ameliabind – super simple????
>
>
>
> Thanks so much,
>
>
>
> Stuart Reece.
>
>
>
>
>
>
>
Hi Nick, the way to think about it is that if you omit a variable, you're
implicitly including the variable and restricting its coefficient (in any
regression) to 0, unless that relationship is picked up by other variables
you do include.
Gary
--
*Gary King* - Albert J. Weatherhead III University Professor - Director,
IQSS <http://iq.harvard.edu/> - Harvard University
GaryKing.org <http://garyking.org/> - King(a)Harvard.edu - @KingGary
<https://twitter.com/kinggary> - 617-500-7570 - Assistant
<king-assist(a)iq.harvard.edu>: 617-495-9271
On Mon, Jul 13, 2020 at 7:28 PM Nick Eubank <nick(a)nickeubank.com> wrote:
> Hi All,
>
> I'm working with panel data (state-years) and a difference-in-difference
> design. I'm looking to fill some values in the dependent variable time
> series.
>
> My main specification is thus just two-way fixed effects (state FEs and
> year FEs) and a treatment variable.
>
> I've gotten the best results modeling the missing data by running a local
> polynomial regression (a la Honaker and King 2010) for each state, and then
> using those predicted values as the predictor in Amelia. (i.e. run I run a
> local polynomial of my DV against time for state S, fill those values in to
> my predictor variable for state S, repeat for all states)
>
> Everything I've read says I should definitely include all variables I plan
> to use in my analysis in Amelia, but I worry about failing to meet
> multi-variate normality conditions with the FEs, and running the model with
> and without them, I'm not sure they're adding much.
>
> Do I *need* to include all those FEs (i.e. will I introduce some weird
> bias in my subsequent analysis if I don't)? And if I do, is there anything
> I can do to deal with them definitely not being multi-variate normal (or do
> I not need to worry about that)?
>
> Thanks!
>
> Nick
>
Hi All,
I'm working with panel data (state-years) and a difference-in-difference
design. I'm looking to fill some values in the dependent variable time
series.
My main specification is thus just two-way fixed effects (state FEs and
year FEs) and a treatment variable.
I've gotten the best results modeling the missing data by running a local
polynomial regression (a la Honaker and King 2010) for each state, and then
using those predicted values as the predictor in Amelia. (i.e. run I run a
local polynomial of my DV against time for state S, fill those values in to
my predictor variable for state S, repeat for all states)
Everything I've read says I should definitely include all variables I plan
to use in my analysis in Amelia, but I worry about failing to meet
multi-variate normality conditions with the FEs, and running the model with
and without them, I'm not sure they're adding much.
Do I *need* to include all those FEs (i.e. will I introduce some weird bias
in my subsequent analysis if I don't)? And if I do, is there anything I can
do to deal with them definitely not being multi-variate normal (or do I not
need to worry about that)?
Thanks!
Nick
Dear all,
I have installed AmeliaView, but it is presenting a bug. When I start it, a
terminal screen appears and closes quickly. I tried to run as
administrator, and it showed an error dialog, saying "Can not find script
file C:\WINDOWS\System32\amelia.vbs." I tried to copy such file from
AmeliaView's folder to the System32, and it returned another error message
"Script: C:\WINDOWS\System32\amelia.vbs
Line: 5
Char: 1
Error: File not found
Code: 800A0035
Source: Microsoft VBScript runtime error"
Could someone help me to solve this?
Thanks in advance!
Gustavo.
--
*|MSc. Gustavo Simões Libardi - "Napister"*
*|*Biólogo e Mestre em Ciências (Universidade de São Paulo/Brasil)
*|*CV Lattes: http://lattes.cnpq.br/8451514538020691
|CRBio: 72563/01-D
Hi Matthew, a few notes below...
Gary
--
*Gary King* - Albert J. Weatherhead III University Professor - Director,
IQSS <http://iq.harvard.edu/> - Harvard University
GaryKing.org <http://garyking.org/> - King(a)Harvard.edu - @KingGary
<https://twitter.com/kinggary> - 617-500-7570 - Assistant
<king-assist(a)iq.harvard.edu>: 617-495-9271
On Mon, May 25, 2020 at 7:29 PM Matthew Simonson <
simonson.m(a)northeastern.edu> wrote:
> Hello Amelia Team,
>
> Three questions about a dataset I'm working with.
>
> 1) My dataset consists of a 10-wave survey in which some survey questions
> were only asked in the final 2 waves. As a block matrix it looks like this
>
> A M
> B C
>
> where all values in block M are structurally missing and not of interest.
> I want to impute the missing values in blocks A, B, and C. I could
> either a) run amelia on the full dataset, b) split it into two datasets,
> one with A and B, the other with only C, or c) split it into two
> overlapping datasets, one with A and B, the other with B and C. I'm
> hesitant to use the full dataset because including block M increases the
> missingness from 25% to 55%, but I don't know if there are theoretical
> objections to the other two approaches. What do you suggest?
>
There isn't a correct answer here, but your suggestions are reasonable. I'd
mainly just make sure that the imputation model fits the data. that will
enable you to pick an approach. (You might also have a look at this
<https://gking.harvard.edu/files/abs/not-abs.shtml> somewhat related paper,
but I'd still use Amelia rather than coding it up separately.)
>
> 2) In my analysis, I examine the interaction between treatment and
> ideology. I run treatment*(ideology>4) in one regression, but I also try other
> models with ideology>5, 6, etc. to see if the cutoff makes a difference.
> For Amelia, would it be sufficient to include a continuous
> treatment*ideology column in my data, or do I need to dichotomize this
> column in Amelia as well (and hence run Amelia multiple times, once for
> each cutoff)?
>
you definitely want to include at least as much information in the
imputation model as in your analysis model.
>
>
> 3) In order to get bootstrapped confidence intervals in my analysis, I
> bootstrap the original data 1000 times and then run Amelia (with m=5) on
> each bootstrapped dataset before analyzing. Although the original dataset
> works just fine, the bootstrapped versions throw errors about 1/3 of the
> time: first a few hundred "chol(): given matrix is not symmetric" warnings
> followed by a "inv_sympd(): matrix is singular or not positive definite"
> error. Usually all 5 imputations fail for a given bootstrapped data set,
> but sometimes only some of them do. Suggestions?
>
i'd hunt this down. i'm guessing that you have included some dummy
variables that don't exclude the baseline (so the all sum to 1), or
something close to that. you probably have either too small an n or
perfect collinearity. i'd track this down since it might be a data error
that affects everything else too.
Best of luck with your work.
Gary
>
> Thanks,
>
> Matthew Simonson
> Doctoral Student, Northeastern University, Boston
> * The COVID-19 Consortium for Understanding the Public’s Policy
> Preferences Across States *
> *Research Areas: Networks, Civil Wars, COVID-19, Causal Inference*
> *www.msimonson.com <http://www.msimonson.com/>*
> *www.covidstates.org <http://www.covidstates.org>*
>
>
>
Hi All,
For my analysis, I have a number of event count variables (22) I want to
add up to make a composite.
My original plan was to Square root transform the counts during imputation.
Then add them up to make the composite after imputation.
But alternatively, I could add them up prior to imputation, and then root
transform them during imputation.
Which is better? (Sorry if it’s a bad question I have little experience
with MI)
--
Brandon McCormick
Doctoral Student
Clinical Psychology - Psychology and Law
The University of Alabama <https://www.ua.edu/>
101 McMillan
Tuscaloosa, AL 35401
Phone 205-460-8678
bfmccormick(a)crimson.ua.edu
[image: The University of Alabama stacked logo with box A]
<https://www.ua.edu/>
Hello Amelia Team,
Three questions about a dataset I'm working with.
1) My dataset consists of a 10-wave survey in which some survey questions were only asked in the final 2 waves. As a block matrix it looks like this
A M
B C
where all values in block M are structurally missing and not of interest. I want to impute the missing values in blocks A, B, and C. I could either a) run amelia on the full dataset, b) split it into two datasets, one with A and B, the other with only C, or c) split it into two overlapping datasets, one with A and B, the other with B and C. I'm hesitant to use the full dataset because including block M increases the missingness from 25% to 55%, but I don't know if there are theoretical objections to the other two approaches. What do you suggest?
2) In my analysis, I examine the interaction between treatment and ideology. I run treatment*(ideology>4) in one regression, but I also try other models with ideology>5, 6, etc. to see if the cutoff makes a difference. For Amelia, would it be sufficient to include a continuous treatment*ideology column in my data, or do I need to dichotomize this column in Amelia as well (and hence run Amelia multiple times, once for each cutoff)?
3) In order to get bootstrapped confidence intervals in my analysis, I bootstrap the original data 1000 times and then run Amelia (with m=5) on each bootstrapped dataset before analyzing. Although the original dataset works just fine, the bootstrapped versions throw errors about 1/3 of the time: first a few hundred "chol(): given matrix is not symmetric" warnings followed by a "inv_sympd(): matrix is singular or not positive definite" error. Usually all 5 imputations fail for a given bootstrapped data set, but sometimes only some of them do. Suggestions?
Thanks,
Matthew Simonson
Doctoral Student, Northeastern University, Boston
The COVID-19 Consortium for Understanding the Public’s Policy Preferences Across States
Research Areas: Networks, Civil Wars, COVID-19, Causal Inference
www.msimonson.com<http://www.msimonson.com/>
www.covidstates.org
Hi Matt,
Thanks again for your help. I am running another issue with Amelia
imputation and I hope you can point me to the right direction.
I am using Amelia to impute historical CCP party membership and
population at the county level. I have used the following Amelia
imputation comments, which has specified the TSCS structure and added
lags and leads as well as the squared terms to improve the imputation
results.
a.out=amelia(data, m = 10, idvars = c("province", "prefecture",
"county", "prov_id", "pref_id", "sgn_base2", "js_base2", "jcj_base2",
"base1", "base2"), ts = "year", cs = "county_id", empri = 100, lags =
c("population", "ccp_member"), leads = c("population", "ccp_member"),
logs = c("population", "ccp_member","population_sq", "ccp_member_sq",
"ccp_branch", "ccp_comission","ccp_branch_sq", "ccp_comission_sq"),
polytime=2, incheck = TRUE, max.resample = 1000, tolerance = 0.01)
However, the imputed data has very high variance across different years,
which defy the patterns that we should have observed in non-missing data.
For example, the original party membership with missing value sand the
imputed values for one county looks like the following:
Year Original Imputed
1938 75 75
1939 588 588
1940 25.15989
1941 29.70148
1942 17.14454
1943 5.593282
1944 35.50387
1945 288.5483
1946 248.8093
1947 82.34441
1948 124.5035
1949 358 358
What worries me is that the imputed party membership jumps up and down
so much between 1940 and 1948. I have seen other counties with complete
time trend data, I don't think the imputed values in this case are in
any way close to the reality. What I can do to smooth the time trend for
the imputed values? Any suggestions would be much appreciated.
Meanwhile, I have also an unrelated question: how can I only analyze a
subset of the imputed data? The "subset" command seems to be
incompatible with imputed dataset, as I cannot use the following command
"a.out.1945 <- subset(a.out$imputations$year<=1945)"
Thanks a ton in advance!
Best,
Xiaobo
--
Xiaobo Lü
Associate Professor
Department of Government
University of Texas at Austin
Tel: (512) 232-7257
Fax: (512) 471-1061
Website: www.xiaobolu.com
Hi there,
This may sound like a stupid question so I hope someone can help me out
there.
I have created five imputed dataset using Amelia II in R, and I am using
Zelig to analyze the data. I would like to create a new variable based
on the imputed variables in these imputed datasets. How can I do this
because the data is in "a.out"?
Thanks a ton in advance!
Best,
Xiaobo
When exactly one case is missing in a data column, plot.amelia includes
that column in those to be plotted. However, if compare=TRUE (the
default), the call to compare.density() fails with the fatal error
message "need at least 2 points to select a bandwidth automatically".
Now that I know, I can work around it; but either the case could be
avoided or Amelia could provide its own message to be more informative?
Cheers, Paul.