Hi,
I am trying to use the cem function in R followed by the k2k function, because I have a
very large dataset with few treated (about 20,000) and roughly ten times as many not
treated respondents (200,000). For this reason I would like to reduce the dataset so that
each treated is matched to one non treated person who is similar with respect to certain
control variables. However, the k2k function does not seem to work and trying out the
vignette suggests that the problem is not specific to my application:
When I tried to run the simple example suggested in the vignette, k2k does not work either
and of course following estimation commands simply reproduce the normal cem results after
the k2k function is used unsuccessfully:
data(LL)
# cem match: automatic bin choice
mat <- cem(treatment="treated", data=LL, drop="re78")
mat
mat$k2k
# ATT estimate
att(mat, re78 ~ treated, data=LL)
# transform the match into k2k
mat2 <- k2k(mat, LL, "euclidean", 1)
mat2
mat2$k2k
# ATT estimate after k2k
att(mat2, re78 ~ treated, data=LL)
When I run it, I get an error message for the k2k() function: There were 50 or more
warnings (use warnings() to see the first 50)
the warnings() function then tells 50 times:
1: In min(x, na.rm = TRUE) : no non-missing arguments to min; returning Inf
…
50: In min(x, na.rm = TRUE) : no non-missing arguments to min; returning Inf
I also exchanged "euclidean" for method = NULL, and here the 50 warnings read
similar:
1: In min(x, na.rm = TRUE) : no non-missing arguments to min; returning Inf
...
The concrete output is:
data(LL)
# cem match: automatic bin choice
mat <- cem(treatment="treated", data=LL, drop="re78")
mat
G0 G1
All 425 297
Matched 222 163
Unmatched 203 134
mat$k2k
[1] FALSE
# ATT estimate
att(mat, re78 ~ treated, data=LL)
G0 G1
All 425 297
Matched 222 163
Unmatched 203 134
Linear regression model on CEM matched data:
SATT point estimate: 550.962564 (p.value=0.368242)
95% conf. interval: [-647.777701, 1749.702830]
# transform the match into k2k
mat2 <- k2k(mat, LL, "euclidean", 1)
There were 50 or more warnings
(use warnings() to see the first 50)
mat2
G0 G1
All 425 297
Matched 222 163
Unmatched 203 134
mat2$k2k
[1] FALSE
# ATT estimate after k2k
att(mat2, re78 ~ treated, data=LL)
G0 G1
All 425 297
Matched 222 163
Unmatched 203 134
Linear regression model on CEM matched data:
SATT point estimate: 550.962564 (p.value=0.368242)
95% conf. interval: [-647.777701, 1749.702830]
I also had a friend trying out the vignette on his macbook and he had the same results. I
have a macbook too, if that is of any importance.
I would be very glad for help with this problem.
Kind regards,
Merlin
--
Dr. Merlin Schaeffer
Wissenschaftszentrum Berlin für Sozialforschung (WZB)
Department "Migration, Integration, Transnationalization"
Reichpietschufer 50
10785 Berlin, Germany
Phone: + 49 30 25491-459
Fax: + 49 30 25491-452
http://www.wzb.eu/en/persons/merlin-schaeffer