Hi,

I am trying to use the cem function in R followed by the k2k function, because I have a very large dataset with few treated (about 20,000) and roughly ten times as many not treated respondents (200,000). For this reason I would like to reduce the dataset so that each treated is matched to one non treated person who is similar with respect to certain control variables. However, the k2k function does not seem to work and trying out the vignette suggests that the problem is not specific to my application:

When I tried to run the simple example suggested in the vignette, k2k does not work either and of course following estimation commands simply reproduce the normal cem results after the k2k function is used unsuccessfully:

data(LL)
 
# cem match: automatic bin choice
mat <- cem(treatment="treated", data=LL, drop="re78")
mat
mat$k2k
 
# ATT estimate
att(mat, re78 ~ treated, data=LL)
 
 
# transform the match into k2k
mat2 <- k2k(mat, LL, "euclidean", 1)
mat2
mat2$k2k
 
# ATT estimate after k2k
att(mat2, re78 ~ treated, data=LL)


When I run it, I get an error message for the k2k() function: There were 50 or more warnings (use warnings() to see the first 50)

the warnings() function then tells 50 times:
1: In min(x, na.rm = TRUE) : no non-missing arguments to min; returning Inf
50: In min(x, na.rm = TRUE) : no non-missing arguments to min; returning Inf

I also exchanged "euclidean" for method = NULL, and here the 50 warnings read similar:
1: In min(x, na.rm = TRUE) : no non-missing arguments to min; returning Inf
...


The concrete output is:

> data(LL)
>  
> # cem match: automatic bin choice
> mat <- cem(treatment="treated", data=LL, drop="re78")
> mat
           G0  G1
All       425 297
Matched   222 163
Unmatched 203 134

> mat$k2k
[1] FALSE
>  
> # ATT estimate
> att(mat, re78 ~ treated, data=LL)

           G0  G1
All       425 297
Matched   222 163
Unmatched 203 134

Linear regression model on CEM matched data:

SATT point estimate: 550.962564 (p.value=0.368242)
95% conf. interval: [-647.777701, 1749.702830]

>  
>  
> # transform the match into k2k
> mat2 <- k2k(mat, LL, "euclidean", 1)
There were 50 or more warnings (use warnings() to see the first 50)
> mat2
           G0  G1
All       425 297
Matched   222 163
Unmatched 203 134

> mat2$k2k
[1] FALSE
>  
> # ATT estimate after k2k
> att(mat2, re78 ~ treated, data=LL)

           G0  G1
All       425 297
Matched   222 163
Unmatched 203 134

Linear regression model on CEM matched data:

SATT point estimate: 550.962564 (p.value=0.368242)
95% conf. interval: [-647.777701, 1749.702830]



I also had a friend trying out the vignette on his macbook and he had the same results. I have a macbook too, if that is of any importance.

I would be very glad for help with this problem.

Kind regards,


Merlin


-- 
Dr. Merlin Schaeffer

Wissenschaftszentrum Berlin für Sozialforschung (WZB)
Department "Migration, Integration, Transnationalization"

Reichpietschufer 50
10785 Berlin, Germany
Phone: + 49 30 25491-459
Fax: + 49 30 25491-452

http://www.wzb.eu/en/persons/merlin-schaeffer