I am familiarizing myself with CEM using R.
I've used spacegraph and cem so far, I notice that when the bins that are chosen
automatically; the bins chosen always start and begin at the smallest and largest values
of the data and only the number of divisions is changed. Why is it that other variations
of bin configurations are not explored?
Setting aside bin configurations that are not equally spaced, but have you considered
variations of bin configurations that translate about, not necessarily beginning and
ending at the limits of the range, but still cover the data? Like so.
MakeRndmBins <- function(data,n){
min=min(data)
max=max(data)
outVect=array(dim=n+3) # vector indices 1:n+2 will define
outVect[1]=min # bins from min to max with n+1 bins
outVect[n+2]=max # or n divisions
caliper=(max-min)/(n+1) # caliper or bin width
for (i in 1:n ){
outVect[i+1]=caliper*i+min
}
outVect[n+3] = max + caliper # create one extra bin
# on the right side
outVect = outVect - runif(1,0,caliper) #perform random
# translation to the left
return(outVect)
}
MakeRndmBins(c(17,55),3)
[1] 15.08076 24.58076
34.08076 43.58076 53.08076 62.58076
MakeRndmBins(c(0,1),3)
[1] -0.06772855
0.18227145 0.43227145 0.68227145 0.93227145 1.18227145
MakeRndmBins(c(0,1),3)
[1] -0.1023989 0.1476011
0.3976011 0.6476011 0.8976011 1.1476011
MakeRndmBins(c(0,1),3)
[1] -0.04485268
0.20514732 0.45514732 0.70514732 0.95514732 1.20514732
One might argue this is exactly like changing the size of the first and last bins.