I am familiarizing myself with CEM using R.
I’ve used spacegraph and cem so far, I notice that when the bins that are chosen automatically; the bins chosen always start and begin at the smallest and largest values of the data and only the number of divisions is changed. Why is it that other variations of bin configurations are not explored?
Setting aside bin configurations that are not equally spaced, but have you considered variations of bin configurations that translate about, not necessarily beginning and ending at the limits of the range, but still cover the data? Like so.
MakeRndmBins <- function(data,n){
min=min(data)
max=max(data)
outVect=array(dim=n+3) # vector indices 1:n+2 will define
outVect[1]=min # bins from min to max with n+1 bins
outVect[n+2]=max # or n divisions
caliper=(max-min)/(n+1) # caliper or bin width
for (i in 1:n ){
outVect[i+1]=caliper*i+min
}
outVect[n+3] = max + caliper # create one extra bin
# on the right side
outVect = outVect - runif(1,0,caliper) #perform random
# translation to the left
return(outVect)
}
> MakeRndmBins(c(17,55),3)
[1] 15.08076 24.58076 34.08076 43.58076 53.08076 62.58076
> MakeRndmBins(c(0,1),3)
[1] -0.06772855 0.18227145 0.43227145 0.68227145 0.93227145 1.18227145
> MakeRndmBins(c(0,1),3)
[1] -0.1023989 0.1476011 0.3976011 0.6476011 0.8976011 1.1476011
> MakeRndmBins(c(0,1),3)
[1] -0.04485268 0.20514732 0.45514732 0.70514732 0.95514732 1.20514732
One might argue this is exactly like changing the size of the first and last bins.