I am familiarizing myself with CEM using R.

 

I’ve used  spacegraph and cem so far, I notice that when the bins that are chosen automatically; the bins chosen always start and begin at the smallest and largest values of the data and only the number of divisions is changed.  Why is it that other variations of bin configurations are not explored? 

 

Setting aside bin configurations that are not equally spaced, but have you considered variations of bin configurations that translate about, not necessarily beginning and ending at the limits of the range, but still cover the data?  Like so.

 

MakeRndmBins <- function(data,n){

  min=min(data)

  max=max(data)

  outVect=array(dim=n+3) # vector indices 1:n+2 will define

  outVect[1]=min         # bins from min to max with n+1 bins  

  outVect[n+2]=max       # or n divisions

  caliper=(max-min)/(n+1)  # caliper or bin width

  for (i in 1:n ){

    outVect[i+1]=caliper*i+min

  }

 

 

  outVect[n+3] = max + caliper # create one extra bin

                                # on the right side

  outVect = outVect - runif(1,0,caliper) #perform random

                                 # translation to the left

return(outVect)

}

 

> MakeRndmBins(c(17,55),3)

[1] 15.08076 24.58076 34.08076 43.58076 53.08076 62.58076

> MakeRndmBins(c(0,1),3)

[1] -0.06772855  0.18227145  0.43227145  0.68227145  0.93227145  1.18227145

> MakeRndmBins(c(0,1),3)

[1] -0.1023989  0.1476011  0.3976011  0.6476011  0.8976011  1.1476011

> MakeRndmBins(c(0,1),3)

[1] -0.04485268  0.20514732  0.45514732  0.70514732  0.95514732  1.20514732

 

One might argue this is exactly like changing the size of the first and last bins.