Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastxtile: Binning observations when number of observations is less than the bin number

    I don't understand the algorithm behind fastxtile() when the number of observations is less than the bin number.
    I know one may question why do this in the first place, but there are instances when grouping by a certain variable may create groups that have less observations than the bin number itself.

    When tested with several bin numbers, the fastxtile() behaves in a seemingly regular pattern, but I can't seem to decipher what that pattern is.
    I used a code similar to the following:

    egen temp_bin = fastxtile(mv), by(time_var) n(5)

    The following are some sample cases when using a specific bin_number with a certain number of observations (num_obs) and the bins that the fastxtile() command categorized the observations into.

    bin_num=5:
    # num_obs=5: 1,2,3,4,5
    # num_obs=4: 1,2,3,4
    # num_obs=3: 1,2,4
    # num_obs=2: 1,3
    # num_obs=1: 1
    So, for example, in the above case, when there were 3 observations and bin_num=5, the observations were binned as bin 1, bin 2 and bin 4.

    bin_num=7:
    # num_obs=7: 1,2,3,4,5,6,7
    # num_obs=6: 1,2,3,4,5,6
    # num_obs=5: 1,2,3,5,6
    # num_obs=4: 1,2,4,6
    # num_obs=3: 1,3,5
    # num_obs=2: 1,4
    # num_obs=1: 1

    bin_num=9:
    # num_obs=9: 1,2,3,4,5,6,7,8,9
    # num_obs=8: 1,2,3,4,5,6,7,8
    # num_obs=7: 1,2,3,4,6,7,8
    # num_obs=6: 1,2,3,5,7,8
    # num_obs=5: 1,2,4,6,8
    # num_obs=4: 1,3,5,7
    # num_obs=3: 1,4,7
    # num_obs=2: 1,5
    # num_obs=1: 1

    I'd appreciate any form of feedback.

    Also, this is my first time posting a question here, so I'd be happy to clarify the question further if needed.
Working...
X