Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ebalance - weights above 1, how to force to be below?

    Hi,

    I am doing an analysis on the effect of disasters on income and am using tax data, which I can only access through a virtual desktop, hence I cannot share an extract/example dataset. I am employing standard difference in difference with pooled, propensity score matching and entropy balancing. With propensity score matching I am using nearest neighbour, so there is no problem. However with ebalance my control group has one third observations as my treatment group and the weights are all above 1. Is there a function/addition with which I can force ebalance to only assign weights under 1?

    Thank you very much.
    Merve

  • #2
    -ebalance- in on SSC.

    Can you not simply rescale the weights by the maximum value? For example:

    Code:
    . webuse cattaneo2,clear
    (Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)
    
    . qui ebalance mbsmoke bweight mage fage
    
    . tabstat _w, by(mbsmoke) stat(min mean max)
    
    Summary for variables: _webal
         by categories of: mbsmoke (1 if mother smoked)
    
      mbsmoke |       min      mean       max
    ----------+------------------------------
    nonsmoker |  .0398066  .2286924  2.019622
       smoker |         1         1         1
    ----------+------------------------------
        Total |  .0398066  .3722533  2.019622
    -----------------------------------------
    
    . tabstat bweight mage fage [aw = _w], by(mbsmoke)
    
    Summary statistics: mean
      by categories of: mbsmoke (1 if mother smoked)
    
      mbsmoke |   bweight      mage      fage
    ----------+------------------------------
    nonsmoker |  3137.825  25.16858  24.74609
       smoker |   3137.66  25.16667  24.74306
    ----------+------------------------------
        Total |  3137.742  25.16762  24.74457
    -----------------------------------------
    
    . sum _w if mbsmoke == 0
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
          _webal |      3,778    .2286924    .1456553   .0398066   2.019622
    
    . replace _w = _w/r(max) if mbsmoke == 0
    (3,778 real changes made)
    
    . tabstat _w, by(mbsmoke) stat(min mean max)
    
    Summary for variables: _webal
         by categories of: mbsmoke (1 if mother smoked)
    
      mbsmoke |       min      mean       max
    ----------+------------------------------
    nonsmoker |  .0197099  .1132352         1
       smoker |         1         1         1
    ----------+------------------------------
        Total |  .0197099  .2782858         1
    -----------------------------------------
    
    . tabstat bweight mage fage [aw = _w], by(mbsmoke)
    
    Summary statistics: mean
      by categories of: mbsmoke (1 if mother smoked)
    
      mbsmoke |   bweight      mage      fage
    ----------+------------------------------
    nonsmoker |  3137.825  25.16858  24.74609
       smoker |   3137.66  25.16667  24.74306
    ----------+------------------------------
        Total |  3137.714   25.1673  24.74406
    -----------------------------------------

    Comment


    • #3
      My problem is not exactly the weights being less than one. If that solely would be the issue then the proposed solution would solve it. The problem is that while the matching is being done, if the function ebalance is allowed to use weights above 1, then for a treatment group of lets say 2,000,000 I will have for example around 700,000 observations/individuals in the control group. However, what I want is to restrict the ebalance function so that at the end the number of individuals in the control group will be greater than or equal to the treatment group. I hope I managed to express myself more clearly this time.. I have been recommended to do some online search on how to restrict ebalance or psmatch2 functions used for matching, but I could not find anything which is why I wanted to ask here..

      Comment


      • #4
        I guess one way would be to take 700,000 samples from the treatment group, run -ebalance- and then rescale the weights.
        Code:
        webuse cattaneo2,clear
        tab mbsmoke
        sample 864, by(mbsmoke) count
        qui ebalance mbsmoke  mage fage
        sum _w if mbsmoke == 0
        replace _w = _w/r(max) if mbsmoke == 0
        tabstat mage fage [aw = _w], by(mbsmoke)
        I am not too familiar with Leuven and Sianesi's -psmatch2- (ssc desc psmatch2) but another way would be to use Gary King's coarsened exact matching (ssc desc cem) with the k2k option so that both treated and control groups will have the same number of observations. And then, as before, use -ebalance- to derive the weights.

        Code:
        .  webuse cattaneo2,clear
        (Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)
        
        . cem fage mage, treat(mbsmoke) k2k
        (using the scott break method for imbalance)
        
        Matching Summary:
        -----------------
        Number of strata: 106
        Number of matched strata: 74
        
                      0     1
              All  3778   864
          Matched   850   850
        Unmatched  2928    14

        Comment

        Working...
        X