Ebalance - weights above 1, how to force to be below?

Merve Kucuk

Join Date: Dec 2018

Posts: 39
#1

Ebalance - weights above 1, how to force to be below?

03 Mar 2020, 19:14

Hi,

I am doing an analysis on the effect of disasters on income and am using tax data, which I can only access through a virtual desktop, hence I cannot share an extract/example dataset. I am employing standard difference in difference with pooled, propensity score matching and entropy balancing. With propensity score matching I am using nearest neighbour, so there is no problem. However with ebalance my control group has one third observations as my treatment group and the weights are all above 1. Is there a function/addition with which I can force ebalance to only assign weights under 1?

Thank you very much.
Merve
Tags: None

Scott Merryman

Join Date: Mar 2014
Posts: 895

03 Mar 2020, 20:05

-ebalance- in on SSC.

Can you not simply rescale the weights by the maximum value? For example:

Code:

. webuse cattaneo2,clear
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)

. qui ebalance mbsmoke bweight mage fage

. tabstat _w, by(mbsmoke) stat(min mean max)

Summary for variables: _webal
     by categories of: mbsmoke (1 if mother smoked)

  mbsmoke |       min      mean       max
----------+------------------------------
nonsmoker |  .0398066  .2286924  2.019622
   smoker |         1         1         1
----------+------------------------------
    Total |  .0398066  .3722533  2.019622
-----------------------------------------

. tabstat bweight mage fage [aw = _w], by(mbsmoke)

Summary statistics: mean
  by categories of: mbsmoke (1 if mother smoked)

  mbsmoke |   bweight      mage      fage
----------+------------------------------
nonsmoker |  3137.825  25.16858  24.74609
   smoker |   3137.66  25.16667  24.74306
----------+------------------------------
    Total |  3137.742  25.16762  24.74457
-----------------------------------------

. sum _w if mbsmoke == 0

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      _webal |      3,778    .2286924    .1456553   .0398066   2.019622

. replace _w = _w/r(max) if mbsmoke == 0
(3,778 real changes made)

. tabstat _w, by(mbsmoke) stat(min mean max)

Summary for variables: _webal
     by categories of: mbsmoke (1 if mother smoked)

  mbsmoke |       min      mean       max
----------+------------------------------
nonsmoker |  .0197099  .1132352         1
   smoker |         1         1         1
----------+------------------------------
    Total |  .0197099  .2782858         1
-----------------------------------------

. tabstat bweight mage fage [aw = _w], by(mbsmoke)

Summary statistics: mean
  by categories of: mbsmoke (1 if mother smoked)

  mbsmoke |   bweight      mage      fage
----------+------------------------------
nonsmoker |  3137.825  25.16858  24.74609
   smoker |   3137.66  25.16667  24.74306
----------+------------------------------
    Total |  3137.714   25.1673  24.74406
-----------------------------------------

Comment

Merve Kucuk

Join Date: Dec 2018

Posts: 39
#3

03 Mar 2020, 20:23

My problem is not exactly the weights being less than one. If that solely would be the issue then the proposed solution would solve it. The problem is that while the matching is being done, if the function ebalance is allowed to use weights above 1, then for a treatment group of lets say 2,000,000 I will have for example around 700,000 observations/individuals in the control group. However, what I want is to restrict the ebalance function so that at the end the number of individuals in the control group will be greater than or equal to the treatment group. I hope I managed to express myself more clearly this time.. I have been recommended to do some online search on how to restrict ebalance or psmatch2 functions used for matching, but I could not find anything which is why I wanted to ask here..
Comment

Scott Merryman

Join Date: Mar 2014
Posts: 895

04 Mar 2020, 20:24

I guess one way would be to take 700,000 samples from the treatment group, run -ebalance- and then rescale the weights.

Code:

webuse cattaneo2,clear
tab mbsmoke
sample 864, by(mbsmoke) count
qui ebalance mbsmoke  mage fage
sum _w if mbsmoke == 0
replace _w = _w/r(max) if mbsmoke == 0
tabstat mage fage [aw = _w], by(mbsmoke)

I am not too familiar with Leuven and Sianesi's -psmatch2- (ssc desc psmatch2) but another way would be to use Gary King's coarsened exact matching (ssc desc cem) with the k2k option so that both treated and control groups will have the same number of observations. And then, as before, use -ebalance- to derive the weights.

Code:

.  webuse cattaneo2,clear
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)

. cem fage mage, treat(mbsmoke) k2k
(using the scott break method for imbalance)

Matching Summary:
-----------------
Number of strata: 106
Number of matched strata: 74

              0     1
      All  3778   864
  Matched   850   850
Unmatched  2928    14

Announcement