Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Aweight vs. fweight vs. pweight

    Dear All,
    I am trying to estimate a treatment effect using an aggregated difference-in-difference linear regression. I have collapsed the panel from an individual level panel to treated and control (2 groups only) groups. The population size of the treated and control units are drastically different. I believe I should weight my regression with the population size to control for this. But I am not sure how to incorporate the population size as the weight? Would population size be an aweight/ fweight/ pweight?

    Many thanks,
    Sumedha.

  • #2
    It would definitely not be a -pweight-.

    Whether it would be an aweight or an fweight depends on exactly how you -collapsed- your data. Please show a sample of the original data, using the -dataex- command, and the exact code you used to collapse the data, and your -xtset- command if you have used one. If you don't already have the -dataex- command, get it by running -ssc install dataex-, and then run -help dataex- to read the instructions for using it. Be sure to post the code used for collapsing the data between code delimiters (see FAQ #12 if you are not familiar with these) so things will be maximally readable.

    Comment


    • #3
      Here is a dataex example:

      Code:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(pat scriptnumber Milligrams treated male week)
       1 100 100 0 1 1
       1  55  10 0 1 1
       2  27  10 0 0 1
       2  54  25 0 0 2
       2  34  50 0 0 4
       3 961  25 0 1 3
       3  10  75 0 1 4
       3  51 100 0 1 5
       4  76 500 0 1 2
       5  23 350 0 0 4
       6   8  40 0 0 2
       6   2  65 0 0 3
       6 107  15 0 0 4
       6 321  25 0 0 5
       7  49  50 0 1 1
       8  40 600 1 1 1
       8  28 100 1 1 2
       8  44  50 1 1 5
       9  85  10 1 0 1
      10 111  25 1 0 5
      end
      Then I try to identify unique patients each week:

      Code:
      *****************************************
      * number of unique patients per month
      *****************************************
      
         by treated week pat, sort: gen nvals1 = _n == 1
         gen pats=pat
         replace pats=. if nvals==0
         drop nvals
      Then I try to collapse the data to create counts, sums and means of different variables:
      Code:
      gen MME=Milligrams
      collapse (count) pats male scriptnumber ///
                    (mean)  MME ///
                    (sum) Milligrams , by(treated week)
      Then I try to run the diff-in-diff regression on the collapsed data:
      Code:
      gen post=1 if week>2
      recode post .=0
      
      gen did=1 if (week>2 & treated==1)
      recode did .=0
      
      
      eststo: reg MME did i.treated post i.week c.week#c.treated  male  [aweight=pats], cluster(treated)
      eststo: reg Milligrams did i.treated post i.week c.week#c.treated male [aweight=pats], cluster(treated)
      Thank you so much for your help.

      Comment


      • #4
        OK. Where the outcome is MME, [aweight = pats] is correct because MME is in fact the mean of pats observations, and the increased weight assigned as pats increases appropriate reflects the decreasing sampling error of MME.

        Where the outcome is Milligrams, it is incorrect, because Milligrams is a sum, not a mean of pats observations. In fact, using [aweight = pats] here actually drives things in the wrong direction! The sampling error of Milligrams actually increases as pats increases, so assigning greater weight to observations with higher pats serves to increase rather than decrease the heteroskedasticity of the data and decrease the efficiency of the model. You need a different model here. Consider Poisson or negative binomial for this one.

        Comment


        • #5
          Hi,Prefessor Clyde

          I have survey data. It's collected through stratified sampling method. I set a weight which means the inverse of the probability of the observation is included. Therefore,when I calculate the mean or run regression, I should use "pweight". But pweight can't be used to calculate standard deviation, then what should I do to calculate the standard deviation? (I use "collapse" to calculate mean\median\sd)
          Thank you!

          Comment

          Working...
          X