Aweight vs. fweight vs. pweight

Sumedha Gupta

Join Date: May 2016

Posts: 289
#1

Aweight vs. fweight vs. pweight

23 May 2017, 20:45

Dear All,
I am trying to estimate a treatment effect using an aggregated difference-in-difference linear regression. I have collapsed the panel from an individual level panel to treated and control (2 groups only) groups. The population size of the treated and control units are drastically different. I believe I should weight my regression with the population size to control for this. But I am not sure how to incorporate the population size as the weight? Would population size be an aweight/ fweight/ pweight?

Many thanks,
Sumedha.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#2

23 May 2017, 22:24

It would definitely not be a -pweight-.

Whether it would be an aweight or an fweight depends on exactly how you -collapsed- your data. Please show a sample of the original data, using the -dataex- command, and the exact code you used to collapse the data, and your -xtset- command if you have used one. If you don't already have the -dataex- command, get it by running -ssc install dataex-, and then run -help dataex- to read the instructions for using it. Be sure to post the code used for collapsing the data between code delimiters (see FAQ #12 if you are not familiar with these) so things will be maximally readable.
Comment

Sumedha Gupta

Join Date: May 2016
Posts: 289

24 May 2017, 08:01

Here is a dataex example:

Code:


	Code:
	* Example generated by -dataex-. To install: ssc install dataex
clear
input float(pat scriptnumber Milligrams treated male week)
 1 100 100 0 1 1
 1  55  10 0 1 1
 2  27  10 0 0 1
 2  54  25 0 0 2
 2  34  50 0 0 4
 3 961  25 0 1 3
 3  10  75 0 1 4
 3  51 100 0 1 5
 4  76 500 0 1 2
 5  23 350 0 0 4
 6   8  40 0 0 2
 6   2  65 0 0 3
 6 107  15 0 0 4
 6 321  25 0 0 5
 7  49  50 0 1 1
 8  40 600 1 1 1
 8  28 100 1 1 2
 8  44  50 1 1 5
 9  85  10 1 0 1
10 111  25 1 0 5
end

Then I try to identify unique patients each week:

Code:

*****************************************
* number of unique patients per month
*****************************************

   by treated week pat, sort: gen nvals1 = _n == 1
   gen pats=pat
   replace pats=. if nvals==0
   drop nvals

Then I try to collapse the data to create counts, sums and means of different variables:

Code:

gen MME=Milligrams
collapse (count) pats male scriptnumber ///
              (mean)  MME ///
              (sum) Milligrams , by(treated week)

Then I try to run the diff-in-diff regression on the collapsed data:

Code:

gen post=1 if week>2
recode post .=0

gen did=1 if (week>2 & treated==1)
recode did .=0


eststo: reg MME did i.treated post i.week c.week#c.treated  male  [aweight=pats], cluster(treated)
eststo: reg Milligrams did i.treated post i.week c.week#c.treated male [aweight=pats], cluster(treated)

Thank you so much for your help.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#4

24 May 2017, 09:45

OK. Where the outcome is MME, [aweight = pats] is correct because MME is in fact the mean of pats observations, and the increased weight assigned as pats increases appropriate reflects the decreasing sampling error of MME.

Where the outcome is Milligrams, it is incorrect, because Milligrams is a sum, not a mean of pats observations. In fact, using [aweight = pats] here actually drives things in the wrong direction! The sampling error of Milligrams actually increases as pats increases, so assigning greater weight to observations with higher pats serves to increase rather than decrease the heteroskedasticity of the data and decrease the efficiency of the model. You need a different model here. Consider Poisson or negative binomial for this one.
2 likes
Comment
Stephen Wee

Join Date: Jul 2019

Posts: 9
#5

21 Feb 2020, 23:30

Hi,Prefessor Clyde

I have survey data. It's collected through stratified sampling method. I set a weight which means the inverse of the probability of the observation is included. Therefore,when I calculate the mean or run regression, I should use "pweight". But pweight can't be used to calculate standard deviation, then what should I do to calculate the standard deviation? (I use "collapse" to calculate mean\median\sd)
Thank you!
Comment

Announcement

Aweight vs. fweight vs. pweight

Comment

Comment

Comment

Comment