pweight and kernel density

Zahid Khan

Join Date: Mar 2019

Posts: 16
#1

pweight and kernel density

04 Apr 2020, 05:14

Hi all,

I am trying to get kernel density graphs but i am unable to get these with adjustment of sample weights which are pweight in my case. I have gone through many readings including (1) https://www.stata.com/statalist/arch.../msg01383.html which is suggesting that the use of aweight is equivalent to pweight so one can use aweight because stata does not allow the use of pweight for kdensity estimate.

The other thing I found somewhat relevant is (2) https://www.stata.com/statalist/arch.../msg00426.htmlbut I think the solution suggested in this will not apply to my case because the weights in my case are probability weights whereas here in this the weights are frequency weights.
Here (3) https://www.stata.com/statalist/arch.../msg00530.html is yet another one which seems quite related to my problem but I am unable to understand in this how can I get this capital W in MY_W in this code

Code:

kdensity income [aw=MY_W]

should I alter my weight variable for using aweight as suggested in 3rd reading I mentioned above ( if yes, then how can I alter my weight variable because I am unable to clearly understand the reading mentioned) or should I just use aweight as suggested in the first reading I mentioned above. Or is there any other way to adjust pweight while getting kdensity graph?
Many thanks in advance.
Tags: None
FernandoRios

Join Date: Apr 2014

Posts: 2431
#2

04 Apr 2020, 05:50

Hi Zahid
One thing you should keep in mind to understand the suggestions from the post you cited is to know what kdensity does.
In essence, kdensity estimates weighted averages of some transformation on your variable of interest. In specific, it uses a kernel function as transformation.
So, for each point of reference (kdensity uses 50 points of reference by default if im not mistaken) it estimates:

Code:

gen kfden=normalden(income, point of reference, bandwidth) sum kfden [aw=weight] gen fden=r(mean) if income== point of reference.

So, being a simple weighted average, all weights (except for pweights) can be used, but ALL will give you the same point estimate.
Perhaps the only case where things may vary is on the estimation of the bandwidth. If you look at the pdf manual, the bandwidth becomes a function of the sample size.
when aweights are used, the weighted "sample size" remains the same, but when fweights or iweights are used, the weighted sample size increases, which may give you very small bandwidths.

Bottom line is that kdensity does not allow for pweights because it is a simple weighted average estimator, for which a simple aweights will provide you with the estimates you need.
HTH
Fernando
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35446
#3

04 Apr 2020, 06:24

Not the question, but I have never seen income distributions that weren't better treated on logarithmic scale. Particular worry should be focused on estimates for very low incomes -- and for very high incomes too.
Comment

Announcement

pweight and kernel density

Comment

Comment