Finding Median for weighteds survey data

Wendi Elkins

Join Date: Dec 2021

Posts: 2
#1

Finding Median for weighteds survey data

21 Dec 2021, 13:40

Good afternoon,

I am using survey data and because I need to account for the sampling & pweights, I'm using svy command for my analysis. I am wondering how to correctly obtain the median values. Currently, I am using epctile (sample code below), but the median value I am obtaining is outside of the 95% CI range I get within the weighted mean in the svy command, even though it allows me to apply the pweights. This seems odd, though one option I thought would be to report the 95% CI of both the mean and median separately. Any insights would be incredibly welcome!

. svy, subpop(if analytical_pop==1 & first_cancer==wave): mean percent_asset_change1
(running mean on estimation sample)

Survey: Mean estimation

Number of strata = 51 Number of obs = 15,736
Number of PSUs = 102 Population size = 71,582,508
Subpop. no. obs = 263
Subpop. size = 892,926
Design df = 51

-----------------------------------------------------------------------
| Linearized
| Mean Std. Err. [95% Conf. Interval]
----------------------+------------------------------------------------
percent_asset_change1 | .0203422 .5888416 -1.161807 1.202491
-----------------------------------------------------------------------
Note: 20 strata omitted because they contain no subpopulation members.

. epctile percent_asset_change1 if analytical_pop==1 & first_cancer==wave [pweight=pre_rwtresp], p(50)

Percentile estimation
------------------------------------------------------------------------------
percent_as~1 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p50 | -.0451306 .0387276 -1.17 0.244 -.1210354 .0307741
------------------------------------------------------------------------------
Tags: None

David Radwin

Join Date: Mar 2014
Posts: 368

21 Dec 2021, 15:02

First, as specified in Statalist FAQ 12.1, "If you are using community-contributed (also known as user-written) commands, explain that and say where they came from." epctile was created by Stas Kolenikov and can be found with findit epctile.

Second, I think you want to use the svy option with epctile (see below).

Third, your output will will be more readable if you use the code tags in your post (see below).

Finally, there's no reason a priori to expect the standard errors of the median and the mean of a distribution to be the same, weighted or not. I think the SE of a median is generally about 30% larger than the SE of a mean, but that is just a rule of thumb approximation. Below is an example where the SE of the median is 65% larger than the SE of the mean. (Also, the mean value differs from the median value, but that is also to be expected in most cases.)

In any case, if appropriate, I encourage you to report the SEs of both the mean and median.

Code:

. webuse nhanes2

. svyset

Sampling weights: finalwgt
             VCE: linearized
     Single unit: missing
        Strata 1: strata
 Sampling unit 1: psu
           FPC 1: <zero>

. svy: mean age
(running mean on estimation sample)

Survey: Mean estimation

Number of strata = 31            Number of obs   =      10,351
Number of PSUs   = 62            Population size = 117,157,513
                                 Design df       =          31

--------------------------------------------------------------
             |             Linearized
             |       Mean   std. err.     [95% conf. interval]
-------------+------------------------------------------------
         age |   42.25264   .3026691      41.63534    42.86994
--------------------------------------------------------------


. epctile age, p(50) svy
(running mean on estimation sample)

Survey: Mean estimation

Number of strata = 31            Number of obs   =      10,351
Number of PSUs   = 62            Population size = 117,157,513
                                 Design df       =          31

--------------------------------------------------------------
             |             Linearized
             |       Mean   std. err.     [95% conf. interval]
-------------+------------------------------------------------
    __000006 |  -.0152412   .0089141     -.0334216    .0029393
--------------------------------------------------------------

Percentile estimation
------------------------------------------------------------------------------
             |             Linearized
         age | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         p50 |         40         .5    80.00   0.000     39.02002    40.97998
------------------------------------------------------------------------------

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him

Comment

Wendi Elkins

Join Date: Dec 2021

Posts: 2
#3

29 Dec 2021, 08:24

This was very helpful! Really appreciate the thoughtful response. I'll be sure to report the CI for both.
1 like
Comment
Vivian Jenkins

Join Date: Nov 2023

Posts: 5
#4

21 Nov 2023, 13:17

Hi David,
Thanks for your weigh-in on the above question which set me on the right path as well.
Is there a way to modify the code to get the Median, Interquartile range (IQR), and statistical signifcance instead of the Median and Confidence Intervals?

I am currently using the CDC’s NHANES - a complex survey design - to compare several variables across quintiles of an exposure variable e.g to compare distribution of age across quintiles of sleep duration. However, all of the variables are not normally distributed, thus I am more inclined to report the number using median (interquartile range), instead of mean ± standard deviation. How can I get the median and also check for statistical significance?

I used Stas Kolenikov's epcitile code across various categories. Please see my output below and kindly advise how to get IQR with p-values nstead. Thanks for your help!
Comment
David Radwin

Join Date: Mar 2014

Posts: 368
#5

21 Nov 2023, 13:31

You can get the lower and upper values of the IQR, and their corresponding p-values, by using p(25) and p(75) in addition to p(50). p(50) is the 50th percentile, which is the same as the median value. Is that what you mean?

I don't know if there's a way to get a standard error or p-value of the IQR itself. I have never seen that reported, but that doesn't mean it's not possible. If that's what you seek, I recommend starting a new thread as recommended by FAQ extra 1.5.

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
Comment
Vivian Jenkins

Join Date: Nov 2023

Posts: 5
#6

21 Nov 2023, 15:26

Hi David. First of all, THANK YOU so much for your quick response and the tip on getting the IQR. It was very helpful but I had to run the p(25) and p(75) as separate commands from the p(50) in order to get the IQR.
And you are right. I have not seen p-values reported for IQR either. (I had used mean values with their associated p-values but one member of the research team thought median and IQR would be better)

I will start a new thread right away and see if anyone has a different thought. Thanks once again

Attached Files
Comment

Announcement