Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • SE or SD of the mean for descriptive statistics

    I am preparing a descriptive statistics table. When using the command summary I can obtain the standard deviation of the mean. But I have seen some papers provide SE for the mean. I wonder which is more appropriate and how to obtain SE for the mean. Thank you.

  • #2
    It can be obtained using -tabstat-:
    Code:
    sysuse auto, clear
    tabstat mpg weight length, stat(semean)
    As for which one is better, my answer is "the well labeled one is better." As long as it's clearly labeled, either one can show data dispersion.

    In practice, I often work on rather big data, which cause the SE to be too small, thus I generally prefer SD if it's available (in some setting, like data with complex survey weight, it may be difficult to get SD.) And If I had to show SE, I usually just go one more step to show confidence intervals, which are perhaps easier to interpret.

    It can also be field-dependent. Identify a few peer-reviewed articles or journals from your field and use them as a guideline.

    Comment


    • #3
      Thank you very much for your detailed answering. I really appreciate it. I have large data and someone in my field using the same data used standard deviation. I guess I will use standard deviation too although I am using panel data with complex survey weight. I wonder how I am going to apply the weight and survey wave in the same command. I only need to take the mean of one variable. So I guess the syntax will be like:
      Code:
      sum variable if wave==1, pweight=weightvariable
      Is that correct?

      The rest of my variables are categorical. I wonder if I could do something like:
      Code:
      tabstat X1 X2 X3, by (wave) rows (variables) pweight=weightvariable
      Thank you.

      Comment


      • #4
        For the first command,
        Code:
        sum variable if wave==1 [iweight=weightvar]
        worked. For the second syntax, option rows not allowed.
        Last edited by Meng Yu; 07 Jul 2021, 12:31.

        Comment


        • #5
          What does the data source's documentation say? If you're using complex survey setting (svyset), then there are -svy: mean- and -svy: table- you may use.

          Comment


          • #6
            neither code in #3 will work as the weight goes before the comma; see the help files:
            Code:
            help summ
            help tabstat
            also, my 2 cents: SE is an inferential, not a descriptive stat; SD is a descriptive stat (but custom may differ in your field)

            Comment


            • #7
              Thank you both. I agree SE is inferential. I use xtset to set my data. Will try svyset to see if it works with pweight.

              Comment


              • #8
                I found this article and the quote in it. It seems I can use summarize with aweight.
                https://www.stata.com/support/faqs/s...ry-statistics/
                First, let me show that summarize with aweights gives the same result as estat sd

                Comment


                • #9
                  If you have a variable X, the standard deviation of X is the standard deviation of X. The standard error is the standard deviation of mean(X). Therefore:
                  1. If you are interested in X, rather than mean(X) you should report standard deviation, and not standard errors.
                  2. Standard error = (Stanard Deviation)/sqrt(Sample Size), and therefore Standard error converges to 0 as the sample size grows without bound. I do not find very interesting objects which converge to 0 as a description of variables/the population. (They are interesting for inference purposes for the given sample, as it was mentioned above.)

                  You might find the following FAQ useful, it discusses weights and svy jointly: https://www.stata.com/support/faqs/s...ry-statistics/


                  Comment


                  • #10
                    Meng:
                    as an aside to previous excellent replies, the standard deviation does not make any assumptions about the theoretical probability distribution the sample under investigation comes from: it simply describes the dispesion of your data around a measure of central tendency (the mean).
                    The standard error can be read as the standard deviation of the sample distribution of the mean; as such, it implies a reference theoretical probability distribution. Having to do with parameters, it is an inferential tool.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Thank you both. I think the link Joro sent was the same as the one I sent. It is also about using sum with aweight gives you the same result as using svy:mean and estat sd.

                      Comment

                      Working...
                      X