Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Statistical significance of descriptive statistics and split of dataset

    I was wondering if it is possible to check for statistical significance for descriptive statistics, such as mean or median. Is there a specific command that I could use?

    Moreover, I have a dataset that I want to split into two, is there a way I can do it using a command or should I do it manually?

    I am new in STATA so every advice is welcome.

  • #2
    What do you mean by statistical significance of descriptive statistics? That is a contradiction in terms. Descriptive statistics are used to describe a sample of data that you actually have. Statistical significance is used to make inferences about a whole population based on a sample when only the sample is available. Perhaps you can explain in greater detail what data you have and what kind of questions you are trying to resolve by using it, and you might get a more helpful response.

    As for splitting a data set into two, again, what do you mean? What are you trying to accomplish? Do you want to split it into two random subsets? Or are you trying to separate out two definable subgroups such as men and women, or young and old, or some other criterion-based split? Again, a more specific question can elicit a more helpful response.

    For more general information about how to get the most out of posting at Statalist, do read the FAQs--it is full of very helpful advice on how to ask questions, and also how to best use the Forum software so as to maximize your chances of getting a useful answer.

    Comment


    • #3
      For the first question, I have to replicate the table shown in the attachment. I cannot understand how statistical significance can be checked. table.pdf

      Regarding the second question, I have to split my dataset into small and large firms, which are denoted by a dummy variable, so they are definable subgroups I would say.

      Comment


      • #4
        So you are trying to test the hypothesis that the medians in the small and large populations are equal. This is not about descriptive statistics then. The -median- command will do this for you:

        Code:
        median my_variable, by(large_small_indicator)
        If you need to split your data set into large and small firm data sets, denoted by a "dummy" variable, let's call it large_small_indicator, coded 0 for small and 1 for large, you can do this with:

        Code:
        use my_data, clear
        preserve
        keep if large_small_indicator == 0
        save small_data, replace
        restore
        keep if large_small_indicator == 1
        save large_data, replace

        Comment


        • #5
          I managed to get the results I was looking for. Your advice was very helpful. Thanks.

          Comment

          Working...
          X