Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Descriptive statistics: add percentages to summary statistics (matrix)

    Hi Statalisters,

    I have a question with regards to descriptive statistics. I have been working on a code to get the sum output of my variables of interest in one table at the same time however as some of the variables are not continuous variables I would like to have them expressed in percentages. Would be great if you could have a quick look in what I have done so far!

    Code:
    //4. List variables for which you'd like to report summary statistics
    
    local age_1 ownsphone school schoolagedchildren_1 childrenattendschool partner childmortality
    
    //5. Create an empty matrix y with J rows and K columns based on the number of variables of interest and
    //     the number of summary statistic scalars that you wish to report
    
    local j_rows=7
    local k_columns=4
    
    mat m = J(`j_rows',`k_columns',.)
    
    //6. Initialize counter variable y to start at 1
    
    local y=1
    
    //7. Generate matrix of summary statistics fromt he pre-specified list of variables
    
    foreach var of varlist age_1 ownsphone school schoolagedchildren_1 childrenattendschool partner childmortality {
        sum `var' if round==0, d
        mat m[`y',1] = round(r(mean),.1)
        sum `var' if round==0, d
        mat m[`y',2] = round(r(sd),.1)
        sum `var' if round==100, d
        mat m[`y',3] = round(r(mean),.1)
        sum `var' if round==100, d
        mat m[`y',4] = round(r(sd),.1)
        local y=`y'+1
    }
    
    //8. Name Columns and Rows
    mat colnames m = "Control Mean" "Control SD" "Treatment Mean"  "Treatment SD"
    mat rownames m = "age_1" "ownsphone" "school" "schoolagedchildren_1" "childrenattendschool" "partner" "childmortality"
    
    //9. Save matrix to spreadsheet
    mat2txt, matrix(m) saving (FILE LOCATION) replace
    Have a good day!

    Linda

  • #2
    I think that summary statistics make sense for all kind of variables. E.g., for a binary variable the average is the percentage of 1s.

    Your code seems fine. There are some redundancies, but if speed is not an issue here, seems to be the code does what it is supposed to do.

    (only one call of each type is needed, and you do not need the -summ, detail- option because you re retrieving only the mean and standard deviation
    sum `var' if round==0 and
    sum `var' if round==100 )

    Comment


    • #3
      Hi Joro,

      Thank you for your swift reply. I am struggling a bit with this, as some of the variables than seem to have limited explanatory value e.g. school attendance, partner and religion as they are have value labels from 1 - 7. This will be my descriptive statistics used beforehand a (PS)Matching, so I want to give a good overview of the differences before matching.

      Code:
          Control Mean    Control SD    Treatment Mean    Treatment SD    
      Age          31.7           5.2          33.9           5.4    
      School             1            .2            .9            .2    
      School attendance           1.4            .7           1.2            .6    
      School completed           1.3            .9           1.4             1    
      School aged children             1           1.2           1.4             1    
      partner           1.2            .8           1.3            .9    
      childmortality            .1            .3            .2            .4    
      religion           1.3            .8           1.2            .7    
      Hope this is clear!

      Comment


      • #4
        I think I understand now, Linda. I made a too general statement which is incorrect at this level of generality.

        There are variables which we call Categorical variables, and calculating summary statistics for them does not make sense, because the numbers we attach to the different categories do not have measurable, numerical meaning. For example if the variable religion is 1 for Muslim, 2 for Christian, 3 for Jewish, calculating means and standard deviations for this make no sense, because there is no measure here, the only meaning we place is that 1 is different from 2, and we arbitrarily chose 1 for Muslims and 2 for Christians. Similarly we cannot use categorical variables in regression analysis, we need to expand them in a set of dummies.

        The way how we do the expansion, is that we create a new dummy variable Muslim, which is equal to 1 if Muslim, 0 otherwise; new dummy variable Christian=1 if Christian, 0 otherwise, etc. And then the means of Muslim and Christian already make sense, as they tell us the proportion of the particular religion in the sample.

        I do not know about the variables you have highlighted because I do not know their nature. E.g., partner might be the number of partners, and then this is a measurement, and you can calculate summary statistics for it. Or it might be a categorical variable where very arbitrary it is written 1 for sinlge, 2 for married, 3 for divorced...etc, and then you cannot meaningfully caclulate summary statistics for it, but need to firstly expand it in a set of dummies. Or just tabulate it.

        To summarise: For variables that are measurements we can take summary statistics and they make sense. For categorical variables, we either need to expand them into set of dummies, or we need to tabulate them.

        Comment


        • #5
          Thank you Joro, will start working on it!

          Comment

          Working...
          X