Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a table of means of covariates used in the regression:

    I am looking for suggestions on how to export the means of covariates used in the regression in a table to word.
    Code:
     probit depvar [indepvars] [if] [in] [weight] [, options]
    In the case of a pooled probit regression. I would like it to have a column of the used means for each year of the two years (assuming it consists of two years).
    And another table of means of covariates used in the regression by the binary dependent variable (i.e., one column when p=1, when p=0, and a third column for the means of the entire sample).

    Is there any direct way of doing this or do I have to type them manually? Thanks a lot!

  • #2
    As you do not provide example data, I will illustrate the approach using the built-in auto.dta

    Code:
    clear*
    sysuse auto
    
    local depvar foreign
    local indvars price mpg headroom
    
    probit `depvar' `indvars'
    
    frame create means str32 variable float(mean_0 mean_1 mean_all)
    foreach v of varlist `indvars' {
        local topost ("`v'")
        forvalues i = 0/1 {
            summ `v' if e(sample) & `depvar' == `i', meanonly
            local topost `topost' (`r(mean)')
        }
        summ `v' if e(sample), meanonly
        local topost `topost' (`r(mean)')
        frame post means `topost'
    }
    At the end of this code, the dataset in frame means is what you are looking for.

    Note: Because the code uses frames, it requires Stata version 16 or later. If you are using an earlier version, the code can be modified to use a -tempfile- instead.

    Comment


    • #3
      Awesome! As always, extremely helpful! Thank you so much, Clyde Schechter! I haven't used this type of data management in the past, I will use this technique (data frames) to produce the table of means. I used your code and this tabulation in the next step and then export it to word doc. unless you suggest a more efficient way of doing this.
      Code:
      tabstat mean_0 mean_1 mean_all , by( variable ) stat( mean)

      Comment


      • #4
        Hend She You'll want to use the user written esttab/estpost commands for exporting to a word doc.

        Comment


        • #5
          Clyde, does the relatively complicated method you showed in #2 have any advantages over a simple -tabstat-, like this?

          Code:
          clear
          sysuse auto
          local depvar foreign
          local indvars price mpg headroom
          quietly probit `depvar' `indvars'
          tabstat `indvars' if e(sample), stat(mean) by(foreign)

          Here is the output from your method:

          Code:
          . frame change means
          
          . list
          
               +-------------------------------------------+
               | variable     mean_0     mean_1   mean_all |
               |-------------------------------------------|
            1. |    price   6072.423   6384.682   6165.257 |
            2. |      mpg   19.82692   24.77273    21.2973 |
            3. | headroom   3.153846   2.613636   2.993243 |
               +-------------------------------------------+
          And here is the output from the simple -tabstat- method:

          Code:
          . tabstat `indvars' if e(sample), stat(mean) by(foreign)
          
          Summary statistics: mean
            by categories of: foreign (Car type)
          
           foreign |     price       mpg  headroom
          ---------+------------------------------
          Domestic |  6072.423  19.82692  3.153846
           Foreign |  6384.682  24.77273  2.613636
          ---------+------------------------------
             Total |  6165.257   21.2973  2.993243
          ----------------------------------------
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment


          • #6
            Clyde, does the relatively complicated method you showed in #2 have any advantages over a simple -tabstat-, like this?
            I proposed that method for a couple of reasons, in decreasing order of importance.

            1. O.P. wants to export the results somewhere. Output from -tabstat- does not readily lend itself to that. By creating a data set of results, you can readily export it to spreadsheets, word processing documents, text files, other statistical packages and databases. If O.P. had just wanted to list the means to the Results window, I would have recommended -tabstat-. I suspect that, in version 17, the use of -collect- could give us another way, but using -collect- is very complicated and I'm barely getting comfortable with it myself, so I'm not trying to instruct others in using it yet.

            2. This method is completely flexible: it can be modified to calculate anything that is a function of the data in memory, in r(), and in e() and create a data set that organizes it by variable. -tabstat- does the basic descriptive statistics, but that's all.

            Comment


            • #7
              Okay, fair enough, Clyde. How about using -collapse- then? Something like this?

              Code:
              clear
              sysuse auto
              local depvar foreign
              local indvars price mpg headroom
              quietly probit `depvar' `indvars'
              preserve
              collapse `indvars' if e(sample), by(foreign)
              // Export to another format if you like
              list
              restore
              Rather than using -preserve- and -restore-, one could copy the working dataset to a new frame if one wished. But for this toy example, at least, -preserve- and -restore- seemed more than adequate.

              Happy New Year! ;-)
              --
              Bruce Weaver
              Email: [email protected]
              Version: Stata/MP 18.5 (Windows)

              Comment


              • #8
                Yes, that's simpler for this problem.

                Comment


                • #9
                  Thank you all for the very helpful suggestions!

                  Comment


                  • #10
                    I just tried both ways using non-built in data. Somehow while Clyde Schechter's code worked very well for me before using the auto.dta, while working on my actual data, the data frame generated contained zero observations. It must be a mistake from my side, but I didn't figure out yet why.

                    For the tabstat suggestion, just to clarify, when I run this code, I see the means before the generated marginal effects:
                    Code:
                    gen sample=0
                    replace sample=1 if e(sample)
                     probit `depvar' `indvars' [pw=hweight] if head==0 & sample==1, robust
                     margins, dydx(*) atmeans  post
                    **prediction
                    Expression: Pr(nocl), predict()
                    At: amount             = 7.663864 (mean)
                        age                = 42.82338 (mean)
                        age2               = 1995.185 (mean)
                        married            = .2318454 (mean)
                    
                    **Marginal effects table here
                    Code:
                     tabstat `indvars' if e(sample), stat(mean) by(nocl)
                    
                    Summary statistics: Mean
                    Group variable: nocl
                    nocl                             amount          age        age2
                    0                                 11.42          38.24       1597
                    1                                  8.53          38.00       1587
                    For the tabstat results table, I only listed above the example of the first three variables by mean. I got confused, is the difference here because of the marginal effects defined (atmeans)?

                    Comment


                    • #11
                      Somehow while Clyde Schechter's code worked very well for me before using the auto.dta, while working on my actual data, the data frame generated contained zero observations. It must be a mistake from my side, but I didn't figure out yet why.
                      Well, you have a number of other solutions to your problem proposed in this thread, so I imagine you have non pressing need to resolve this. But if you would like, for learning purposes, to figure out what went wrong when you tried to use my code, post back with the exact code you tried and an example data set (use -dataex-, of course) that reproduces this problem, and I'll try to troubleshoot it.

                      For the tabstat results table, I only listed above the example of the first three variables by mean. I got confused, is the difference here because of the marginal effects defined (atmeans)?
                      No, you are comparing apples to oranges here. The means of the -at()- variables that -margins- shows before the marginal effects are means across the entire estimation sample, whereas the results you are getting from -tabstat- are disaggregated into separate means for nocl = 0 and nocl = 1. That's one thing. Another thing is that the -probit- command is using -pweights-, and -margins- follows along with that, whereas your -tabstat- command is unweighted, so the results would be different anyway.
                      Last edited by Clyde Schechter; 05 Jan 2022, 14:21. Reason: @Rich Goldstein kindly pointed out that I said "apples to origins." Correcting that error.

                      Comment

                      Working...
                      X