Generating a table of means of covariates used in the regression:

Hend She

Join Date: Jul 2020

Posts: 70
#1

Generating a table of means of covariates used in the regression:

30 Dec 2021, 09:08

I am looking for suggestions on how to export the means of covariates used in the regression in a table to word.

Code:

probit depvar [indepvars] [if] [in] [weight] [, options]

In the case of a pooled probit regression. I would like it to have a column of the used means for each year of the two years (assuming it consists of two years).
And another table of means of covariates used in the regression by the binary dependent variable (i.e., one column when p=1, when p=0, and a third column for the means of the entire sample).

Is there any direct way of doing this or do I have to type them manually? Thanks a lot!
Tags: export, means, regression, Suggestion, syntax

Clyde Schechter

Join Date: Apr 2014
Posts: 29799

30 Dec 2021, 14:19

As you do not provide example data, I will illustrate the approach using the built-in auto.dta

Code:

clear*
sysuse auto

local depvar foreign
local indvars price mpg headroom

probit `depvar' `indvars'

frame create means str32 variable float(mean_0 mean_1 mean_all)
foreach v of varlist `indvars' {
    local topost ("`v'")
    forvalues i = 0/1 {
        summ `v' if e(sample) & `depvar' == `i', meanonly
        local topost `topost' (`r(mean)')
    }
    summ `v' if e(sample), meanonly
    local topost `topost' (`r(mean)')
    frame post means `topost'
}

At the end of this code, the dataset in frame means is what you are looking for.

Note: Because the code uses frames, it requires Stata version 16 or later. If you are using an earlier version, the code can be modified to use a -tempfile- instead.

Comment

Hend She

Join Date: Jul 2020

Posts: 70
#3

31 Dec 2021, 07:51

Awesome! As always, extremely helpful! Thank you so much, Clyde Schechter! I haven't used this type of data management in the past, I will use this technique (data frames) to produce the table of means. I used your code and this tabulation in the next step and then export it to word doc. unless you suggest a more efficient way of doing this.

Code:

tabstat mean_0 mean_1 mean_all , by( variable ) stat( mean)
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#4

31 Dec 2021, 08:26

Hend She You'll want to use the user written esttab/estpost commands for exporting to a word doc.
1 like
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1109

31 Dec 2021, 09:36

Clyde, does the relatively complicated method you showed in #2 have any advantages over a simple -tabstat-, like this?

Code:

clear
sysuse auto
local depvar foreign
local indvars price mpg headroom
quietly probit `depvar' `indvars'
tabstat `indvars' if e(sample), stat(mean) by(foreign)

Here is the output from your method:

Code:

. frame change means

. list

     +-------------------------------------------+
     | variable     mean_0     mean_1   mean_all |
     |-------------------------------------------|
  1. |    price   6072.423   6384.682   6165.257 |
  2. |      mpg   19.82692   24.77273    21.2973 |
  3. | headroom   3.153846   2.613636   2.993243 |
     +-------------------------------------------+

And here is the output from the simple -tabstat- method:

Code:

. tabstat `indvars' if e(sample), stat(mean) by(foreign)

Summary statistics: mean
  by categories of: foreign (Car type)

 foreign |     price       mpg  headroom
---------+------------------------------
Domestic |  6072.423  19.82692  3.153846
 Foreign |  6384.682  24.77273  2.613636
---------+------------------------------
   Total |  6165.257   21.2973  2.993243
----------------------------------------

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29799
#6

31 Dec 2021, 11:09

Clyde, does the relatively complicated method you showed in #2 have any advantages over a simple -tabstat-, like this?

I proposed that method for a couple of reasons, in decreasing order of importance.

1. O.P. wants to export the results somewhere. Output from -tabstat- does not readily lend itself to that. By creating a data set of results, you can readily export it to spreadsheets, word processing documents, text files, other statistical packages and databases. If O.P. had just wanted to list the means to the Results window, I would have recommended -tabstat-. I suspect that, in version 17, the use of -collect- could give us another way, but using -collect- is very complicated and I'm barely getting comfortable with it myself, so I'm not trying to instruct others in using it yet.

2. This method is completely flexible: it can be modified to calculate anything that is a function of the data in memory, in r(), and in e() and create a data set that organizes it by variable. -tabstat- does the basic descriptive statistics, but that's all.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1109
#7

31 Dec 2021, 16:48

Okay, fair enough, Clyde. How about using -collapse- then? Something like this?

Code:

clear sysuse auto local depvar foreign local indvars price mpg headroom quietly probit `depvar' `indvars' preserve collapse `indvars' if e(sample), by(foreign) // Export to another format if you like list restore

Rather than using -preserve- and -restore-, one could copy the working dataset to a new frame if one wished. But for this toy example, at least, -preserve- and -restore- seemed more than adequate.

Happy New Year! ;-)

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
2 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29799
#8

31 Dec 2021, 18:36

Yes, that's simpler for this problem.
Comment
Hend She

Join Date: Jul 2020

Posts: 70
#9

04 Jan 2022, 12:36

Thank you all for the very helpful suggestions!
Comment

Hend She

Join Date: Jul 2020
Posts: 70

#10

05 Jan 2022, 12:38

I just tried both ways using non-built in data. Somehow while Clyde Schechter's code worked very well for me before using the auto.dta, while working on my actual data, the data frame generated contained zero observations. It must be a mistake from my side, but I didn't figure out yet why.

For the tabstat suggestion, just to clarify, when I run this code, I see the means before the generated marginal effects:

Code:

gen sample=0
replace sample=1 if e(sample)
 probit `depvar' `indvars' [pw=hweight] if head==0 & sample==1, robust
 margins, dydx(*) atmeans  post
**prediction
Expression: Pr(nocl), predict()
At: amount             = 7.663864 (mean)
    age                = 42.82338 (mean)
    age2               = 1995.185 (mean)
    married            = .2318454 (mean)

**Marginal effects table here

Code:

 tabstat `indvars' if e(sample), stat(mean) by(nocl)

Summary statistics: Mean
Group variable: nocl
nocl                             amount          age        age2
0                                 11.42          38.24       1597
1                                  8.53          38.00       1587

For the tabstat results table, I only listed above the example of the first three variables by mean. I got confused, is the difference here because of the marginal effects defined (atmeans)?

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29799
#11

05 Jan 2022, 13:47

Somehow while Clyde Schechter's code worked very well for me before using the auto.dta, while working on my actual data, the data frame generated contained zero observations. It must be a mistake from my side, but I didn't figure out yet why.

Well, you have a number of other solutions to your problem proposed in this thread, so I imagine you have non pressing need to resolve this. But if you would like, for learning purposes, to figure out what went wrong when you tried to use my code, post back with the exact code you tried and an example data set (use -dataex-, of course) that reproduces this problem, and I'll try to troubleshoot it.

For the tabstat results table, I only listed above the example of the first three variables by mean. I got confused, is the difference here because of the marginal effects defined (atmeans)?

No, you are comparing apples to oranges here. The means of the -at()- variables that -margins- shows before the marginal effects are means across the entire estimation sample, whereas the results you are getting from -tabstat- are disaggregated into separate means for nocl = 0 and nocl = 1. That's one thing. Another thing is that the -probit- command is using -pweights-, and -margins- follows along with that, whereas your -tabstat- command is unweighted, so the results would be different anyway.

Last edited by Clyde Schechter; 05 Jan 2022, 14:21. Reason: @Rich Goldstein kindly pointed out that I said "apples to origins." Correcting that error.
1 like
Comment

Announcement