Number of observations for each category when regressing

Nicole Wardrop

Join Date: Jun 2018
Posts: 3

Number of observations for each category when regressing

26 Jun 2018, 22:55

Hi,

I've got a very long code that runs a bunch of regressions and writes the output into CSV files. I want to include the number of observations for each category when I run a regression for example:

Model 1
Factor		n	Beta coefficient
Education level		n = 649	test of trend: p = 0.10
	Less than Year 10	28	Reference
	Year 10 or Year 11	75	1.8
	Year 12 or equivalent	177	2.5
	Trade/ Certificate	389	2.8
	Bachelor degree	247	3.3
	Postgraduate	106	3.8

I've used e(N) to get the overall number of observations for the regression (n = 649):

Code:

regress age i.education
local n = e(N)

However, the number of observations for each category currently come from just tabulating education which does not take into account the records eliminated due to missing data when running the regression. Hence the individual categories do not add up to 649.
I know I could do the following to get a table with the right numbers for the example above:

Code:

tab education if age != ., matcell(tabx)

But in reality I have a long list of variables stored in local macros that are included in the regression and I don't want to have to unpack it and manually write an if statement for each one. Is there a way to do this? Please let me know if anything above isn't clear.

Thanks,
Nicole

Tags: None

Nicole Wardrop

Join Date: Jun 2018
Posts: 3

27 Jun 2018, 01:20

I've solved my problem but I'll leave it up in case anyone else wants an answer.

Code:

local outcome "age" //dependent variable
local factor "education" //independent variable
local adj "sex weight height income" //extra variables included in regression
local adjc : subinstr local adj " " ",", all //adding commas to the variable list to fit in with missing() syntax
tab `factor' if !missing(`outcome', `adjc'), matcell(tabx)

Which outputs tabx which is a matrix with the number of observations for each category of the independent variable included in the regression.

Comment

Richard Williams

Join Date: Apr 2014

Posts: 4992
#3

27 Jun 2018, 04:53

After each regression, you could do something like

Code:

tab education if e(sample)

That will limit the analysis to the cases that were used in the regression.

Another nice command for some purposes is estat sum, e.g

Code:

reg y x1 x2 x3 estat sum

"estat summarize summarizes the variables used by the command and automatically restricts the sample to the estimation sample; it also summarizes the weight variable and cluster structure, if specified."

For more info, type

Code:

help estat summarize

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Nicole Wardrop

Join Date: Jun 2018

Posts: 3
#4

27 Jun 2018, 18:00

Great! Thanks Richard. e(sample) is exactly what I was looking for.
Comment

Announcement

Number of observations for each category when regressing

Comment

Comment

Comment