Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Number of observations for each category when regressing

    Hi,

    I've got a very long code that runs a bunch of regressions and writes the output into CSV files. I want to include the number of observations for each category when I run a regression for example:
    Model 1
    Factor n Beta coefficient
    Education level n = 649 test of trend: p = 0.10
    Less than Year 10 28 Reference
    Year 10 or Year 11 75 1.8
    Year 12 or equivalent 177 2.5
    Trade/ Certificate 389 2.8
    Bachelor degree 247 3.3
    Postgraduate 106 3.8












    I've used e(N) to get the overall number of observations for the regression (n = 649):
    Code:
    regress age i.education
    local n = e(N)
    However, the number of observations for each category currently come from just tabulating education which does not take into account the records eliminated due to missing data when running the regression. Hence the individual categories do not add up to 649.
    I know I could do the following to get a table with the right numbers for the example above:
    Code:
    tab education if age != ., matcell(tabx)
    But in reality I have a long list of variables stored in local macros that are included in the regression and I don't want to have to unpack it and manually write an if statement for each one. Is there a way to do this? Please let me know if anything above isn't clear.

    Thanks,
    Nicole


  • #2
    I've solved my problem but I'll leave it up in case anyone else wants an answer.
    Code:
    local outcome "age" //dependent variable
    local factor "education" //independent variable
    local adj "sex weight height income" //extra variables included in regression
    local adjc : subinstr local adj " " ",", all //adding commas to the variable list to fit in with missing() syntax
    tab `factor' if !missing(`outcome', `adjc'), matcell(tabx)
    Which outputs tabx which is a matrix with the number of observations for each category of the independent variable included in the regression.

    Comment


    • #3
      After each regression, you could do something like

      Code:
      tab education if e(sample)
      That will limit the analysis to the cases that were used in the regression.

      Another nice command for some purposes is estat sum, e.g

      Code:
      reg y x1 x2 x3
      estat sum
      "estat summarize summarizes the variables used by the command and automatically restricts the sample to the estimation sample; it also summarizes the weight variable and cluster structure, if specified."

      For more info, type

      Code:
      help estat summarize
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Great! Thanks Richard. e(sample) is exactly what I was looking for.

        Comment

        Working...
        X