Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Table (automated) on how many observations a regression with different control variables has

    Hello

    I have a regression with 10 covariates, that are partly factor variables like ib10.variablep

    Some of my variables have missing values - which Stata will then in Regression (reg) not use at all (case wise deletion).

    A reviewer wants a table from me where I show all possible combinations and how many missing I would have in this combination.

    Obviously, this is manually not humanly possible, so I search for a Stata based solution.



    I don't know if I need a data example for this, because it's independent of any data - I just need to find the right program/code.

    But assume

    GDP = ib1.country + population + shareofsomething + attitudinalscale + ib10.variablep (and 5 others I omit for ease) and my code is

    reg GDP b1.country population shareofsomething attitudinalscale ib10.variablep , robust
    local nobs = e(N)

    And now I leave one out and get different number of obs (nobs) but is it somehow possible to automize / get a table from this? (I know I could code it from scratch but maybe something is preinstalled or available.)




    Thank you!
    Last edited by Andrea Maier; 29 Apr 2022, 05:42. Reason: typo in title corrected

  • #2
    Please use dataex to show us your data.


    EDIT: I'm also unclear on what's meant by all possible combinations. Of variables? If I have three predictor variables, do they want the first one, the first two, all three, two and three, and one and three? Because if so, this doesn't make sense
    Last edited by Jared Greathouse; 29 Apr 2022, 05:55.

    Comment


    • #3
      Thank you Jared (if I may).
      I don't think the question is dependent on my data (I am searching for a code.)

      Jeah, it is a peer review comment, it is not that I have much authority over what makes sense. I understood it as (and I use your three case example again, if I may

      n_all: for prediction based on all variables (here: 3)
      n_all-1: in your example this would be y= x1+x2, y=x2+x3, y=x1+x3
      n_all-2: in your example this would be y=x1, y=x2, y=x3.

      As I have 10 variables this would give me a table with 100 values, so I cannot do it manually.

      Comment


      • #4
        Trust me, this will be much easier if you just show me what your data look like. By your 50th post here, you should know there's basically never a good reason to not provide a data example, unless of course the data you're working with are confidential. Any code I attempt to provide you will be meaningless if I can't see the exact variables you're using. There's a good reason the FAQ asks that we show both data and code.

        Either way, you're likely interested in the user written "tuples" command, I think this does what you want, written by Nick Cox, daniel klein and Joseph Luchman

        Comment


        • #5
          Originally posted by Andrea Maier View Post
          Jeah, it is a peer review comment, it is not that I have much authority over what makes sense.
          Well, if a reviewer comment does not seem to make sense, explain in your response letter why you think so and suggest a more useful alternative.


          Originally posted by Andrea Maier View Post
          As I have 10 variables this would give me a table with 100 values
          No. While 3 variables result in the 2^3-1 = 7 possible combinations you show, 10 variables imply 2^10-1 = 1,023 possible combinations ...

          Comment


          • #6
            Jared Greathouse Thank you. The thing is: my data is very confidential its stuff about individuals income policy attitudes. So I would need to make something up, completely from scratch just to show: there are variables that have missing in no particular pattern. I just want to save the "n"'s of some regressions that omit all possible combinations of the independent variables.

            (Anyway I should have mentioned why I cannot show the exact data I am working with, that's my bad, excuse me please.)

            The thing is my code is in there.


            Thank you, I will look at tuples, thank you!

            Daniel yeah I actually did not do the math and just guessed. Thanks. (I don't think that reviewer 2 wanted all combinations but at the moment, I wouldn't be able to show the reviewer (say) 7 combinations, so I am trying to solve the problem.)

            Comment


            • #7
              I am now using -tuples- (from: Joseph N. Luchman) and it works really well (and slow but that might just be my computer).

              The code is:



              tuples x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 , view

              (you will see all the tuples)


              set obs (to largest number displayed)


              gen n_obs = .
              gen predictors = ""



              quietly forval i = 1/largest number displayed {
              regress nextcarisbev `tuple`i''
              replace n_obs = e(N) in `i'
              replace predictors = "`tuple`i''" in `i'

              }

              Last edited by Andrea Maier; 29 Apr 2022, 08:35.

              Comment


              • #8
                It's almost always nice to be mentioned, but the order of authors of tuples has meaning to those authors in that I gave birth to the command but Joseph and Daniel now oversee its care and maintenance.

                Naively I am still trying to understand the question. If it's the effect of including or excluding any of 10 covariates I make that 2^10 possibilities (minus 1 for the uninteresting case of them all being excluded) -- so 1024 or 1023 -- and does the reviewer really want to see a table that long?

                (EDIT This was in draft form over my lunch time and so overlaps by accident with other posts not visible at the time, esp. #5.)
                Last edited by Nick Cox; 29 Apr 2022, 08:43.

                Comment


                • #9
                  Details are not relevant here but they usually are. Note that

                  Originally posted by Andrea Maier View Post
                  tuples x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 , view
                  will issue an invalid syntax error. The correct option name is (and always has been) display. Please always copy and paste the exact code that you use.

                  Originally posted by Andrea Maier View Post
                  set obs (to largest number displayed)
                  Is not necessary as the number of tuples is returned in local macro ntuples. This is explained in the help file. Thus,

                  Code:
                  set obs `ntuples'
                  is more flexible code.

                  Regarding speed, tuples (from SSC, I suppose) should be able to handle 10 items in less than a second, unless you are on a very slow machine or use the naive Stata-based method, which we do not recommend. Even then, tuples should not take longer than a couple of seconds. As for obtaining the number of missing values, there are faster methods than fitting a regression model. The mark and count commands come to mind.

                  The usefulness of the approach has now been questioned three times.

                  Comment

                  Working...
                  X