Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Optimizing number of observations in a regression analysis

    It is understood that the number of observations in a regression analysis will vary depending on variables with missing values. With a goal of optimizing the number of non-missing values in a regression, one approach is to run the analysis adding or dropping one variable at a time to see which variables result in a significant drop in the sample size. However, the combination of variables included is also relevant. Can a subset of variables be identified a priori from a list of candidate predictors that would have a relatively high number or percentage of non-missing observations when included in a regression analysis? We could specify the number or percentage as a goal.
    Or, more generally, is a command or procedure available that lists the number of non-missing observations for each combination of variables from a list?

  • #2
    Hillel:
    a temptative reply rests on the -egen- function -rowmiss-.
    That said, as missingness is an ubiquitous plague, the best approach is probably diagnosing the mechanism underlying missing values and treat them accordingly (via -mi-, is feasible).
    Last edited by Carlo Lazzaro; 20 Jan 2022, 03:12.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Thank you, Carlo. I have since found the command mvpatterns. It might do what I am looking for, although I still need to understand the output.

      Comment


      • #4
        Hillel:
        thanks for pointing me out to the community-contributed module -mvpatterns-, as I was not aware of it.
        It seems to sum up the missing status of each variable include in the dataset:
        Code:
        . sysuse auto.dta
        (1978 automobile data)
        
        . mvpatterns rep78
        Variable     | type     obs   mv   variable label
        -------------+---------------------------------------
        rep78        | int       69    5   Repair record 1978
        -----------------------------------------------------
        
        Patterns of missing values
        
          +------------------------+
          | _pattern   _mv   _freq |
          |------------------------|
          |        +     0      69 |
          |        .     1       5 |
          +------------------------+
        
        .
        Obviously how to deal with missing values remains the main issue.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          alternatively you can use tuples (also available from SSC) to list the number of non-missing observations for each combination of variables from a list,
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(v1 v2 v3 v4 v5)
          . . . . .
          1 . . . .
          1 1 . . .
          1 1 1 . .
          1 1 1 1 .
          end
          
          tuples v*, varlist
          qui forv i = 1/`ntuples' {
          local x = subinstr("`tuple`i''"," ",",",.)
          count if mi(`x')
          noi di "`x'" _col(20) r(N)
          }

          Comment


          • #6
            tuples looks interesting, but tuples command is unrecognized after installing dataex.

            Comment


            • #7
              Hillel:
              Øyvind' s helpful example works for me in Stata/SE 17.0:
              Code:
              . input float(v1 v2 v3 v4 v5)
              
                          v1         v2         v3         v4         v5
                1.
              . . . . . .
                2.
              . 1 . . . .
                3.
              . 1 1 . . .
                4.
              . 1 1 1 . .
                5.
              . 1 1 1 1 .
                6.
              . end
              
              .
              .
              .
              . tuples v*, varlist
              
              .
              . qui forv i = 1/`ntuples' {
              .
              . local x = subinstr("`tuple`i''"," ",",",.)
              .
              . count if mi(`x')
              .
              . noi di "`x'" _col(20) r(N)
              .
              . }
              v5                 5
              v4                 4
              v3                 3
              v2                 2
              v1                 1
              v4,v5              5
              v3,v5              5
              v3,v4              4
              v2,v5              5
              v2,v4              4
              v2,v3              3
              v1,v5              5
              v1,v4              4
              v1,v3              3
              v1,v2              2
              v3,v4,v5           5
              v2,v4,v5           5
              v2,v3,v5           5
              v2,v3,v4           4
              v1,v4,v5           5
              v1,v3,v5           5
              v1,v3,v4           4
              v1,v2,v5           5
              v1,v2,v4           4
              v1,v2,v3           3
              v2,v3,v4,v5        5
              v1,v3,v4,v5        5
              v1,v2,v4,v5        5
              v1,v2,v3,v5        5
              v1,v2,v3,v4        4
              v1,v2,v3,v4,v5     5
              
              .
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment


              • #8
                Thank you. I am using Stata 17.0. How exactly do you install it, please?

                Comment


                • #9
                  tuples must be installed from SSC,
                  Code:
                  ssc install tuples
                  sorry for the confusion
                  Last edited by Øyvind Snilsberg; 20 Jan 2022, 07:00. Reason: crossed with #8

                  Comment


                  • #10
                    If you want to use tuples, you must install it separately.

                    Code:
                    ssc install tuples
                    See also several answers at https://www.statalist.org/forums/for...ns-by-variable

                    Comment


                    • #11
                      Me too forgot to mention that -tuples- is a community-contributed module that can be installed from SSC (typing -search tuples-).
                      Kind regards,
                      Carlo
                      (StataNow 18.5)

                      Comment


                      • #12
                        No problem! Thank you, both.

                        Comment

                        Working...
                        X