Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Complete Case Analysis

    Hi,

    I have one DV and 20 IVs.
    I conducted a series of univariable regressions and multivariable regressions, using my full data set.

    The issue is that some individuals have data available on certain variables, but are missing data on other variables.

    I have been told I need to now conduct a Complete Case Analysis, where analysis is run only on individuals who have data available for all variables.

    Is there a particular code to do this? I am completely lost as to how to do this.






  • #2
    You may perform a regression with all variables, then use - if e(sample), or - gen complete = 1 if e(sample) - and use the data if complete equals to 1.
    Best regards,

    Marcos

    Comment


    • #3
      Alan:
      as an aside to Marcos' helpful comment, please note that if missing data in numeric variables are reported as dots (.), Stata will apply listwise deletion and the observations with missing data in any of the variables (dependent and/or independent) will be ruled out from your regression.
      However, the main issue is why some units have missing data and, even more interesting for selecting an approach to deal with them, the mechanism and the pattern underlying their missingness.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Originally posted by Alan Jeddi View Post
        Hi,

        I have one DV and 20 IVs.
        I conducted a series of univariable regressions and multivariable regressions, using my full data set.

        The issue is that some individuals have data available on certain variables, but are missing data on other variables.

        I have been told I need to now conduct a Complete Case Analysis, where analysis is run only on individuals who have data available for all variables.

        Is there a particular code to do this? I am completely lost as to how to do this.




        Thanks Marcos for that advice.
        I basically have 10 variables, a dependent variable called A
        and then 9 independent variables, B C D E F G.

        The tricky part is that B has only been measured on 10% of people; C on 15% of people; D on 85% of people.

        I want to run a linear regression of A on B, adjusting for C D E F G, but I only want to include individuals on whom all of B, C, D, E, F, G have been measured.

        I have tried doing the following:
        keep A if B= . keep A if C = .

        But that doesnt really seem to work.

        Comment


        • #5
          Listwise deletion (aka complete case analysis) is what regress and most estimation commands do by default. You should not need to do any keep commands beforehand.

          It does concern me that one variable is costing you 90% of your cases. Are you sure you need it?
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Alan:
            you can use -egen- with the rowmiss()- function to select complete case:
            Code:
            egen flag=rowmiss(Alfa-Omega)*replace the fake names within brackets with the real name of your first and last variable, keeping the hyphen in between
            reg A <indpepvar> if flag==0
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              If you want to limit all of your analyses to only cases that have nonmissing data, you can do something like

              Code:
              reg y x1-x19
              gen mysample = e(sample)
              reg y x1 x2 x3 if mysample
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                A discussion of basic and advanced techniques for handling missing data can be found at

                https://www3.nd.edu/~rwilliam/stats3/MD01.pdf

                https://www3.nd.edu/~rwilliam/stats3/MD02.pdf
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment

                Working...
                X