Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing data Treatment in STATA - Zero or Blank

    Dear Statalist,

    I tried to find the thread of missing data treatment in STATA but I could not find it. If someone has had the same question about this before, please kindly help me to post the link and happy to delete this thread.
    I have questions regarding the missing data treatment in STATA for unbalanced panel data. I have unbalanced panel data for the period 2006 - 2017 with 5 variables. There are some years of some individual data that only have less than 5 variables, in that case:
    1. how do I treat the missing data whether I need to put zero or leave it blank?
    2. Whether put zero or leave it blank affect the number of observations?

    What is the suggestion to treat this missing data? Here I list some of my datasets. The blank spot is the missing data.
    Some years of ID has no data, do I need to take it off from my data or leave it blank or zero?

    Code:
     
    ID Year ROA RD Intensity Leverage SIZE
    ATM.AX 2017 0.05018 0.00593 0.38395 9.45276
    ATM.AX 2016 0.02428 0.00573 0.38600 9.48925
    ATM.AX 2015 0.00314 0.00360 0.39662 9.48102
    ATM.AX 2014 0.03123 0.00312 0.45238 9.33767
    ATM.AX 2013 0.05913 0.00711 0.41489 9.30480
    ATM.AX 2012 0.08257 0.00865 0.34890 9.29429
    ATM.AX 2011 0.18975 0.00940 0.29137 9.21509
    ATM.AX 2010 0.23416 0.01478 0.21568 9.12382
    ATM.AX 2009 0.13495 0.00739 0.17878 9.06997
    ATM.AX 2008 0.24983 0.01419 0.21297 9.12548
    ATM.AX 2007 0.60715 0.00504 0.27347 9.16574
    ATM.AX 2006 0.39480 0.00567 0.41285 9.01221
    CGM.AX 2017 - 0.35130 3.24652 6.26208
    CGM.AX 2016 - 0.52461 4.40949 6.01259
    CGM.AX 2015 - 2.31306 2.22907 6.19635
    CGM.AX 2014 - 0.43015 0.50321 6.69391
    CGM.AX 2013 - 0.19008 0.78025 6.72938
    CGM.AX 2012 - 0.02078 0.23537 7.23147
    CGM.AX 2011 0.43734 0.20769 7.13459
    CGM.AX 2010 - 0.15499 0.26446 0.22807 6.97070
    CGM.AX 2009 - 0.13175 0.32719 6.94133
    CGM.AX 2008 - 0.02141 0.63914 0.22383 6.94586
    CGM.AX 2007 - 0.14034 10.80597 0.08406 6.92174
    CGM.AX 2006 0.42961 0.37287 0.08025 6.85449
    CDT.AX 2017 - 1.34157 109.61333 0.94394 5.53619
    CDT.AX 2016 - 1.24766 95.94615 2.34886 5.38998
    CDT.AX 2015 - 0.78376 60.82037 0.90781 5.67502
    CDT.AX 2014 - 0.33934 40.87309 0.18513 5.94838
    CDT.AX 2013 - 0.21600 17.59932 0.07845 6.36504
    CDT.AX 2012 - 0.12898 11.43324 0.14042 6.62093
    CDT.AX 2011 - 0.01636 10.52734 0.11270 6.94857
    CDT.AX 2010 - 0.07897 15.32129 0.11668 6.89607
    CDT.AX 2009 - 0.74828 19.32075 0.11954 5.93952
    CDT.AX 2008 - 0.16633 11.89941 0.07615 6.17522
    CDT.AX 2007 - 0.16301 13.50769 0.05610 6.39094
    CDT.AX 2006 - 0.04072 10.54762 0.08028 6.53631
    ACS.AX 2017 - 0.07611 0.70817 6.80636
    ACS.AX 2016 - 0.07500 0.33179 0.49811 6.82299
    ACS.AX 2015 - 0.13106 31.48739 0.04227 6.68935
    ACS.AX 2014 - 0.03868 14.71475 0.01227 7.27406
    ACS.AX 2013 - 0.03229 0.01363 7.33274
    ACS.AX 2012 - 0.07118 0.01272 7.34673
    ACS.AX 2011 - 0.03968 0.00872 7.34515
    ACS.AX 2010 - 0.08017 0.06501 0.03051 7.23339
    ACS.AX 2009 - 0.04066 2.17783 0.01876 7.08472
    ACS.AX 2008 - 0.05734 0.24290 0.05078 7.14192
    ACS.AX 2007 - 0.10324 3.48837 0.02731 6.65360
    ACS.AX 2006 - 0.06944 0.50633 0.04601 6.67560
    MTB.AX 2017 - 0.46659 4.31521 5.91632
    MTB.AX 2016 - 1.48764 18.66129 5.25057
    MTB.AX 2015 - 33.69375 156.65259 4.27207
    MTB.AX 2014 - 33.54684 97.25403 4.36078
    MTB.AX 2013 - 1.49375 3.09969 5.71634
    MTB.AX 2012 - 0.05620 0.08268 7.17762
    MTB.AX 2011 - 0.06945 359.63636 0.04685 7.17107
    MTB.AX 2010 - 0.05665 0.03956 7.16164
    MTB.AX 2009 - 0.05762 0.01507 7.17598
    MTB.AX 2008 - 0.06749 0.01717 7.18021
    MTB.AX 2007 - 0.08237 0.00842 7.13152
    MTB.AX 2006 - 0.07369 0.01274 7.12516
    RBX.AX 2017 - 0.67787 0.57170 1.41164 6.05710
    RBX.AX 2016 - 0.53093 1.06794 6.07142
    RBX.AX 2015 - 0.27682 0.61718 6.26085
    RBX.AX 2014 0.59806 0.65958 6.55228
    RBX.AX 2013 0.03015 0.82058 6.61416
    RBX.AX 2012 - 0.07778 0.17023 6.98062
    RBX.AX 2011 - 0.06555 0.17739 6.84316
    RBX.AX 2010 - 0.07915 0.18446 6.86146
    RBX.AX 2009 - 0.18015 0.18693 6.82210
    RBX.AX 2008 - 0.13902 0.15665 6.88098
    RBX.AX 2007 - 0.14644 0.39943 6.24428
    RBX.AX 2006
    AKN.AX 2017 - 2.21257 9.37860 0.65464 5.71814
    AKN.AX 2016 - 0.42054 10.84888 0.04111 6.35441
    AKN.AX 2015 - 0.24696 8.49135 0.07540 6.67257
    AKN.AX 2014 - 0.05663 5.50973 0.06334 7.16439
    AKN.AX 2013 - 0.09448 1.93775 0.19924 7.30691
    AKN.AX 2012 - 0.11980 0.67456 0.06165 7.23711
    AKN.AX 2011 - 0.06847 0.36629 0.02397 7.28684
    AKN.AX 2010 - 0.15152 0.99484 0.05889 6.86693
    AKN.AX 2009 - 0.11908 0.08859 6.79574
    AKN.AX 2008 - 0.06196 0.04093 6.82752
    AKN.AX 2007 - 0.05565 0.00892 5.65072
    AKN.AX 2006 - 0.02909 0.01091 5.43933
    MRR.AX 2017 - 0.41977 12.77106 0.03207 6.06567
    MRR.AX 2016 - 0.86230 7.58261 0.09964 5.95268
    MRR.AX 2015 - 0.28658 7.83351 0.01687 6.19640
    MRR.AX 2014 - 0.18326 0.01228 6.36798
    MRR.AX 2013 - 0.10291 0.02852 0.01068 6.44262
    MRR.AX 2012 - 0.07261 2.82760 0.00668 6.48504
    MRR.AX 2011 0.03515 5.24366
    MRR.AX 2010
    MRR.AX 2009
    MRR.AX 2008
    MRR.AX 2007
    MRR.AX 2006
    TRM.AX 2017 - 0.00422 0.09956 6.91081
    TRM.AX 2016 - 0.00851 0.06882 0.08899 6.89308
    TRM.AX 2015 - 0.02591 1.28617 0.11003 6.88303
    TRM.AX 2014 - 0.03620 0.09821 6.87623
    TRM.AX 2013 - 0.02435 0.62122 0.03535 6.86844
    TRM.AX 2012 0.00418 0.06506 0.04045 6.87061
    TRM.AX 2011 0.04336 0.04025 6.82295
    TRM.AX 2010 0.05312 0.16829 0.05570 6.62582
    TRM.AX 2009 - 0.10229 0.59130 0.04418 6.52697
    TRM.AX 2008 - 0.06111 1.16660 0.03922 6.56055
    TRM.AX 2007 - 0.07484 0.00379 6.51358
    TRM.AX 2006 - 0.21166 0.00648 5.66558
    SHK.AX 2017 - 0.14767 3.02297 7.23899
    SHK.AX 2016 - 0.07280 2.12392 7.35471
    SHK.AX 2015 - 0.11958 1.94563 7.34713
    SHK.AX 2014 - 0.40082 1.68499 7.36349
    SHK.AX 2013 - 0.24143 0.15239 1.13386 7.32706
    SHK.AX 2012 - 0.14063 0.01212 0.55371 7.42744
    SHK.AX 2011 0.10056 0.00545 0.65543 7.49603
    SHK.AX 2010 - 0.14991 0.02313 0.25847 7.54994
    SHK.AX 2009 - 0.06999 14.89601 0.06047 7.25310
    SHK.AX 2008 - 0.03764 0.01818 7.27437
    SHK.AX 2007 - 0.01666 0.01383 7.22076
    SHK.AX 2006 0.04023 0.05745 7.06288
    MMG.AX 2017 0.13883 0.91809 10.27781
    MMG.AX 2016 0.06050 0.93234 10.32446
    MMG.AX 2015 0.02870 0.95453 10.30430
    MMG.AX 2014 0.05715 0.87500 10.21790
    MMG.AX 2013 0.16039 0.65408 9.72055
    MMG.AX 2012 0.16209 0.66398 9.64239
    MMG.AX 2011 0.26854 0.02872 0.58436 9.52860
    MMG.AX 2010 0.29721 0.02891 0.86241 9.53121
    MMG.AX 2009 0.12214 0.00612 0.63062 9.53984
    MMG.AX 2008 0.04663 0.31626 9.21098
    MMG.AX 2007 0.10341 0.27585 9.10573
    MMG.AX 2006 0.20649 0.47994 9.12186
    KGD.AX 2017 - 0.07527 0.00800 6.99974
    KGD.AX 2016 - 0.01410 0.05045 7.55888
    KGD.AX 2015 - 0.01947 0.01003 7.62825
    KGD.AX 2014 - 0.02176 0.00987 7.84639
    KGD.AX 2013 - 0.01517 0.02718 8.06246
    KGD.AX 2012 - 0.02362 52.75198 0.01679 8.05634
    KGD.AX 2011 - 0.00976 0.02900 8.14750
    KGD.AX 2010 - 0.03316 0.03259 8.08258
    KGD.AX 2009 0.02380 7.78115
    KGD.AX 2008 - 0.01001 0.02954 7.79492
    KGD.AX 2007
    KGD.AX 2006
    E3S.AX 2017 48.76815 4.38453
    E3S.AX 2016 5.54055 5.43772
    E3S.AX 2015 0.00324 2.35849 1.83557 5.88703
    E3S.AX 2014 0.06687 26.14904 0.92771 5.91031
    E3S.AX 2013 1.38957 40.73786 0.52644 5.99307
    E3S.AX 2012 0.18993 0.23310 0.03875 6.35597
    E3S.AX 2011 0.05319 6.56573
    E3S.AX 2010 0.24573 6.39863
    E3S.AX 2009 - 0.27949 0.02500 0.17656 6.43616
    E3S.AX 2008 - 0.08384 17.55000 0.09748 6.80489
    E3S.AX 2007 - 0.06442 6.37931 0.12511 6.75089
    E3S.AX 2006 - 0.03660 0.16215 6.95236
    Look forward to helping me with this from the STATAlist forum.

    Thanks,

    Annur

  • #2
    While it is possible to represent missing values with 0 (or, for that matter, any number you wish) in Stata, it is almost always a terrible idea to do so.

    You show a listing of some data, that comes from some source other than a Stata data set -- I know that because RD Intensity is not a legal variable name in Stata. There are various Stata commands, and for that matter, other file conversion programs, that can create a Stata data set from many types of non-Stata original sources. You will need to do that before you can do anything else with your data in Stata. All the ones that I'm familiar with will automatically fill in the "blanks" with Stata's internal missing value representation--in fact I don't even know of any that give you the option of doing anything else. If you have some really good reason for overwriting that with a numerical value, once the data are imported into Stata you can do that--but as I have already said, and cannot emphasize strongly enough, it is very rarely a good idea to do that. Leave the Stata missing values alone unless you have a compelling reason to do otherwise.

    When Stata then uses your data for analysis, it will appropriately handle missing values for you. In regression models, any observation (row of the data matrix) that contains a missing value for any variable mentioned in the regression is omitted from the regression calculations. The handling of missing values by Stata, if you leave them alone, is quite natural and produces sensible and useful results. (Well, some people think that in the context of logical expressions interpreting missing value as "true" is not so sensible and useful, but other than that...) By contrast, if you tamper with the data to replace missing values with zero, those values will be treated just like a real zero--which means you will be using false data and get false results. In short, unless you know that in the creation of the source data set its creators, for some peculiar reason, chose to leave out observations with value zero and just leave those blank, replacing them with zeroes will just contaminate your data set with errors and make everything you do with it wrong.

    Comment


    • #3
      Hi Clyde,

      Thanks for your comment.
      regarding:
      You show a listing of some data, that comes from some source other than a Stata data set
      Yes, I copy the data from my excel spreadsheet instead of the STATA dataset. I was confused whether I leave it blank or zero if put it zero whether it affects the number of observations or not.

      There are various Stata commands,...... All the ones that I'm familiar with will automatically fill in the "blanks" with Stata's internal missing value representation
      I imported from Excel to STATA and found the blank one put as a dot (.) I assume it is blank. That is the reason whether I need to change to zero or leave it as black before importing to STATA. Whether the changes affect the number of observations?

      Look forward to your reply

      Regards,

      Annur

      Comment


      • #4
        Annur:
        1) if the value of your observation is zero, Stata considers it as an observed value and the related observation is included in the -e(sample)- (if you do not impose clauses that rules it out, such as -if-. qualifier);
        2) if the values is reported as missing, Stata will apply listwise deletion and rule out the observation from -e(sample)-.

        That said, your approach does not seem to consider the missing mechanisms underlying unobserved values (missing completely at random; missing at random; missing not at random) that are informative to decide how to deal with missing values (see -mi- entries in Stata .pdf manual).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hi Carlo,

          Thank you for your comment. It is a crystal clear explanation.

          Cheers

          Annur

          Comment


          • #6
            I imported from Excel to STATA and found the blank one put as a dot (.) I assume it is blank.
            The dot you refer to is the way Stata displays missing values of numeric variables when it shows you the data. You should leave those alone. For string variables, missing values are displayed as blank.

            Comment

            Working...
            X