Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Please show the code you used with the wages data. As it is I can't tell what y and x are.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 18.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #17
      You need to use the errorinv and constinv options to xtdpdml. Or, add time dummies to xtreg and not use constinv with xtdpml.

      Code:
      use https://www3.nd.edu/~rwilliam/statafiles/wages
      xtreg lwage union, fe
      xtdpdml lwage union,  ylag(0) errorinv constinv
      xtreg lwage union i.t, fe
      xtdpdml lwage union,  ylag(0) errorinv
      So long as there is no missing data and panels are balanced, xtreg and xtdpdml will produce pretty much identical results. If some data are missing and/or panels are unbalanced, adding the fiml option to xtdpdml may still get you identical results.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 18.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #18
        xtdpdml was developed with the idea that it would be used for models with lagged y. But it may sometimes be preferable to xtreg because you can use fiml to deal with missing data, and you can include time-invariant variables in the model.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 18.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #19
          Hello Richard,

          this was indeed very helpful!
          The results of xtreg can now be reproduced using xtdpdml. This is a big step forward.

          However, as you mentioned, missing values and unbalanced panels are still a problem. I tried to go ahead and reproduce the results with sem, using the code provided by the show option of xtdpdml.
          I am working on a strongly unbalanced panel dataset with many missing values. Using the method(mlmv) option in sem treats those missings as missing at random and still uses them for the analysis, causing consiberable changes in the estimates. On the other hand using the default method(ml) option every id with at least one missing value in any time period or any variable is dropped entirely from the analysis.
          xtreg uses every non-missing observation, whereas sem requires the dataset to be reshaped and thereby causes problems with missings.
          At the end of this post you find sample data using dataex. Furthermore I get the "initial values not feasible" error. Maybe you can quikly see what's the problem here, I can't.
          So one could ask why using sem at all if we have xtdpdml? Because we are able to customize equations (my initial aim).

          First, here is the complete thing using the wages dataset:
          Code:
          use https://www3.nd.edu/~rwilliam/statafiles/wages, clear
          *simple FE estomator:
          xtreg wks lwage union i.t, fe
          *rebuilding using xtdpdml
          xtdpdml wks lwage union,  ylag(0) errorinv fiml show
          Next I set some random datapoints to missing:
          Code:
          replace wks = . in 12
          replace union = . in 12
          replace lwage = . in 12
          replace union = . in 13
          replace lwage = . in 13
          replace lwage = . in 14
          Then re-running the same code from above yields different results for all three estimators:
          Code:
          xtreg wks lwage union i.t, fe
          xtdpdml wks lwage union,  ylag(0) errorinv fiml show
          xtdpdml wks lwage union,  ylag(0) errorinv show
          using xtdpdml we could simply drop observations before estimating, but not if we go for sem due to the reshaping (see the first example in the xtdpdml help file).
          In this case unbalanced panels and missings are a big problem, at least if the panel is strongly unbalanced and missings are frequent.
          So my initial point to estimate the thing with sem seems not easy (inappropriate?) with a messy dataset :-(


          My sample data (actually is looks even worse in pactice):
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input double id float(y x1 x2 x3 time)
           1 11.786157  9.707103  18.12841 10.074078 1
           1 11.798973  9.449182   18.1515 10.205018 2
           1  11.93375   9.82331  18.25295 10.984213 3
           1  12.01614  9.530828 18.360538 11.013732 4
           1 12.049327  9.487215  18.35673  10.99555 5
           1  12.11031  9.694295 18.388964  10.99722 6
           1 12.057282  9.838154 18.406801  11.02168 7
           1   12.0577  9.622662 18.392174 11.158607 8
           2 11.411223 11.071633 16.308449  9.640743 1
           2 11.311464 10.977267 16.254854  9.563765 2
           2 11.363195 11.061585  16.24614  9.515488 3
           2 11.432908  11.13877 16.240402   9.45038 4
           2 11.516933 11.228243  16.29362  9.383066 5
           2 11.549325 11.261554 16.302336   9.31944 6
           2  11.56419         .         .  9.254298 7
           2         .         .         .         . 8
           3 11.358405 11.069918 16.130608  9.655519 1
           3  11.29853  10.97543  16.22441  9.599789 2
           3 11.351833  11.05995 16.207598   9.58542 3
           3 11.424236 11.137315 16.200321  9.528213 4
           3 11.509698  11.22703  16.25329   9.45392 5
           3 11.541402 11.260614 16.261984  9.385996 6
           3 11.555724         .         .  9.314897 7
           3         .         .         .         . 8
           4 12.235353 11.443337 17.604631 10.545232 1
           4 12.184057 11.406787 17.545788 10.461857 2
           4 12.076385 11.216512 17.471231 10.406956 3
           4 12.064768  11.24987  17.45307 10.006947 4
           4  12.01538 11.249166 17.410418  9.807766 5
           4         .         .         .         . 6
           4         .         .         .         . 7
           4         .         .         .         . 8
           5 11.750606 10.529304 17.828917 10.153934 1
           5 11.880262 10.718668 17.928793 10.168695 2
           5   11.8876 10.758957 17.960762 10.187243 3
           5 11.887003  10.70569 17.961767 10.187914 4
           5 11.974668 10.848852  18.02596 10.174105 5
           5 12.007523  10.78065 18.008638  10.22694 6
           5 11.969925 10.856668 18.117105  10.18356 7
           5         .         .         .         . 8
           6  12.73509         . 18.155552 10.389715 1
           6 12.665876 11.991199  18.12661 10.066955 2
           6  12.52977 11.872853 18.031593  9.954779 3
           6 12.404465 11.760746 17.886177  9.866408 4
           6 12.352062         . 17.977362  9.733624 5
           6 11.596765         . 17.355982  9.012803 6
           6 11.528192         .  17.21696  8.642078 7
           6         .         .         .         . 8
           7  9.152067  8.126499 15.127016  5.985201 1
           7  9.342045  8.494836 15.134694  5.913996 2
           7   9.27257         . 15.178274   6.10463 3
           7  9.264415         . 15.219153  6.165418 4
           7  9.325574         .         .  6.189817 5
           7  9.315297         .         .  6.082311 6
           7  9.330119    8.2441 15.238265  6.268924 7
           7         .         .         .         . 8
           8  8.395161  7.100257 14.849283  7.400521 1
           8  9.138773   8.69511 15.054854  7.327986 2
           8         .         .         .         . 3
           8         .         .         .         . 4
           8         .         .         .         . 5
           8         .         .         .         . 6
           8         .         .         .         . 7
           8         .         .         .         . 8
           9 14.843973  12.74147  20.40905 14.637725 1
           9 14.866098 12.671604  20.41227 14.333522 2
           9 14.814757  12.56319  20.50106 14.337147 3
           9 14.929997 12.444302  20.49115 14.679866 4
           9 14.977988  12.46708 20.541845 14.940574 5
           9  14.83565         .         . 15.076637 6
           9  14.84675         .         .   15.0244 7
           9 14.924575  12.12603  20.63342  15.18818 8
          10  10.57446  9.604995  16.10194  8.449834 1
          10  10.57012  9.595175 16.112501  9.313366 2
          10 10.426833  9.579705 16.114246 9.3232355 3
          10 10.598034         . 16.121668  9.300638 4
          10  10.73912         . 16.347372  9.284231 5
          10         .         .         .         . 6
          10         .         .         .         . 7
          10         .         .         .         . 8
          end

          Comment


          • #20
            This is making my head spin. ;-)

            We know that xtdpdml works pretty well with strongly balanced panels with no missing data. We know that FIML can handle some missing data problems.

            I think you could check the number of groups each program said it had to get a feel for how many cases were surviving all the problems.

            I will look more closely later on. In the meantime, if you can just fill in all that missing data this will be a lot easier to handle. ;-)
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 18.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #21
              In your wages example, xtdpdml with fiml comes closest to reproducing the results before some values were set to missing. That probably isn't guaranteed to happen, But, my guess is that fiml would be better in this case and better than xtreg -- xtreg will totally discard the records with missing values, whereas xtdpdml will use what nonmissing information is present.

              We have a short handout comparing multiple imputation and fiml approaches using xtdpdml

              https://www3.nd.edu/~rwilliam/dynamic/mi_xtdpdml.pdf

              In our examples, fiml came out best and was certainly easier than MI. But, we left open the possibility that there might be situations where mi would work better.

              I'll try your data a little later.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 18.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #22
                I can't get xtdpdml to work with your data. But, as is, N = 10 and T = 8, and N = 10 is really much too small.

                I gave the command

                keep if time < 3

                and xtdpdml and xtreg both ran and gave similar results. That gives me hope that xtdpdml will have a fighting chance in the full data set. Unless, of course, N really is only 10.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 18.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #23
                  Of cource N is much larger, but I did not want to provide too much data in this thread.
                  I am also still experimenting. I have an idead which I try to implement: if we can artificially replace every missing data point with a predicted value exactly on the regression line then the results should not change with or without missings. This could then allow us to use sem after reshape.
                  I will come back to that later. It is a little cumbersome with fixed effects.

                  Comment


                  • #24
                    I'd rather use multiple imputation than regression estimates. Regression estimates screw up the standard errors. See pp. 4-5 of

                    https://www3.nd.edu/~rwilliam/xsoc73994/MD01.pdf

                    I'm not clear on why you think FIML is inadequate. But if it is I would consider MI as discussed in https://www3.nd.edu/~rwilliam/dynamic/mi_xtdpdml.pdf
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    StataNow Version: 18.5 MP (2 processor)

                    EMAIL: [email protected]
                    WWW: https://www3.nd.edu/~rwilliam

                    Comment


                    • #25
                      Well, you are right concerning standard errors in linear prediction. I was totally overseeing this. So this will not solve my problem :-(

                      I never used MI, you think this will help? I just wanted to impute values to make use of sem instead of xtdpdml to build a better xtivreg.

                      Also if I set some more values to missing, xtreg and xtdpdml do not coincide. I do not see why the latter should be better. If it (randomly) imputes values then we surely have more obs but they are generated artificially.
                      Try this and you see considerable differences in xtreg and xtdpdml:

                      Code:
                      use https://www3.nd.edu/~rwilliam/statafiles/wages, clear
                      keep wks lwage union id t
                      replace lwage=. if inlist(t,4,6) & id<300
                      xtset id t
                      xtreg wks lwage union i.t, fe
                      xtdpdml wks lwage union,  ylag(0) errorinv show fiml
                      Last edited by Tim Grünebaum; 14 Aug 2019, 08:31.

                      Comment


                      • #26
                        The differences between xtreg and xtdpdml are not that large, especially when you consider that standard errors are pretty large in your example.

                        fiml isn't randomly imputing variables. That would be more like MI. FIML is using ML to come up with the best estimates given the available data. Records with some MD get totally dropped by xtreg. FIML considers the information from the nonmissing data that are available with those data.

                        It may be that fiml is inappropriate given the data. But, your argument is sort of like saying that MI or FIML are always wrong because something is being "made up". There is strong evidence that MI and FIML often work very well and are superior to a listwise deletion approach. In the example I gave in the earlier handout, fiml worked extremely well, almost perfectly replicating the values that were produced before some values were changed to missing.
                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        StataNow Version: 18.5 MP (2 processor)

                        EMAIL: [email protected]
                        WWW: https://www3.nd.edu/~rwilliam

                        Comment


                        • #27
                          Put another way, it isn't like MI or FIML are just making wild guesses about what to do with missing data. They are educated guesses, and, at least under the right conditions, those educated guesses can work very well.
                          -------------------------------------------
                          Richard Williams, Notre Dame Dept of Sociology
                          StataNow Version: 18.5 MP (2 processor)

                          EMAIL: [email protected]
                          WWW: https://www3.nd.edu/~rwilliam

                          Comment


                          • #28
                            Ok, so xtdpdml, fiml is as you say in most cases superior to xtreg, fe. Hm, your package then gives us complete new opportunities to estimate a fixed effects model.
                            I should have a closer look at what fiml really does to draw a conclusion for a specific case I guess. E.g. Stata's impute command replaces missing values with predictions from correlations with (non-missing) other variables in the dataset. I believe this is not always adequate as it could generate artficial correlation across covariates causing near multicollinearity and in some cases endogeneity problems. But maybe this is not what fiml or MI do. I will check.

                            However, if we assume xtdpdml, fiml to be adequate for missing data, then we could simply rebuild it using the corresponding sem command using method(mlmv) as an option. Doing this for my data still yields "initial values not feasible" :-( *Looking for problems*

                            Comment


                            • #29
                              I can’t say it is always or even usually superior. With balanced panels xtdpdml seems to work as well or better as xtreg. Unbalanced panels can sometimes cause the program grief. And xtreg is usually much quicker. Xtreg should be able to handle bigger T better.

                              we weren’t really planning for xtdpdml to be an xtreg substitute but FIML and being able to handle time invariant variables may indeed sometimes make it an attractive alternative.

                              without having your data I can’t say much about why you are having problems. I am guessing unbalanced panels are contributing to the problem. If you are free to email your data and code to me I can take a look.
                              -------------------------------------------
                              Richard Williams, Notre Dame Dept of Sociology
                              StataNow Version: 18.5 MP (2 processor)

                              EMAIL: [email protected]
                              WWW: https://www3.nd.edu/~rwilliam

                              Comment


                              • #30
                                Everything's fine now, there was just something wrong with my xtset command. I am now able to reproduce a simple FE model using xtdpdml for my data with an unbalanced panel of N~40,000 and T~8.
                                Surely this was not the idea of the package but I think this is a pretty nice side effect, for instance if it includes built-in imputaion.

                                I will now try to simulate an xtivreg with custimized iv equations using the corresponding sem command. This would be an even nicer side effect :-)
                                Indeed if the panel is highly unbalanced with many missings, the fiml option will lengthen the estimation a lot. Without fiml we can simply reproduce xtreg, the sem version might suffer from MI. I will have a look now.

                                By the way: can we also estimate System GMM with your command? I was yet using xtabond2 and maybe the results can also be reproduced with maximum likelihood.
                                Last edited by Tim Grünebaum; 15 Aug 2019, 02:59.

                                Comment

                                Working...
                                X