Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop for replacing missing value by previous or next value in panel data

    Good morning,
    I'm trying to replace missing value with loop, but it doesn't work well. It just replaces the first missing value and let the others unchanged. Here is my code

    Code:
    foreach v of varlist GDP-Export {
          {
          gen `v'_new = `v'
                  if `v'_new[_n-1]!=.{
                   by id (time), sort: replace `v'_new= `v'_new[_n-1] if missing(`v'_new)
                                          }
                                    else{
                   by id (time), sort: replace `v'_new= `v'_new[_n+1] if missing(`v'_new)
                                         }
            }
    }
    Thank you in advance for your help

  • #2
    You are confusing the -if- command with the -if- qualifier. Stata has both and they do different things. In particular, the -if- command that you have used is not carried out observation by observation. It is carried out once and only once, at the time it is encountered. The reference to `v'_new[_n-1] there is understood to be carried out with _n = 1, and, of course there is no such thing as `v'_new[0], so the condition is never met, and everything it guards is then skipped. The -else- part is applied (but, again, only once) and then we go to the next iteration of the -foreach v of varlist GDP-Export- loop.

    So the -if- command is intended for conditioning execution on some global state of the data/system, it does not select which observations the commands are applied to. Selecting observations is accomplished using the -if- qualifier added to the command itself.

    Actually, what you want to do is even simpler using the -cond()- function. What I think you want is:
    Code:
    gen `v'_new = `v'
    by id (time), sort: replace `v'_new = cond(!missing(`v'[_n-1], `v'[_n-1], `v'[_n+1]) if missing(`v')

    Comment


    • #3
      Jojo:
      as an aside to Clyde's excellent advice, I was wondering whether you have sound methodological reasons to replace missing values with the previous or next observed ones (see, if interested, https://missingdata.lshtm.ac.uk/).
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Good morning,
        Thank you very much (to you both) for helping me.
        But, Clyde there is a problem with the code
        Code:
        gen `v'_new = `v' by id (time), sort: replace `v'_new = cond(!missing(`v'[_n-1], `v'[_n-1], `v'[_n+1]) if missing(`v')
        It returns error : "unknown function ()"

        Comment


        • #5
          References to time of day can look pretty odd in an international forum.

          More seriously, your code when reformatted

          Code:
          gen `v'_new = `v'
          by id (time), sort: replace `v'_new = cond(!missing(`v'[_n-1], `v'[_n-1], `v'[_n+1]) if missing(`v')
          looks fine to me.The problem may be trying to execute code with local macro references line by line. It must all be run in the same namespace.

          Comment


          • #6
            Thank you very much.
            I wish you an happy new year!!!

            Comment


            • #7
              Hi everyone,

              Thank you for posting this thread, I am trying to do the same thing and replace the observations of several variables with the first nonmissing observation. I used the above code for my database and I am running into the same issue that this only works for the first observation/ID (I understand the description about "if" command). Here is a sample of my database and the codes. What am I doing wrong?


              Thank you,


              foreach v of varlist rbbb twipr epsilon tad twiinf qrsfinf {
              gen `v'_new = `v'
              by newid (date), sort: replace `v'_new = cond(!missing(`v'[_n-1]), `v'[_n-1], `v'[_n+1]) if missing(`v')
              }



              . dataex newid date rbbb twipr epsilon tad twiinf qrsfinf

              ----------------------- copy starting from the next line -----------------------
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input byte newid int date byte(rbbb twipr epsilon tad twiinf qrsfinf)
              1 17195 . . . . . .
              1 17391 0 4 0 0 0 1
              1 17643 0 4 0 0 0 1
              1 18077 0 5 0 0 0 2
              1 18281 0 4 0 0 0 1
              1 18465 . . . . . .
              1 18742 0 4 0 0 0 2
              1 18854 . . . . . .
              1 18938 . . . . . .
              1 19026 . . . . . .
              1 19246 . . . . . .
              1 19260 . . . . . .
              1 19285 . . . . . .
              1 19296 0 4 0 0 0 3
              1 19297 . . . . . .
              1 19428 . . . . . .
              1 19456 . . . . . .
              1 19457 . . . . . .
              1 19486 0 4 0 0 0 3
              1 19501 . . . . . .
              1 19590 . . . . . .
              1 20177 0 5 0 0 0 3
              1 21108 0 4 0 0 0 3
              2 18018 . . . . . .
              2 18037 . . . . . .
              2 18038 0 1 0 0 0 0
              2 18455 0 1 0 0 1 0
              2 18770 0 4 0 0 2 0
              2 19303 0 4 0 0 2 0
              2 19470 0 4 0 0 2 0
              2 20079 0 3 0 0 2 0
              2 20478 0 4 0 0 2 0
              2 20541 0 4 0 0 2 0
              2 21073 0 3 0 0 1 0
              2 21227 0 4 0 0 2 0
              2 21738 0 3 0 0 2 0
              3 15985 . . . . . .
              3 17867 . . . . . .
              3 18044 1 6 0 1 0 0
              3 18045 . . . . . .
              3 18047 . . . . . .
              3 18051 . . . . . .
              3 18154 . . . . . .
              3 19170 . . . . . .
              3 19288 . . . . . .
              3 19330 8 8 8 8 8 8
              3 19451 8 8 8 8 8 8
              3 19928 . . . . . .
              4 18324 . . . . . .
              4 18358 . . . . . .
              4 18368 0 3 0 0 0 0
              4 18784 . . . . . .
              4 19421 0 4 0 0 0 0
              4 19960 0 3 0 0 0 0
              4 20576 0 4 0 1 0 3
              4 20583 . . . . . .
              4 20703 . . . . . .
              4 20898 0 4 0 1 0 3
              4 21112 . . . . . .
              4 21143 0 3 0 1 0 3
              4 21507 0 3 0 1 0 3
              4 21627 . . . . . .
              4 21636 . . . . . .
              4 21693 0 4 0 1 0 3
              4 21747 0 3 0 1 0 3
              5 17801 . . . . . .
              5 18007 0 6 0 0 3 0
              5 18350 0 6 0 0 3 0
              5 18508 . . . . . .
              5 18700 0 6 0 0 3 0
              5 18921 . . . . . .
              5 19015 . . . . . .
              5 19106 . . . . . .
              5 19227 . . . . . .
              5 19316 0 6 0 0 3 0
              5 19417 . . . . . .
              5 19498 0 6 0 0 3 0
              5 19499 . . . . . .
              5 19589 . . . . . .
              5 19666 . . . . . .
              5 19750 0 6 0 0 3 0
              5 19757 . . . . . .
              5 19933 . . . . . .
              5 20128 . . . . . .
              5 20139 . . . . . .
              5 20310 0 6 0 0 3 0
              5 20412 . . . . . .
              5 20506 0 6 0 0 3 0
              5 20524 . . . . . .
              5 20614 . . . . . .
              5 20699 . . . . . .
              5 20716 . . . . . .
              5 20717 . . . . . .
              5 20730 . . . . . .
              5 20779 . . . . . .
              5 20828 . . . . . .
              5 20865 0 6 0 0 3 0
              5 20888 . . . . . .
              5 20912 0 4 0 0 3 0
              5 21038 . . . . . .
              end
              format %tdnn/dd/CCYY date
              ------------------ copy up to and including the previous line ------------------

              Comment


              • #8
                To Clarify, I am trying to "carry backward", replacing the missing values of a variable with the first nonmissing

                Comment


                • #9
                  I think I found another way to do this, by sorting date from most recent to oldest and creating an obs variable. This seems to work for my dataset replacing the missing observations with the first nonmissing.

                  gsort newid -date
                  by newid: gen obs=_n
                  order obs newid date
                  foreach v of varlist rbbb twipr epsilon tad twiinf qrsfinf {
                  by newid (obs), sort: replace `v' = `v'[_n-1] if missing(`v')
                  }
                  sort newid date obs










                  Comment


                  • #10
                    Brookline:
                    are you aware that NOCB (Next Observation Carried Backward) approach can severely (and unpredictably) bias your subsequent statistical analyses (see, if interested, https://missingdata.lshtm.ac.uk/)?
                    Kind regards,
                    Carlo
                    (StataNow 18.5)

                    Comment


                    • #11
                      See also https://www.stata.com/support/faqs/d...issing-values/ as a complement to @Carlo Lazzaro's good advice.

                      Comment

                      Working...
                      X