Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace missing data with mean of previous and next years data

    Hello everybody,

    I have a dataset where some data are missing for the year 2011.
    Now I want to replace this missing data (with missing data I mean the "NA" once) with the mean of the 2010 and 2012 values.

    For example
    2010 --------- 10
    2011 --------- "NA"
    2012 --------- 14

    (so 2011 should be 12)

    I tried this in every variation, with and without space, destring... but I get the error code 109 (mostly).
    The code I tried e.g. is

    replace revenue = (revenue[_n-1]+revenue[_n+1])/2 if revenue=="NA"&year==2011

    even this does not work and the restriction that this should only happen if 2010 and 2012 have no "NA" in their data
    (should be something like this but does not work revenue[_n-1]>="."&year==2010 & revenue[_n+1]>="."&year==2012

    I am very grateful for any advice!




  • #2
    You cannot do calculations using strings. You need to create a numerical variable using destring.

    Code:
    destring revenue, force gen(Revenue)
    xtset firm year
    replace Revenue= (L.Revenue + F.Revenue)/2 if year==2011 & missing(Revenue)
    Using time series operators (L. and F.) ensures that the previous and next years are 2010 and 2012, respectively.

    Comment


    • #3
      Lena:
      you can also consider-ipolate- .
      Andrew's helpful advice of converting -string-variables into their numeric counterparts before running any calculation in Stata still holds.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Gentlemen, I have a similar problem, but I can't solve it with the xtset command because I have a time variable (datadate) repeated in the dataset. I would like to fill the missing values contained in double variables with the content of the next observation (not missing). How can I do this?

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long(gvkey datadate) double(ebit intc irei isgr isgu pll secu) long cik
         1619 14609        . . . . .       . 248.849 3133
         1619 14609 1169.369 . . . . 165.626       . 3133
         1619 14975        . . . . .       .  107.55 3133
         1619 14975  900.537 . . . .   227.6       . 3133
         1619 15340        . . . . .       .  20.179 3133
         1619 15340 1002.107 . . . .   187.1       . 3133
         1619 15705        . . . . .       . -66.842 3133
         1619 15705 1083.354 . . . .  213.55       . 3133
         1619 16070        . . . . .       .  36.326 3133
         1619 16070 1103.763 . . . .   173.7       . 3133
         1619 16436        . . . . .       .  56.834 3133
         1619 16436 1152.139 . . . .  127.75       . 3133
         1619 16801        . . . . .       . 129.571 3133
         1619 16801 1180.843 . . . .   93.95       . 3133
        11842 14609        . . . . .       .  38.782 7789
        11842 14609  315.111 . . . .  19.243       . 7789
        11842 14975        . . . . .       . -15.581 7789
        11842 14975  314.313 . . . .  20.206       . 7789
        11842 15340        . . . . .       . -51.513 7789
        11842 15340  332.662 . . . .   28.21       . 7789
        end
        format %d datadate

        Kind regards,
        Lucas

        Comment


        • #5
          Lucas
          welcome to this forum.
          Have you taken a look at -ipolate-?
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Gentlemen
            Please don't ever do this. You're trying to be polite, but "Gentlemen" in English implies males only, which is not appropriate here.

            That said, I can't see that you're reading the problem with your data correctly. You don't need replacement of missing values or interpolation at all.


            Code:
            collapse ebit intc irei isgr isgu pll secu cik, by(gvkey datadate) 
            
            l, sepby(gvkey)
            
                 +-------------------------------------------------------------------------------------+
                 | gvkey    datadate       ebit   intc   irei   isgr   isgu       pll      secu    cik |
                 |-------------------------------------------------------------------------------------|
              1. |  1619   31dec1999   1169.369      .      .      .      .   165.626   248.849   3133 |
              2. |  1619   31dec2000    900.537      .      .      .      .     227.6    107.55   3133 |
              3. |  1619   31dec2001   1002.107      .      .      .      .     187.1    20.179   3133 |
              4. |  1619   31dec2002   1083.354      .      .      .      .    213.55   -66.842   3133 |
              5. |  1619   31dec2003   1103.763      .      .      .      .     173.7    36.326   3133 |
              6. |  1619   31dec2004   1152.139      .      .      .      .    127.75    56.834   3133 |
              7. |  1619   31dec2005   1180.843      .      .      .      .     93.95   129.571   3133 |
                 |-------------------------------------------------------------------------------------|
              8. | 11842   31dec1999    315.111      .      .      .      .    19.243    38.782   7789 |
              9. | 11842   31dec2000    314.313      .      .      .      .    20.206   -15.581   7789 |
             10. | 11842   31dec2001    332.662      .      .      .      .     28.21   -51.513   7789 |
                 +-------------------------------------------------------------------------------------+
            Last edited by Nick Cox; 25 Feb 2020, 02:38.

            Comment


            • #7
              Dear Nick, my mistake. Sorry. Thanks for your prompt reply. Worked perfectly!

              Comment

              Working...
              X