Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • accounting for missing values when generating 3-year average variables

    Hello,

    I am combining 2 panel datasets to estimate a cross-lagged panel model. The variable ori is the unit identifier and year is the time identifier. One dataset has variables from six waves of data collection that occurred during the years 2000, 2003, 2007, 2013, 2016, and 2020 and the other has variables for every year between 2000-2020. In the dataex example below, the numftsworn variable comes from the 6-wave dataset and the crime variables (murder, assault, and robbery) come from the yearly dataset. With the yearly crime dataset (2000-2020), I am trying to create 3-year averages for each wave of the first dataset. For instance, the new 2000 value for the crime variables will equal (the 2000 value + the 2001 value + the 2002 values)/3. The new 2003 value will equal (the 2002 value + the 2003 value + the 2004 value)/3 and so on. I pasted the code I wrote below to create these variables. I am hoping someone can help me with accounting for missing data. If all 3 years are missing I am hoping the value on `var'_r will be missing. If 2 years are missing I am hoping it will use the 1 nonmissing year and divide by 1. If 1 year is missing I am hoping it will use the 2 nonmissing values and divide by 2. Thank you very much for any help you can provide.


    Code:
    foreach var in `srsvars' {
        bysort ori (year) : gen `var'_r = (`var'[_n] + `var'[_n+1] + `var'[_n+2])/3 if year==2000
        bysort ori (year) : replace `var'_r = (`var'[_n] + `var'[_n+1] + `var'[_n-1])/3 if year==2003
        bysort ori (year) : replace `var'_r = (`var'[_n] + `var'[_n+1] + `var'[_n-1])/3 if year==2007
        bysort ori (year) : replace `var'_r = (`var'[_n] + `var'[_n+1] + `var'[_n-1])/3 if year==2013
        bysort ori (year) : replace `var'_r = (`var'[_n] + `var'[_n+1] + `var'[_n-1])/3 if year==2016
        bysort ori (year) : replace `var'_r = (`var'[_n] + `var'[_n-1] + `var'[_n-2])/3 if year==2020
    }

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str11 ori int(year actual_murder) long(actual_assault_total actual_robbery_total numftsworn)
    "AL0011200" 2000  3  208  70 127
    "AL0011200" 2001  .    .   .   .
    "AL0011200" 2002  1  182  82   .
    "AL0011200" 2003  0  192 113 136
    "AL0011200" 2004  3   84  97   .
    "AL0011200" 2005  .    .   .   .
    "AL0011200" 2006  2  822  81   .
    "AL0011200" 2007  0  880  71 156
    "AL0011200" 2008  2  878  62   .
    "AL0011200" 2009  2  848  44   .
    "AL0011200" 2010  1  876  47   .
    "AL0011200" 2011  0  779  35   .
    "AL0011200" 2012  2  858  46   .
    "AL0011200" 2013  2  740  34 158
    "AL0011200" 2014  1  706  50   .
    "AL0011200" 2015  3  786  51   .
    "AL0011200" 2016  3  819  40 160
    "AL0011200" 2017  2  726  33   .
    "AL0011200" 2018  2  406  18   .
    "AL0011200" 2019  4  796  27   .
    "AL0011200" 2020  6  699  19 173
    "AL0020000" 2000  7  773  45 148
    "AL0020000" 2001  4  668  50   .
    "AL0020000" 2002  5  647  48   .
    "AL0020000" 2003  4  651  45 146
    "AL0020000" 2004  2  672  30   .
    "AL0020000" 2005  6  574  42   .
    "AL0020000" 2006  5  579  63   .
    "AL0020000" 2007  7  644  80 160
    "AL0020000" 2008  9  621  84   .
    "AL0020000" 2009  5  445  27   .
    "AL0020000" 2010  5 1954  30   .
    "AL0020000" 2011  6 2187  43   .
    "AL0020000" 2012  6 2095  30   .
    "AL0020000" 2013  5 1875  40 162
    "AL0020000" 2014  3 1627  15   .
    "AL0020000" 2015  4 1557  21   .
    "AL0020000" 2016  9 1736  26 153
    "AL0020000" 2017  5 1552  21   .
    "AL0020000" 2018  1 1395  23   .
    "AL0020000" 2019  3  836  16   .
    "AL0020000" 2020  7  646   4 179
    "AL0020100" 2000 20 5969 896 552
    "AL0020100" 2001 42 7194 840   .
    "AL0020100" 2002 40 9301 752   .
    "AL0020100" 2003 24 6469 739 529
    "AL0020100" 2004 27 8966 673   .
    "AL0020100" 2005 35  459 597   .
    "AL0020100" 2006 34 9303 685   .
    "AL0020100" 2007 38 8924 750 526
    "AL0020100" 2008 42 8513 875   .
    "AL0020100" 2009 24 8319 857   .
    "AL0020100" 2010 25 4877 653   .
    "AL0020100" 2011 30  904 637   .
    "AL0020100" 2012 32  777 460   .
    "AL0020100" 2013 29 7511 459 541
    "AL0020100" 2014 31 7809 428   .
    "AL0020100" 2015 24 7521 402   .
    "AL0020100" 2016 44 7586 438 493
    "AL0020100" 2017 50 8490 440   .
    "AL0020100" 2018 26 8166 378   .
    "AL0020100" 2019  .    .   .   .
    "AL0020100" 2020  9 2018  49 491
    "AL0030000" 2000  2  542  16 122
    "AL0030000" 2001  2  495  18   .
    "AL0030000" 2002  1  528  21   .
    "AL0030000" 2003  3  531  18 121
    "AL0030000" 2004  1  490   6   .
    "AL0030000" 2005  .    .   .   .
    "AL0030000" 2006  .    .   .   .
    "AL0030000" 2007  0  487  16 128
    "AL0030000" 2008  0  476   9   .
    "AL0030000" 2009  0  545   9   .
    "AL0030000" 2010  0  488   5   .
    "AL0030000" 2011  3  462  10   .
    "AL0030000" 2012  0  378   6   .
    "AL0030000" 2013  0   69   3 125
    "AL0030000" 2014  0   50   7   .
    "AL0030000" 2015  0   75   6   .
    "AL0030000" 2016  1  348   8 121
    "AL0030000" 2017  0  327  14   .
    "AL0030000" 2018  0  336  10   .
    "AL0030000" 2019  2  193   8   .
    "AL0030000" 2020  1  187   3 119
    "AL0380100" 2000  7 1784 103 159
    "AL0380100" 2001  3  929  83   .
    "AL0380100" 2002  1 1234  90   .
    "AL0380100" 2003  4 1518 110 155
    "AL0380100" 2004  9 2430 124   .
    "AL0380100" 2005  8 2266 156   .
    "AL0380100" 2006  7 2196 189   .
    "AL0380100" 2007  8 2425 190 148
    "AL0380100" 2008  3 2133 189   .
    "AL0380100" 2009  1 2322 168   .
    "AL0380100" 2010  1 2274 118   .
    "AL0380100" 2011  2 2232 106   .
    "AL0380100" 2012  5 2409 100   .
    "AL0380100" 2013  4 1996  78 158
    "AL0380100" 2014  1 1414  45   .
    "AL0380100" 2015  1 1770  64   .
    end
    label values numftsworn FTSWORN

  • #2
    I think this does what you want:
    Code:
    gen lower = 2000 if year == 2000
    gen upper = 2002 if year == 2000
    foreach y of numlist 2003 2007 2013 2016 {
        replace lower = `y' - 1 if year == `y'
        replace upper = `y' + 1 if year == `y'
    }
    replace lower = 2018 if year == 2020
    replace upper = 2020 if year == 2020
    
    foreach v of varlist actual_murder - actual_robbery_total {
        rangestat (mean) `v'_r = `v', by(ori) interval(year lower upper)
        replace `v'_r = . if missing(lower, upper)
    }
    -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

    Comment


    • #3
      I am happy to endorse @Clyde Schechter's advice.

      Backing up, the problem here is that missing values contaminate any sum and that the divisor of 3 needs to be corrected if any are present. It could be done directly if long-windedly.

      Here is one way. Each term in the numerator needs to be more like
      Code:
       
       cond(missing(`var'[_n]), 0, `var`[_n]) 
      and instead of 3 the denominator needs to have 3 terms like
      Code:
      !missing(`var'[_n])

      Comment


      • #4
        Thank you both very much for your time and assistance. I went with rangestat and it worked perfectly. It is great to learn these things!

        Comment


        • #5
          You may also want to check out Stata's inbuilt command
          Code:
          tssmooth ma
          for creating moving averages. The command accounts for missing observations in the way you want. You might do something like this:

          Code:
          encode ori, gen(ori_num) // this is because xtset can't work with a string variable
          xtset ori_num year
          
          foreach var of varlist actual_murder - actual_robbery_total {
              tssmooth ma `var'_base = `var', window(1 1 1)
              tssmooth ma `var'_lead = `var', window(0 1 2)
              tssmooth ma `var'_lag = `var', window(2 1 0)
              gen `var'_r = `var'_base if inlist(year, 2003, 2007, 2013, 2016)
              replace `var'_r = `var'_lead if year == 2000
              replace `var'_r = `var'_lag if year == 2020
              drop `var'_base `var'_lead `var'_lag
          }
          
          drop ori_num

          Comment

          Working...
          X