Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Produce Linear Regressions for Time Series Data

    Hello, I am a pretty novice Stata user with some knowledge on linear regression and basic Stata commands. I am analyzing time series rainfall data in Ethiopia villages (each observation in ea_id2). The trouble is there are roughly 600 villages, each with 35 years of data for a total of 19,000 observations.

    I used a simple tabstat, by command to give the rainfall for each village averaged over 35 years.

    ea_id2 | mean
    --------------+--------------
    10101088 | 69.40667
    10102088 | 72.355
    10103010 | 71.81194
    10103088 | 72.39486
    10105088 | 78.20653
    10106088 | 97.18694
    10107010 | 74.47611
    10201088 | 63.62139
    10202020 | 48.58444
    10202088 | 52.20139
    10203088 | 51.30889
    10204088 | 56.76083
    10206088 | 75.1075
    10207010 | 67.77722
    10208088 | 55.24208
    10209088 | 53.89843
    10212010 | 56.98778
    10301088 | 41.93361
    10303010 | 46.71806


    In addition to producing the mean, I want to produce simple linear regressions for each village as a function of time. Do I create a new variable to use in tabstat? Or do I need to use matrices or moving averages? I have no experience in either, so assistance would be greatly appreciated.

  • #2
    In addition to producing the mean, I want to produce simple linear regressions for each village as a function of time.
    What does this mean? Other than that you want to do one regression of some kind for each village, I can't tell what you want here. What is the outcome variable of the regression? What are the predictor variables? Which results are you interested in retaining and what do you want to do with them? What does this have to do with -tabstat-?

    When responding to these questions, please include an example of your data. The data should include perhaps three or four villages, and probably 10 years of data for each of those would be sufficient. Be sure to use -dataex- to do this.

    If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.



    Last edited by Clyde Schechter; 05 Nov 2018, 19:02.

    Comment


    • #3
      Hi Clyde,

      Thank you for the response, and my apologies for the misconception! As I noted, I am pretty new to Stata and certainly the forum.

      Here's an example of the data from dataex
      ea_id2 year chirps_yr_mean
      10101088801600 1981 93.59
      10101088801600 1982 62.17
      10101088801600 1983 67.03
      10101088801600 1984 59.13
      10101088801600 1985 78.28
      10101088801600 1986 78.89
      10101088801600 1987 65.68
      10101088801600 1988 74.94
      10101088801600 1989 57.04
      10101088801600 1990 49.9
      10101088801600 1991 61.64
      10101088801600 1992 59.54
      10101088801600 1993 72.99
      10101088801600 1994 68.34
      10101088801600 1995 69.14
      10101088801600 1996 68.66
      10101088801600 1997 73.66
      10101088801600 1998 73.85
      10101088801600 1999 72.54
      10101088801600 2000 73.64
      10101088801600 2001 65.16
      10101088801600 2002 61.44
      10101088801600 2003 70.85
      10101088801600 2004 64.07
      10101088801600 2005 70.41
      10101088801600 2006 72.5
      10101088801600 2007 84.04
      10101088801600 2008 82.38
      10101088801600 2009 63.87
      10101088801600 2010 74.94
      10101088801600 2011 68.89
      10101088801600 2012 79.61
      10101088801600 2013 61.19
      10101088801600 2014 68.48
      10101088801600 2015 63.94
      10101088801600 2016 66.22

      And there are about 600 other ea_id2 variable observations (10102088801400, 10103010100100, and so on)

      Originally posted by Clyde Schechter View Post
      What does this mean? Other than that you want to do one regression of some kind for each village, I can't tell what you want here. What is the outcome variable of the regression? What are the predictor variables? Which results are you interested in retaining and what do you want to do with them? What does this have to do with -tabstat-?
      My goal is to obtain a linear regression of mean annual rainfall against time for each village. In other words, for each village I want to see how average rainfall changes over the 35 years.
      Last edited by Tyler Clark; 06 Nov 2018, 18:38.

      Comment


      • #4
        Well, your -dataex- output was truncated at both the beginning and the end. Next time please pay careful attention to the instructions that -dataex- displays that explain where to begin and end your copy/paste operation. Also if you want to do something repetitively by village, when showing a data example it would be sensible to show data from at least 2 different villages.

        And you don't say which regression results you are interested in or what you want to do with them. It is an easy one-liner to just run village-specific regressions, but that will generate an enormous amount of output, too much to process by eye/brain, but leave no usable electronic compendium of results. So I'll just assume you want the regression coefficient of year and its standard error, along with the number of observations and the R2 from each regression, and that you want them stored in the original data set next to the observations themselves.

        Code:
        capture program drop one_village
        program define one_village
            regress chirps_yr_mean year
            gen b = _b[year]
            gen se = _se[year]
            gen n_obs = e(N)
            gen r2 = e(r2)
            exit
        end
        
        runby one_village, by(ea_id2)
        You need to install the -runby- program to do this. It is written by Robert Picard and me, and is available from SSC.


        Comment


        • #5
          Another way to do it is with rangestat (SSC):

          Code:
          rangestat (reg) chirps_yr_mean year, int(ea_id2  0 0)
          The only limit on this I can see is if your identifier is string, in which case it's two lines not one.


          Code:
          egen n_ea_id2 = group(village) 
          rangestat (reg) chirps_yr_mean year, int(n_ea_id2  0 0)

          Comment


          • #6
            village above should be ea_id2

            Comment


            • #7
              Thank you for the assistance, rangestat is exactly what I needed.

              Final question, is there a way to use rangestat to t-test the regression coefficient?

              Comment


              • #8
                You cannot do that within -rangestat-. But it is simple enough to do post hoc:

                Code:
                gen t_year = b_year/se_year
                gen p_year = 2*t(reg_nobs - 2, -abs(t_year))

                Comment


                • #9
                  The spirit of rangestat is loosely that we produce basic statistics and not those easily calculated from those basic statistics. The letter of rangestat is that it is often what its authors care about most....

                  Under a quite different heading. I would sit very, very loose to P-values for linear trends in time in rainfall. Serial dependence of errors could mess up all such P-values all too easily, especially as a linear trend is at best a crude first approximation to any overall pattern. Also, it is far from obvious that even annual rainfall is Gaussian conditionally on a mean function.

                  The sample series in #3 yields a regression of 69.44 + 0.0244 (year - 2000) with R-sq of 0.1%, so that's a non-starter any way. (Units are perhaps inches rather than mm?)

                  (For why one should shift origin, as above. see Section 2 https://www.stata-journal.com/sjpdf....iclenum=st0394)
                  Last edited by Nick Cox; 09 Nov 2018, 13:25.

                  Comment


                  • #10
                    I would not use P-values here but intercept (estimated rainfall in 2000) and slope (for convenience perhaps multiplied by 10 or 100 as linear change over a decade or century).

                    Comment

                    Working...
                    X