Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate average within-unit correlation of variable across time

    I have a panel dataset at the firm-year level. I would first like to calculate, within each firm, the correlation of variable x across years. I would then like to calculate the average of that correlation across all firms. I am unsure how best to accomplish this. I have considered calculating a correlation matrix for each firm, then taking the average of each matrix. Something like the below code (taken from this discussion):
    Code:
    levelsof firm, local(levls)
    foreach i of local levels {
        corr demand2012 demand2013 demand2014 demand2015 demand2016 demand2017 demand2018 if firm == `i'
    mata { 
        C = st_matrix("r(C)")
        mean(abs(select(vech(C), vech(C) :< 1)))
    }
    Running this displays the average of the correlation matrix below the diagnol for each firm in the results window. However, I am unfamiliar with mata and thus am not sure how to store the results as variable values.

    I am wondering a) if it would be possible to add to the above code to store the results as variable values or b) if there is a more efficient/accurate way of calculating the within-firm correlation of variable x across years.

    I have gone through documentation and the usual search routes and haven't yet found a solution. I am using Stata 15.

  • #2
    When you say "store the results as variables", how would you want the variables to be arranged? Perhaps instead of using Mata, you could use Stata's -svmat- or -svmat2- or even -svmatsv-. (The last two of these are in collections published by Nick Cox on SSC.) If you use -dataex- to give us a data fragment on which your code runs, we could perhaps be more helpful.

    Comment


    • #3
      Thanks to Mead Over for the mention. However, svmat2 and svmatsv have never been posted on SSC but rather at

      Code:
      net describe dm79, from(http://www.stata.com/stb/stb56)
      STB-56 means Stata Technical Bulletion 56, which appeared in July 2000. The issue can be seen at https://www.stata.com/products/stb/journals/stb56.pdf

      A different solution to saving correlations as data can be found in corrci from the Stata Journal

      . search corrci, sj

      Search of official help files, FAQs, Examples, and Stata Journals

      SJ-20-4 pr0041_3 . . . . . . . . . . . . . . . . . Software update for corrci
      (help corrci, corrcii if installed) . . . . . . . . . . . . N. J. Cox
      Q4/20 SJ 20(4):1028--1030
      corrects code for a bias correction used if (and only if) the
      fisher option is specified

      SJ-17-3 pr0041_2 . . . . . . . . . . . . . . . . . Software update for corrci
      (help corrci, corrcii if installed) . . . . . . . . . . . . N. J. Cox
      Q3/17 SJ 17(3):779
      new options added

      SJ-10-4 pr0041_1 . . . . . . . . . . . . . . . . . Software update for corrci
      (help corrci, corrcii if installed) . . . . . . . . . . . . N. J. Cox
      Q4/10 SJ 10(4):691
      update to fix corrci so that it always saves r-class results

      SJ-8-3 pr0041 . Speaking Stata: Corr. with confidence, Fisher's z revisited
      (help corrci, corrcii if installed) . . . . . . . . . . . . N. J. Cox
      Q3/08 SJ 8(3):413--439
      reviews Fisher's z transformation and its inverse, the
      hyperbolic tangent, and reviews their use in inference
      with correlations


      I am not, however, especially optimistic about the value of averaged correlations.

      Comment


      • #4
        Thank you Mead Over and Nick Cox for your responses. Here is a fragment of my dataset:
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str76 employer_name double(demand2012 demand2013 demand2014 demand2015 demand2016 demand2017 demand2018)
        `""EXCELLENTMPUTINGDISTRIBUTORS""'                      0 1 0  0  0  0  0
        `""IHAVEADREAM"FOUNDATION"'                             1 0 1  0  0  0  0
        `""K"LINEAMERICA"'                                      5 0 1  4  0  0  0
        `""K"LINELOGISITCS(USA)"'                               1 0 0  0  0  0  0
        "'K'LINELOGISTICS"                                      2 0 0  1  1  0  0
        "'TKLOKHUIS"                                            0 0 0  0  0  0  1
        "(AMBI)DEXTROASSET"                                     0 1 0  0  1  0  0
        "(BRCONSULTANTS)"                                       0 0 0  1  0  0  0
        "(HOUSEHOLDOFSANDRASIRUGOJORGEDJAURY)"                  0 1 1  1  0  0  0
        "(HOUSEHOLDOFSANDRASIRUGOJORGEEJAURY)"                  1 0 0  0  0  0  0
        "(INC)"                                                 0 0 0  0  3  0  0
        "(USA)"                                                 3 4 1  5  1  6  4
        "+421FOUNDATION"                                        0 0 0  0  1  0  0
        "+VANTAGE"                                              0 0 0  1  1  0  0
        "/IISOFTWARE"                                           0 0 0  0  0  0  1
        "/RECRUITERS"                                           2 0 0  0  0  0  0
        "004"                                                   0 0 1  0  0  0  0
        "012"                                                   1 1 0  2  0  1  1
        "015"                                                   1 1 0  0  0  0  0
        "01INTERACTIVE"                                         0 0 0  0  0  0  8
        "03091776GEORGIA"                                       0 1 0  0  2  0  0
        "05"                                                    0 1 0  0  0  0  1
        "0956588BC"                                             0 0 0  0  0  4  2
        "0965688BC"                                             0 0 0  1  2  0  1
        "0CHA"                                                  0 0 0  0  0  0  4
        "0XDATA"                                                1 5 4  1  0  0  0
        "1"                                                     0 0 0  2  0  0  0
        "1/0MORTGAGE"                                           0 0 0  1  0  0  0
        "10"                                                    0 0 1  0  0  0  0
        I am also not especially optimistic about the value of averaged correlations. Unfortunately, it is the best solution I can think of for deriving a single correlation statistic over time. Using corrci, I wrote the following:
        Code:
        // average within-firm correlation of demand across years
        *create numeric firm identifier
        
        *calculate correlations across years for each firm and save results to tempfiles
        gen emp_id = _n
        #d ;
        local m = 1 ;
        forval y = 1/185085 { ;
            tempfile temp`m' ;
            corrci demand2012 demand2013 demand2014 demand2015 demand2016 demand2017 demand2018
            if emp_id == `y', 
            saving(`temp`n'', replace) ;
            local m = `m' + 1 ;
        } ;
        #d cr
        
        *collapse each tempfile to get each firm's average demand across all years
        forval y = 1/`n' {
            use `temp`y'', clear
            gen emp_id = `y'
            collapse (mean) r, by(emp_id)
            save temp1`y'
        }
        
        *append the averages into one dataset
        use `temp11', clear
        forval y = 2/`n' {
            append using `temp1`y''
        }
        
        *average across all firms
        sum r
        Unfortunately, the first loop returns the error code "insufficient observations" for all firms. This is most likely because n = 7 for each firm and also because for many firms it is the case that x = 0 for almost all years. I run into the same issue if I instead try to use corr.

        At this point I think it is clear that trying to calculate averaged correlations will not be viable or useful. It would still be helpful to learn if there are alternative methods to deriving a single correlation statistic for a variable across time. Otherwise, I have no further questions.

        Comment


        • #5
          So using your example data, i reshaped it like this:

          Code:
          reshape long demand, i(emp) j(yr)
          sort emp yr
          yielding:

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str36 employer_name int yr byte demand
          `""EXCELLENTMPUTINGDISTRIBUTORS""'     2012 0
          `""EXCELLENTMPUTINGDISTRIBUTORS""'     2013 1
          `""EXCELLENTMPUTINGDISTRIBUTORS""'     2014 0
          `""EXCELLENTMPUTINGDISTRIBUTORS""'     2015 0
          `""EXCELLENTMPUTINGDISTRIBUTORS""'     2016 0
          `""EXCELLENTMPUTINGDISTRIBUTORS""'     2017 0
          `""EXCELLENTMPUTINGDISTRIBUTORS""'     2018 0
          `""IHAVEADREAM"FOUNDATION"'            2012 1
          `""IHAVEADREAM"FOUNDATION"'            2013 0
          `""IHAVEADREAM"FOUNDATION"'            2014 1
          `""IHAVEADREAM"FOUNDATION"'            2015 0
          `""IHAVEADREAM"FOUNDATION"'            2016 0
          `""IHAVEADREAM"FOUNDATION"'            2017 0
          `""IHAVEADREAM"FOUNDATION"'            2018 0
          `""K"LINEAMERICA"'                     2012 5
          `""K"LINEAMERICA"'                     2013 0
          `""K"LINEAMERICA"'                     2014 1
          `""K"LINEAMERICA"'                     2015 4
          `""K"LINEAMERICA"'                     2016 0
          `""K"LINEAMERICA"'                     2017 0
          `""K"LINEAMERICA"'                     2018 0
          `""K"LINELOGISITCS(USA)"'              2012 1
          `""K"LINELOGISITCS(USA)"'              2013 0
          `""K"LINELOGISITCS(USA)"'              2014 0
          `""K"LINELOGISITCS(USA)"'              2015 0
          `""K"LINELOGISITCS(USA)"'              2016 0
          `""K"LINELOGISITCS(USA)"'              2017 0
          `""K"LINELOGISITCS(USA)"'              2018 0
          "'K'LINELOGISTICS"                     2012 2
          "'K'LINELOGISTICS"                     2013 0
          "'K'LINELOGISTICS"                     2014 0
          "'K'LINELOGISTICS"                     2015 1
          "'K'LINELOGISTICS"                     2016 1
          "'K'LINELOGISTICS"                     2017 0
          "'K'LINELOGISTICS"                     2018 0
          "'TKLOKHUIS"                           2012 0
          "'TKLOKHUIS"                           2013 0
          "'TKLOKHUIS"                           2014 0
          "'TKLOKHUIS"                           2015 0
          "'TKLOKHUIS"                           2016 0
          "'TKLOKHUIS"                           2017 0
          "'TKLOKHUIS"                           2018 1
          "(AMBI)DEXTROASSET"                    2012 0
          "(AMBI)DEXTROASSET"                    2013 1
          "(AMBI)DEXTROASSET"                    2014 0
          "(AMBI)DEXTROASSET"                    2015 0
          "(AMBI)DEXTROASSET"                    2016 1
          "(AMBI)DEXTROASSET"                    2017 0
          "(AMBI)DEXTROASSET"                    2018 0
          "(BRCONSULTANTS)"                      2012 0
          "(BRCONSULTANTS)"                      2013 0
          "(BRCONSULTANTS)"                      2014 0
          "(BRCONSULTANTS)"                      2015 1
          "(BRCONSULTANTS)"                      2016 0
          "(BRCONSULTANTS)"                      2017 0
          "(BRCONSULTANTS)"                      2018 0
          "(HOUSEHOLDOFSANDRASIRUGOJORGEDJAURY)" 2012 0
          "(HOUSEHOLDOFSANDRASIRUGOJORGEDJAURY)" 2013 1
          "(HOUSEHOLDOFSANDRASIRUGOJORGEDJAURY)" 2014 1
          "(HOUSEHOLDOFSANDRASIRUGOJORGEDJAURY)" 2015 1
          "(HOUSEHOLDOFSANDRASIRUGOJORGEDJAURY)" 2016 0
          "(HOUSEHOLDOFSANDRASIRUGOJORGEDJAURY)" 2017 0
          "(HOUSEHOLDOFSANDRASIRUGOJORGEDJAURY)" 2018 0
          "(HOUSEHOLDOFSANDRASIRUGOJORGEEJAURY)" 2012 1
          "(HOUSEHOLDOFSANDRASIRUGOJORGEEJAURY)" 2013 0
          "(HOUSEHOLDOFSANDRASIRUGOJORGEEJAURY)" 2014 0
          "(HOUSEHOLDOFSANDRASIRUGOJORGEEJAURY)" 2015 0
          "(HOUSEHOLDOFSANDRASIRUGOJORGEEJAURY)" 2016 0
          "(HOUSEHOLDOFSANDRASIRUGOJORGEEJAURY)" 2017 0
          "(HOUSEHOLDOFSANDRASIRUGOJORGEEJAURY)" 2018 0
          "(INC)"                                2012 0
          "(INC)"                                2013 0
          "(INC)"                                2014 0
          "(INC)"                                2015 0
          "(INC)"                                2016 3
          "(INC)"                                2017 0
          "(INC)"                                2018 0
          "(USA)"                                2012 3
          "(USA)"                                2013 4
          "(USA)"                                2014 1
          "(USA)"                                2015 5
          "(USA)"                                2016 1
          "(USA)"                                2017 6
          "(USA)"                                2018 4
          "+421FOUNDATION"                       2012 0
          "+421FOUNDATION"                       2013 0
          "+421FOUNDATION"                       2014 0
          "+421FOUNDATION"                       2015 0
          "+421FOUNDATION"                       2016 1
          "+421FOUNDATION"                       2017 0
          "+421FOUNDATION"                       2018 0
          "+VANTAGE"                             2012 0
          "+VANTAGE"                             2013 0
          "+VANTAGE"                             2014 0
          "+VANTAGE"                             2015 1
          "+VANTAGE"                             2016 1
          "+VANTAGE"                             2017 0
          "+VANTAGE"                             2018 0
          "/IISOFTWARE"                          2012 0
          "/IISOFTWARE"                          2013 0
          end
          Then I encoded the employer name variable with

          Code:
          encode employer_name, gen(emp)
          
          . tab demand
          
               demand |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    0 |        146       71.92       71.92
                    1 |         37       18.23       90.15
                    2 |          7        3.45       93.60
                    3 |          2        0.99       94.58
                    4 |          6        2.96       97.54
                    5 |          3        1.48       99.01
                    6 |          1        0.49       99.51
                    8 |          1        0.49      100.00
          ------------+-----------------------------------
                Total |        203      100.00
          Now I am in the position to wonder what you mean by a correlation coefficient for each employer. Perhaps you mean "On a average across all employers, what is the correlation between the current year's demand and the previous year's demand?" One approach to answer thi question is to estimate the regression of current year's demand as a function of the previous year's demand for the same employer. If you impose the same constant term across all employers, you could estimate this relationship by:

          Code:
          . reg demand L.demand, noemptycells noomitted
          
                Source |       SS           df       MS      Number of obs   =       174
          -------------+----------------------------------   F(1, 172)       =     12.56
                 Model |  17.5182482         1  17.5182482   Prob > F        =    0.0005
              Residual |  239.838074       172  1.39440741   R-squared       =    0.0681
          -------------+----------------------------------   Adj R-squared   =    0.0627
                 Total |  257.356322       173   1.4876088   Root MSE        =    1.1809
          
          ------------------------------------------------------------------------------
                demand |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                demand |
                   L1. |   .2919708   .0823738     3.54   0.001     .1293772    .4545644
                       |
                 _cons |   .3827502   .0985403     3.88   0.000     .1882462    .5772543
          ------------------------------------------------------------------------------
          This result suggests there is positive autocorrelation over time for the average employer. But if you allow each employer to have its own constant term, the autocorrelation disappears in this data extract containing only 29 employers:

          Code:
          . areg demand L.demand, noemptycells noomitted a(emp)
          
          Linear regression, absorbing indicators         Number of obs     =        174
          Absorbed variable: emp                          No. of categories =         29
                                                          F(   1,    144)   =       0.75
                                                          Prob > F          =     0.3874
                                                          R-squared         =     0.3197
                                                          Adj R-squared     =     0.1827
                                                          Root MSE          =     1.1027
          
          ------------------------------------------------------------------------------
                demand |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                demand |
                   L1. |   -.085676   .0988234    -0.87   0.387     -.281008    .1096559
                       |
                 _cons |   .5715737   .0971046     5.89   0.000     .3796391    .7635082
          ------------------------------------------------------------------------------
          F test of absorbed indicators: F(28, 144) = 1.902             Prob > F = 0.008
          (Perhaps on your complete sample, the autocorrelation would remain statistically significant even after absorbing a constant term for each employer.)

          Or perhaps you mean "How is the demand for one employer correlated with the demand for other employers in the same year? There are a variety of ways to address that question. One would need to start by considering whether the firms are perfectly competitive with one another or only imperfectly competitive. (Or you might be able to group the firms into "markets" within which the firms products are substitutable with one another. Remembering that the demand for good i is positively correlated with the demand for a complement and negatively correlated with the demand for a substitute might help to structure the groups.) Whether or not one groups the firms into "product categories", one might want to model each individual employer as a Stackleberg leader or, alternatively as a Stackleberg follower. These assumptions would guide the construction of the model and thence the estimation approach.

          One more note, which perhaps should have been my first and only comment: As you point out in #4 and is illustrated by the above tabulation on your data extract, your demand variable is very sparse. Therefore it contains very little information. One wonders if demand could not be measured with more granularity. Or perhaps the proper way to think about your demand variable is that it is the observed manifestation of an unobservable (or "latent") demand which must be at least equal to 1.0 before it is measured. Once you have sorted out the question you want to ask and the theoreticlal framework for your model, you might think about a hurdle model, like -heckman- or -twopm-, but you would a few exogenous variables to identify such a model. Type "search hurdle" or "search "heckman" or "search twopm" or "search cmp" from inside Stata to see many different estimators.
          Last edited by Mead Over; 15 Jul 2021, 15:38.

          Comment


          • #6
            Thank you Mead Over for all of this. I will consult with my collaborators and consider your suggestions.

            Comment

            Working...
            X