Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can I generate a cross-moment matrix and then use it in repeated regressions?

    Is it possible to compute the cross-moment matrix of a set of variables, and then tell the -regress- command to use that matrix instead of re-calculating it again? This has two benefits. The first is for performance, but the second is that it allows me to ensure that any regression using a SUBSET of the variables in the cross-moment matrix is run over *precisely* the same sample. For example, if I build this data set:
    Code:
    cls
    freduse UNRATE GDPC1, clear
    rename UNRATE unemp
    rename GDPC1 gdp
    gen t = qofd(daten)
    collapse (mean) unemp gdp, by(t) fast
    tsset t, q
    gen lgdp = log(gdp)
    and run two regressions:
    Code:
    regress gdp L4.gdp L4.unemp
    regress gdp L4.gdp L3.unemp
    these two regressions repeat some calculations internally and also use slightly different samples (because L4.unemp has one fewer observations than L3.unemp).

    Note that the example in the documentation for -matrix accum- ( [P] matrix accum ) isn't helpful here because it's just the basic calculation of
    Code:
    syminv(XX)*Xy
    not the more complex (and more useful) calculation that first calculates the cross-moment matrix and then uses it to estimate a regression involving a SUBSET of the variables in it instead of the entire matrix.

    For example, in RATS, I can compute a cross-moment matrix using the CMOMENT command, and then tell LINREG to use that matrix when performing its calculation.
    Code:
    cmoment(noprint)
    # gdp{0 to 4} unemp{1 to 4} constant
    
    linreg(cmom, print) gdp
    # gdp{4} unemp{4} constant
    
    linreg(cmom, print) gdp
    # gdp{4} unemp{3} constant
    This guarantees that the regressions are run on *exactly* the same sample, and the cross-moment matrix isn't calculated multiple times.

    Is this at all possible in Stata? This is an useful feature for applied time series work, especially in econometrics, in which model selection methods often require running similar regressions repeatedly, while guaranteeing that changes in the results are driven only by changes in the variables included, NOT by changes in the sample.
    Last edited by Michael Anbar; 03 Dec 2015, 15:01.

  • #2
    Also posted on Stack Overflow.
    Is it possible to compute the cross-moment matrix of a set of variables, and then tell the regress command to use that matrix instead of re-calculating it again? This has two benefits. The first is...

    Comment


    • #3
      Michael also mentioned this in the Wish List for Stata 15 thread. But, if I understand the Q, the wish has already been fulfilled. See pages 8-10 of

      http://www3.nd.edu/~rwilliam/stats2/OLS-Stata9.pdf

      for a discussion of how to analyze means, correlations, and standard deviations using the corr2data command.

      As Clyde pointed out in the Wish List thread, you can also use sem with ssd. Michael responded

      -sem- doesn't support factor-variable notation (according to the linear regression example in [sem] intro 6, Structural models 1). I can bypass this by using -xi-, but as the documentation states, factor variables are the recommended method (unless, of course, the command doesn't support them). Since -gsem- supports them, though, maybe that's where I should look.
      But if you are analyzing covariances you can't use factor variables anyway. Students who don't read my notes carefully are always asking questions like "how come gender has codes like -3.219, 2.72, etc." There are an infinite number of ways to create data that reproduce the the correlation matrix; corr2data will reproduce the correlations but it won't reproduce the original data set.

      In the other thread Michael also raises concerns about massive data sets. Again, not an issue once you have created the covariances. If you have lots and lots of variables in the model, that may slow sem down, but the N of the data set is not an issue if you are using the summary statistics.

      Whether Michael can actually use summary statistics is another matter. If he is always changing his models, adding new variables, modifying his sample, or whatever, he may have to break down and use the original data all the time.

      One other trick with corr2data: lets say you have 100 million cases. You could just have corr2data create a data set with 100 cases; and then, on your regress command, say fw=1000000. You don't need a lot of cases for corr2data to accurately reproduce a covariance matrix.

      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      Stata Version: 17.0 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Also, I suspect you would have to create lagged vars, e.g.

        Code:
        gen L4gdp = L4.gdp
        and then use them when creating the covariance matrix.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        Stata Version: 17.0 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment

        Working...
        X