
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert MATA matrix into STATA variables

    I am trying to run a linear probability model. In the model, a dependent variable for each unique observation takes values either 0 or 1. My goal is to estimate probabilities of lying within each unique value of the dependent variable after adjusting for independent variables. My codes are as follows:

    forvalues i=1/100 {
    use "E:\data\sample`i'.dta",clear
    egen group = group($y)             /////////// y is a dependent variable ///////////
    tempname max
    sum group
    scalar `max' = r(max)
    forvalues a = 1(1)`=`max'' {
    gen y_`a' = 0
    replace y_`a' = 1 if group <= `a'
    global group "y_*"
    mata: y = st_data(., "$group")
    mata: X = st_data(., "$xs")      /////////// xs is a vector for independent variables ///////////
    mata: X = X, J(rows(X),1,1)
    mata: b = invsym(X'*X)*X'*y
    mata: yhat = X*b
    drop $group
    /// (continued) ///
    Since I am using very large data sets, I try to use MATA and then convert the matrix from MATA into STATA variables. To implement such conversion, the below codes work, but it takes very long time. Thus, I would like to know a more efficient way to convert the matrix from MATA into STATA variables. I have tried to use st_addvar and st_store, but they did not work.

    /// (continued) ///
    tempname max
    su group
    scalar `max' = r(max)
    forvalues a = 1(1)`=`max'' {
    mata: p_`a' = yhat[., `a']
    getmata p_`a', force
    Thank you in advance!

  • #2
    Two things:
    • I don't understand what you ultimately want to achieve; knowing what your real problem is would be useful for a better answer (as it is, your code seems very convoluted, with multiple loops over samples, etc. which seems weird because -regress- will almost always be faster than the mata approach, and that -y- should only take one variable as you said)
    • For your more inmediate question, check help mf_st_store . The st_store() command in mata might be a bit faster than your getmata approach, although probably not by much.


    • #3
      A couple of comments:
      • When speeding up an algorithm you need to take into account the time spent writing the algorithm. You have lost already quite a lot of time writing what you have done. Is the speed-up really going to make up for that? Almost always the answer is no.
      • You can cut a bit of overhead by using _regress instead of regress.
      • \( \mathbf{(X'X)^{-1}X'Y} \) is the textbook formula for computing the coefficients, but it is not what modern computer programs do, as it is not the most stable way of doing that computation.
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz


      • #4
        I will ad to Maarten's comments

        1. cross-products such as x'x are more efficiently computed with cross() or quadcross(). Transposing is actually a costly operation.
        2. it's better numerically speaking to use a solver such as qrsolve() to compute the coefficient beta instead of the textbook formula. Generally ols is solved by using the qr decomposition.


        • #5
          Thank you for your suggestions and comments!

          For my analysis, I have to necessarily use loop at least once, as shown in my code. Thus, incorporating multiple loops into a single loop and running separate regress within the loop may be more efficient than using mata with multiple loops.

