Convert MATA matrix into STATA variables

Sungchul Park

Join Date: Apr 2016

Posts: 26
#1

Convert MATA matrix into STATA variables

03 Oct 2016, 11:42

I am trying to run a linear probability model. In the model, a dependent variable for each unique observation takes values either 0 or 1. My goal is to estimate probabilities of lying within each unique value of the dependent variable after adjusting for independent variables. My codes are as follows:

Code:

forvalues i=1/100 { use "E:\data\sample`i'.dta",clear egen group = group($y) /////////// y is a dependent variable /////////// tempname max sum group scalar `max' = r(max) forvalues a = 1(1)`=`max'' { gen y_`a' = 0 replace y_`a' = 1 if group <= `a' global group "y_*" mata: y = st_data(., "$group") mata: X = st_data(., "$xs") /////////// xs is a vector for independent variables /////////// mata: X = X, J(rows(X),1,1) mata: b = invsym(X'*X)*X'*y mata: yhat = X*b drop $group /// (continued) /// }

Since I am using very large data sets, I try to use MATA and then convert the matrix from MATA into STATA variables. To implement such conversion, the below codes work, but it takes very long time. Thus, I would like to know a more efficient way to convert the matrix from MATA into STATA variables. I have tried to use st_addvar and st_store, but they did not work.

Code:

{ /// (continued) /// tempname max su group scalar `max' = r(max) forvalues a = 1(1)`=`max'' { mata: p_`a' = yhat[., `a'] getmata p_`a', force }

Thank you in advance!
Tags: None
Sergio Correia

Join Date: Apr 2014

Posts: 420
#2

03 Oct 2016, 12:02

Two things:
I don't understand what you ultimately want to achieve; knowing what your real problem is would be useful for a better answer (as it is, your code seems very convoluted, with multiple loops over samples, etc. which seems weird because -regress- will almost always be faster than the mata approach, and that -y- should only take one variable as you said)

For your more inmediate question, check help mf_st_store . The st_store() command in mata might be a bit faster than your getmata approach, although probably not by much.
1 like
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#3

04 Oct 2016, 01:45

A couple of comments:
When speeding up an algorithm you need to take into account the time spent writing the algorithm. You have lost already quite a lot of time writing what you have done. Is the speed-up really going to make up for that? Almost always the answer is no.

You can cut a bit of overhead by using _regress instead of regress.

$ \mathbf{(X'X)^{-1}X'Y} $ is the textbook formula for computing the coefficients, but it is not what modern computer programs do, as it is not the most stable way of doing that computation.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Christophe Kolodziejczyk

Join Date: Mar 2014

Posts: 377
#4

04 Oct 2016, 04:49

I will ad to Maarten's comments

1. cross-products such as x'x are more efficiently computed with cross() or quadcross(). Transposing is actually a costly operation.
2. it's better numerically speaking to use a solver such as qrsolve() to compute the coefficient beta instead of the textbook formula. Generally ols is solved by using the qr decomposition.
Comment
Sungchul Park

Join Date: Apr 2016

Posts: 26
#5

04 Oct 2016, 15:21

Thank you for your suggestions and comments!

For my analysis, I have to necessarily use loop at least once, as shown in my code. Thus, incorporating multiple loops into a single loop and running separate regress within the loop may be more efficient than using mata with multiple loops.
Comment

Announcement

Convert MATA matrix into STATA variables

Comment

Comment

Comment

Comment