(I have asked a similar question on StackOverflow. But it's a better idea to post on Mata forum. Once my question be solved, the duplicates will be deleted)
Naive Stata user like me who only intends to do some basic estimation job are often quite unfamiliar with Stata's matrix feature, and Mata syntax. However, in some cases, the technique is needed. For example, recently I want to use the “High-dimensional Lasso IV” approach imposed by Belloni et al.(2014) ( the bottom paragraph of Page40), their method includes a “Preliminary Data Cleaning” :
(1). Create interaction term between every bivariate-variable pair, and high order terms for many variables.
(2) Calculate the correlation matrix and drop those variables with significant high correlation (keep only one variable in this case).
(3) Identify those variables of significant small standard errors and drop them.
Finally, keep the remaining variables and use them as instrumental variables.
The original code was is written in Matlab. Clearly, the first step is easy enough within Stata ( simplest method is also welcomed). But for the remaining two steps, they are essentially a task of logical indexing and some knowledge of matrix manipulation in Mata is required (My observations are over 800).
In my very limited understanding, the outline of code to deal with this problem should be like (only pseudo code, details and corrections needed )
So what I want is how to adjust my code to achieve my goal, or there is more convenient way to finish the same work. Dive into Mata may the final solution but currently, I don't have so much time. Thank you for your help.
Aside, the Matlab code snippet is below
Naive Stata user like me who only intends to do some basic estimation job are often quite unfamiliar with Stata's matrix feature, and Mata syntax. However, in some cases, the technique is needed. For example, recently I want to use the “High-dimensional Lasso IV” approach imposed by Belloni et al.(2014) ( the bottom paragraph of Page40), their method includes a “Preliminary Data Cleaning” :
(1). Create interaction term between every bivariate-variable pair, and high order terms for many variables.
(2) Calculate the correlation matrix and drop those variables with significant high correlation (keep only one variable in this case).
(3) Identify those variables of significant small standard errors and drop them.
Finally, keep the remaining variables and use them as instrumental variables.
The original code was is written in Matlab. Clearly, the first step is easy enough within Stata ( simplest method is also welcomed). But for the remaining two steps, they are essentially a task of logical indexing and some knowledge of matrix manipulation in Mata is required (My observations are over 800).
In my very limited understanding, the outline of code to deal with this problem should be like (only pseudo code, details and corrections needed )
Code:
local varlist var_1 var_2 ······ var_n correlate `varlist' mata: void data_cleaning() /* corr receive value of correlation matrix st_local(corr,"`r(C)'") /* Identify those elements with no significant high correlation index = corr[.,.]<=0.99 /* Create the sub-matrix submat= select(select(corr,index),index) /* How to extract the varlist associated with above sub-matrix ? return list
Aside, the Matlab code snippet is below
Code:
xxinv = inv(x'*x); My = full(y - x*xxinv*(x'*y)); %#ok<*MINV> Md = full(d - x*xxinv*(x'*d)); Mz = full(z - x*xxinv*(x'*z)); I = find(std(Mz) > 1e-6); Mz = Mz(:,I); % Identify those instruments of large enough sd. and keep them (P40) IND_dem = IND_dem(I); namez = namez(I); [I,J] = find(abs(tril(corr(Mz)-eye(size(Mz,2)))) > .99); % Identify those instruments of high bivariate correlated and drop them (P40) drop = unique(I); Mz(:,drop) = []; IND_dem(drop) = []; namez(drop) = []; n = size(Mz,1); p = size(Mz,2); kx = size(x,2);
Comment