Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to do a loop to open files

    I am very new to Stata and would like to write a loop code to open different files, run a regression, and then store the outputs into a results file.
    Is this possible in Stata? I have previous experience with Matlab which works fine, but Matlab doesn't have the statistical test I want and therefore I need to learn Stata.

    There are 195 files, three columns each, the name of each file is HS1 to HS195.

    Please let me know what would be the simplest way to do this.

    Thank you very much.

  • #2
    Well, here is an outline of the loop approach:

    Code:
    forvalues i = 1/195 {
        use HS`i', clear
        /* code for regression
           and tests here
        */
        /* code to save/store results here */
    }
    When you speak of saving the outputs to a results file, there are different kinds of outputs you may have and different kinds of layouts and file types you might want to save them in. Stata has a variety of commands for these various purposes. I suggest you post a first draft of your code, along with an example of the data from one of your files, and a more detailed description of just what results you want and in what way you would like to save them, and then you can get more specific advice about those aspects of your question.

    -forvalues-, and its cousin -foreach- (which itself comes in several flavors) are basic loop constructs for programming in Stata. Do read the chapters on these in the [P] users' manual: you will not get far in Stata without mastering these.

    Please be sure to install the -dataex- command (-ssc install dataex-) and use it to post your example data so that those who would like to help you can easily reproduce your situation faithfully. (-dataex- is very easy to use, see -help dataex- after you install it.) When you post example code, please be sure to do that using code delimiters. (See FAQ #12 if you don't know how those works.)

    Comment


    • #3
      Thank you so much for this - I am in awe of how easier this can be with Stata in comparison to Matlab.

      Code:
      forvalues i = 1/2     {
          insheet using HS`i'.csv, clear
          reg v3 v5 v6
          hettest
          /* code to save/store results here */
      }

      HTML Code:
      
      . forvalues i = 1/2        {
        2.     insheet using HS`i'.csv, clear
        3.     reg v3 v5 v6
        4.     hettest
        5.     /* code to save/store results here */
      . }
      (8 vars, 17 obs)
      
            Source |       SS       df       MS              Number of obs =      17
      -------------+------------------------------           F(  2,    14) =   20.84
             Model |  7.1256e+19     2  3.5628e+19           Prob > F      =  0.0001
          Residual |  2.3930e+19    14  1.7093e+18           R-squared     =  0.7486
      -------------+------------------------------           Adj R-squared =  0.7127
             Total |  9.5186e+19    16  5.9491e+18           Root MSE      =  1.3e+09
      
      ------------------------------------------------------------------------------
                v3 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
                v5 |   1.23e+10   2.18e+09     5.65   0.000     7.64e+09    1.70e+10
                v6 |   7.19e+09   3.60e+09     2.00   0.066    -5.30e+08    1.49e+10
             _cons |  -3.27e+10   1.07e+10    -3.05   0.009    -5.57e+10   -9.67e+09
      ------------------------------------------------------------------------------
      
      Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
               Ho: Constant variance
               Variables: fitted values of v3
      
               chi2(1)      =     1.02
               Prob > chi2  =   0.3120
      (8 vars, 17 obs)
      
            Source |       SS       df       MS              Number of obs =      17
      -------------+------------------------------           F(  2,    14) =  172.47
             Model |  3.4893e+21     2  1.7446e+21           Prob > F      =  0.0000
          Residual |  1.4162e+20    14  1.0116e+19           R-squared     =  0.9610
      -------------+------------------------------           Adj R-squared =  0.9554
             Total |  3.6309e+21    16  2.2693e+20           Root MSE      =  3.2e+09
      
      ------------------------------------------------------------------------------
                v3 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
                v5 |  -5.74e+09   1.98e+10    -0.29   0.776    -4.81e+10    3.66e+10
                v6 |   6.99e+10   5.30e+09    13.20   0.000     5.86e+10    8.13e+10
             _cons |  -2.56e+11   1.68e+10   -15.25   0.000    -2.92e+11   -2.20e+11
      ------------------------------------------------------------------------------
      
      Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
               Ho: Constant variance
               Variables: fitted values of v3
      
               chi2(1)      =     0.00
               Prob > chi2  =   0.9663
      
      .
      end of do-file
      
      
      Now I need to figure out how to save a table for each result with 5 columns, the coef. of V6, t of V6, and P>|t| of V6, the chi2(1), and Prob > chi2.

      I hope I posted the question and accompanying code correctly.

      Thank you very much for your kind advice


      Comment


      • #4
        You might want to look into -esttab- and -estout-, available from ssc (ssc install commandname).

        Comment


        • #5
          Thank you for that suggestion, it helped and I now have the following:

          Code:
          forvalues i = 1/2     {
              insheet using HS`i'.csv, clear
              reg v3 v5 v6
              hettest
              
              
              estimates store HS`i', title(HS`i')
          }
          
          estout HS1 HS2, cells(b(star fmt(3)) se(par fmt(2))) ///
             legend label varlabels(_cons Constant)
          and in addition to the previous output, I now have:

          HTML Code:
          . estout HS1 HS2, cells(b(star fmt(3)) se(par fmt(2))) ///
          >    legend label varlabels(_cons Constant)
          
          ----------------------------------------------------
                                        HS1             HS2  
                                       b/se            b/se  
          ----------------------------------------------------
          v5                      1.233e+10***   -5.738e+09  
                                 (2.18e+09)      (1.98e+10)  
          v6                      7.195e+09       6.992e+10***
                                 (3.60e+09)      (5.30e+09)  
          Constant               -3.267e+10**    -2.559e+11***
                                 (1.07e+10)      (1.68e+10)  
          ----------------------------------------------------
          * p<0.05, ** p<0.01, *** p<0.001
          
          .
          end of do-file
          
          1) I'm not sure how to loop estout HS`i' - I have 195 files so would prefer not to write them
          2) Not sure how to include the results from the httest in the table - ereturn list doesn't seem to have this there

          Thank you very much for your kind consideration and help.

          Comment


          • #6
            You can use estout HS*, which would give you all the HS-results in one table, though that's probably not what you want? That said, I'm not sure how you'd display 195 results in a way that is readable... If you have an idea, we can see if we can translate that into code. As for the hettest, you can use
            Code:
            ...
            hettest
            estadd scalar hettest_chi2 = r(chi2)
            estadd scalar hettest_p = r(p)
            
            estimates store HS`i', title(HS`i')
            ... 
             estout HS1 HS2, cells(b(star fmt(3)) se(par fmt(2))) ///    legend label varlabels(_cons Constant) scalars(hettest_chi2 hettest_p)
            Where the parts in bold are new. Let me know if it works!

            Comment


            • #7
              I just want 5 columns with, file number, coefficient, p-value, h-test chi, h-test p-value

              I tried the additional code, it gives me the following error:

              HTML Code:
              . estout HS1 HS2, cells(b(star fmt(3)) se(par fmt(2))) ///
              >    legend label varlabels(_cons Constant) ///
              >    scalars(hettest_chi2 hettest_p)
              option scalars() not allowed
              r(198);
              
              end of do-file
              

              Comment


              • #8
                I think the option in -estout- is -stats()-, not -scalars()-. But I'm not sure this will get you what you want, as you may end up with all the -hettest- results at the bottom of the table, rather than in parallel with the coefficients. There's probably a way to do this with -esttab- or -estout-, but I am not a big fan of those programs precisely because they are so complicated and have so many options. This is, in turn, a consequence of their being quite flexible in accommodating many different output layouts. But it makes them difficult to use and learn. I find it easier to just "roll my own" with -postfile-s customized to each problem. So something like this:

                Code:
                capture postutil clear
                tempfile results
                postfile handle int filenum v5_coeff v6_coeff het_chi2 het_p using `results'
                
                forvalues i = 1/195 {
                    insheet using HS`i'.csv, clear
                    regress v3 v5 v6
                    local topost (`i') (_b[v5]) (_b[v6])
                    hettest
                    local topost `topost' (r(chi2)) (r(p))
                    post handle `topost'
                }
                postclose handle
                
                use `results', clear
                
                list, noobs clean
                Notes:
                1. Not tested. Beware of typos, unbalanced parentheses or braces, etc.
                2. You can at the end of this code, save these results in a Stata data set, or export them to .csv, or Excel, or do whatever you like with them if you will need them again in the future.
                3. If you are using a current version of Stata, -insheet- has been replaced by -import delimited- (though -insheet- still runs). See the -help import delimited- for details on that command.
                4. Also, if you don't like the formatting ot the results you get from -list-, you can apply your preferred display formats to the variables before the -list- command and then add the -format- option to -list-.
                Last edited by Clyde Schechter; 12 Sep 2016, 11:09.

                Comment


                • #9
                  This works very well and I get the table but I have trouble tweaking it to capture the " P>|t | " values.
                  I tried different names but can't figure it out, any ideas?

                  Thank you so much !!

                  Comment


                  • #10
                    Sorry. For some reason, I thought you only wanted the coefficients of v5 and v6, along with the results of -estat hettest-. Maybe I got this by focusing on the output you showed in #5. I failed to notice that you didn't ask for anything about v5, and you did ask about the t and p-value for v6. So, there are a couple of ways you can do it. Probably the simplest is this.

                    Code:
                    capture postutil clear
                    tempfile results
                    postfile handle int filenum v6_coeff v6_t v6_p het_chi2 het_p using `results'
                    
                    forvalues i = 1/195 {
                        insheet using HS`i'.csv, clear
                        regress v3 v5 v6
                        matrix M = r(table) // MUST HAVE STATA 12 OR LATER
                        local topost (`i') (M[1,2]) (M[3, 2]) (M[4, 2])
                        hettest
                        local topost `topost' (r(chi2)) (r(p))
                        post handle `topost'
                    }
                    postclose handle
                    
                    use `results', clear
                    
                    list, noobs clean
                    Notes:
                    1, Changes from prior version are in bold face.
                    2. This method requires Stata version 12 or later; r(table) does not exist before that. You are supposed to tell us if you are not using current (version 14) Stata, so I am assuming that this won't be a problem for you. If you are, in fact, using Stata 11 or earlier, post back and I'll show you a different way to get these results.
                    3. In the line -local topost (`i') (M[1,2]) (M[3, 2]) (M[4, 2])-, the first subscripts 1, 3, and 4 represent the coefficient, t-statistic- and p-value rows of the r(table) matrix. The second coefficient, 2, refers to the column of r(table) for the second predictor variable in the regression (here, v6). I give you this information in case you need to modify this code to get other statistics, or to get information about other variables. To see the full layout provided by r(table) you can -matrix list M- at the end of the code. (You can do it in the loop, too, but then you will get it listed 195 times.)


                    Last edited by Clyde Schechter; 12 Sep 2016, 18:48. Reason: Correct typo.

                    Comment


                    • #11
                      You have solved my problem, I really appreciate your help on this !!!!

                      Result output of the above code:

                      HTML Code:
                      . capture postutil clear
                      
                      . tempfile results
                      
                      . postfile handle int filenum v6_coeff v6_t v6_p het_chi2 het_p using `results'
                      
                      .
                      . forvalues i = 1/2 {
                        2.     insheet using HS`i'.csv, clear
                        3.     regress v3 v5 v6
                        4.     matrix M = r(table) // MUST HAVE STATA 12 OR LATER
                        5.     local topost (`i') (M[1,2]) (M[3, 2]) (M[4, 2])
                        6.     hettest
                        7.     local topost `topost' (r(chi2)) (r(p))
                        8.     post handle `topost'
                        9. }
                      (8 vars, 17 obs)
                      
                            Source |       SS       df       MS              Number of obs =      17
                      -------------+------------------------------           F(  2,    14) =   20.84
                             Model |  7.1256e+19     2  3.5628e+19           Prob > F      =  0.0001
                          Residual |  2.3930e+19    14  1.7093e+18           R-squared     =  0.7486
                      -------------+------------------------------           Adj R-squared =  0.7127
                             Total |  9.5186e+19    16  5.9491e+18           Root MSE      =  1.3e+09
                      
                      ------------------------------------------------------------------------------
                                v3 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                                v5 |   1.23e+10   2.18e+09     5.65   0.000     7.64e+09    1.70e+10
                                v6 |   7.19e+09   3.60e+09     2.00   0.066    -5.30e+08    1.49e+10
                             _cons |  -3.27e+10   1.07e+10    -3.05   0.009    -5.57e+10   -9.67e+09
                      ------------------------------------------------------------------------------
                      
                      Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
                               Ho: Constant variance
                               Variables: fitted values of v3
                      
                               chi2(1)      =     1.02
                               Prob > chi2  =   0.3120
                      (8 vars, 17 obs)
                      
                            Source |       SS       df       MS              Number of obs =      17
                      -------------+------------------------------           F(  2,    14) =  172.47
                             Model |  3.4893e+21     2  1.7446e+21           Prob > F      =  0.0000
                          Residual |  1.4162e+20    14  1.0116e+19           R-squared     =  0.9610
                      -------------+------------------------------           Adj R-squared =  0.9554
                             Total |  3.6309e+21    16  2.2693e+20           Root MSE      =  3.2e+09
                      
                      ------------------------------------------------------------------------------
                                v3 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                                v5 |  -5.74e+09   1.98e+10    -0.29   0.776    -4.81e+10    3.66e+10
                                v6 |   6.99e+10   5.30e+09    13.20   0.000     5.86e+10    8.13e+10
                             _cons |  -2.56e+11   1.68e+10   -15.25   0.000    -2.92e+11   -2.20e+11
                      ------------------------------------------------------------------------------
                      
                      Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
                               Ho: Constant variance
                               Variables: fitted values of v3
                      
                               chi2(1)      =     0.00
                               Prob > chi2  =   0.9663
                      
                      . postclose handle
                      
                      .
                      . use `results', clear
                      
                      .
                      . list, noobs clean
                      
                          filenum   v6_coeff       v6_t       v6_p   het_chi2      het_p  
                                1   7.19e+09   1.997522   .0655835   1.022158   .3120076  
                                2   6.99e+10   13.19625   2.74e-09   .0017884    .966268  
                      
                      .
                      end of do-file
                      
                      .
                      
                      

                      Thank you so much

                      Comment


                      • #12
                        Hi,

                        I was running a loop but it shows an error: [beta] not found
                        post: above message corresponds to expression 1, variable coeff


                        tempfile coefficients
                        capture postutil clear
                        postfile handle coeff using `coefficients'
                        forvalues i = 1/4 {
                        use country`i', clear
                        reg lncon hhsize i.region,robust
                        post handle (_b[beta])
                        }
                        postclose handle

                        Where am I going wrong? Beginner to stata

                        Comment


                        • #13
                          So, -post handle (_b[beta])- tells Stata to add the coefficient of beta from the most recent regression to a new observation in the postfile. But your regression command has no variable named beta. Replace beta with the name of the variable whose coefficient you are trying to collect. (Or, perhaps there is a variable named beta in your data and it is supposed to be in the regression. In that case, leave the -post handle- command alone and add beta to the regression command.

                          Comment


                          • #14
                            Thank you. This was really helpful.
                            I have another doubt. Can I have RMSE and R square values as well like we record the beta coefficients?
                            How can I do it in a loop?

                            Comment


                            • #15
                              My loop at the moment is this:

                              capture postutil clear
                              tempfile results
                              postfile handle int filenum hhsize_coeff region_coeff using `results'
                              forvalues i = 1/4 {
                              use country`i', clear
                              reg lncon hhsize region
                              matrix M=r(table)
                              local topost(`i') (_b[hhsize]) (_b[region])
                              post handle `topost'
                              }
                              postclose handle
                              preserve
                              use `results', clear


                              I want to include in this RMSE and R sqaured as well while posting

                              Comment

                              Working...
                              X