Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running regressions in loop

    Stata Users

    I have 1,324 observations, and I am trying to run a regression upon 16 variables.

    I believe that I should have 21,184 regression outputs (i.e. 1,324 * 16).

    However, when I am running the loop it is only giving me 1,324 regression outputs

    The code I am using is as follows:

    Code:
    use evtstudydata, clear
    egen obs = group (gtdid)
    sort obs
    local vars atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100
    forvalues i = 1/1324 {
    preserve
    keep if obs == `i'
    reg car `var'
    restore
    }
    I'd be very grateful if you could kindly provide some insights on what I am doing wrong.

    Thank you very much for the help.

  • #2
    If you want to regress car on 16 different independant variables, one at a time, you will have to create a loop across the values of your local macro vars.
    Code:
    foreach var of local vars {
    reg car `var'
    }
    Right now, since the local macro var is undefined, your regression command is effectively
    Code:
    reg car
    which regresses your dependent variable on only the constant term.

    Comment


    • #3
      Parvesh,

      If you are trying to generate a regression for each of the 16 variables separately, you would have to have another loop. I would also suggest a modification that will be more efficient than preserve/restore:

      Code:
      forvalues i=1/1324 {
      foreach var of varlist atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100 {
        regress car `var' if obs==`i'
      }
      }
      Regards,
      Joe

      Comment


      • #4
        Yes, but all of this may miss something even bigger. -regress car `var' if obs == `i'- will carry out a regression only on those observations where obs == `i', which is one group defined by variable gtdid. If each such group consists of just a single observation, then the regressions will all fail because you cannot regress on a single observation (unless there are no predictors and it's constant only). More likely each group actually contains many observations, but in that case there aren't going to be 1,324 groups. So what is needed is:

        Code:
        summ obs
        forvalues i = 1/`r(max)' {
            foreach var of varlist.....{
                regress car `var' if obs == `i'
            }
        }
        By the way, naming a group variable obs is not really a good idea: the name obs suggests that it is identifying individual observations in the data. That's confusing to somebody who is approaching the code fresh, as I am. And it will be equally confusing to you if you have to review this code in a few months after being away from it. It is better to give variables names that suggest what they really are.
        Last edited by Clyde Schechter; 17 Jul 2017, 16:15.

        Comment


        • #5
          Hi guys

          Thanks for the prompt response.

          Well I have conducted an event study on 1324 events and its effect in 16 stock market indices.

          I have calculated the CAR and now I wish to regress these CAR on specific variables.

          My dataset is in long format and as follows:
          Click image for larger version

Name:	Stata.png
Views:	1
Size:	188.8 KB
ID:	1402471



          As u can see the first column lists the CAR - which is 16 CAR for the same date, i.e. one for each of my 16 stock indices. This goes on and I have 1,324 events in total.

          So my Y is the CAR and the X is the columns starting from ATX and so on.

          The regression I wanted to run was, for e.g. the CARs on the 04 Jan 2005 (16 CARS for each index) regressed on each of the columns (16 rows per date) from ATX and so on.

          And loop the same process for all of my events in my sample.

          Comment


          • #6
            Please follow the advice in the FAQ on screenshots and how to post data examples.

            I posted a reply before noticing that values are constant for each X vars by event. Unless I'm missing something, you have to implement a regression without a constant term in this case. With just one independent variable, this reduces to taking the mean of car per event and dividing that by each X.

            Code:
            * create fake data
            clear all
            set seed 123321
            set obs 1324
            gen event_date = mdy(1,4,2005) + (_n-1) * 7
            format %td event_date
            foreach v in atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100 {
                gen `v' = runiform()
            }
            expand 16
            bysort event_date: gen Index = _n
            bysort event_date: gen car = runiform()
            
            
            * By event, regress car on each X vars
            bysort event_date: egen car_mean = mean(car)
            foreach v of varlist atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100 {
                gen b_`v' = car_mean / `v'
            }
            
            * spot check results
            reg car atx if event_date == mdy(1,4,2005), nocons
            list b_atx if event_date == mdy(1,4,2005)
            
            reg car bel20 if event_date == mdy(1,11,2005), nocons
            list b_bel20 if event_date == mdy(1,11,2005)
            and the spot check results
            Code:
            . * spot check results
            . reg car atx if event_date == mdy(1,4,2005), nocons
            
                  Source |       SS           df       MS      Number of obs   =        16
            -------------+----------------------------------   F(1, 15)        =     64.55
                   Model |  4.83779567         1  4.83779567   Prob > F        =    0.0000
                Residual |  1.12411439        15   .07494096   R-squared       =    0.8115
            -------------+----------------------------------   Adj R-squared   =    0.7989
                   Total |  5.96191007        16  .372619379   Root MSE        =    .27375
            
            ------------------------------------------------------------------------------
                     car |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     atx |   2.609326   .3247612     8.03   0.000     1.917114    3.301538
            ------------------------------------------------------------------------------
            
            . list b_atx if event_date == mdy(1,4,2005)
            
                   +----------+
                   |    b_atx |
                   |----------|
                1. | 2.609326 |
                2. | 2.609326 |
                3. | 2.609326 |
                4. | 2.609326 |
                5. | 2.609326 |
                   |----------|
                6. | 2.609326 |
                7. | 2.609326 |
                8. | 2.609326 |
                9. | 2.609326 |
               10. | 2.609326 |
                   |----------|
               11. | 2.609326 |
               12. | 2.609326 |
               13. | 2.609326 |
               14. | 2.609326 |
               15. | 2.609326 |
                   |----------|
               16. | 2.609326 |
                   +----------+
            
            . 
            . reg car bel20 if event_date == mdy(1,11,2005), nocons
            
                  Source |       SS           df       MS      Number of obs   =        16
            -------------+----------------------------------   F(1, 15)        =    103.86
                   Model |  6.60277261         1  6.60277261   Prob > F        =    0.0000
                Residual |  .953594935        15  .063572996   R-squared       =    0.8738
            -------------+----------------------------------   Adj R-squared   =    0.8654
                   Total |  7.55636754        16  .472272971   Root MSE        =    .25214
            
            ------------------------------------------------------------------------------
                     car |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   bel20 |   3.704455   .3634943    10.19   0.000     2.929686    4.479225
            ------------------------------------------------------------------------------
            
            . list b_bel20 if event_date == mdy(1,11,2005)
            
                   +----------+
                   |  b_bel20 |
                   |----------|
               17. | 3.704455 |
               18. | 3.704455 |
               19. | 3.704455 |
               20. | 3.704455 |
               21. | 3.704455 |
                   |----------|
               22. | 3.704455 |
               23. | 3.704455 |
               24. | 3.704455 |
               25. | 3.704455 |
               26. | 3.704455 |
                   |----------|
               27. | 3.704455 |
               28. | 3.704455 |
               29. | 3.704455 |
               30. | 3.704455 |
               31. | 3.704455 |
                   |----------|
               32. | 3.704455 |
                   +----------+
            
            .
            Last edited by Robert Picard; 17 Jul 2017, 19:34.

            Comment


            • #7
              Hi

              I have a sample of firms from 10 industries for 20 years.

              I need to run a regression in loop by year (for one model) and by industry (for another modelo) and I need to save the residuals of theses regressions (for one model) and the coefficient of these regressions (for another model).

              Can anyone help me with this Stata command?

              I appreciated it.

              Ana Siqueira

              Comment


              • #8
                Hi Robert

                Thanks for the reply and for pointing out that the X was constant for each event.

                In fact I made a massive mistake when merging my files to construct my panel data for the regressions. The X shouldn't have been constant!

                I have now corrected it and using all the advice I got on this post, I have been able to get my regressions!

                A massive thanks to William, Joe, Clyde and Robert for your help. I highly appreciate it guys.

                Regards
                Parvesh

                Comment


                • #9
                  Hi guys

                  New problem!

                  When I am exporting my results using the outreg2 function, I end up with the coefficients and standard errors only.

                  Is there a way to get the t-stat and significance level as well?

                  The code I'm using is:

                  Code:
                  use Model_1, clear
                  egen obs = group (gtdid)
                  summ obs
                  sort event_date obs
                  forvalues i = 1/5 {
                  regress car LIQ if obs == `i'
                  outreg2 using 6dayresults.xls, append
                  }
                  Thanks.

                  Parvesh

                  Comment


                  • #10
                    #9 is a new problem, as said, with the user-written outreg2 command (not function!). That is from SSC, as you are asked to explain (FAQ Advice #12).

                    outreg2 is a popular download but its author is not a member here and questions on it are often not answered, partly because it is used by few of the most active people here who answer lots of questions.

                    Regardless of that, I suggest starting a new thread flagging outreg2 in the title. That's the best way to try to catch attention from people who use it (as implied, not including me).

                    Comment


                    • #11
                      re #8, glad you found a mistake because what you were trying to do did not make much sense in my mind. I'll repost my original solution that shows how to perform all these regressions efficiently using rangestat (from SSC). The whole thing runs in less than 2 seconds on my computer.

                      Code:
                      * create fake data
                      clear all
                      set seed 123321
                      set obs 1324
                      gen event_date = mdy(1,4,2005) + (_n-1) * 7
                      format %td event_date
                      expand 16
                      bysort event_date: gen Index = _n
                      foreach v in car atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100 {
                          gen `v' = runiform()
                      }
                      
                      * regress car per event with a bunch of independant variables
                      foreach v of varlist atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100 {
                       rangestat (reg) car `v', interval(event_date 0 0)
                       rename reg_nobs n_`v'
                       rename b_cons a_`v'
                       drop reg_r2 reg_adj_r2 se_`v' se_cons
                      }
                      
                      * spot check results
                      reg car atx if event_date == mdy(1,4,2005)
                      list n_atx b_atx a_atx if event_date == mdy(1,4,2005)
                      
                      reg car bel20 if event_date == mdy(1,11,2005)
                      list n_bel20 b_bel20 a_bel20 if event_date == mdy(1,11,2005)

                      Comment


                      • #12
                        Hi guys.

                        Something related to my regressions...and I'd be very helpful if you could help.

                        I am running my event study using the following code:

                        Code:
                        clear
                        capture cd "XXX"
                        set obs 1
                        g fake = .
                        save evday_car, replace
                        * cleaningevents file
                        import delimited using GTD.csv, clear
                        drop city perpetrator1 guncertain1 perpetrator2 guncertain2 perpetrator3 guncertain3 targettype1 targettype2 targettype3 region attacktype1 attacktype2 attacktype3 weapontype1 weapontype2 weapontype3 weapontype4
                        rename date date_string
                        g date = date(date_string,"DMY")
                        format date %td
                        sort date
                        drop date_string
                        g date_id = _n
                        tsset date_id
                        * Drop events occuring on non-trading days
                        gen dow = dow(date)
                        drop if dow(date)==0 | dow(date)==6
                        drop dow
                        sort date
                        rename date event_date
                        g nnn = 1
                        g obs = _n
                        save eventsdates, replace
                        * Calculating market returns using SP500 as proxy for market portfolio
                        import delimited using sp500.csv, clear
                        rename date date_string
                        rename sp500 market
                        generate date = date(date_string,"DMY")
                        format date %td
                        sort date
                        g date_id = _n
                        keep market date_id date
                        drop if market==.
                        tsset date_id
                        generate returnmarket = ln(market) - ln(L.market)
                        sort date
                        order date, first
                        save marketret, replace
                        * Calculating indices returns and merging with market returns file
                        import delimited using indices.csv, clear
                        rename date date_string
                        generate date = date(date_string,"DMY")
                        format date %td
                        sort date
                        drop date_string
                        g date_id = _n
                        tsset date_id
                        local vars atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100
                        foreach var of local vars {
                        gen return_`var' = ln(`var') - ln(L.`var')
                        }
                        sort date
                        merge 1:1 date using marketret
                        drop _merge market
                        sort date
                        g nnn = 1
                        drop atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100
                        save allreturns, replace
                        * merging events file with returns file
                        use eventsdates, clear
                        drop date_id
                        forvalues i = 1320/1324 {
                        preserve
                        keep if obs == `i'
                        joinby nnn using allreturns
                        sort date
                        drop date_id
                        g date_id = _n
                        gen day_cnt = date_id
                        gen target_day = day_cnt if date==event_date
                        egen max_target_day = max(target_day)
                        gen evday = day_cnt-max_target_day
                        drop day_cnt target_day max_target_day
                        sort evday
                        gen evt_window=1 if evday>=0 & evday<=6
                        gen est_window=1 if evday<=-11 & evday>=-30
                        drop if evt_window==. & est_window==.
                        foreach var of local vars {
                        reg return_`var' returnmarket if est_window==1
                        estimates store ols_dum
                        gen rmse_`var' = e(rmse)
                        predict phat_`var'
                        gen ar_`var' = return_`var' - phat_`var' if evt_window==1
                        drop phat_`var'
                        }
                        drop if evt_window==.
                        drop est_window nnn
                        ***************************************************
                        *Display the CAR and its Test Statistic
                        foreach var of local vars {
                        egen car_`var' = sum(ar_`var')
                        gen tstat_`var' = car_`var'/(rmse_`var'*sqrt(_N))
                        drop return_`var' rmse_`var' ar_`var'
                        }
                        drop returnmarket date_id evday evt_window date
                        * DO EVENT analysis, generate CAR in 1/1
                        keep in 1/1
                        append using evday_car
                        save evday_car, replace
                        restore
                        }
                        use evday_car, clear
                        order event_date, first
                        sort event_date
                        I'd like to:

                        1) put a dummy variable in my regressions such that it identifies an event and assign a value of 0 on event date and 1 otherwise.
                        2) have several dummy variables for each of the event date in my sample.

                        Any ideas how I should proceed with that?

                        The use of the dummy variable is to help me identify which events have a significant impact on a sample of 16 stock market indices.

                        A summary of dataset: 1) Returns and market return data - daily observations from Jan 04 to Dec 16 & 2) Events list - contains 1,324 events

                        Thank you for your help.

                        Regards Parvesh
                        Last edited by Parvesh Seeballack; 20 Jul 2017, 08:04.

                        Comment


                        • #13
                          Hi!
                          I'm using Stata 16 and I want to know the codes to perform a specific logit regression. I spent a while looking for related posts or answers and this post is the closest to my problem that I found.
                          First, my data is:

                          Code:
                          clear
                          input float(sick improv_water toilet interview tp20852 t2m20852 tp20853 t2m20853 tp20854 t2m20854)
                          0 1 1 20968  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
                          0 1 1 20968  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
                          0 1 1 20966  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
                          . 1 1 20966  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
                          . 1 1 20966  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
                          1 1 1 20938  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
                          0 1 1 20938  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
                          0 1 1 20965  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
                          1 1 1 20939  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
                          0 1 1 20938  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
                          1 1 1 20964  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
                          0 1 1 20964  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
                          1 1 1 20936  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
                          1 1 1 20926  77.49091  20.90748 24.808275 21.219606 37.740124  20.88153
                          . 1 1 20978  85.28074  21.28524 20.338476 21.566307 14.748202  21.32637
                          0 1 1 20978  85.28074  21.28524 20.338476 21.566307 14.748202  21.32637
                          end
                          format %tdDD/NN/CCYY interview
                          There are variables that have common prefix as "tp" and "t2m" followed by five numeric digits which are dates expressed -as I understand- in Stata format for dates. So next to "tp" or "t2m" are dates.
                          Interview has dates values.

                          I want to run regress with the following characteristics:
                          logit sick improv_water toilet tp* if interview == * - (4 days ago)
                          Where: * is the date


                          It could be posible that there are posts have already answered my query but I have not found yet. If you know, please let me know the link or help me to resolve my query.

                          Any help will be appreciated

                          Comment


                          • #14
                            It's pretty unclear what you mean by your command and the remark "where * is the date. My best attempt at reading your mind is that you have tp* variables where tp is suffixed by a date (as you described) and that you want to include an observation in the regression if the value of the variable interview is exactly four days earlier than one of those tp suffixes, and, if so, to use only that particular tp variable as a predictor.

                            If that's what you want to do, then:
                            Code:
                            gen long obs_no = _n
                            reshape long tp t2m, i(obs_no) j(the_date)
                            format the_date %tdDD/NN/CCYY
                            
                            logit sick improv_water toilet tp if interview == the_date - 4
                            Note that in your example data, the interview date is never actually four days earlier than any of those suffixed dates in the tp* variables, so this code just exits with a "no observations" error message. Hopefully, that is not the case in your real data.

                            Comment


                            • #15
                              Clyde,
                              thanks your answer. The codes run well but unfortunately when data is reshaped the number of observations increases (is tripled).

                              Further details of data:
                              • it's survey data, in this case, it provides information about and individual within a household.
                              • Now, the sample of tp* variables included dates as sufixes which are considered as values in interview.
                              Code:
                              * Example generated by -dataex-. To install: ssc install dataex
                              clear
                              input str15 household_key str18 member_key float(sick improv_water toilet interview tp20880 t2m20880 tp20881 t2m20881 tp20882 t2m20882 tp21170 t2m21170 tp21171 t2m21171)
                              "      000500401" "      000500401-4"  0 1 1 20880  63.53304 14.855492  65.21365 15.039074  191.2181 14.646427  51.39142 15.769538   77.8743  15.67886
                              "      000510101" "      000510101-5"  0 1 1 20881  63.53304 14.855492  65.21365 15.039074  191.2181 14.646427  51.39142 15.769538   77.8743  15.67886
                              "      000511001" "      000511001-3"  0 1 1 20882  63.53304 14.855492  65.21365 15.039074  191.2181 14.646427  51.39142 15.769538   77.8743  15.67886
                              "      034006501" "      034006501-8"  1 1 1 20880 .00590086  25.71922 1.0614395 26.345863  .5013794 26.261665  2.453223 19.083393 .46007335  19.35805
                              "      034008301" "      034008301-7"  1 1 1 20881 .00590086  25.71922 1.0614395 26.345863  .5013794 26.261665  2.453223 19.083393 .46007335  19.35805
                              "      034009201" "      034009201-4"  1 1 1 20881 .00590086  25.71922 1.0614395 26.345863  .5013794 26.261665  2.453223 19.083393 .46007335  19.35805
                              "      034009201" "      034009201-5"  0 1 1 20881 .00590086  25.71922 1.0614395 26.345863  .5013794 26.261665  2.453223 19.083393 .46007335  19.35805
                              end
                              format %tdDD/NN/CCYY interview
                              The regression I want to perfomed is what you mentioned before:
                              My best attempt at reading your mind is that you have tp* variables where tp is suffixed by a date (as you described) and that you want to include an observation in the regression if the value of the variable interview is exactly four days earlier than one of those tp suffixes, and, if so, to use only that particular tp variable as a predictor.
                              but, I do not want to change the number of observations.

                              Comment

                              Working...
                              X