Running regressions in loop

Parvesh Seeballack

Join Date: May 2017

Posts: 41
#1

Running regressions in loop

17 Jul 2017, 15:05

Stata Users

I have 1,324 observations, and I am trying to run a regression upon 16 variables.

I believe that I should have 21,184 regression outputs (i.e. 1,324 * 16).

However, when I am running the loop it is only giving me 1,324 regression outputs

The code I am using is as follows:

Code:

use evtstudydata, clear egen obs = group (gtdid) sort obs local vars atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100 forvalues i = 1/1324 { preserve keep if obs == `i' reg car `var' restore }

I'd be very grateful if you could kindly provide some insights on what I am doing wrong.

Thank you very much for the help.
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

17 Jul 2017, 15:26

If you want to regress car on 16 different independant variables, one at a time, you will have to create a loop across the values of your local macro vars.

Code:

foreach var of local vars { reg car `var' }

Right now, since the local macro var is undefined, your regression command is effectively

Code:

reg car

which regresses your dependent variable on only the constant term.
Comment
Joe Canner

Join Date: Mar 2014

Posts: 580
#3

17 Jul 2017, 15:29

Parvesh,

If you are trying to generate a regression for each of the 16 variables separately, you would have to have another loop. I would also suggest a modification that will be more efficient than preserve/restore:

Code:

forvalues i=1/1324 { foreach var of varlist atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100 { regress car `var' if obs==`i' } }

Regards,
Joe
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#4

17 Jul 2017, 16:09

Yes, but all of this may miss something even bigger. -regress car `var' if obs == `i'- will carry out a regression only on those observations where obs == `i', which is one group defined by variable gtdid. If each such group consists of just a single observation, then the regressions will all fail because you cannot regress on a single observation (unless there are no predictors and it's constant only). More likely each group actually contains many observations, but in that case there aren't going to be 1,324 groups. So what is needed is:

Code:

summ obs forvalues i = 1/`r(max)' { foreach var of varlist.....{ regress car `var' if obs == `i' } }

By the way, naming a group variable obs is not really a good idea: the name obs suggests that it is identifying individual observations in the data. That's confusing to somebody who is approaching the code fresh, as I am. And it will be equally confusing to you if you have to review this code in a few months after being away from it. It is better to give variables names that suggest what they really are.

Last edited by Clyde Schechter; 17 Jul 2017, 16:15.
Comment
Parvesh Seeballack

Join Date: May 2017

Posts: 41
#5

17 Jul 2017, 17:53

Hi guys

Thanks for the prompt response.

Well I have conducted an event study on 1324 events and its effect in 16 stock market indices.

I have calculated the CAR and now I wish to regress these CAR on specific variables.

My dataset is in long format and as follows:

As u can see the first column lists the CAR - which is 16 CAR for the same date, i.e. one for each of my 16 stock indices. This goes on and I have 1,324 events in total.

So my Y is the CAR and the X is the columns starting from ATX and so on.

The regression I wanted to run was, for e.g. the CARs on the 04 Jan 2005 (16 CARS for each index) regressed on each of the columns (16 rows per date) from ATX and so on.

And loop the same process for all of my events in my sample.
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

17 Jul 2017, 19:01

Please follow the advice in the FAQ on screenshots and how to post data examples.

I posted a reply before noticing that values are constant for each X vars by event. Unless I'm missing something, you have to implement a regression without a constant term in this case. With just one independent variable, this reduces to taking the mean of car per event and dividing that by each X.

Code:

* create fake data
clear all
set seed 123321
set obs 1324
gen event_date = mdy(1,4,2005) + (_n-1) * 7
format %td event_date
foreach v in atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100 {
    gen `v' = runiform()
}
expand 16
bysort event_date: gen Index = _n
bysort event_date: gen car = runiform()


* By event, regress car on each X vars
bysort event_date: egen car_mean = mean(car)
foreach v of varlist atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100 {
    gen b_`v' = car_mean / `v'
}

* spot check results
reg car atx if event_date == mdy(1,4,2005), nocons
list b_atx if event_date == mdy(1,4,2005)

reg car bel20 if event_date == mdy(1,11,2005), nocons
list b_bel20 if event_date == mdy(1,11,2005)

and the spot check results

Code:

. * spot check results
. reg car atx if event_date == mdy(1,4,2005), nocons

      Source |       SS           df       MS      Number of obs   =        16
-------------+----------------------------------   F(1, 15)        =     64.55
       Model |  4.83779567         1  4.83779567   Prob > F        =    0.0000
    Residual |  1.12411439        15   .07494096   R-squared       =    0.8115
-------------+----------------------------------   Adj R-squared   =    0.7989
       Total |  5.96191007        16  .372619379   Root MSE        =    .27375

------------------------------------------------------------------------------
         car |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         atx |   2.609326   .3247612     8.03   0.000     1.917114    3.301538
------------------------------------------------------------------------------

. list b_atx if event_date == mdy(1,4,2005)

       +----------+
       |    b_atx |
       |----------|
    1. | 2.609326 |
    2. | 2.609326 |
    3. | 2.609326 |
    4. | 2.609326 |
    5. | 2.609326 |
       |----------|
    6. | 2.609326 |
    7. | 2.609326 |
    8. | 2.609326 |
    9. | 2.609326 |
   10. | 2.609326 |
       |----------|
   11. | 2.609326 |
   12. | 2.609326 |
   13. | 2.609326 |
   14. | 2.609326 |
   15. | 2.609326 |
       |----------|
   16. | 2.609326 |
       +----------+

. 
. reg car bel20 if event_date == mdy(1,11,2005), nocons

      Source |       SS           df       MS      Number of obs   =        16
-------------+----------------------------------   F(1, 15)        =    103.86
       Model |  6.60277261         1  6.60277261   Prob > F        =    0.0000
    Residual |  .953594935        15  .063572996   R-squared       =    0.8738
-------------+----------------------------------   Adj R-squared   =    0.8654
       Total |  7.55636754        16  .472272971   Root MSE        =    .25214

------------------------------------------------------------------------------
         car |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       bel20 |   3.704455   .3634943    10.19   0.000     2.929686    4.479225
------------------------------------------------------------------------------

. list b_bel20 if event_date == mdy(1,11,2005)

       +----------+
       |  b_bel20 |
       |----------|
   17. | 3.704455 |
   18. | 3.704455 |
   19. | 3.704455 |
   20. | 3.704455 |
   21. | 3.704455 |
       |----------|
   22. | 3.704455 |
   23. | 3.704455 |
   24. | 3.704455 |
   25. | 3.704455 |
   26. | 3.704455 |
       |----------|
   27. | 3.704455 |
   28. | 3.704455 |
   29. | 3.704455 |
   30. | 3.704455 |
   31. | 3.704455 |
       |----------|
   32. | 3.704455 |
       +----------+

.

Last edited by Robert Picard; 17 Jul 2017, 19:34.

Comment

Ana Siqueira

Join Date: Jul 2017

Posts: 2
#7

17 Jul 2017, 21:04

Hi

I have a sample of firms from 10 industries for 20 years.

I need to run a regression in loop by year (for one model) and by industry (for another modelo) and I need to save the residuals of theses regressions (for one model) and the coefficient of these regressions (for another model).

Can anyone help me with this Stata command?

I appreciated it.

Ana Siqueira
Comment
Parvesh Seeballack

Join Date: May 2017

Posts: 41
#8

19 Jul 2017, 04:41

Hi Robert

Thanks for the reply and for pointing out that the X was constant for each event.

In fact I made a massive mistake when merging my files to construct my panel data for the regressions. The X shouldn't have been constant!

I have now corrected it and using all the advice I got on this post, I have been able to get my regressions!

A massive thanks to William, Joe, Clyde and Robert for your help. I highly appreciate it guys.

Regards
Parvesh
Comment
Parvesh Seeballack

Join Date: May 2017

Posts: 41
#9

19 Jul 2017, 07:26

Hi guys

New problem!

When I am exporting my results using the outreg2 function, I end up with the coefficients and standard errors only.

Is there a way to get the t-stat and significance level as well?

The code I'm using is:

Code:

use Model_1, clear egen obs = group (gtdid) summ obs sort event_date obs forvalues i = 1/5 { regress car LIQ if obs == `i' outreg2 using 6dayresults.xls, append }

Thanks.

Parvesh
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35444
#10

19 Jul 2017, 07:55

#9 is a new problem, as said, with the user-written outreg2 command (not function!). That is from SSC, as you are asked to explain (FAQ Advice #12).

outreg2 is a popular download but its author is not a member here and questions on it are often not answered, partly because it is used by few of the most active people here who answer lots of questions.

Regardless of that, I suggest starting a new thread flagging outreg2 in the title. That's the best way to try to catch attention from people who use it (as implied, not including me).
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

#11

19 Jul 2017, 08:01

re #8, glad you found a mistake because what you were trying to do did not make much sense in my mind. I'll repost my original solution that shows how to perform all these regressions efficiently using rangestat (from SSC). The whole thing runs in less than 2 seconds on my computer.

Code:

* create fake data
clear all
set seed 123321
set obs 1324
gen event_date = mdy(1,4,2005) + (_n-1) * 7
format %td event_date
expand 16
bysort event_date: gen Index = _n
foreach v in car atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100 {
    gen `v' = runiform()
}

* regress car per event with a bunch of independant variables
foreach v of varlist atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100 {
 rangestat (reg) car `v', interval(event_date 0 0)
 rename reg_nobs n_`v'
 rename b_cons a_`v'
 drop reg_r2 reg_adj_r2 se_`v' se_cons
}

* spot check results
reg car atx if event_date == mdy(1,4,2005)
list n_atx b_atx a_atx if event_date == mdy(1,4,2005)

reg car bel20 if event_date == mdy(1,11,2005)
list n_bel20 b_bel20 a_bel20 if event_date == mdy(1,11,2005)

Comment

Parvesh Seeballack

Join Date: May 2017
Posts: 41

#12

20 Jul 2017, 07:52

Hi guys.

Something related to my regressions...and I'd be very helpful if you could help.

I am running my event study using the following code:

Code:

clear
capture cd "XXX"
set obs 1
g fake = .
save evday_car, replace
* cleaningevents file
import delimited using GTD.csv, clear
drop city perpetrator1 guncertain1 perpetrator2 guncertain2 perpetrator3 guncertain3 targettype1 targettype2 targettype3 region attacktype1 attacktype2 attacktype3 weapontype1 weapontype2 weapontype3 weapontype4
rename date date_string
g date = date(date_string,"DMY")
format date %td
sort date
drop date_string
g date_id = _n
tsset date_id
* Drop events occuring on non-trading days
gen dow = dow(date)
drop if dow(date)==0 | dow(date)==6
drop dow
sort date
rename date event_date
g nnn = 1
g obs = _n
save eventsdates, replace
* Calculating market returns using SP500 as proxy for market portfolio
import delimited using sp500.csv, clear
rename date date_string
rename sp500 market
generate date = date(date_string,"DMY")
format date %td
sort date
g date_id = _n
keep market date_id date
drop if market==.
tsset date_id
generate returnmarket = ln(market) - ln(L.market)
sort date
order date, first
save marketret, replace
* Calculating indices returns and merging with market returns file
import delimited using indices.csv, clear
rename date date_string
generate date = date(date_string,"DMY")
format date %td
sort date
drop date_string
g date_id = _n
tsset date_id
local vars atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100
foreach var of local vars {
gen return_`var' = ln(`var') - ln(L.`var')
}
sort date
merge 1:1 date using marketret
drop _merge market
sort date
g nnn = 1
drop atx bel20 omxc20 omxh cac40 dax30 athex iseq mib aex obx psi20 ibex35 omxs30 smi ftse100
save allreturns, replace
* merging events file with returns file
use eventsdates, clear
drop date_id
forvalues i = 1320/1324 {
preserve
keep if obs == `i'
joinby nnn using allreturns
sort date
drop date_id
g date_id = _n
gen day_cnt = date_id
gen target_day = day_cnt if date==event_date
egen max_target_day = max(target_day)
gen evday = day_cnt-max_target_day
drop day_cnt target_day max_target_day
sort evday
gen evt_window=1 if evday>=0 & evday<=6
gen est_window=1 if evday<=-11 & evday>=-30
drop if evt_window==. & est_window==.
foreach var of local vars {
reg return_`var' returnmarket if est_window==1
estimates store ols_dum
gen rmse_`var' = e(rmse)
predict phat_`var'
gen ar_`var' = return_`var' - phat_`var' if evt_window==1
drop phat_`var'
}
drop if evt_window==.
drop est_window nnn
***************************************************
*Display the CAR and its Test Statistic
foreach var of local vars {
egen car_`var' = sum(ar_`var')
gen tstat_`var' = car_`var'/(rmse_`var'*sqrt(_N))
drop return_`var' rmse_`var' ar_`var'
}
drop returnmarket date_id evday evt_window date
* DO EVENT analysis, generate CAR in 1/1
keep in 1/1
append using evday_car
save evday_car, replace
restore
}
use evday_car, clear
order event_date, first
sort event_date

I'd like to:

1) put a dummy variable in my regressions such that it identifies an event and assign a value of 0 on event date and 1 otherwise.
2) have several dummy variables for each of the event date in my sample.

Any ideas how I should proceed with that?

The use of the dummy variable is to help me identify which events have a significant impact on a sample of 16 stock market indices.

A summary of dataset: 1) Returns and market return data - daily observations from Jan 04 to Dec 16 & 2) Events list - contains 1,324 events

Thank you for your help.

Regards Parvesh

Last edited by Parvesh Seeballack; 20 Jul 2017, 08:04.

Comment

Brian Yalle

Join Date: Nov 2015
Posts: 22

#13

31 Jan 2021, 13:48

Hi!
I'm using Stata 16 and I want to know the codes to perform a specific logit regression. I spent a while looking for related posts or answers and this post is the closest to my problem that I found.
First, my data is:

Code:

clear
input float(sick improv_water toilet interview tp20852 t2m20852 tp20853 t2m20853 tp20854 t2m20854)
0 1 1 20968  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
0 1 1 20968  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
0 1 1 20966  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
. 1 1 20966  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
. 1 1 20966  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
1 1 1 20938  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
0 1 1 20938  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
0 1 1 20965  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
1 1 1 20939  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
0 1 1 20938  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
1 1 1 20964  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
0 1 1 20964  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
1 1 1 20936  75.77503 15.405326  92.61058 15.548793  55.14777 15.510856
1 1 1 20926  77.49091  20.90748 24.808275 21.219606 37.740124  20.88153
. 1 1 20978  85.28074  21.28524 20.338476 21.566307 14.748202  21.32637
0 1 1 20978  85.28074  21.28524 20.338476 21.566307 14.748202  21.32637
end
format %tdDD/NN/CCYY interview

There are variables that have common prefix as "tp" and "t2m" followed by five numeric digits which are dates expressed -as I understand- in Stata format for dates. So next to "tp" or "t2m" are dates.
Interview has dates values.

I want to run regress with the following characteristics:

logit sick improv_water toilet tp* if interview == * - (4 days ago)

Where: * is the date

It could be posible that there are posts have already answered my query but I have not found yet. If you know, please let me know the link or help me to resolve my query.

Any help will be appreciated

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#14

31 Jan 2021, 14:02

It's pretty unclear what you mean by your command and the remark "where * is the date. My best attempt at reading your mind is that you have tp* variables where tp is suffixed by a date (as you described) and that you want to include an observation in the regression if the value of the variable interview is exactly four days earlier than one of those tp suffixes, and, if so, to use only that particular tp variable as a predictor.

If that's what you want to do, then:

Code:

gen long obs_no = _n reshape long tp t2m, i(obs_no) j(the_date) format the_date %tdDD/NN/CCYY logit sick improv_water toilet tp if interview == the_date - 4

Note that in your example data, the interview date is never actually four days earlier than any of those suffixed dates in the tp* variables, so this code just exits with a "no observations" error message. Hopefully, that is not the case in your real data.
Comment

Brian Yalle

Join Date: Nov 2015
Posts: 22

#15

31 Jan 2021, 16:15

Clyde,
thanks your answer. The codes run well but unfortunately when data is reshaped the number of observations increases (is tripled).

Further details of data:

it's survey data, in this case, it provides information about and individual within a household.
Now, the sample of tp* variables included dates as sufixes which are considered as values in interview.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str15 household_key str18 member_key float(sick improv_water toilet interview tp20880 t2m20880 tp20881 t2m20881 tp20882 t2m20882 tp21170 t2m21170 tp21171 t2m21171)
"      000500401" "      000500401-4"  0 1 1 20880  63.53304 14.855492  65.21365 15.039074  191.2181 14.646427  51.39142 15.769538   77.8743  15.67886
"      000510101" "      000510101-5"  0 1 1 20881  63.53304 14.855492  65.21365 15.039074  191.2181 14.646427  51.39142 15.769538   77.8743  15.67886
"      000511001" "      000511001-3"  0 1 1 20882  63.53304 14.855492  65.21365 15.039074  191.2181 14.646427  51.39142 15.769538   77.8743  15.67886
"      034006501" "      034006501-8"  1 1 1 20880 .00590086  25.71922 1.0614395 26.345863  .5013794 26.261665  2.453223 19.083393 .46007335  19.35805
"      034008301" "      034008301-7"  1 1 1 20881 .00590086  25.71922 1.0614395 26.345863  .5013794 26.261665  2.453223 19.083393 .46007335  19.35805
"      034009201" "      034009201-4"  1 1 1 20881 .00590086  25.71922 1.0614395 26.345863  .5013794 26.261665  2.453223 19.083393 .46007335  19.35805
"      034009201" "      034009201-5"  0 1 1 20881 .00590086  25.71922 1.0614395 26.345863  .5013794 26.261665  2.453223 19.083393 .46007335  19.35805
end
format %tdDD/NN/CCYY interview

The regression I want to perfomed is what you mentioned before:

My best attempt at reading your mind is that you have tp* variables where tp is suffixed by a date (as you described) and that you want to include an observation in the regression if the value of the variable interview is exactly four days earlier than one of those tp suffixes, and, if so, to use only that particular tp variable as a predictor.

but, I do not want to change the number of observations.

Announcement