Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression Double FOR Loop Error

    Hi all,

    I have a table which contains a date column, company name and a few other variables (see below). What I need to achieve is a regression for each year-company name level between year 2000 and 2007 (inclusive). For example, Micrografx-2000, and then Micrografx-2001 etc.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long date float year double(spindx sprtrn) str36 COMNAM double RET
    14612 2000 1455.22   -.009549090000000001 "MICROGRAFX INC"                     -.02142857201397419
    14613 2000 1399.42             -.03834472 "GEOWORKS CORP"                      -.12820513546466827
    14614 2000 1402.11             .001922225 "MUNIVEST NEW JERSEY FUND INC"                         0
    14615 2000 1403.45   .0009557025000000001 "FAIR ISAAC & CO INC"               -.027499999850988388
    14616 2000 1441.47              .02709038 "VIASOFT INC"                        .010309278033673763
    14619 2000  1457.6              .01118997 "WESTERFED FINANCIAL CORP"           .008547008968889713
    14620 2000 1438.56             -.01306257 "1ST SOURCE CORP"                   -.028795812278985977
    14621 2000 1432.25            -.004386331 "U S WIRELESS CORP"                              .078125
    14622 2000 1449.68              .01216966 "WESTPAC BANKING CORP"               .018656715750694275
    14623 2000 1465.15              .01067132 "SANTA ISABEL S A"                   .032258063554763794
    14627 2000 1455.14            -.006832065 "OAKLEY INC"                                           0
    14628 2000  1455.9            .0005222865 "AMGEN INC"                          .042533937841653824
    14629 2000 1445.57            -.007095268 "TOFUTTI BRANDS INC"                                   0
    14630 2000 1441.36            -.002912346 "GLENAYRE TECHNOLOGIES INC"          -.02150537632405758
    14633 2000 1401.53             -.02763362 "CORTECH INC"                         .08510638028383255
    14634 2000 1410.03    .006064801000000001 "PUTNAM PREMIER INCOME TRUST"                          0
    14635 2000 1404.09   -.004212677000000001 "NATIONAL STEEL CORP"                -.12389380484819412
    14636 2000 1398.56            -.003938494 "CHAMPIONS SPORTS INC"                                .b
    14637 2000 1360.16             -.02745681 "ENVIRONMENTAL SAFEGUARDS INC"        -.0416666679084301
    14640 2000 1394.46              .02521762 "SPECTRUM SIGNAL PROCESSING INC"     -.03896103799343109
    14641 2000 1409.28              .01062777 "WESCO INTERNATIONAL INC"           -.007462686393409967
    14642 2000 1409.12           -.0001135332 "SUMMIT FINANCIAL CORP"                                0
    14643 2000 1424.97              .01124815 "M C N ENERGY GROUP INC"              .02469135820865631
    14644 2000 1424.37 -.00042106150000000003 "JEFFERSON SAVINGS BANCORP INC"      .005494505632668734
    14647 2000 1424.24          -.00009126842 "HEALTH MANAGEMENT ASSOCIATES INC" -.0043290043249726295
    14648 2000 1441.72              .01227321 "CROWN CRAFTS INC"                   -.02777777798473835
    14649 2000  1411.7             -.02082235 "E Z EM INC"                                           0
    14650 2000 1416.83   .0036339170000000004 "KEY TECHNOLOGY INC"                -.012820512987673283
    14651 2000 1387.12             -.02096935 "LOUISIANA PACIFIC CORP"           -.0052083334885537624
    14654 2000 1389.94   .0020329890000000002 "GILEAD SCIENCES INC"                 .07374301552772522
    14655 2000 1402.05             .008712607 "PROGRESSIVE CORP OH"                 .07314148545265198
    14656 2000 1387.67             -.01025641 "UNIVERSAL MFG CO"                  -.016949152573943138
    14657 2000 1388.26            .0004251731 "P S GROUP HOLDINGS INC"             -.01104972418397665
    14658 2000 1346.09             -.03037615 "INNSUITES HOSPITALITY TRUST"        -.02857142873108387
    14662 2000 1352.17    .004516785000000001 "GENERAL MOTORS CORP"              -.0016949152341112494
    14663 2000 1360.69   .0063009830000000005 "I T GROUP INC"                       .08474576473236084
    14664 2000 1353.43            -.005335528 "VESTAUR SECURITIES INC"             .005076142027974129
    14665 2000 1333.36             -.01482899 "DREYFUS MUNICIPAL INCOME INC"                         0
    14668 2000 1348.05              .01101728 "MERCHANTS NEW YORK BANCORP INC"     .003846153849735856
    14669 2000 1366.42              .01362709 "F F Y FINANCIAL CORP"              -.011363636702299118
    14670 2000 1379.19    .009345589000000001 "PACIFIC AMERICAN INCOME SHS INC"    .005235602147877216
    14671 2000 1381.76   .0018634130000000002 "AUSTRIA FUND INC"                      .043824702501297
    14672 2000 1409.17              .01983702 "STARWOOD HOTELS & REST WLDWD INC"  -.034383952617645264
    14675 2000 1391.28             -.01269542 "YOCREAM INTERNATIONAL INC"         -.043478261679410934
    14676 2000 1355.62             -.02563107 "ARVIN INDUSTRIES INC"               .006734006572514772
    14677 2000  1366.7             .008173382 "UNION FINANCIAL BANCSHARES INC"                       0
    14678 2000 1401.69              .02560182 "CORPORATE HIGH YIELD FD INC"       -.006666666828095913
    14679 2000 1395.07  -.0047228700000000005 "FORT DEARBORN INCOME SECS INC"      .004975124262273312
    14682 2000 1383.62   -.008207474000000001 "MEDICORE INC"                      -.014705882407724857
    14683 2000 1359.15             -.01768549 "A M R CORP DEL"                     .008724099956452847
    14684 2000 1392.15              .02427988 "AMERICAN INDUSTRIAL PPYS REIT NE"   -.02247191034257412
    14685 2000 1458.47              .04763854 "HYBRIDON INC"                                        .b
    14686 2000 1464.47               .0041139 "SUPERIOR ENERGY SERVICES INC"         .0317460335791111
    14689 2000 1456.63            -.005353473 "COMMUNITY BANK SYSTEM INC"         .0030487803742289543
    14690 2000 1493.87              .02556586 "MISSION WEST PPTYS INC MD"         -.013888888992369175
    14691 2000 1500.64    .004531853000000001 "BLACKROCK HIGH YIELD TRUST"         .005050505045801401
    14692 2000 1527.35              .01779907 "DIAGEO PLC"                        -.004444444552063942
    14693 2000 1527.46  .00007202015999999999 "I C O INC NEW"                       .03846153989434242
    14696 2000 1523.86  -.0023568540000000002 "L H S GROUP INC"                   -.014066495932638645
    14697 2000 1507.73   -.010584960000000001 "ATCHISON CASTING CORP"             -.024793388321995735
    14698 2000 1508.52            .0005239665 "ULTIMATE SOFTWARE GROUP"            -.05325443670153618
    14699 2000 1487.92   -.013655770000000001 "AMERICAN STRATEGIC INCOME PT II"    .005780346691608429
    14700 2000 1498.58             .007164364 "CELGENE CORP"                        .14193548262119293
    14703 2000 1505.97             .004931335 "F F P MARKETING CO"                                   0
    14704 2000 1494.73   -.007463628000000001 "GLOBAL INCOME FUND INC"                               0
    14705 2000 1487.37            -.004923966 "HAWKER PACIFIC AEROSPACE"           .007874015718698502
    14706 2000 1501.34             .009392417 "MESA AIR GROUP INC NEV"             .030612245202064514
    14707 2000 1516.35             .009997736 "LUCOR INC"                          -.05263157933950424
    14710 2000 1504.46            -.007841198 "RUSSELL CORP"                      -.007968127727508545
    14711 2000 1500.59            -.002572352 "INTERNATIONAL BUSINESS MACHS COR"  -.025588536635041237
    14712 2000 1467.17             -.02227124 "DUKE WEEKS REALTY CORP"              .01277955248951912
    14713 2000 1440.51             -.01817104 "PERMA FIX ENVIRONMENTAL SVCS INC"  -.043478261679410934
    14714 2000 1356.56             -.05827797 "LINC CAPITAL INC"                   -.15000000596046448
    14717 2000 1401.44              .03308368 "CYMER INC"                           .08746355772018433
    14718 2000 1441.61              .02866337 "QUIXOTE CORP"                        .05649717524647713
    14719 2000 1427.47            -.009808478 "VISTA BANCORP INC"                  .004219409078359604
    14720 2000 1434.54    .004952819000000001 "IVEX PACKAGING CORP DEL"                              0
    14724 2000 1429.86             -.00326237 "WESTPAC BANKING CORP"              .0058139534667134285
    14725 2000 1477.44    .033275980000000004 "UNITED HERITAGE CORP"                -.0833333358168602
    14726 2000 1460.99             -.01113412 "LIGHTBRIDGE INC"                    -.10211267322301865
    14727 2000 1464.92             .002689957 "N N BALL & ROLLER INC"              -.01923076994717121
    14728 2000 1452.43            -.008526063 "CINEMASTAR LUXURY THEATERS INC"      .03703703731298447
    14731 2000 1468.25              .01089209 "FORT DEARBORN INCOME SECS INC"                        0
    14732 2000 1446.29             -.01495658 "NATIONAL GOLF PROPERTIES INC"        .01607717014849186
    14733 2000  1415.1             -.02156552 "PENTEGRA DENTAL GROUP INC"          -.05000000074505806
    14734 2000 1409.57            -.003907851 "MARTIN INDUSTRIES INC"              -.10204081982374191
    14735 2000 1432.63    .016359600000000002 "PERMA FIX ENVIRONMENTAL SVCS INC"   -.04444444552063942
    14738 2000 1424.17            -.005905223 "DALLAS SEMICONDUCTOR CORP"          -.06355932354927063
    14739 2000 1412.14            -.008447025 "TYLER TECHNOLOGIES INC"             -.10256410390138626
    14740 2000 1383.05             -.02059994 "ALBERTSONS INC"                      .01876172609627247
    14741 2000 1407.81    .017902460000000002 "CHESAPEAKE BIOLOGICAL LABS INC"                       0
    14742 2000 1420.96             .009340749 "I N G GROEP N V"                    .015819208696484566
    14745 2000 1452.36              .02209774 "REPTRON ELECTRONICS INC"                              0
    14746 2000 1466.04    .009419153000000001 "QLOGIC CORP"                      -.0009930486558005214
    14747 2000  1447.8             -.01244168 "KEMPER STRATEGIC INCOME TRUST"     -.004524887073785067
    14748 2000 1437.21   -.007314546000000001 "BERGSTROM CAPITAL CORP"             .006713176146149635
    14749 2000 1406.95             -.02105468 "ELITE PHARMACEUTICALS INC"         -.032786883413791656
    14752 2000 1400.72   -.004428018000000001 "POPE RESOURCES"                     -.04117647185921669
    14753 2000 1373.86             -.01917585 "RAMCO GERSHENSON PROPERTIES TR"      -.0357142873108387
    14754 2001 1399.05               .0183352 "HERBALIFE INTERNATIONAL INC"                          0
    end
    format %td date
    What I am thinking is probably a double for loop. The top loop going from year 2000 to 2007, and the second loop going through each company name in the variable for that year. Is this possible? I did this myself - while I am able to iterate through the years, I am struggling to iterate through each company name variables. Am I doing anything wrongly?

    In addition, I am also trying to append this value as a new variable in the dataset for each row.

    Code:
    forval y= 2000/2007 & foreach x of varlist COMNAM {
    reg RET sprtrn if year==`y' & COMNAM== `x'
    }
    Thank you!


    Last edited by Kye Reen; 26 Jul 2021, 12:59.

  • #2
    Possible yes, but only three companies in 2000 have repeated cases (2 each) and all the rest appeared only once. Wouldn't that cause a problem to fit a regression to a subset with just 1 or 2 cases?

    Comment


    • #3
      Hi Ken, thanks for replying! The data above is just a small sample of my dataset. The original dataset has more that 14,000,000 observations.

      How would you suggest I write the for loop code?

      Are there anything wrong with the code I wrote above?

      Thanks again!

      Comment


      • #4
        Looping over levels of categorical variables (year, company name) is seldom appropriate in Stata. When you are thinking of doing that, you should always consider alternatives that are usually better: -runby-, or in this case perhaps -rangestat-.

        -runby- and -rangestat- are community contributed commands, available from SSC. -rangestat- will be simpler to use for this application if all you need to do is get the regression coefficients, standard errors, number of observations in the estimation sample, and R2. If you need to do some other things with the regression results, then it might (or might not) be easier to use -runby- which offers more flexibility at the expense of being a bit more demanding to work with.

        I suggest you check out the help files for -runby- and -rangestat- to see which of these might be better. Nested loops, as you proposed, are a possible approach*, but they are very slow in a data set with over 14,000,000 observations. If you want the results during your lifetime, that's probably not a good bet. :-)

        *Conceptually--the code actually shown in #1 will produce only syntax errors.
        Last edited by Clyde Schechter; 26 Jul 2021, 14:45.

        Comment


        • #5
          Hi Clyde, I tried using -rangestat- but it says "no result for all obs: reg RET sprtrn varlist required". This is the code I used:

          Code:
          rangestat (reg) RET sprtrn, interval(year 2000 2007) by(COMNAM)
          As I am new to this, could you help review where I did wrong? I would really appreciate it.

          Thank you!

          Comment


          • #6
            What the error message is telling you is that there are no COMNAMs in your data set that have enough observations to support a regression. That is certainly true in your example data, where there are only 3 COMNAMs that appear more than once, and each of those appears only twice. But with two observations, -rangestat- will not perform a regression. You need more data per COMNAM. Have you tried running this on your full data set. When I used your example data and just randomly changed the names of some observations to create COMNAM's with more observations, it runs without an error message.

            That said, your command will not get you what you wanted. You appear to be misunderstanding the -interval()- option in -rangestat-. It does not provide a range of values to iterate over for separate regressions. It restricts the range of values within which the regressions will be carried out. Since you want each company to have a separate regression in each year, you have to put year into the -by()- option along with COMNAM. The question remains, then, what to put in the -interval()- option. You don't really want to restrict the range of anything here: you want all observations for a given combination of COMNAM and year to be included in that combination's regression. That is where the use of missing values in the -interval()- option comes into play. So I think the command that will work for you, if you have sufficient observations to support regressions in it, is:

            Code:
            rangestat (reg) RET sprtrn, by(COMNAM year) interval(year . .)

            Comment


            • #7
              Hi Clyde, it works now! Thank you so much!

              Comment


              • #8
                Sorry, I have a new problem regarding this. I am now running stata in a text-mode on Unix, which I connect to the server using an SSH program such as SSH Secure Shell. It works the same as Windows Stata.The only difference is that there is no variables window, review window, or any of the menus.

                The problem I have right now is that when I tried using the code in #6, it says that "command rangestat is unrecognized". I thought maybe this is because I have not installed rangestat yet. However, when I tried installing, this happens:

                Code:
                ssc install rangestat
                
                remote connection failed -- see help r(677) for troubleshooting
                http://fmwww.bc.edu/repec/bocode/r/ either
                1)  is not a valid URL, or
                2)  could not be contacted, or
                3)  is not a Stata download site (has no stata.toc file).
                r(677);

                Comment


                • #9
                  The most likely explanation for this is that the SSC website was down or busy at the time. Try again later.

                  If the problem occurs repeatedly over the course of many hours, then it may be that you lack privileges to access that website on that machine/network configuration. You may have to ask your system administrator to do the installation for you, or upgrade your access privileges. In the course of my own work, I run Stata on servers on networks provided by several different organizations, each with its own policies. On some of them, I can do anything I want. On some, I can install ados from SSC and similar sites but not run Stata updates. On some I can't even access the internet at all.
                  Last edited by Clyde Schechter; 28 Jul 2021, 10:42.

                  Comment


                  • #10
                    I was told to use loops since both -rangestat- and -runby- can't be used in this case. Since the code in #1 would result in a syntax error, what are the codes for loops would your recommend?

                    Comment


                    • #11
                      Code:
                      gen n_obs = .
                      gen b_sprtrn = .
                      gen se_sprtrn = .
                      gen r2 = .
                      gen int error = .
                      quietly levelsof COMNAM, local(companies)
                      foreach c of local companies {
                          quietly levelsof year if COMNAM == `"`c'"', local(years)
                          foreach y of local years {
                              capture regress RET sprtrn if year == `y' & COMNAM == `"`c'"'
                              if c(rc) == 0 { // SUCCESSFUL REGRESSION
                                  replace n_obs = e(N) if COMNAM == `"`c'"' & year == `y'
                                  replace b_sprtrn = _b[sprtrn] if COMNAM == `"`c'"' & year == `y'
                                  replace se_sprtrn = _se[sprtrn] if COMNAM == `"`c'"' & year == `y'
                                  replace r2 = e(r2) if COMNAM == `"`c'"' & year == `y'
                              }
                              else if !inlist(c(rc), 2000, 2001) { // UNEXPECTED ERROR
                                  replace error = c(rc) if COMNAM == `"`c'"' & year == `y'
                              }
                          }
                      }
                      It's unfortunate that you are being made to operate in such a restrictive environment. With more than 14,000,000 observations this is going to be very slow. Be patient!
                      Last edited by Clyde Schechter; 28 Jul 2021, 11:15.

                      Comment


                      • #12
                        Hi Clyde, yes it did take quite some time but it works! However, it got stuck at the line "if c(rc) == 0 {". I added a variable into the regression and renamed some of the variable names. This is what the error said:

                        Code:
                        gen n_obs = .
                        gen b_sprtrn = .
                        gen b_twexb = .
                        quietly levelsof hcomnam, local(companies)
                        foreach c of local companies {
                          2. quietly levelsof year if hcomnam == `"`c'"', local(years)
                          3. foreach y of local years {
                          4. capture regress RET sprtrn twexb if year == `y' & hcomnam == `"`c'"'
                          5. if c(rc) == 0 {
                          6. replace n_obs = e(N) if hcomnam == `"`c'"' & year == `y'
                          7.             replace b_sprtrn = _b[sprtrn] if hcomnam == `"`c'"' & year == `y'
                          8.             replace b_twexb = _b[twexb] if hcomnam == `"`c'"' & year == `y'
                          9. }
                         10.  else if !inlist(c(rc), 2000, 2001) {
                         11.             replace error = c(rc) if hcomnam == `"`c'"' & year == `y'
                         12. }
                         13. }
                         14. }
                        
                        5. is not a valid command name
                        r(199);
                        Last edited by Kye Reen; 28 Jul 2021, 14:46.

                        Comment


                        • #13
                          Are you showing the code you ran, followed by the error message you got? Or is this just all from the Results window?

                          If the former: those numbers 2. through 14. should not be in your code. Get rid of them. But, if this is the situation, I don't know how you even got that far: it should have broken at 2.

                          If the latter, then it is hard to know where this came from. Please post back with example data that triggers this problem, along with the exact code you are using, and I will try to troubleshoot.

                          I do see a problem in the code, that I think will cause it to not perform exactly as intended, but not in the way you show.

                          Code:
                          else if !inlist(c(rc), 1925, 1926)
                          is definitely wrong. 1925 and 1926 are not Stata error codes and will never arise. I don't know how they got there: in the code I wrote, the values 2000, and 2001, which are the error codes for "no observations" and "insufficient observations," respectively, appeared there. Now, what will happen with this code is that if (and in a data set that's large, that's almost certainly when) you come across some combination of company and year for which there are insufficient observations to support your regression, Stata will throw a 2000 or 2001 error code at the regress command. That will then fail the -if c(rc) == 0- test and proceed to the -else if !inlist(c(rc), 1925, 1926)- branch. There, it will pass the test since 2000 and 2001 are both not 1925, nor 1926. So it will record 2000 or 2001, respectively, in the variable error for those observations, and then move on to the next company-year combination, which is not the intended behavior. I suppose the result is not actually harmful, though.

                          Comment


                          • #14
                            Yes, that is the code I ran, followed by the error message.

                            Oh my bad, I thought 2000 and 2001 were years that's why I changed it to 1925 and 1926. Nevertheless, I changed it back to 2000 and 2001 because I decided to drop the observations before 2000 as the dataset was way too large. I am running stata on Terminal (a SSH client program that runs on your local computer) and when I copy and paste the code onto Terminal, it will show those numbers 2. through 14. I guess that's why I managed to get that far.

                            I now managed to get past the error showed on #12. The solution to that is to copy and paste the code directly into terminal instead on pasting it line-by-line. However, another error occurred, and this time it says:

                            Code:
                            foreach c of local companies {
                              2.     quietly levelsof year if hcomnam == `"`c'"', local(years)
                              3.     foreach y of local years {
                              4.         capture regress RET sprtrn twexb if year == `y' & hcomnam == `"`c'"'
                              5.         if c(rc) == 0 {
                              6.             replace n_obs = e(N) if hcomnam == `"`c'"' & year == `y'
                              7.             replace b_sprtrn = _b[sprtrn] if hcomnam == `"`c'"' & year == `y'
                              8.             replace b_twexb = _b[twexb] if hcomnam == `"`c'"' & year == `y'
                              9.      
                            .         }
                             10.         else if !inlist(c(rc), 2000, 2001) {
                             11.             replace error = c(rc) if hcomnam == `"`c'"' & year == `y'
                             12.         }
                             13.     }
                             14. }
                            
                            variable error not found
                            
                            r(111);
                            I presumed this error could be due to the line:

                            Code:
                            else if !inlist(c(rc), 2000, 2001) {
                            replace error = c(rc) if hcomnam == `"`c'"' & year == `y'
                            I am now running the code without these two lines and so far it is working (still loading). I'm not sure how the outcome will look like as it is still loading. I will keep you updated to which what happens and if there are any errors that might occur afterwards. Thank you!
                            Last edited by Kye Reen; 28 Jul 2021, 15:33.

                            Comment


                            • #15
                              I see from #10 that you omitted the line -gen error = .- from the code. That is why you are getting this problem. Re-insert that line before the loop and it will work properly.

                              The purpose of that variable is this: regressions can be funky. Sometimes errors arise when you try to perform them. In an application like yours where you are trying to run a long series of them, you almost certainly will encounter some. Stata's default behavior when an error arises is to terminate execution. Well, in this context, you really would prefer to go on to the next case and keep on truckin'. Then at the end, you can review any error cases and figure out what, if anything, to do about them. That's what the -capture- and -if c(rc)- stuff is all about. It tells Stata, if you get an error during the regression, don't stop--keep going. But record the error code in the data set so that the user can check it out later.

                              The reason for making an exception out of error codes 2000 and 2001 is that they just come up so often, and there is usually nothing to be done about them: if there isn't any data there isn't any data. So those we don't even bother recording.

                              But after you've run the whole thing, you should check the data for any observations where the variable error has a non-missing value. If you find any, the values there are the error codes that Stata threw when it tried to do that regression. You should then find out what that error code means and investigate further why it happened and what, if anything, to do about it.

                              Comment

                              Working...
                              X