Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looping through a matrix until a condition is satisfied

    The goal of my code here is to select the ideal number of principal components. According to this website which details the method for Stata, we essentially want to select the number of components where the average is at its lowest and begins to increase after that. Note my use of the user written crossfold prefix. To be concrete, here's my code:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(year gdpcap15 pred1 pred2 pred3 pred4 pred5 pred6 pred7 pred8 pred9)
    1955  3.853184630005267 -5.6359681692686685   -.2796493308499519    .14076150844533958  .031033665292261155   -.15112918386469865   .01822882385195665    -.008124304083252737  -.0008164576572658733  .0002865166184990442
    1956 3.9456582961508766   -4.80336876033442  -.20493869698996953    .10893913008301967   .03810150847092765  .0002628159616413356 -.013868786188177551    -.006429933515285535    .004279014881456822   -.00028320987191921
    1957  4.033561734872626  -3.984097559990941  -.15062350735637864    .07413980200961344   .03615715202065638    .14698408848701347 -.044033646814635424   -.0032634351303035825    .004594817232454029 -.0007239532350134525
    1958  4.023421896896646 -3.7909617209543813  -.21861309305703375   -.04366336241116735 -.025601833127530615     .0751777915675255 -.001905947589295215   -.0008236563661104535   -.008400687934551607  .0017323740685059597
    1959  4.013781968405232 -3.5960191421690597   -.2875851019749088   -.16428162980925726   -.0946026919255327 -.0006760918780774616   .03819602222038847   -.0020393501911901285    -.02575956769585215 -.0015446361508075906
    1960  4.285918396222732   -2.40520552130941 -.022971459410171417    -.1621336397189589  -.04263588859617323   -.01744175533074968  .022601202686801186      .00613835442027974     .04103215499285389  .0008280633007628213
    1961  4.574336095797406 -1.3303790071789818    .5309343857741721   -.09294057379692988    .1413480762231831  -.017890179062513074 .0010103711436177006      .05044703538868318   -.012004946934204473  .0024779905452485124
    1962  4.898957353563045  .04228213763346514    .5412352518357144   -.06031624412421429  .061584272382696835  -.017632543983173845 -.016308012125437872    -.015943505697339336     .00196428212353659  -.005299340712424142
    1963  5.197014981629133  1.3818361379201034    .5112547238876536  -.010022155165356008 -.038436535244485114   -.01853674225273269  -.02088969728520911      -.0708273493439845   -.005548450291739709  -.001131061857750193
    1964 5.3389029787527225  1.9611439921091653    .4226917985221951    .07081727130789066 -.055061427587660666      .015383650681875    .0179673876598935    -.013718314773666251   .0005227960766076845   .008660399468473308
    1965  5.465153005251848   2.512032836587504   .30387269313558785     .1466876568856701  -.07180058997304965   .039132084392686554  .057970807460515444     .037219005780505854 -.00016834621438262948 -.0018753329397272062
    1966  5.545915627064143  3.2785311878240764  .027584452000544053    .05577526983594565  -.05756744855911607   -.00868821871424516 -.010843189126278005       .0274838487765844    .003844001145529616  -.006352518350698069
    1967  5.614895726639487    4.02807594553564   -.2627014003748662    -.0261268019214803  -.06720710738888518  -.059637885581341164   -.0805153379207898      .02669785039648495   -.007198561000218567  .0019571102831647536
    1968 5.8521849330715785   5.445083506715171  -.38111146454733813  -.022780059389372984  .034014966033170654   -.01183480815152145  -.02182652234407402   -.0006919939231301575    .006452130851168247  .0018004998243573156
    1969 6.0814054173695915    6.89701413688074   -.5293792505952488  -.014856172230743148   .11067388197953773   .026526977728309448  .054216524370719235    -.026124251738275844  -.0027921795753868273  -.000532900990673435
    1970   6.17009424134957   8.038199924990405    -.779090200321954 -.0018207213227130759   .12695146249770906     .1955640050140302   .09036882537098212      .05438415156659038     .08471243381069524   -.06810464939518154
    1971  6.283633404546246   9.145831123757889   -.9581022929408727   .017284947847694132   .11449644613898544    .36138691243717824   .14376089776334022       .1331524970023707     .15367048597972616   -.14412247783215515
    1972 6.5555553986528405  10.983920152860067   -.9224148612405898    .13741117390715954     .245384955266249     .5872056509280951   .26958023328948866      .07744013991540999     .21669144214265626   -.15132619386978885
    1973  6.810768561103078  12.797898389916256   -.9188239314052387     .2597731099547717    .3572339027620418     .8054123560884386   .40500328959475046     .016267002212098647      .2504152654263079   -.16684961304785007
    1974  7.105184302810804  13.483436916330389   -.6912008846040067    .28920628417672295   .14529159677906842     .7418375140135989   .45178882518772734     -.14565015366024225     .22189576108402365   -.23540893913360122
    1975  7.377891682175629  14.129962334361577  -.48910478275826064    .32139331002556326    -.068432647396291     .6771019665423366     .502216641352912      -.3144457664927772     .17822165303778037    -.2978813916241576
    1976  7.232933621922754  14.655724797410553   -.8195517558088924    .26024964665872374   -.3163213335059729     .7952233480881981    .6995804758293602      -.3200185575710963     .07668854994396535    -.3872792441288567
    1977  7.089831372119127  15.167983198809889  -1.1452589771500479     .2188386890861237   -.5715493203851785     .9057103652666365    .8945935592653703      -.3328679210115676     -.0197146767761191   -.47157613190480197
    1978  6.786703607144611  15.246498160982224  -1.5832636509535927    .22551363384770295   -.7152678695111958      .957237911150079    .8379966416176281      -.2432014360926641     -.2847825631831978   -.45359152166116035
    1979 6.6398173868571035  15.282575725478978  -1.8033324476838084    .24964799981841534   -.6306966220394404    1.0691825799505508     .598332032776653     -.25245212207980755    -.28343486092719855   -.33770168385929183
    1980  6.562839171369564  15.481585723352795  -1.7898147574012568    .16304890403427746   -.3846219297832588    1.2248834214840256    .3111588530473055     -.16289279236345866    -.15255742392170735   -.21394982926744094
    1981   6.50078545499277   15.83232897218898  -1.7530443413968633  .0029852721653992886   -.1523853234088376     1.409805175418291   .04795807738992236     -.10564646169219682   .0009063780809936972  -.056123793728779736
    1982  6.545058606999563  16.309593606525535   -1.784468040316007    -.0726225266832779   -.3377780502558716    1.4013275464652095   .15133987275970073 -.000053022571553817954     .07272124169259164  -.051658536760092244
    1983  6.595329801139407  16.818135788104822  -1.8086859669856088    -.1571816084016392   -.5320848509254261    1.3939288004168728    .2550847467631834      .10542367660701446     .13244122904770378  -.027459190827567814
    1984  6.761496750091492  17.631311010318974  -1.8338991911875449   -.14492183674859432   -.5292312090822835     1.395928637503537    .2710251943297569      .17052067085894462      .1387845069763436   -.08029841987964442
    1985  6.937160671727721  18.482714837431047  -1.8663538287215398   -.14652742073317787   -.5327547376820521    1.3966905041997464   .28842298383402054      .24143251557977286      .1244133748715433    -.1162693874289108
    1986  7.332191151300521  20.892869429520093  -1.7475666762165913   -.13909364881807473   -.5506734053689885    1.7850752708803073    .3647781877540621      .43439427377534434     .14344758997858148   -.26621243848354903
    1987  7.742788123594152  23.329675833037232  -1.6339961910462066   -.11048057343516143   -.6291803374256126    2.1625926284644375   .45589124349554333       .6112319347274746     .14721949164321602   -.41005372527654516
    1988   8.12053664075889  25.666380884615926   -1.623124252105773  -.024982883199390127   -.8412939241332308    2.4714871099866884    .6037780941976278       .5484206501680482     .19720849028273124     -.505166091967399
    1989  8.509711162324157   28.00935858726377  -1.6270558649915419    .07714389394393995  -1.0689817172597456    2.7992441225289584    .7641988311988135       .5069535199084416      .2186007035341937    -.5746335620073871
    1990  8.776777889074104   29.21914765014005   -1.615727337035631     .1764241804855109  -1.5291831551065178    2.8537456687094784    .9311299698192872       .6194858367631093     .16545387960640268    -.5461770419550871
    1991   9.02527866619582  30.409858803872673   -1.566357132663052     .2323741917230142  -1.9438485588235395    2.8705228767967435    1.093448814239629       .7305398867925607      .1346147572802343    -.5508702936327751
    1992  8.873892824706335  29.566071506714888  -1.6418149885222633    .48775876767527426  -2.0171636139907068    2.8624876012920994    1.099338293986793       .8854525641695832     .01689859152810802    -.5518044941246998
    1993  8.718223539089278   28.72833388460929  -1.7139242650504576     .7402916614293393  -2.0968489496131446     2.852399408805832   1.1080889010104502      1.0489654461608868    -.08842898572482327    -.5600206940772299
    1994  9.018137849286365  30.023375376148266  -1.6216341006721546      .520095804113017  -2.1118696608999152    2.8595663671585525    .8238863268288168      1.2045879912620205    -.07684458007656847    -.4100289494428013
    1995  9.440873861653367  31.312531772654815  -1.4532279796489098     .4309155176565326   -2.260235052254674    2.8446523366154746    .6876126485134306       1.129642678084732    .027223013634157245    -.3848566318605062
    1996   9.68651813767495   32.85221669474676  -1.6749058326253756      .625399108002632   -2.497716504314482    3.2912265949405093    .7858230737927119      1.4317549486919998     -.2274248843919251    -.6434678836236071
    1997 10.170665872808662   35.00403089746422   -1.549512124557728     .6428125484515941  -2.6900621400867153    3.5104278648230483    .7612511825986432      1.5787132481189765     -.2957021623046656    -.7591290137769755
    end
    format %ty year
    
    cls
    qui ds pred*
    
    loc v : di word("`r(varlist)'", `c(k)')
    
    loc vars `r(varlist)'
    
    local nwords :  word count `r(varlist)'
    
    cls
    set seed 1000
    
    tokenize `vars'  
    local predictors 
    
    qui forval i = 1/`nwords' {
    
    loc a: word `i' of `c(alpha)'
    
    local predictors  `predictors' ``i''
    
    crossfold: reg gdpcap15 `predictors' if year < 1975, k(10)
    
    mat `a' = r(est)
    
    svmat double `a', name(model`a')
    
    }
    
    mean model*
    The result we get is
    Code:
    Mean estimation                             Number of obs = 10
    
    --------------------------------------------------------------
                 |       Mean   Std. err.     [95% conf. interval]
    -------------+------------------------------------------------
         modela1 |    .657484    .037877      .5718001    .7431678
         modelb1 |   .5696208   .0614974       .430504    .7087376
         modelc1 |   .5784063   .0485904      .4684871    .6883254
         modeld1 |   1.342622   .1057873      1.103315     1.58193
         modele1 |   .3425936   .0374513      .2578728    .4273145
         modelf1 |   .3795547   .0326934      .3055972    .4535122
         modelg1 |   .3340048   .0401717      .2431302    .4248794
         modelh1 |   .3870031   .0443458      .2866858    .4873203
         modeli1 |   .3789193   .0671177      .2270885      .53075
    --------------------------------------------------------------
    So in this case (and after inspecting the screeplot from PCA), the ideal number of components is 2.

    Conceptually, the way I think that makes the most sense to think about this is by starting with the second mean, .569. Then, move to the next one, and ask if .5784063 is less than .5696208. In this case it obviously isn't, so we'd select 2 as the number of principal components. But if it weren't (if mean 3 were less than mean 2), we'd move on to mean 3, and ask if mean 4 was less than mean 3. And if true, we'd use 3 components in our regression.

    How would I implement this programmatically, though? My intuition tells me that looping through the columns of the r(table) matrix would be a good first start with a few assert statements, but how would I actually go about this?

  • #2
    Originally posted by Jared Greathouse View Post
    The goal of my code here is to select . . where the average is at its lowest and begins to increase after that.. . .How would I implement this programmatically, though? My intuition tells me that looping through the columns of the r(table) matrix would be a good first start with a few assert statements, but how would I actually go about this?
    Wouldn't it just be a loop where you look ahead at the next row, compare and then decide whether to keep looking or exit with the findings? Would something like the following be close to what you're looking for? (It's not thoroughly tested or optimized, but it might be a start.)
    Code:
    version 17.0
    
    clear *
    
    program define findMin, rclass
        version 17.0
        syntax , m(name)
    
        tempname delta
        local found 0
        local target_row .n
        forvalues this_row = 1/`=rowsof(`m')-1' {
            local next_row = `this_row' + 1
            scalar define `delta' = `m'[`next_row', 1] - `m'[`this_row', 1]
    // display in smcl as text "Rows " `this_row', `next_row', "Delta " `delta'
            if `delta' > 0 {
                local found 1
                local target_row `this_row'
                continue, break
            }
        }
        return scalar found = `found'
        return scalar target_row = `target_row'
    end
    
    * Minimum at 2
    matrix input M = (1 0 1 2)
    matrix define M = M'
    findMin, m(M)
    return list
    
    * Minimum at 3
    matrix input M = (1 0.5 0 1)
    matrix define M = M'
    findMin, m(M)
    return list
    
    * Minimum at 1
    matrix input M = (0 0.5 1 2)
    matrix define M = M'
    findMin, m(M)
    return list
    
    * Minimum at last (no minimum followed by a higher value, same as no minimum)
    matrix input M = (1 1 1 0)
    matrix define M = M'
    findMin, m(M)
    return list
    
    * No minimum at all
    matrix input M = (1 1 1 1)
    matrix define M = M'
    findMin, m(M)
    return list
    
    exit
    You could probably do something with the Mata function minindex(), too, and it would probably be more elegant.

    Comment


    • #3
      Thanks for this. We'll see, it's 4:30am where I am and I'm not up/at my computer yet, so after I've had/while I'm drinking coffee, we'll see if we can't optimize this.

      Comment


      • #4
        Yeah this pretty much worked. I had to contextualize it to my code, but this does the job. Thanks so much!

        Comment


        • #5
          Good. I'm not so sure now that what's returned by the Mata function lends itself well to this problem, but at least the Stata code can be cleaned up a little.
          Code:
          program define findMin, rclass
              version 17.0
              syntax , m(name)
          
              return scalar target_row = .n
              forvalues this_row = 1/`=rowsof(`m')-1' {
                  if `m'[`= `this_row' + 1', 1] > `m'[`this_row', 1] {
                      return scalar target_row = `this_row'
                      continue, break
                  }
              }
          end
          Last edited by Joseph Coveney; 11 Apr 2022, 19:16.

          Comment

          Working...
          X