Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Semi-parametric regression with figure for large dataset

    Dear,

    Greetings. I have used semipar function of Stata with graph option for 426,000 observations to estimate semi-parametric regression of dependent variable on several independent variables in linear form and one variable as a flexible functional form to be considered as non-parametric variable. I gave the command 48 hours ago, however, the command is still running. I do not know when I will get my output. Alternatively, i tried the same regression for a sub-sample of 1,000 observations but i have not got the output within 30 minutes, then i canceled the operation. For reference, I have shared my command and data sample below. Would you please help me out by identifying my mistake or suggesting efficient way to handle this problem? Thank you.

    Code:
    semipar mflow alpha mflowL logTNA logAGE EXPR REAR, nonpar(FSB) xtitle(FSB) ci
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(mflow alpha mflowL logTNA logAGE) double EXPR byte REAR float FSB
        .06142427  -.00007701437     .09345506  5.805526 1.2039728  .007 1   -.0005415289
        .02658373     .005997835    -.04395232  5.362667 1.2039728 .0079 1     .005259412
     .00018048484     .000999924    .023868304  4.417635  1.446919     0 1   -.0016591768
      -.035596263    .0016930813   .0026768625  3.795399  2.957511 .0069 1   -.0020691568
        .08475412    -.000187979     .06142427  5.895565 1.2286655  .007 1   -.0011507465
        .05474074     .008758709     .02658373  5.435581 1.2286655 .0079 1     .004928627
        .01248491    .0015635238   -.035596263  3.825201  2.961831 .0069 1   -.0044297883
        .06776663     .001072005  .00018048484 4.4886365  1.466337     0 1   -.0012998166
        .01606508    .0014608257     .01248491 3.8657696  2.966132 .0069 1    -.002229583
       .030364953     .007016675     .05474074  5.483921  1.252763 .0079 1    .0005581158
        .08002505    .0009161719     .06776663 4.5736794 1.4853852     0 1    -.001498271
         .1008181   -.0004140712     .08475412  6.009427  1.252763  .007 1   -.0009622335
        .04361444      .01112596    .030364953    5.5418 1.2762934 .0079 1   .00017001225
         .0813513    .0008834972     .08002505 4.6577625 1.5040774     0 1   -.0012104682
       .010070225    .0020960646     .01606508  3.895548 2.9704144 .0069 1   -.0006541223
        .10232854   -.0007423428      .1008181  6.124427 1.2762934  .007 1    .0005680091
        .11509213    .0010447002      .0813513  4.772378 1.5224266     0 1   -.0014547526
        .10861416   -.0009437219     .10232854  6.236493  1.299283  .007 1     .001206447
       .035726815     .014936568     .04361444  5.598429  1.299283 .0079 1   -.0000724135
       .002450603    .0022881522    .010070225  3.908898  2.974679 .0069 1    -.000651091
        .02406323     .018815147    .035726815  5.631749  1.321756 .0079 1    -.000399879
      .0002104824     .003690503    .002450603  3.918303  2.978925 .0069 1   -.0009268375
        .09328988  -.00017967848     .10861416  6.335515  1.321756  .007 1      .00128309
      -.022381427    .0010966233     .11509213 4.7570324  1.540445     0 1   -.0014206193
      .0008084829     .002905097   .0002104824  3.954546 2.9831536 .0069 1   -.0009438918
      -.005911467     .001314592   -.022381427 4.7613187 1.5581446     0 1    -.001390465
      .0005135556      .01891317     .02406323  5.651362 1.3437347 .0079 1   -.0004524317
         .0797721  -5.512099e-06     .09328988  6.438484 1.3437347  .007 1    .0013454867
        -.1496443    .0009976532   -.005911467 4.6021657 1.5755364     0 1   -.0012228353
     -.0030045186    .0031624115   .0008084829 3.9402995  2.987364 .0068 1   -.0003094229
         .1498355  -.00026522044      .0797721  6.561479  1.365241  .009 1     .002032796
        .05693629       .0210311   .0005135556  5.724588  1.365241 .0094 1    -.001676965
        .12017477   -.0009618956      .1498355  6.680723 1.3862944  .009 1   .00027987495
        .15430064    .0032259275  -.0030045186 4.0898175  2.991557 .0068 1   -.0013707738
        -.1023339    .0011437794     -.1496443  4.497585 1.5926307     0 1   -.0010388659
       .021618327     .017394518     .05693629  5.761507 1.3862944 .0094 1   -.0038289665
      -.006155239     .002595417     .15430064  4.078334  2.995732 .0068 1   -.0019615197
       .021909716    .0009664675     -.1023339 4.5196123  1.609438     0 1    -.001289533
        .07571987   -.0010825517     .12017477  6.748365 1.4069136  .009 1     .000951598
        .04003801     .013780753    .021618327   5.80599 1.4069136 .0094 1   -.0042301677
       .007516535    .0022012626   -.006155239  4.092309   2.99989 .0068 1   -.0025436056
     -.0030387016    .0006912753    .021909716   4.52396 1.6259673     0 1   -.0018465634
        .19640027     .011309516     .04003801  5.989282 1.4271163 .0094 1    -.005588527
        .06781927   -.0011145149     .07571987  6.820099 1.4271163  .009 1   .00024893033
      -.006190598    .0008545409  -.0030387016  4.522875 1.6422276     0 1   -.0009915882
    -.00006084931    .0013775495      -.005789  5.001931 3.1098046 .0124 1    -.009085847
        .06344786   -.0011151778     .06781927  6.897247  1.446919  .009 1   .00015062357
     .00027926321    .0015498564    .007516535  4.115127  3.004031 .0068 1    -.002954414
       -.11934887       .0101219     .19640027  5.874001  1.446919 .0094 1    -.006163663
       -.03606929    .0009498876   -.006190598 4.4897594  1.658228     0 1    -.001352241
      -.014404282   .00004046849    .002966798  6.489283  .9162908  .004 1   -.0016162507
       -.00804203     .001571646 -.00006084931  5.005958 3.1135154 .0124 1   -.0083020525
        .02720734     .001945871    .004736708 3.7395725  .9162908 .0033 1    -.002140493
         .0442791   -.0012927772     .06344786  6.954967  1.466337  .009 1  -.00007768975
        .03870124     .011073073    -.11934887  5.921289  1.466337 .0094 1    -.006003893
     -.0019808675     .001604684  .00027926321 4.1314316  3.008155 .0068 1   -.0020233255
      .0027059496    .0019753443  -.0019808675 4.1669908 3.0122616 .0068 1    -.001682181
         .0688294     .000771197    -.03606929 4.5591264 1.6739764     0 1    -.001737805
        .03265522     .010504276     .03870124  5.974219 1.4853852 .0094 1    -.004340083
     -.0001340596    .0011294313    -.00804203  5.028475  3.117212 .0124 1     -.00814419
        .07761426   .00006282002   -.014404282  6.567595  .9490805  .004 1     -.00153382
         .2501607    .0021982067     .02720734  3.980485  .9490805 .0033 1   -.0020693238
        .06638545    -.001253826      .0442791   7.04108 1.4853852  .009 1   -.0004652972
        .03656212     .010578724     .03265522  6.025277 1.5040774 .0094 1    -.004802327
       .002739633    .0023882026   .0027059496   4.17841 3.0163515 .0068 1    .0007547074
      -.006602606    .0014091296  -.0001340596  5.031744 3.1208954 .0124 1    -.007417512
       -.25867513  .000032680757     .07761426  6.126347  .9808293  .004 1   -.0013472944
       -.10021084    .0007533834      .0688294 4.4601445 1.6894805     0 1    -.002370697
        .00838021     .002551545      .2501607  3.996456  .9808293 .0033 1   -.0014210194
        .05404568   -.0008240653     .06638545  7.102348 1.5040774  .009 1    .0011466013
       -.02436311      .00082957    -.10021084  4.441474  1.704748     0 1    -.002380928
         .2471334 -.000015242078    -.25867513  6.349926  1.011601  .004 1    -.001354269
       .004450624    .0022587993     .00838021  4.020106  1.011601 .0033 1   -.0014581778
        .05190275   -.0008692344     .05404568  7.166162 1.5224266  .009 1    .0011411711
        .04491969       .0107694     .03656212  6.081511 1.5224266 .0094 1   -.0046173367
       -.01415376    .0021733474    .002739633  4.175617  3.020425 .0068 1    .0006252178
      -.015732674    .0016351067   -.006602606  5.025852  3.124565 .0124 1      -.0074384
         .7812297  .000015771011      .2471334  6.928217 1.0414538  .004 1    -.001261789
       -.01049875     .001428472   -.015732674  4.997212 3.1282215 .0124 1    -.007963448
       -.08099536    .0018958178    -.01415376  4.074805  3.024482 .0068 1 -.000020186097
        .03917046   -.0007660271     .05190275  7.190769  1.540445  .009 1     .001305357
       -.04821699     .010680643     .04491969  6.014485  1.540445 .0094 1     -.00557392
      -.065162584     .001205248     -.0227588  2.580444  .9162908  .002 1  -.00052613067
       -.04136892    .0005180608    -.03039349  2.482654  .9162908  .023 1    -.003348782
       -.07791165    .0009104254    -.02436311  4.352855  1.719786     0 1   -.0022730334
         .0334033     .002722006    .004450624  4.036009 1.0414538 .0033 1   -.0005900306
       -.02422193   -.0004587063    -.04136892 2.4550486  .9490805  .023 1    -.004382292
       .014255114   -.0005865431     .03917046  7.200355 1.5581446  .009 1   -.0002275679
        .04469506     .005719021    -.04821699  6.078034 1.5581446 .0094 1    -.008123649
       -.25867513   -.0000416555      .7812297  6.395076 1.0704415  .004 1   -.0013842816
        -.0956372    .0011813188   -.065162584 2.4767065  .9490805  .002 1   -.0019714439
       .003021754    .0015701263    -.08099536 4.0757394  3.028522 .0068 1   -.0010070017
     -.0034670064   -.0002723711    -.07791165 4.3463993  1.734601     0 1   -.0034285414
     -.0040416485   .00033518884    -.01049875  4.998563  3.131864 .0124 1    -.008548275
        .02394063    .0013275605      .0334033 4.0552573 1.0704415 .0033 1     -.00137038
       -.01041108    .0012512715    .003021754 4.0816307  3.032546 .0068 1   -.0018487276
      -.015636599    .0006142295  -.0040416485  5.005958  3.135494 .0124 1   -.0084052365
        .13667588  -.00012694816    -.25867513  6.525782 1.0986123  .004 1   -.0014079948
        .05260683   -.0007623637  -.0034670064  4.405499    1.7492     0 1    -.003968108
      -.005927357    -.000978984    -.02422193  2.458306  .9808293  .023 1    -.005293334
    end

  • #2
    Hi Anisul
    In my experience, Semipar is not very efficient when using large datasets. I am surprised that your command didnt work with the 1000 observation sample though.
    So the problem, (at least with the full dataset) is that Stata is technically running 400k regressions on the background. It isnt surprising it will take a very long time to get the results you want.
    And, for the smaller sample case, if you are using just a qualifier (" if "), then the same problem may still apply.
    My question would be. are you interested in the non parametric component? or the parametric one?
    Depending on which, perhaps I can suggest a feasible option.
    Fernando

    Comment


    • #3
      Dear, FernandoRios, It gives me output instantly for 100 observations which I shared in my post. Stata is still running for 60 hours for 426k observations to execute the semipar. I am interested in the nonparametric component (single variable), however, I need to control other variables together to get the orthogonalized effect of that single independent variable on the dependent variable. Thanks.
      Last edited by Anisul Islam; 25 Jul 2020, 11:03.

      Comment


      • #4
        So one option could be to run your model assuming the nonparametric component can be proxied with splines:
        say if z is the original variable, you can create simple cubic spline with knots:
        Code:
        ssc install f_able
        sum z,d
        local p25=r(p25)
        local p50=r(p50)
        local p75=r(p75)
        
        fgen z1=z^2
        fgen z2=z^3
        fgen z3=max(z-`p25',0)^3
        fgen z4=max(z-`p50',0)^3
        fgen z5=max(z-`p75',0)^3
        
        reg y x1 x2 x3 z z1 z2 z3 z4 z5 z6
        The "Z" variables will capture your nonlinear component. Now for the marginal effects or marginal means you could do:

        Code:
        f_able, nl(z1 z2 z3 z4 z5) 
        ** for marginal effects
        margins, dydx(z) at(z=(list of values)) nose nochain
        *for marginal means
        margins, at(z=(list of values)) nose nochain
        Perhaps this will be helpful.
        Otherwise, you may need to implement "semipar" by hand using Binned regressions rather than regressions for every value of z (Z being your FSB variable)

        HTH
        Fernando

        Comment


        • #5
          Dear, I have been able to follow your first part of the suggestion to get splines. I get the following output. However, i failed to implement second part as it reports numlist error. However, would you please help me to understand the output here? I am not familiar with this operation. It would be great if you kindly guide me to understand the story from this output. I was looking for semiparametric regression to test nonlinearity between flow (y) and FSB(x). Thank you very much,


          Code:
          . reg mflow alpha alpha2 mflowL logTNA logAGE EXPR REAR FSB FSB1 FSB2 FSB3 FSB4 FSB5
          note: FSB4 omitted because of collinearity
          
                Source |       SS           df       MS      Number of obs   =   425,836
          -------------+----------------------------------   F(12, 425823)   =   1421.27
                 Model |  118.667025        12  9.88891874   Prob > F        =    0.0000
              Residual |   2962.7895   425,823  .006957796   R-squared       =    0.0385
          -------------+----------------------------------   Adj R-squared   =    0.0385
                 Total |  3081.45653   425,835  .007236269   Root MSE        =    .08341
          
          ------------------------------------------------------------------------------
                 mflow |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                 alpha |   .5087217   .0255469    19.91   0.000     .4586505    .5587929
                alpha2 |  -.4084067   .0325539   -12.55   0.000    -.4722114   -.3446021
                mflowL |   .1501596   .0015137    99.20   0.000     .1471928    .1531263
                logTNA |   .0010007   .0000611    16.37   0.000     .0008809    .0011205
                logAGE |  -.0130837   .0002119   -61.75   0.000    -.0134989   -.0126684
                  EXPR |  -.3529388    .027012   -13.07   0.000    -.4058815   -.2999961
                  REAR |  -.0024057   .0002997    -8.03   0.000    -.0029931   -.0018182
                   FSB |  -.0012607   .0002303    -5.47   0.000    -.0017121   -.0008092
                  FSB1 |  -.0001385   .0000274    -5.05   0.000    -.0001922   -.0000847
                  FSB2 |  -1.26e-06   2.95e-07    -4.29   0.000    -1.84e-06   -6.86e-07
                  FSB3 |    .000108   .0000381     2.84   0.005     .0000334    .0001826
                  FSB4 |          0  (omitted)
                  FSB5 |  -.0001061   .0000383    -2.77   0.006    -.0001811   -.0000311
                 _cons |   .0323965   .0005366    60.37   0.000     .0313447    .0334482
          ------------------------------------------------------------------------------

          Comment


          • #6
            can you show me exactly what is the error you obtain?. Also perhaps may be wise to change the scale of FSB. I think the numbers are two small for f_able to work correctly
            Fernando

            Comment


            • #7
              I have got the following output for marginal effect and marginal mean command. It is showing non estimable.
              Code:
              . margins, dydx(FSB) at(FSB=(-1.545921 -.695437 -.4325278 -.166775 -.0059189 0 .2235194 .635494 .9895068 2.040177)) nose nochain
              
              Average marginal effects                        Number of obs     =    425,836
              
              Expression   : Fitted values, predict()
              dy/dx w.r.t. : FSB
              
              1._at        : FSB             =   -1.545921
              
              2._at        : FSB             =    -.695437
              
              3._at        : FSB             =   -.4325278
              
              4._at        : FSB             =    -.166775
              
              5._at        : FSB             =   -.0059189
              
              6._at        : FSB             =           0
              
              7._at        : FSB             =    .2235194
              
              8._at        : FSB             =     .635494
              
              9._at        : FSB             =    .9895068
              
              10._at       : FSB             =    2.040177
              
              ------------------------------------------------------------------------------
                           |      dy/dx
              -------------+----------------------------------------------------------------
              FSB          |
                       _at |
                        1  |          .  (not estimable)
                        2  |          .  (not estimable)
                        3  |          .  (not estimable)
                        4  |          .  (not estimable)
                        5  |          .  (not estimable)
                        6  |          .  (not estimable)
                        7  |          .  (not estimable)
                        8  |          .  (not estimable)
                        9  |          .  (not estimable)
                       10  |          .  (not estimable)
              ------------------------------------------------------------------------------
              
              . 
              end of do-file
              
              . do "C:\Users\HP\AppData\Local\Temp\STDbb4_000000.tmp"
              
              . margins, at(FSB=(-1.545921 -.695437 -.4325278 -.166775 -.0059189 0 .2235194 .635494 .9895068 2.040177)) nose nochain
              
              Predictive margins                              Number of obs     =    425,836
              
              Expression   : Fitted values, predict()
              
              1._at        : FSB             =   -1.545921
              
              2._at        : FSB             =    -.695437
              
              3._at        : FSB             =   -.4325278
              
              4._at        : FSB             =    -.166775
              
              5._at        : FSB             =   -.0059189
              
              6._at        : FSB             =           0
              
              7._at        : FSB             =    .2235194
              
              8._at        : FSB             =     .635494
              
              9._at        : FSB             =    .9895068
              
              10._at       : FSB             =    2.040177
              
              ------------------------------------------------------------------------------
                           |     Margin
              -------------+----------------------------------------------------------------
                       _at |
                        1  |          .  (not estimable)
                        2  |          .  (not estimable)
                        3  |          .  (not estimable)
                        4  |          .  (not estimable)
                        5  |          .  (not estimable)
                        6  |          .  (not estimable)
                        7  |          .  (not estimable)
                        8  |          .  (not estimable)
                        9  |          .  (not estimable)
                       10  |          .  (not estimable)
              ------------------------------------------------------------------------------

              Comment


              • #8
                try again adding the option noestimcheck
                And let me know if that works.

                Comment


                • #9
                  This time it works well. I get the following output. I plot the marginal means against the predictor values of FSB in excel and find a slight nonlinear line. However, my parametric regression output suggests that it should be the perfect nonlinear line. I wish i could run the semipar for semiparametric partial linear regression for my 500K+ observations as the article I am following refers to the Robinson (1988)'s approach of estimating semi-parametric regression which i think is available through semipar. I want a scatter plot line with a confidence interval from this regression. But don't know how to get this from your apporach of using spline. i alternatively tried npregress kernel, but it is also taking inadequately large time for which i let my PC awake for whole 48 hours. However, i failed to get the output as it was running still after 48 hours. Thanks.

                  Code:
                  . f_able, nl(FSB1 FSB2 FSB3 FSB4 FSB5) 
                  
                  . ** for marginal effects
                  . margins, dydx(FSB) at(FSB=(-1.545921 -.695437    -.4325278 -.166775 -.0059189 0 .2235194 .635494 .9895068 2.040177)) nose nochain noesti
                  > mcheck
                  
                  Average marginal effects    Number of obs     =    425,836
                  
                  Expression   : Fitted values, predict()
                  dy/dx w.r.t. : FSB
                  
                  1._at        : FSB             =   -1.545921
                  
                  2._at        : FSB             =    -.695437
                  
                  3._at        : FSB             =   -.4325278
                  
                  4._at        : FSB             =    -.166775
                  
                  5._at        : FSB             =   -.0059189
                  
                  6._at        : FSB             =           0
                  
                  7._at        : FSB             =    .2235194
                  
                  8._at        : FSB             =     .635494
                  
                  9._at        : FSB             =    .9895068
                  
                  10._at       : FSB             =    2.040177
                  
                      
                  dy/dx
                      
                  FSB          
                  _at 
                  1    -.0008416
                  2    -.0010699
                  3    -.0011416
                  4    -.0012146
                  5    -.0012506
                  6    -.0012516
                  7    -.0012734
                  8    -.0012837
                  9     -.001292
                  10    -.0013138
                      
                  
                  . *for marginal means
                  . margins, at(FSB=(-1.545921 -.695437 -.4325278    -.166775 -.0059189 0 .2235194 .635494 .9895068 2.040177)) nose nochain noestimcheck
                  
                  Predictive margins    Number of obs     =    425,836
                  
                  Expression   : Fitted values, predict()
                  
                  1._at        : FSB             =   -1.545921
                  
                  2._at        : FSB             =    -.695437
                  
                  3._at        : FSB             =   -.4325278
                  
                  4._at        : FSB             =    -.166775
                  
                  5._at        : FSB             =   -.0059189
                  
                  6._at        : FSB             =           0
                  
                  7._at        : FSB             =    .2235194
                  
                  8._at        : FSB             =     .635494
                  
                  9._at        : FSB             =    .9895068
                  
                  10._at       : FSB             =    2.040177
                  
                      
                  Margin
                      
                  _at 
                  1     .0051981
                  2     .0043857
                  3      .004095
                  4     .0037819
                  5     .0035834
                  6      .003576
                  7     .0032932
                  8     .0027665
                  9     .0023106
                  10     .0009413

                  Comment


                  • #10
                    I have a question about combining graphs from Verardi & Debarsy's semipar package.
                    As an example, what would be the best way to display the following two graphs in one (not through graph combine as two separate plots in one figure, but as one plot with both estimated lines)
                    use https://www.stata-press.com/data/r18/auto
                    semipar price weight if foreign==0, nonpar(mpg) ci
                    semipar price weight if foreign==1, nonpar(mpg) ci
                    I know how I can capture (generate) the point estimates that I could use to write other graphs. But how can I capture the estimated CI (of s.e.) for each case so I can add the confidence bands? Or how might I go about merging the results into one graph?
                    Many thanks in advance for any hints or help!

                    Comment

                    Working...
                    X