Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • I would like Stata to develop some options for programming options for Mata to "easily" develop programs that can use multiple processors at the same time.


    Comment


    • Originally posted by FernandoRios View Post
      I would like Stata to develop some options for programming options for Mata to "easily" develop programs that can use multiple processors at the same time.

      That would be awesome. Something like parallised loops would be a great thing!

      Comment


      • This is very basic, but:

        Robust ANOVA options, like Welch's F-test or Brown-Forsythe's F-test.

        Psychology has a fixation with ANOVA and it's still pretty normative for reviewers to request these. I <think> these as add-ons to the oneway or anova command as options would be terrific.

        Comment


        • David Speed, I agree with your suggestions in #513. Meanwhile, here is some code I cobbled together earlier this year to compute Welch's F. Maybe it will help in the meanwhile.

          Code:
          // Date:  15-Jun-2021
          // Use equations shown in Andy Field's document to compute
          // Welch's F-test.
          // https://www.discoveringstatistics.com/docs/welchf.pdf
          // The data AF uses come from this document:
          // https://www.discoveringstatistics.com/repository/contrasts.pdf
          
          clear
          input byte group y
          1 3  
          1 2  
          1 1  
          1 1  
          1 4
          2 5  
          2 2  
          2 4  
          2 2  
          2 3
          3 7  
          3 4  
          3 5  
          3 3  
          3 6
          end
          
          preserve
          drop if missing(group)
          statsby Ybar=r(mean) Var=r(Var) n=r(N), by(group) clear:  summarize y
          list
          quietly {
          generate double  Wt        = n/Var
          generate double  WtMean    = Wt*Ybar
          egen     double  SumWt     = total(Wt)
          egen     double  SumWtMean = total(WtMean)
          generate double  WelchGM   = SumWtMean/SumWt
          generate double  WtSqDev   = Wt*(Ybar-WelchGM)^2
          egen     double  SSmodel   = total(WtSqDev)
          generate byte    k         = _N // # of groups   
          generate double  MSmodel   = SSmodel/(k-1)
          generate double  Lterm     = (1-Wt/SumWt)^2 / (n-1)
          egen     double  lambda    = total(Lterm)
          replace          lambda    = lambda*3/(k^2-1)
          generate double  MSerror   = 1+2*lambda*(k-2)/3
          generate double  WelchF    = MSmodel/MSerror
          generate byte    df1       = k-1
          generate double  df2       = 1/lambda
          generate double  p         = Ftail(df1,df2,WelchF)
          }
          list WelchF-p in 1
          display               _newline ///
          "From SPSS ONEWAY:"   _newline ///
          "Welch F = "4.320451  _newline ///
          "    df1 = " 2              _newline ///
          "    df2 = " 7.943375     _newline ///    
          "      p = " .053738
          
          restore
          
          // Date:  17-Aug-2021
          // Compare Welch's F-test to multilevel modeling approach
          // in post #3 here:
          // https://www.statalist.org/forums/forum/general-stata-discussion/general/1427959-anova-rejection-of-bartlett-test
          // Note that I have added the dfmethod(anova) option.
          
          mixed y i.group, ml residuals(independent, by(group)) ///
          nolrtest nolog dfmethod(anova)
          
          margins group
          margins group, pwcompare(effects) mcompare(noadjust)
          
          // Compare to -regress- with vce(robust)
          regress y i.group, vce(robust)
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment


          • I'd guess that if this was technically feasible it would have been done a long time ago but I'll put in a pitch for "on the fly" use of variable transformations in estimation commands, e.g.
            Code:
            reg log(y) x1 x2
            poisson y x x^2 x^3
            etc.
            A subset of RHS transformations can be effected using factor variable notation, of course.

            Comment


            • John Mullahy;n1686463]I'd guess that if this was technically feasible it would have been done a long time ago but I'll put in a pitch for "on the fly" use of variable transformations in estimation commands, e.g.
              Code:
              reg log(y) x1 x2
              poisson y x x^2 x^3
              etc.
              A subset of RHS transformations can be effected using factor variable notation, of course.[/QUOTE]

              I think it must be easily implementable since first differences in the time domain are possible and that's a simple mathematical operations.
              Even from a coding perspective it should be easy to implement. Neither (, ^ or other mathematical operators are allowed in variablenames.

              Comment


              • Interesting.
                I think someone else already did something like that some time ago. I also wrote a small script that works at least with regress -fgreg- (function for regress).
                It actually does what John suggests, with the caveat that it "recognizes" functions based on spaces.
                fgreg y x1 x2 x1*x2 x1^2
                is valid, but this:
                freg y x1 x2 x1 ^ 2
                is not.
                Something i haven't thought about it. I could technically combine this with -f_able- to still be able to estimate marginal effects!
                Attached Files

                Comment


                • John Mullahy #515 , indeed Jeroen Weesie contributed -exprcmd- and -expr- to the STB way back in 1998. I still use it for quick exploration.
                  His -expr:- can be used as a prefix as follows.

                  Code:
                  . sysuse auto
                  (1978 automobile data)
                  
                  . expr: reg log(price) weight rep78
                  
                  Expression _Expr0 := log(price)
                  -> reg  _Expr0 weight rep78  
                  
                        Source |       SS           df       MS      Number of obs   =        69
                  -------------+----------------------------------   F(2, 66)        =     21.24
                         Model |  4.00927365         2  2.00463683   Prob > F        =    0.0000
                      Residual |  6.22817243        66  .094366249   R-squared       =    0.3916
                  -------------+----------------------------------   Adj R-squared   =    0.3732
                         Total |  10.2374461        68  .150550678   Root MSE        =    .30719
                  
                  ------------------------------------------------------------------------------
                        _Expr0 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                        weight |    .000333   .0000513     6.50   0.000     .0002307    .0004354
                         rep78 |   .1272241   .0410658     3.10   0.003     .0452337    .2092146
                         _cons |   7.196296   .2500145    28.78   0.000     6.697125    7.695466
                  ------------------------------------------------------------------------------
                  
                  . expr: poisson price weight weight^2 rep78
                  
                  Expression _Expr0 := weight^2
                  -> poisson  price weight _Expr0 rep78  
                  
                  Iteration 0:   log likelihood = -21667.605  
                  Iteration 1:   log likelihood = -21655.464  
                  Iteration 2:   log likelihood = -21655.463  
                  
                  Poisson regression                                    Number of obs =       69
                                                                        LR chi2(2)    = 36753.75
                                                                        Prob > chi2   =   0.0000
                  Log likelihood = -21655.463                           Pseudo R2     =   0.4591
                  
                  ------------------------------------------------------------------------------
                         price | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                        weight |  -.0005043   .0000145   -34.90   0.000    -.0005326    -.000476
                        _Expr0 |   1.36e-07   2.20e-09    61.86   0.000     1.32e-07    1.40e-07
                         rep78 |   .0955141   .0018534    51.53   0.000     .0918815    .0991466
                         _cons |   8.555204    .025696   332.94   0.000      8.50484    8.605567
                  -
                  Alternatively, issuing his -exprcmd- once turns on his interpreter until it is subsequently turned off like this:

                  Code:
                  . sysuse auto
                  (1978 automobile data)
                  
                  . exprcmd
                  
                  (expr) . reg log(price) weight rep78
                  
                  Expression _Expr0 := log(price)
                  -> reg  _Expr0 weight rep78  
                  
                        Source |       SS           df       MS      Number of obs   =        69
                  -------------+----------------------------------   F(2, 66)        =     21.24
                         Model |  4.00927365         2  2.00463683   Prob > F        =    0.0000
                      Residual |  6.22817243        66  .094366249   R-squared       =    0.3916
                  -------------+----------------------------------   Adj R-squared   =    0.3732
                         Total |  10.2374461        68  .150550678   Root MSE        =    .30719
                  
                  ------------------------------------------------------------------------------
                        _Expr0 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                        weight |    .000333   .0000513     6.50   0.000     .0002307    .0004354
                         rep78 |   .1272241   .0410658     3.10   0.003     .0452337    .2092146
                         _cons |   7.196296   .2500145    28.78   0.000     6.697125    7.695466
                  ------------------------------------------------------------------------------
                  
                  (expr) . poisson price weight weight^2 rep78
                  
                  Expression _Expr0 := weight^2
                  -> poisson  price weight _Expr0 rep78  
                  
                  Iteration 0:   log likelihood = -21667.605  
                  Iteration 1:   log likelihood = -21655.464  
                  Iteration 2:   log likelihood = -21655.463  
                  
                  Poisson regression                                    Number of obs =       69
                                                                        LR chi2(2)    = 36753.75
                                                                        Prob > chi2   =   0.0000
                  Log likelihood = -21655.463                           Pseudo R2     =   0.4591
                  
                  ------------------------------------------------------------------------------
                         price | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                        weight |  -.0005043   .0000145   -34.90   0.000    -.0005326    -.000476
                        _Expr0 |   1.36e-07   2.20e-09    61.86   0.000     1.32e-07    1.40e-07
                         rep78 |   .0955141   .0018534    51.53   0.000     .0918815    .0991466
                         _cons |   8.555204    .025696   332.94   0.000      8.50484    8.605567
                  ------------------------------------------------------------------------------
                  
                  (expr) . .
                  His package also includes a cute utility -diest- to display the variable definitions of the temporary variables constructed by -expr- or -exprcmd-:


                  Code:
                  . sysuse auto
                  (1978 automobile data)
                  
                  . expr: reg log(price) weight weight^2 rep78
                  
                  Expression _Expr0 := log(price)
                  Expression _Expr1 := weight^2
                  -> reg  _Expr0 weight _Expr1 rep78  
                  
                        Source |       SS           df       MS      Number of obs   =        69
                  -------------+----------------------------------   F(3, 65)        =     18.23
                         Model |  4.67772488         3  1.55924163   Prob > F        =    0.0000
                      Residual |   5.5597212        65  .085534172   R-squared       =    0.4569
                  -------------+----------------------------------   Adj R-squared   =    0.4319
                         Total |  10.2374461        68  .150550678   Root MSE        =    .29246
                  
                  ------------------------------------------------------------------------------
                        _Expr0 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                        weight |    -.00067   .0003621    -1.85   0.069    -.0013932    .0000532
                        _Expr1 |   1.60e-07   5.74e-08     2.80   0.007     4.58e-08    2.75e-07
                         rep78 |   .0963816   .0406237     2.37   0.021     .0152506    .1775127
                         _cons |   8.768601   .6107286    14.36   0.000     7.548892     9.98831
                  ------------------------------------------------------------------------------
                  
                  . diest
                  ------------------------------------------------------------------------------
                    _Expr0 log(price)                      |    Coef.   Std. Err.     t    P>|t|
                  -----------------------------------------+------------------------------------
                    weight Weight (lbs.)                   |   -.00067  .0003621  -1.850   0.069
                    _Expr1 weight^2                        |  1.60e-07  5.74e-08   2.796   0.007
                     rep78 Repair record 1978              |  .0963816  .0406237   2.373   0.021
                     _cons                                 |  8.768601  .6107286  14.358   0.000
                  ------------------------------------------------------------------------------
                  It would be nice if these commands were updated to accommodate some of the capabilities of newer Stata versions.

                  Comment


                  • A minor but nice option would be to allow the xline and yline options in twoway to have suboptions that would result in the line(s) appearing atop the plotted information rather than beneath it.

                    In the following code I've used the standard approach and then generated the desired result using twoway function to draw the lines atop of the plotted information.

                    Code:
                    cap preserve
                    cap drop _all
                    
                    sysuse auto
                    
                    qui sum mpg
                    loc m=r(mean)
                    
                    tw hist mpg, plotr(marg(zero)) xli(`m', lco(red)) saving(gsl1, replace)
                    
                    tw (hist mpg, plotr(marg(zero)) leg(off)) ///
                       (function y=`m', horiz ra(0 .1) lco(red) lpa(solid) saving(gsl2, replace))
                       
                       
                    drop _all
                    
                    set seed 2345
                    set obs 2000
                    
                    gen x=rnormal(0 1)
                    gen y=rnormal(0 1)
                    qui sum x
                    loc mx=r(mean)
                    loc mnx=r(min)
                    loc mxx=r(max)
                    
                    qui sum y
                    loc my=r(mean)
                    loc mny=r(min)
                    loc mxy=r(max)
                    
                    tw scatter y x, plotr(marg(zero)) xli(`mx', lco(red)) yli(`my',lco(red)) xla(-2(1)5) saving(gsl3, replace)
                    
                    tw (scatter y x, plotr(marg(zero)) xsc(r(`mnx' `mxx')) xla(-2(1)5) ysc(r(`mny' `mxy')) leg(off)) ///
                       (function y=`my', ra(`mnx' `mxx') lco(red) lpa(solid) ) ///
                       (function y=`mx', horiz ra(`mny' `mxy') lpa(solid) lco(red) saving(gsl4, replace))
                    
                    cap restore

                    Comment


                    • I haven't checked if anyone else asked for this.
                      I've just started using frames, and it would be of great help if -frame put- had an option replace and an option change.
                      Kind regards

                      nhb

                      Comment


                      • LASSO for survival analysis (Cox)? Having the capability to do Tukey post hoc tests after ANOVA would be great too.

                        Comment


                        • According to the Stata/MP performance report, many functions benefit from having multiple cores, at the exception of mgarch and mswitch; the problem with mgarch and mswitch is that they can take a very long time to run; my suggestion for Stata 18 would be to parallelize those, but I would like to ask the community first if it's even possible, does the algorithm underneath allows parallelization for those two commands?

                          Comment


                          • Bruce Weaver Thanks for the code in #514!

                            Comment


                            • I would love to see some built-in JSON functionality , despite its flexibility and non-rectangular philosophy. Currently , we depend on python libraries, via SFI, or community-packages, -jsonio- (https://wbuchanan.github.io/StataJSON/) and -insheetjson- (http://fmwww.bc.edu/RePEc/bocode/i/insheetjson.html)

                              A drop-down command in File section menu to convert it to a mata matrix or associative arrays, would be a possible first step towards.

                              Comment


                              • Bruce Weaver

                                Hi Bruce,

                                I also had to use Field's equations to figure out what was going on with the robust versions. I've wrote code awhile back to produce Brown-Forsythe's and Welch's F statistics, but it's much clunkier than yours - I'd never seen the statsby command before!

                                Cheers,

                                David.

                                Comment

                                Working...
                                X