Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generalized additive model: problem is too big for gamfit.exe r(2002)


    I used to use R for GAM.
    I tried to use Stata for GAM but encountered the captioned error message
    "problem is too big for gamfit.exe (r2002)".

    . gam labyef age,df(3)
    [Approximate problem size: 907000 reals. Available:70000 reals.]
    problem is too big for c:\ado\gamfit.exe
    r(2002);

    How can deal with this?

    The dataset is:
    . sum labyef age

    Variable | Obs Mean Std. Dev. Min Max
    -------------+--------------------------------------------------------
    labyef | 19744 7.438359 3.326544 1 13
    age | 19744 24.11066 30.11334 -.0027397 101.9753


  • #2
    I would try npregress.


    I think you're alluding to

    STB-42 sg79 . . . . . . . . . . . . . . . . . . Generalized additive models
    . . . . . . . . . . . . . . . . . . . . . . . P. Royston and G. Ambler
    3/98 pp.38--43; STB Reprints Vol 7, pp.217--224
    interface between Stata and a slightly modified version of
    the FORTRAN program GAMFIT, written by T. J. Hastie and
    R. J. Tibshirani

    which was a major contribution at the time (1998). Official Stata so far has not tried to implement generalized additive models as such, but for an outcome and a single predictor, there are many loosely similar commands.

    What kind of age do you have that runs over [-.0027397, 101.9753] ?


    Comment


    • #3
      According to its help file, ". . . a model with a constant and a single predictor (i.e. #V = 2) the biggest problem that can be fit is N = 783495." So you shouldn't be having a problem with only 19744 observations. See below for illustration of that. (I've recast the response variable to double as a worst case.)

      Make sure that you're using the current version, which is available on SSC and on Patrick Royston's website. (The older version on James Hardin's directory on StataCorp's website has two executables, one for smaller datasets and another for larger datasets.)

      .ÿ
      .ÿversionÿ17.0

      .ÿ
      .ÿclearÿ*

      .ÿ
      .ÿ//ÿseedem
      .ÿsetÿseedÿ1181128936

      .ÿ
      .ÿquietlyÿsetÿobsÿ19744

      .ÿ
      .ÿgenerateÿdoubleÿageÿ=ÿruniform(-.0027397,ÿ101.9753)

      .ÿ
      .ÿgenerateÿbyteÿlabyefÿ=ÿruniformint(1,ÿ13)

      .ÿ
      .ÿglobalÿGAMDIRÿ`c(pwd)'\

      .ÿ
      .ÿrecastÿdoubleÿlabyef

      .ÿ
      .ÿgamÿlabyefÿage,ÿfamily(gaussian)ÿlink(identity)ÿdf(3)

      19744ÿrecordsÿmerged.

      GeneralizedÿAdditiveÿModelÿwithÿfamilyÿgauss,ÿlinkÿident.

      Modelÿdfÿÿÿÿÿ=ÿÿÿÿÿ4.000ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNo.ÿofÿobsÿ=ÿÿÿÿÿ19744
      Devianceÿÿÿÿÿ=ÿÿÿÿ278617ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿDispersionÿ=ÿÿÿ14.1144
      -------------------------------------------------------------------------
      ÿÿÿÿÿÿlabyefÿ|ÿÿÿdfÿÿÿÿLin.ÿCoef.ÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿÿÿÿÿGainÿÿÿÿP>Gain
      -------------+-----------------------------------------------------------
      ÿÿÿÿÿÿÿÿÿageÿ|ÿÿ3.004ÿÿÿ-.000137ÿÿÿ.0009102ÿÿÿÿ-0.151ÿÿÿÿÿ9.521ÿÿÿÿ0.0086
      ÿÿÿÿÿÿÿ_consÿ|ÿÿÿÿÿÿ1ÿÿÿÿ6.99326ÿÿÿÿ.026737ÿÿÿ261.557ÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿ.
      -------------------------------------------------------------------------
      Totalÿgainÿ(nonlinearityÿchisquare)ÿ=ÿÿÿÿÿ9.521ÿ(2.004ÿdf),ÿPÿ=ÿ0.0086

      .ÿ
      .ÿexit

      endÿofÿdo-file


      .

      Comment


      • #4
        Originally posted by Nick Cox View Post
        I would try npregress.


        I think you're alluding to

        STB-42 sg79 . . . . . . . . . . . . . . . . . . Generalized additive models
        . . . . . . . . . . . . . . . . . . . . . . . P. Royston and G. Ambler
        3/98 pp.38--43; STB Reprints Vol 7, pp.217--224
        interface between Stata and a slightly modified version of
        the FORTRAN program GAMFIT, written by T. J. Hastie and
        R. J. Tibshirani

        which was a major contribution at the time (1998). Official Stata so far has not tried to implement generalized additive models as such, but for an outcome and a single predictor, there are many loosely similar commands.

        What kind of age do you have that runs over [-.0027397, 101.9753] ?

        Thank you very much for your response.
        Age is for human subjects. -0.0027397 is one or two days before delivery

        Comment


        • #5
          Originally posted by Joseph Coveney View Post
          According to its help file, ". . . a model with a constant and a single predictor (i.e. #V = 2) the biggest problem that can be fit is N = 783495." So you shouldn't be having a problem with only 19744 observations. See below for illustration of that. (I've recast the response variable to double as a worst case.)

          Make sure that you're using the current version, which is available on SSC and on Patrick Royston's website. (The older version on James Hardin's directory on StataCorp's website has two executables, one for smaller datasets and another for larger datasets.)

          .ÿ
          .ÿversionÿ17.0

          .ÿ
          .ÿclearÿ*

          .ÿ
          .ÿ//ÿseedem
          .ÿsetÿseedÿ1181128936

          .ÿ
          .ÿquietlyÿsetÿobsÿ19744

          .ÿ
          .ÿgenerateÿdoubleÿageÿ=ÿruniform(-.0027397,ÿ101.9753)

          .ÿ
          .ÿgenerateÿbyteÿlabyefÿ=ÿruniformint(1,ÿ13)

          .ÿ
          .ÿglobalÿGAMDIRÿ`c(pwd)'\

          .ÿ
          .ÿrecastÿdoubleÿlabyef

          .ÿ
          .ÿgamÿlabyefÿage,ÿfamily(gaussian)ÿlink(identity)ÿdf(3)

          19744ÿrecordsÿmerged.

          GeneralizedÿAdditiveÿModelÿwithÿfamilyÿgauss,ÿlinkÿident.

          Modelÿdfÿÿÿÿÿ=ÿÿÿÿÿ4.000ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNo.ÿofÿobsÿ=ÿÿÿÿÿ19744
          Devianceÿÿÿÿÿ=ÿÿÿÿ278617ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿDispersionÿ=ÿÿÿ14.1144
          -------------------------------------------------------------------------
          ÿÿÿÿÿÿlabyefÿ|ÿÿÿdfÿÿÿÿLin.ÿCoef.ÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿÿÿÿÿGainÿÿÿÿP>Gain
          -------------+-----------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿageÿ|ÿÿ3.004ÿÿÿ-.000137ÿÿÿ.0009102ÿÿÿÿ-0.151ÿÿÿÿÿ9.521ÿÿÿÿ0.0086
          ÿÿÿÿÿÿÿ_consÿ|ÿÿÿÿÿÿ1ÿÿÿÿ6.99326ÿÿÿÿ.026737ÿÿÿ261.557ÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿ.
          -------------------------------------------------------------------------
          Totalÿgainÿ(nonlinearityÿchisquare)ÿ=ÿÿÿÿÿ9.521ÿ(2.004ÿdf),ÿPÿ=ÿ0.0086

          .ÿ
          .ÿexit

          endÿofÿdo-file


          .
          Thank you very much for your reply.
          Following your advice, I downloaded gam.zip from
          https://ideas.repec.org/c/boc/bocode/s428701.html

          (I did not have idea about where is gam in SSC and where is Patrick's official download site).

          gam.zip contained gam.exe and some other files. I extracted these files into c:\ado

          I ran gam command from stata but received the same error message.

          . gam boxcox_igg age yearm,df(3)
          problem is too big for c:\ado\gamfit.exe
          r(2002);

          So I deleted gamfit.exe, and tried again:

          . gam boxcox_igg age yearm,df(3)
          GAMFIT failure, c:\ado\gamfit.exe not found
          [Approximate problem size: 984000 reals. Available:70000 reals.]
          problem is too big for c:\ado\gamfit.exe
          r(2002);

          Anyway, I use R for GAM for the time being. Thank you.
          Last edited by Yoshiro Nagao; 14 Feb 2022, 20:03. Reason: additional information is necessary

          Comment


          • #6
            I will write to Patrick Royston about how to install his gam into stata

            Comment


            • #7
              Originally posted by Yoshiro Nagao View Post
              I ran gam command from stata but received the same error message.
              It's a different dataset from the one for which you provided a summary of in the first post.

              But the error message is puzzling in that it claims that you have "907000 reals" while the output of -summarize- (in the first post above) indicates that you have fewer than 20 000 nonmissing observations in the two relevant variables.

              I see that you've already deleted the executable file, rendering the command unusable (see the new error message), and gone back to R.

              But if you decide to try again, then maybe you can trim the dataset down to only those variables and mutually nonmissing observations needed for fitting the model, that is
              Code:
              keep boxcox_igg age yearm
              keep if !missing(boxcox_igg, age, yearm)
              compress
              count
              before invoking the command.

              Comment


              • #8
                Yoshiro, I had the same issue that you had after downloading the gam package from https://ideas.repec.org/c/boc/bocode/s428701.html
                It is a simple problem that the file is zipped so that Stata can't read it. Find the actual package in your ado user file and extract it and everything should work fine. Cheers, Jessie

                Comment

                Working...
                X