Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help! Large sample-out of memory-

    Hello!

    I really need your help!

    I have a dataset with 122,401 firm-year observations. I run fixed effects models and I want to use firm fixed effects. However, since the large number of firms STATA does not run the regression,

    xi:xtreg la1 ma size1 cr1 prof1 i.gvkey, fe vce(cluster gvkey)

    and shows the following message:

    "no room to add more variables
    Up to 5,000 variables are currently allowed, although you could reset the maximum using set maxvar; see help memory.
    r(900);"

    I tried to change maxvar, by typing " set maxvar 32767", because I am using STATA/MP which by default has maxvar of 5,000, but then another message appears:

    ". set maxvar 32767
    no; data in memory would be lost
    r(4);"


    Could you please help me?

    Thank you very much in advance!

    Kind Regards,
    Stavroula

  • #2
    First, you are not running out of memory here. You are simply exceeding the number of variables currently allowed in your setup.

    Looking at the code, you are making one error, and doing one thing poorly.

    The error is that before you can change the setting for the maximum number of variables, you have to clear out whatever is in memory. So
    Code:
    clear
    set maxvar 32767
    Then reload your data set and start over.

    The thing you are doing poorly is using xi:. It's an old command that is nowadays only needed in rare situations, most of which pertain to archaic commands that have been superseded by newer ones that carry out the same functions. So it is probably best if you more or less forget you ever heard of xi. Instead, use factor-variable notation. In this case all you need to do is drop the -xi:- from the front of your -xtreg- command and you have it in factor-variable notation. Stata will know that you want to create indicator (dummy) variables for those variables prefixed with i., and it will do so for you automatically. Best of all, those variables are "virtual" variables that do not permanently occupy any memory and that do not count against the limit on the number of variables. So if you do this, you won't even need to reset the maximum number of variables allowed. See -help fvvarlist- for more information on factor variables.

    Now, on top of that, your -xtreg- command looks strange. How did you -xtset- your variable? I don't know anything about finance, but just from participating in this Forum, I have learned that gvkey is an identifier for firms or securities or something, and usually when it's in the data, it's the panel variable in the -xtset- command. If that's the case, having i.gvkey in your -xtreg- command is pointless: the -fe- part already handles gvkey-level fixed effects, so there is no need to explicitly include i.gvkey in the model. And, in fact, if you did -xtset gvkey-, the i.gvkey variables will all be colinear with the automatically generated fixed effects and Stata will omit them anyway.

    It only makes sense to have i.gvkey in there if the panel variable used in -xtset- is something else. And in that case I'm wondering what the something else is, and thinking that rather than using -xtreg- you might want to take a look at -reghdfe- (available from SSC), which does regression with multiple fixed effects.

    Comment


    • #3
      Mr Clyde Schechter,

      Thank you very much!! I changed maxvar, but now the problem is on matsize, already set at 11000-not enough-Anyway!

      Regarding, xtreg since I am pretty new in STATA use, I did not know that when a variable is the panel variable (xtset gvkey year) there is no need to included in regression. I donot want to check for multiple fixed effects (at the moments) so no need for reghdfe.

      I would like to ask you another question,since year is included in xtset, does xtreg captures year fixed effects also or do I need to include i.year in regression?

      Thank you very much for your help!

      Comment


      • #4
        I don't know why Stata is complaining about -matsize- here. Did you follow my advice to ditch -xi:- and go with factor variable notation, and omit i.gvkey from the list of predictors?

        Here's an example that shows that even with a puny -matsize- of 400, I can run a fixed effects regression with 20,000 groups:

        Code:
        . clear*
        
        . 
        . display c(matsize)
        400
        
        . 
        . set obs 200000
        number of observations (_N) was 0, now 200,000
        
        . gen int group = mod(_n, 20000)
        
        . 
        . set seed 1234
        
        . 
        . gen y = rnormal()
        
        . gen x = rnormal()
        
        . 
        . xtset group
               panel variable:  group (balanced)
        
        . xtreg y x, fe
        
        Fixed-effects (within) regression               Number of obs     =    200,000
        Group variable: group                           Number of groups  =     20,000
        
        R-sq:                                           Obs per group:
             within  = 0.0000                                         min =         10
             between = 0.0000                                         avg =       10.0
             overall = 0.0000                                         max =         10
        
                                                        F(1,179999)       =       0.76
        corr(u_i, Xb)  = 0.0003                         Prob > F          =     0.3848
        
        ------------------------------------------------------------------------------
                   y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                   x |   -.002049   .0023574    -0.87   0.385    -.0066695    .0025715
               _cons |  -.0049627   .0022355    -2.22   0.026    -.0093443   -.0005812
        -------------+----------------------------------------------------------------
             sigma_u |  .31838577
             sigma_e |  .99974825
                 rho |  .09208159   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(19999, 179999) = 1.01               Prob > F = 0.0896
        I would like to ask you another question,since year is included in xtset, does xtreg captures year fixed effects also or do I need to include i.year in regression?
        -xtreg- will not automatically include the year effects. You need to include i.year in the regression if you want year effects.

        Comment


        • #5
          I did not know what happened earlier, but know I did all (the same) process again. Thank you very much!!

          Comment


          • #6
            I did the process again and it works!

            Comment

            Working...
            X