Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to add fixed effect in sureg Stata?

    Hi everyone,

    I have a question about how to add fixed effect in sureg and I wonder if anyone of you knows how to do this in Stata.

    I have six equations and I want to run regression simultaneously, so I use sureg for the six equations. However, I also want to add fixed effect into the model, does anyone knows if I can just add ID dummy and date dummy in the sureg. Actually I have already tried this, but the error shows that unable to allocate matrix rows and columns. I'm not sure if that's because I have too many data points. I have 10,000 ID and 400 date.

    Please let me know if you have any thoughts on this.

    Thanks a lot!

  • #2
    It's not the number of data points that's biting here, it's the number of IDs and equations. Each ID (except one) adds a row and column to the matrix that -sureg- has to create. And then you have to multiply that by 6. That blows way past the maximum matrix size for Stata SE (11,000 x 11,000). Now, Stata MP is more generous, allowing the dimension to go up to 65,534x65,534, so depending on how many other variables you are working with, you might just barely fit this in Stata MP.

    When you run -xtreg, fe- after -xtset ID-, you don't encounter this kind of problem because Stata does not add ID indicators ("dummies") to the model. Instead, behind the scenes, it de-means everything within the groups defined by ID and then runs the regression on the demeaned data. You can try this approach. The -xtdata- command will make the demeaning simple. See -help xtdata-. Then run -sureg- with the demeaned variables.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      It's not the number of data points that's biting here, it's the number of IDs and equations. Each ID (except one) adds a row and column to the matrix that -sureg- has to create. And then you have to multiply that by 6. That blows way past the maximum matrix size for Stata SE (11,000 x 11,000). Now, Stata MP is more generous, allowing the dimension to go up to 65,534x65,534, so depending on how many other variables you are working with, you might just barely fit this in Stata MP.

      When you run -xtreg, fe- after -xtset ID-, you don't encounter this kind of problem because Stata does not add ID indicators ("dummies") to the model. Instead, behind the scenes, it de-means everything within the groups defined by ID and then runs the regression on the demeaned data. You can try this approach. The -xtdata- command will make the demeaning simple. See -help xtdata-. Then run -sureg- with the demeaned variables.
      Hi Clyde,

      Thank you very much for your reply. The explanation makes everything clearer to me. I'll demean the variables first and then try sureg.

      Thanks a lot! It really helps me out.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        It's not the number of data points that's biting here, it's the number of IDs and equations. Each ID (except one) adds a row and column to the matrix that -sureg- has to create. And then you have to multiply that by 6. That blows way past the maximum matrix size for Stata SE (11,000 x 11,000). Now, Stata MP is more generous, allowing the dimension to go up to 65,534x65,534, so depending on how many other variables you are working with, you might just barely fit this in Stata MP.

        When you run -xtreg, fe- after -xtset ID-, you don't encounter this kind of problem because Stata does not add ID indicators ("dummies") to the model. Instead, behind the scenes, it de-means everything within the groups defined by ID and then runs the regression on the demeaned data. You can try this approach. The -xtdata- command will make the demeaning simple. See -help xtdata-. Then run -sureg- with the demeaned variables.
        Hi Clyde,

        I tried the method you mentioned and I have one additional question.

        If in the sureg regression, I had categorical/factor variables such as user_status (which have value 1,2,3,4). Should I demean the variable using xtdata first and then drop the i. in the following sureg regression? Or should I leave out the factor variable when doing the demean and then add it in the sureg directly.

        For instance: if originally I have sureg(y1 x1 x2 i.x3) (y2 x1 x2 i.x3)
        version 1: xtdata y1 y2 x1 x2 x3 sureg (y1 x1 x2 x3) (y2 x1 x2 x3)
        version 2: xtdata y1 y2 x1 x2 sureg(y1 x1 x2 i.x3) (y2 x1 x2 i.x3)

        Thank you and look forward to your reply!


        Comment


        • #5
          Good question. I should have anticipated that. To handle categorical variables, you have to first create indicator ("dummy") variables for all the levels of those variables. The mostly obsolete -xi- command is one easy way to do that. Then include the indicators in the demeaning process, and run -sureg- on the demeaned variables without using factor variable notation. Here's an illustration of how it works using the StataCorp nlswork.dta:

          Code:
          clear*
          webuse nlswork
          xtset idcode year
          
          //  REAL xtreg, fe
          xtreg ln_wage hours tenure i.ind_code, fe
          
          //  EMULATING
          xi i.ind_code
          xtdata ln_wage hours tenure _I*, fe
          regress ln_wage hours tenure _I*
          I just used one equation with -regress- to illustrate the process, but it will work the same way with multiple equations and -sureg-.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            Good question. I should have anticipated that. To handle categorical variables, you have to first create indicator ("dummy") variables for all the levels of those variables. The mostly obsolete -xi- command is one easy way to do that. Then include the indicators in the demeaning process, and run -sureg- on the demeaned variables without using factor variable notation. Here's an illustration of how it works using the StataCorp nlswork.dta:

            Code:
            clear*
            webuse nlswork
            xtset idcode year
            
            // REAL xtreg, fe
            xtreg ln_wage hours tenure i.ind_code, fe
            
            // EMULATING
            xi i.ind_code
            xtdata ln_wage hours tenure _I*, fe
            regress ln_wage hours tenure _I*
            I just used one equation with -regress- to illustrate the process, but it will work the same way with multiple equations and -sureg-.
            Hi Clyde,

            Thank you so much. This makes the picture complete.

            However, I still encounter the error that Stata is unable to allocate columns or rows to the computation.

            My confusion is that before adding fixed effect, sureg (y1 x1 x2 i.x3) (y2 x1 x2 i.x3) can produce results, which means that Stata can allocate enough space for the computation even when x3 has many values (around 7,000).

            If my understanding is correct, if I demean everything first and then run sureg (y1 x1 x2 _I*) (y2 x1 x2 _I*), the computation burden is similar to the previous regression. Why this time it seems to reach the limit of the setup? It has the same error as the one sureg (y1 x1 x2 i.x3 i.ID i.date) (y2 x1 x2 i.x3 i.D i.date).

            I think due to a lack of understanding of the underlying logic it's very difficult for me to figure out why this happened. I wonder if you have any thoughts on that?

            Thanks a lot and look forward to your reply.

            Comment


            • #7
              You're not demeaning the ID variable, are you? You're not supposed to. It should be left alone; and it will disappear from the data set after -xtdata-
              Code:
              xtset ID
              xi i.x3 i.date
              xtdata y* x1 x2 _I*, fe
              sureg (y1 x1 x2 _I*) (y2 x1 x2 _I*)
              I can't think of any reason that this would require any different size matrix than just the -sureg- on the original data omitting the ID fixed effect.

              Comment


              • #8
                Originally posted by Clyde Schechter View Post
                You're not demeaning the ID variable, are you? You're not supposed to. It should be left alone; and it will disappear from the data set after -xtdata-
                Code:
                xtset ID
                xi i.x3 i.date
                xtdata y* x1 x2 _I*, fe
                sureg (y1 x1 x2 _I*) (y2 x1 x2 _I*)
                I can't think of any reason that this would require any different size matrix than just the -sureg- on the original data omitting the ID fixed effect.
                Hi Clyde,

                Thank you so much for your reply. I didn't demean the ID variable.

                I'll check more closely to see if I missed anything. Thanks a lot!

                Comment


                • #9
                  One other thought. In my code I used _I* to cover the variables created by xi. But if there are already other variables in your data set whose names begin with _I, those would be inappropriately swept along with it. So if that's the case, -drop-ping them might make a difference.

                  I know, I'm grasping at straws here, but I really don't see what's going wrong. I didn't have the patience to create a full-sized example and run it on my setup, but with some smaller data sets I have verified that the size of the matrix that -sureg- generates using my approach is the same as the size of the matrix that -sureg- generates when given the same equations (and no ID fixed effect).

                  Comment


                  • #10
                    Originally posted by Clyde Schechter View Post
                    One other thought. In my code I used _I* to cover the variables created by xi. But if there are already other variables in your data set whose names begin with _I, those would be inappropriately swept along with it. So if that's the case, -drop-ping them might make a difference.

                    I know, I'm grasping at straws here, but I really don't see what's going wrong. I didn't have the patience to create a full-sized example and run it on my setup, but with some smaller data sets I have verified that the size of the matrix that -sureg- generates using my approach is the same as the size of the matrix that -sureg- generates when given the same equations (and no ID fixed effect).
                    Hi Clyde,

                    Thanks a lot for your help. I just tried the previous code you gave me on slurm, which is a supercomputer end which has larger memory and higher speed and now I get the output successfully using your method.

                    My previous sureg results without fixed effect were also obtained when using slurm while the stata on my own laptop doesn't work. I don't think this is the error reason that stata gave me but the good news is that it is finally solved!!

                    Thank you so much!

                    Comment


                    • #11
                      I'm happy both to hear that you have successfully run what you need to, and to have confirmation that I was not missing something in my reasoning about this problem.

                      Comment

                      Working...
                      X