Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with Stata 14

    Dear all,
    I encounter a problem while running Stata, and have not found any other post related to it. For replication of the problem, Im including here a toydataset of my main data, for which the problem persists.

    Code:
    sum
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
             sex |      4,319    1.463302    .4987092          1          2
         citizen |      4,319           0           0          0          0
             wtf |      4,319    3125.397    1704.387   270.8347   13390.37
           tran4 |      4,319    .0754804    .2641956          0          1
    
    .  reg tran4 i.citizen sex  [pw=wtf]
    (sum of wgt is   1.3499e+07)
    
    Linear regression                               Number of obs     =      4,319
                                                    F(1, 4316)        =          .
                                                    Prob > F          =          .
                                                    R-squared         =     0.0003
                                                    Root MSE          =      .2612
    
    ------------------------------------------------------------------------------
                 |               Robust
           tran4 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         citizen |
            NIU  |   .0091304   .0016563     5.51   0.000     .0058833    .0123775
             sex |  -.0084173   .0090477    -0.93   0.352    -.0261555    .0093208
           _cons |   .0767286   .0073079    10.50   0.000     .0624013    .0910559
    ------------------------------------------------------------------------------
    As you can see, when using the i.citizen, even i find a regression for which the variable is not dropped, as it should be, since there is no variation in citizen.

    While i can avoid the problem just excluding this variable, i think it requires some attention.
    Thank you
    Fernando
    DATA:
    Data_small.dta

  • #2
    I can duplicate your results. I think you should report this to technical support.

    The weights seem to be confusing Stata somehow: if we drop the pweights, then it runs as expected and citizen gets omitted.

    Comment


    • #3
      Thank you, Will do.

      Comment


      • #4
        I have problems too but at least the coefficient for citizen is not coming up as significant:

        Code:
        . reg tran4 i.citizen sex  [pw=wtf]
        (sum of wgt is   1.3499e+07)
        
        Linear regression                               Number of obs     =      4,319
                                                        F(1, 4316)        =          .
                                                        Prob > F          =          .
                                                        R-squared         =     0.0003
                                                        Root MSE          =      .2612
        
        ------------------------------------------------------------------------------
                     |               Robust
               tran4 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
             citizen |
                NIU  |  -.0654856     786432    -0.00   1.000     -1541811     1541811
                 sex |  -.0084173   .0090477    -0.93   0.352    -.0261555    .0093208
               _cons |   .1513446     786432     0.00   1.000     -1541811     1541811
        ------------------------------------------------------------------------------
        That to me suggests some sort of collinearity problem, where the program calculates a coefficient as being slightly different than zero when it should be zero.

        Interestingly, glm does what you would expect it to do:

        Code:
        . glm tran4 i.citizen sex  [pw=wtf]
        note: 0.citizen omitted because of collinearity
        
        Iteration 0:   log pseudolikelihood =  -48594314  
        
        Generalized linear models                         No. of obs      =      4,319
        Optimization     : ML                             Residual df     =      4,317
                                                          Scale parameter =   213.1883
        Deviance         =  920333.7181                   (1/df) Deviance =   213.1883
        Pearson          =  920333.7181                   (1/df) Pearson  =   213.1883
        
        Variance function: V(u) = 1                       [Gaussian]
        Link function    : g(u) = u                       [Identity]
        
                                                          AIC             =   22502.58
        Log pseudolikelihood = -48594313.89               BIC             =   884197.1
        
        ------------------------------------------------------------------------------
                     |               Robust
               tran4 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
             citizen |
                NIU  |          0  (omitted)
                 sex |  -.0084173   .0090456    -0.93   0.352    -.0261464    .0093118
               _cons |    .085859   .0144293     5.95   0.000     .0575781    .1141399
        ------------------------------------------------------------------------------
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        Stata Version: 17.0 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          I got the right answer (citizen omitted because of collinearity). What version/update of Stata are you using, Fernando?

          Comment


          • #6
            For me it was:

            Code:
            . about
            
            Stata/MP 14.1 for Windows (64-bit x86-64)
            Revision 19 May 2016
            Copyright 1985-2015 StataCorp LP
            
            Total physical memory:     8269900 KB
            Available physical memory: 2238232 KB
            
            Single-user 2-core Stata perpetual license:
                   Serial number:  [redacted]
                     Licensed to:  Clyde Schechter
                                   Albert Einstein College of Medicine
            
            . update query
            (contacting http://www.stata.com)
            
            Update status
                Last check for updates:  08 Jun 2016
                New update available:    none         (as of 08 Jun 2016)
                Current update level:    19 May 2016  (what's new)
            
            Possible actions
            
                Do nothing; all files are up to date.

            Comment


            • #7
              I dug out my old copy of Stata 11 and it worked fine:

              Code:
              . reg tran4 i.citizen sex  [pw=wtf]
              (sum of wgt is   1.3499e+07)
              note: 0.citizen omitted because of collinearity
              
              Linear regression                                      Number of obs =    4319
                                                                     F(  1,  4317) =    0.87
                                                                     Prob > F      =  0.3522
                                                                     R-squared     =  0.0003
                                                                     Root MSE      =  .26117
              
              ------------------------------------------------------------------------------
                           |               Robust
                     tran4 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                 0.citizen |  (omitted)
                       sex |  -.0084173   .0090467    -0.93   0.352    -.0261534    .0093188
                     _cons |    .085859    .014431     5.95   0.000     .0575669    .1141511
              ------------------------------------------------------------------------------
              But in 12, 13 and 14 I had problems.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              Stata Version: 17.0 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Dear Joe,
                I was using Stata 14 updated to 30 Match 2016.
                Code:
                Stata/MP 14.1 for Windows (64-bit x86-64)
                Revision 30 Mar 2016
                Copyright 1985-2015 StataCorp LP
                
                Total physical memory:     8345288 KB
                Available physical memory: 3450016 KB
                
                Single-user 4-core Stata perpetual license:
                I did notice that when I used a logit, 0.citizen is dropped out before it starts estimating the model. What is particularly annoying is that this particular problem comes randomly.
                It should also be noticed that in this examples, the coefficient affected is the constant, although it did created some problems with other coefficients in my larger models.
                Fernando
                Last edited by FernandoRios; 08 Jun 2016, 15:16.

                Comment


                • #9
                  This seems to work:

                  Code:
                  . svyset [pw = wtf]
                  
                        pweight: wtf
                            VCE: linearized
                    Single unit: missing
                       Strata 1: <one>
                           SU 1: <observations>
                          FPC 1: <zero>
                  
                  . svy: reg tran4 i.citizen sex
                  (running regress on estimation sample)
                  
                  Survey: Linear regression
                  
                  Number of strata   =         1                 Number of obs     =       4,319
                  Number of PSUs     =     4,319                 Population size   =  13,498,590
                                                                 Design df         =       4,318
                                                                 F(   1,   4318)   =        0.87
                                                                 Prob > F          =      0.3521
                                                                 R-squared         =      0.0003
                  
                  ------------------------------------------------------------------------------
                               |             Linearized
                         tran4 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                       citizen |
                          NIU  |          0  (omitted)
                           sex |  -.0084173   .0090456    -0.93   0.352    -.0261513    .0093167
                         _cons |    .085859   .0144293     5.95   0.000     .0575702    .1141479
                  ------------------------------------------------------------------------------
                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  Stata Version: 17.0 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://www3.nd.edu/~rwilliam

                  Comment


                  • #10
                    Dear All,

                    I had a similar problem in 2013 and Miguel Dorta from StataCorp told me the following:

                    ...since factor variables were first introduced with Stata 11, -regress- and -_rmcoll- switch the numeric tolerance for omitting nearly perfect collinear variables when factor variables operators are used.
                    I guess this may be the source of the problem and it may also explain the following behavior (there is no problem if we do not use factor notation):

                    Code:
                    . reg tran4 citizen sex  [pw=wtf]
                    (sum of wgt is   1.3499e+07)
                    note: citizen omitted because of collinearity
                    
                    Linear regression                               Number of obs     =      4,319
                                                                    F(1, 4317)        =       0.87
                                                                    Prob > F          =     0.3522
                                                                    R-squared         =     0.0003
                                                                    Root MSE          =     .26117
                    
                    ------------------------------------------------------------------------------
                                 |               Robust
                           tran4 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                         citizen |          0  (omitted)
                             sex |  -.0084173   .0090467    -0.93   0.352    -.0261534    .0093188
                           _cons |    .085859    .014431     5.95   0.000     .0575669    .1141511
                    ------------------------------------------------------------------------------
                    At the time I also discussed this with Jeff Pitblado because I find it very annoying that Stata is not consistent in the way collinearity is treated, but the problems persist. Ideally, the user should be able to have some control over the choice of threshold that is used to exclude variables. Hopefully this additional example will help persuading StataCorp to do something about this.

                    All the best,

                    Joao

                    Comment


                    • #11
                      This is specific to regress and occurs when pweights, aweights, or iweights are specified.

                      We have found the source of the problem and hope to have it fixed in the next Stata 14 update.
                      Last edited by Jeff Pitblado (StataCorp); 09 Jun 2016, 00:28.

                      Comment

                      Working...
                      X