Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Warning: *variance matrix is nonsymmetric or highly singular" when absorbing year/month FE

    I have data at the physician-hospital-ym level (ym=year/month). A given physician may work in multiple hospitals at the same year/month.

    I started off by running FE regressions (using -reghdfe- command) absorbing physician FE as well as hospital-ym FE. I could estimate the model but thought I was absorbing too much of the variation. Then, I ran the same regression specification but now absorbing hospital FE and ym FE instead of hospital-ym FE. To my surprise I got the following message "Warning: variance matrix is nonsymmetric or highly singular" and Stata did not output the coefficients SEs. I thought the first model was more restrictive than the second model so I did not quite understand what happened.

    Then, I ran the same regression model but now only including one FE at a time. I did not get any warning message when estimating the regressions with physician FE and that with hospital FE. However, I got the warning message when estimating the regression with ym FE. When I run this same model using -areg- instead of -reghdfe- I am able to estimate the model and get no warning message.

    Among the explanatory variables in my model, I have a categorical variable (physician age brackets) for which I would like the coefficients to be estimated so I specify it as i.age. What is also strange is that when I use -reghdfe- and absorb this variable together with the ym FE, I don't get the warning anymore.

    I have read that this message "It is most likely due to one or more sparse indicator variables." (source: https://www.stata.com/statalist/arch.../msg00980.html). In my context, none of these variables seem sparse.

    Code:
    . tab age_int, m
    
            age_int |      Freq.     Percent        Cum.
    ----------------+-----------------------------------
     15 to 20 years |         12        0.00        0.00
     20 to 25 years |     59,820        0.77        0.77
     25 to 30 years |    705,128        9.09        9.87
     30 to 35 years |  1,349,585       17.41       27.27
     35 to 40 years |  1,301,711       16.79       44.06
     40 to 45 years |    992,859       12.81       56.87
     45 to 50 years |    838,942       10.82       67.68
     50 to 55 years |    770,032        9.93       77.62
     55 to 60 years |    679,585        8.76       86.38
     60 to 65 years |    540,148        6.97       93.35
     65 to 70 years |    298,706        3.85       97.20
     70 to 75 years |    100,118        1.29       98.49
     75 to 80 years |     32,761        0.42       98.91
     80 to 85 years |      8,130        0.10       99.02
     85 to 90 years |      1,718        0.02       99.04
     90 to 95 years |        232        0.00       99.04
    95 to 100 years |         63        0.00       99.04
                  . |     74,099        0.96      100.00
    ----------------+-----------------------------------
              Total |  7,753,649      100.00
    
    . tab ym, m
    
             ym |      Freq.     Percent        Cum.
    ------------+-----------------------------------
         2012m7 |     80,083        1.03        1.03
         2012m8 |     84,123        1.08        2.12
         2012m9 |     84,122        1.08        3.20
        2012m10 |     84,856        1.09        4.30
        2012m11 |     83,738        1.08        5.38
        2012m12 |     82,253        1.06        6.44
         2013m1 |     82,590        1.07        7.50
         2013m2 |     82,954        1.07        8.57
         2013m3 |     85,525        1.10        9.68
         2013m4 |     86,280        1.11       10.79
         2013m5 |     86,475        1.12       11.90
         2013m6 |     86,116        1.11       13.01
         2013m7 |     86,593        1.12       14.13
         2013m8 |     87,516        1.13       15.26
         2013m9 |     87,089        1.12       16.38
        2013m10 |     87,210        1.12       17.51
        2013m11 |     86,015        1.11       18.62
        2013m12 |     84,463        1.09       19.71
         2014m1 |     84,895        1.09       20.80
         2014m2 |     86,245        1.11       21.91
         2014m3 |     86,835        1.12       23.03
         2014m4 |     87,393        1.13       24.16
         2014m5 |     87,926        1.13       25.30
         2014m6 |     87,029        1.12       26.42
         2014m7 |     87,814        1.13       27.55
         2014m8 |     88,418        1.14       28.69
         2014m9 |     87,829        1.13       29.82
        2014m10 |     87,942        1.13       30.96
        2014m11 |     86,755        1.12       32.08
        2014m12 |     85,190        1.10       33.17
         2015m1 |     84,384        1.09       34.26
         2015m2 |     84,763        1.09       35.36
         2015m3 |     87,585        1.13       36.49
         2015m4 |     86,593        1.12       37.60
         2015m5 |     86,715        1.12       38.72
         2015m6 |     86,351        1.11       39.83
         2015m7 |     86,634        1.12       40.95
         2015m8 |     86,882        1.12       42.07
         2015m9 |     86,259        1.11       43.19
        2015m10 |     86,020        1.11       44.29
        2015m11 |     85,106        1.10       45.39
        2015m12 |     83,765        1.08       46.47
         2016m1 |     83,209        1.07       47.55
         2016m2 |     84,800        1.09       48.64
         2016m3 |     86,612        1.12       49.76
         2016m4 |     86,360        1.11       50.87
         2016m5 |     86,778        1.12       51.99
         2016m6 |     87,197        1.12       53.11
         2016m7 |     86,779        1.12       54.23
         2016m8 |     86,804        1.12       55.35
         2016m9 |     86,256        1.11       56.47
        2016m10 |     85,978        1.11       57.57
        2016m11 |     85,318        1.10       58.67
        2016m12 |     84,445        1.09       59.76
         2017m1 |     84,406        1.09       60.85
         2017m2 |     85,309        1.10       61.95
         2017m3 |     87,622        1.13       63.08
         2017m4 |     87,101        1.12       64.21
         2017m5 |     88,483        1.14       65.35
         2017m6 |     88,306        1.14       66.49
         2017m7 |     88,013        1.14       67.62
         2017m8 |     88,518        1.14       68.76
         2017m9 |     87,554        1.13       69.89
        2017m10 |     87,883        1.13       71.03
        2017m11 |     87,257        1.13       72.15
        2017m12 |     86,551        1.12       73.27
         2018m1 |     85,759        1.11       74.37
         2018m2 |     85,897        1.11       75.48
         2018m3 |     88,152        1.14       76.62
         2018m4 |     88,081        1.14       77.75
         2018m5 |     88,533        1.14       78.90
         2018m6 |     88,131        1.14       80.03
         2018m7 |     88,612        1.14       81.18
         2018m8 |     89,467        1.15       82.33
         2018m9 |     88,535        1.14       83.47
        2018m10 |     89,151        1.15       84.62
        2018m11 |     88,416        1.14       85.76
        2018m12 |     86,829        1.12       86.88
         2019m1 |     86,534        1.12       88.00
         2019m2 |     87,314        1.13       89.12
         2019m3 |     88,371        1.14       90.26
         2019m4 |     88,608        1.14       91.41
         2019m5 |     89,513        1.15       92.56
         2019m6 |     88,294        1.14       93.70
         2019m7 |     88,680        1.14       94.84
         2019m8 |     88,586        1.14       95.98
         2019m9 |     87,608        1.13       97.11
        2019m10 |     84,472        1.09       98.20
        2019m11 |     77,577        1.00       99.20
        2019m12 |     61,661        0.80      100.00
    ------------+-----------------------------------
          Total |  7,753,649      100.00
    Could anyone help me understand what is going on?

    Many thanks
    Paula



  • #2
    Paula:
    could you please share with interested listers the outcome of the problematic models along with the one from -areg-? Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo: thank you for your reply.

      All the models have the same outcome: it is a measure of average cost among all the direct peers of a given physician working in a given hospital at a given month. It is, therefore, a continuous variable.

      More explicitly, every row in the dataset is identified by a physician id working in a given hospital at a given month. For every row, I have information on (i) the average cost of all hospitalizations performed by physician i at the given hospital-month as well as (ii) the average among all of her colleagues' average hospitalization cost in the given hospital-month, where colleagues = all the other physicians with whom physician i works at the given hospital-month. The outcome I am analyzing here is (ii).

      The outcome, therefore, varies at the physician-hospital-month level, but it should vary little in across different physicians within hospital-months, specially in large hospitals (as this variable is essentially a leave out mean, excluding a given physician from an average of all physicians within a hospital-month will have a smaller effect in larger groups). I don't know how I can show this with the data. Below I estimated the SD and mean within hospital-month to be able to compute the coefficient of variation among physicians within a given hospital-month. Then, I summarized the coefficient of variation among different combinations of hospital-month. Is there a more useful way of investigating for the variability across different physicians within hospital-month?

      Code:
      . isid physician_id hosp_id ym // each row is identified by physician-hospital-month
      
      . 
      . bys hosp_id ym: egen sd_within_hosp_mon = sd(avg_peer_cost) // computing SD among physicians within hospital-m
      > onth
      (67,409 missing values generated)
      
      . bys hosp_id ym: egen mean_within_hosp_mon = mean(avg_peer_cost) // computing mean among physicians within hosp
      > ital-month
      (67,409 missing values generated)
      
      . gen cv_within_hosp_mon = sd_within_hosp_mon/mean_within_hosp_mon // computing coefficient of variation among p
      > hysicians within hospital-month
      (67,482 missing values generated)
      
      . 
      . bys hosp_id ym: gen hosp_ym_first = _n==1 // tagging first obs within hospital-ym
      
      . 
      . su cv_within_hosp_mon if hosp_ym_first, d // extent to which SD varies across different hospital-months
      
                           cv_within_hosp_mon
      -------------------------------------------------------------
            Percentiles      Smallest
       1%     .0037104              0
       5%     .0086899              0
      10%     .0122923              0       Obs             345,161
      25%     .0214667              0       Sum of Wgt.     345,161
      
      50%     .0415179                      Mean           .0801513
                              Largest       Std. Dev.        .11824
      75%      .087763       1.414214
      90%     .1812275       1.414214       Variance       .0139807
      95%     .2834875       1.414214       Skewness       4.381938
      99%     .6075859       1.414214       Kurtosis       29.97145
      Last edited by Paula de Souza Leao Spinola; 25 Jul 2022, 10:52.

      Comment


      • #4
        Here is what I referred to with using -reghdfe- vs -areg-.

        I get the warning when using -reghdfe- to absorb year/month FE while estimating age bracket FEs. When I do the exact same with -areg-, I get no warning message. I also get no warning message using -reghdfe- when I absorb both year/month FE and age bracket FEs.

        Code:
        . ********************* I get the warning message when using -reghdfe-
        . reghdfe avg_peer_cost iv_age iv_fem iv_uni pat_fem pat_age i.age_int, absorb(ym) vce(robust)
        (MWFE estimator converged in 1 iterations)
        Warning:  variance matrix is nonsymmetric or highly singular
        
        HDFE Linear regression                            Number of obs   =  7,148,998
        Absorbing 1 HDFE group                            F(  21,7148887) =   10378.63
                                                          Prob > F        =     0.0000
                                                          R-squared       =     0.0330
                                                          Adj R-squared   =     0.0330
                                                          Within R-sq.    =     0.0292
                                                          Root MSE        =  2096.0537
        
        ----------------------------------------------------------------------------------
                         |               Robust
           avg_peer_cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -----------------+----------------------------------------------------------------
                  iv_age |  -31.59436          .        .       .            .           .
                  iv_fem |    411.207          .        .       .            .           .
                  iv_uni |   738.8013          .        .       .            .           .
                 pat_fem |  -386.0751          .        .       .            .           .
                 pat_age |   10.79922          .        .       .            .           .
                         |
                 age_int |
         20 to 25 years  |  -1180.342          .        .       .            .           .
         25 to 30 years  |  -1009.705          .        .       .            .           .
         30 to 35 years  |  -795.4957          .        .       .            .           .
         35 to 40 years  |  -708.4765          .        .       .            .           .
         40 to 45 years  |  -698.3601          .        .       .            .           .
         45 to 50 years  |   -755.683          .        .       .            .           .
         50 to 55 years  |  -831.5416          .        .       .            .           .
         55 to 60 years  |  -909.2266          .        .       .            .           .
         60 to 65 years  |  -979.1657          .        .       .            .           .
         65 to 70 years  |  -935.7501          .        .       .            .           .
         70 to 75 years  |  -788.7044          .        .       .            .           .
         75 to 80 years  |  -607.2163          .        .       .            .           .
         80 to 85 years  |  -902.8926          .        .       .            .           .
         85 to 90 years  |  -975.3231          .        .       .            .           .
         90 to 95 years  |  -864.3475          .        .       .            .           .
        95 to 100 years  |  -177.4297          .        .       .            .           .
                         |
                   _cons |   3718.518          .        .       .            .           .
        ----------------------------------------------------------------------------------
        
        Absorbed degrees of freedom:
        -----------------------------------------------------+
         Absorbed FE | Categories  - Redundant  = Num. Coefs |
        -------------+---------------------------------------|
                  ym |        90           0          90     |
        -----------------------------------------------------+
        
        . 
        .********************* The exact same specification can be estimated with -areg- with no warning
        . areg avg_peer_cost iv_age iv_fem iv_uni pat_fem pat_age i.age_int, absorb(ym) vce(robust) 
        
        Linear regression, absorbing indicators         Number of obs     =  7,148,998
        Absorbed variable: ym                           No. of categories =         90
                                                        F(  21,7148887)   =   10378.61
                                                        Prob > F          =     0.0000
                                                        R-squared         =     0.0330
                                                        Adj R-squared     =     0.0330
                                                        Root MSE          =  2096.0537
        
        ----------------------------------------------------------------------------------
                         |               Robust
           avg_peer_cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -----------------+----------------------------------------------------------------
                  iv_age |  -31.59436   .2484534  -127.16   0.000    -32.08132    -31.1074
                  iv_fem |    411.207   8.263385    49.76   0.000      395.011    427.4029
                  iv_uni |   738.8013   4.091539   180.57   0.000      730.782    746.8205
                 pat_fem |  -386.0751   2.196373  -175.78   0.000    -390.3799   -381.7703
                 pat_age |   10.79922   .0335185   322.19   0.000     10.73353    10.86492
                         |
                 age_int |
         20 to 25 years  |  -1180.342   71.35806   -16.54   0.000    -1320.201   -1040.482
         25 to 30 years  |  -1009.705   71.09587   -14.20   0.000     -1149.05   -870.3597
         30 to 35 years  |  -795.4956   71.09138   -11.19   0.000    -934.8322   -656.1591
         35 to 40 years  |  -708.4764   71.09684    -9.96   0.000    -847.8237   -569.1291
         40 to 45 years  |  -698.3601   71.10212    -9.82   0.000    -837.7177   -559.0025
         45 to 50 years  |   -755.683   71.10238   -10.63   0.000    -895.0411   -616.3249
         50 to 55 years  |  -831.5416   71.10488   -11.69   0.000    -970.9046   -692.1786
         55 to 60 years  |  -909.2266   71.11088   -12.79   0.000    -1048.601   -769.8518
         60 to 65 years  |  -979.1656   71.11159   -13.77   0.000    -1118.542   -839.7895
         65 to 70 years  |    -935.75   71.16914   -13.15   0.000    -1075.239   -796.2611
         70 to 75 years  |  -788.7043   71.39923   -11.05   0.000    -928.6443   -648.7644
         75 to 80 years  |  -607.2163   73.32153    -8.28   0.000    -750.9239   -463.5087
         80 to 85 years  |  -902.8925    74.7852   -12.07   0.000    -1049.469   -756.3162
         85 to 90 years  |  -975.3231    87.3742   -11.16   0.000    -1146.573   -804.0728
         90 to 95 years  |  -864.3475   139.6753    -6.19   0.000    -1138.106   -590.5889
        95 to 100 years  |  -177.4297   149.1099    -1.19   0.234    -469.6797    114.8204
                         |
                   _cons |   3718.518   72.30979    51.42   0.000     3576.794    3860.243
        ----------------------------------------------------------------------------------
        
        . 
        .********************* I don't get the warning with -reghdfe- any longer when I absorb the coefficients of variable age_int
        . reghdfe avg_peer_cost iv_age iv_fem iv_uni pat_fem pat_age, absorb(ym age_int) vce(robust)
        (MWFE estimator converged in 4 iterations)
        
        HDFE Linear regression                            Number of obs   =  7,148,998
        Absorbing 2 HDFE groups                           F(   5,7148887) =   39389.47
                                                          Prob > F        =     0.0000
                                                          R-squared       =     0.0330
                                                          Adj R-squared   =     0.0330
                                                          Within R-sq.    =     0.0260
                                                          Root MSE        =  2096.0537
        
        ------------------------------------------------------------------------------
                     |               Robust
        avg_peer_c~t |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              iv_age |  -31.59436   .2484534  -127.16   0.000    -32.08132    -31.1074
              iv_fem |    411.207   8.263385    49.76   0.000      395.011    427.4029
              iv_uni |   738.8013   4.091539   180.57   0.000      730.782    746.8205
             pat_fem |  -386.0751   2.196373  -175.78   0.000    -390.3799   -381.7703
             pat_age |   10.79922   .0335185   322.19   0.000     10.73353    10.86492
               _cons |   2902.447   12.48775   232.42   0.000     2877.971    2926.922
        ------------------------------------------------------------------------------
        
        Absorbed degrees of freedom:
        -----------------------------------------------------+
         Absorbed FE | Categories  - Redundant  = Num. Coefs |
        -------------+---------------------------------------|
                  ym |        90           0          90     |
             age_int |        17           1          16     |
        -----------------------------------------------------+

        Comment


        • #5
          Paula:
          1) -areg- and -reghdfe- are different beasts (with some differences in the way SE are estimated);
          2) more substantively, you invoked robust with -reghdfe-. From -reghdfe- helpfile:
          Warning: in a FE panel regression, using robust will lead to inconsistent standard errors if for every fixed effect, the other dimension is fixed. For instance, in an standard panel with individual
          and time fixed effects, we require both the number of individuals and time periods to grow asymptotically. If that is not the case, an alternative may be to use clustered errors, which as discussed
          below will still have their own asymptotic requirements. For a discussion, see Stock and Watson, "Heteroskedasticity-robust standard errors for fixed-effects panel-data regression," Econometrica 76
          (2008): 155-174
          I do not know if this issue is actually the culprit but a careful investigation is called for.
          In the same fashion, -robust- option from -areg- also accomodates heteroskedasticity only (and non autocorrelation) as you can see from the following toy-example:
          Code:
          . use "https://www.stata-press.com/data/r17/nlswork.dta"
          (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
          
          . areg ln_wage c.age##c.age, abs(idcode) robust
          
          Linear regression, absorbing indicators            Number of obs     =  28,510
          Absorbed variable: idcode                          No. of categories =   4,710
                                                             F(2, 23798)       = 1088.29
                                                             Prob > F          =  0.0000
                                                             R-squared         =  0.6659
                                                             Adj R-squared     =  0.5998
                                                             Root MSE          =  0.3025
          
          ------------------------------------------------------------------------------
                       |               Robust
               ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                   age |   .0539076   .0032564    16.55   0.000      .047525    .0602903
                       |
           c.age#c.age |  -.0005973   .0000544   -10.97   0.000     -.000704   -.0004906
                       |
                 _cons |    .639913   .0472256    13.55   0.000     .5473479    .7324781
          ------------------------------------------------------------------------------
          
          . areg ln_wage c.age##c.age, abs(idcode) vce(cluster idcode)
          
          Linear regression, absorbing indicators             Number of obs     = 28,510
          Absorbed variable: idcode                           No. of categories =  4,710
                                                              F(2, 4709)        = 423.60
                                                              Prob > F          = 0.0000
                                                              R-squared         = 0.6659
                                                              Adj R-squared     = 0.5998
                                                              Root MSE          = 0.3025
          
                                       (Std. err. adjusted for 4,710 clusters in idcode)
          ------------------------------------------------------------------------------
                       |               Robust
               ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                   age |   .0539076   .0047139    11.44   0.000     .0446661    .0631492
                       |
           c.age#c.age |  -.0005973   .0000788    -7.58   0.000    -.0007517   -.0004429
                       |
                 _cons |    .639913   .0683166     9.37   0.000     .5059806    .7738454
          ------------------------------------------------------------------------------
          
          .
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thanks a lot, Carlo.

            Thanks for clarifying that there are difference between how -areg- and -reghdfe- compute SE. I will stick to -reghdfe-.

            I tried running the same regression without robust standard errors and indeed I did not get the warning message. However, this is not an option for me as I am supposed to have heteroskedasticity. When clustering at the physician level (which I would think is the best specification given that the error term is expected to be correlated over time for the same physician), I still get the warning message. Would you think of any solution for this?

            As I mentioned above, I do not get the warning message when I absorb the age dummies instead of estimate their coefficients, regardless of how I specify the standard errors (robust or clustered at the physician level). Would you know why? Is it safe to use these results?

            Comment


            • #7
              Paula:
              are you sure that you cannot consider another option, such as a two-way -fe- via -xtreg,fe-?
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Hello Carlo - I am not too sure how to proceed. In the models above I did not consider two way FE: there is only time FE. The whole issue was with including a categorical variable (age brackets) among the explanatory variables. It is fine when I absorb them, but I get the warning when I specify them among my explanatory variables.

                As regards to standard errors, I would like them robust - I am still assessing whether to cluster them at the physician level or not.

                Comment


                • #9
                  Paula:
                  1) you already have one fixed effect (that is, the -panelid-); time fixed effect is the second one. I would try to code this up via -xtreg,fe- including your are categoricl variable in the right-hand side of your regression equation (by the way, you're surely already aware of that categorizing a continuous predictor such as age comes with some issues; see, if interested, https://pubmed.ncbi.nlm.nih.gov/16217841/).
                  2) With 4,710 panels, I would -cluster-.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Hi Paula, have you solved this problem at last? I encountered the same problem with you.

                    Comment

                    Working...
                    X