Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • with or without weight

    Dear Prof and colleagues,

    I conducted two estimations, one with and one without weights. In the estimation without the weight, the coefficient is larger compared to the estimation with the weight. This difference raises questions about the appropriateness of the chosen weights. Shall I think that the weight is wrong? Which estimation I should consider as the reliable/ efficient one?
    I discarded the entire output due to its length.
    Code:
    . reg lwage shr_immg i.year i.sk_rat_quartile i.Expgroup i.Expgroup#i.sk_rat_quartile i.Expgroup#i.year i.sk_rat_quartil
    > e#i.year [aw= wieght],robust
    (sum of wgt is 12,312,886)
    
    Linear regression                               Number of obs     =        320
                                                    F(131, 188)       =    4100.09
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.9983
                                                    Root MSE          =     .01257
    
    ------------------------------------------------------------------------------------------
                             |               Robust
                       lwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------------------+----------------------------------------------------------------
                    shr_immg |   .7652276   .4006971     1.91   0.058    -.0252125    1.555668
                             |
                        year |
                       2011  |  -.0247768   .0195741    -1.27   0.207    -.0633898    .0138363
                       2012  |   -.039046   .0174316    -2.24   0.026    -.0734328   -.0046593
                       2013  |  -.0569614   .0175539    -3.24   0.001    -.0915892   -.0223335
                       2014  |  -.0429618   .0191181    -2.25   0.026    -.0806753   -.0052482
                       2015  |  -.0377884   .0200167    -1.89   0.061    -.0772746    .0016977
                       2016  |  -.0172464   .0163993    -1.05   0.294    -.0495967    .0151039
                       2017  |   .0030977   .0195128     0.16   0.874    -.0353945    .0415898
                       2018  |    .055643   .0173336     3.21   0.002     .0214497    .0898364
                       2019  |   .0946934   .0193115     4.90   0.000     .0565983    .1327885
                             |
             sk_rat_quartile |
                          2  |   .0789576   .0084858     9.30   0.000      .062218    .0956972
                          3  |    .059397   .0150838     3.94   0.000     .0296418    .0891523
                          4  |   .4939467   .0197351    25.03   0.000      .455016    .5328775
                             |
    Without weight:
    Code:
      reg lwage shr_immg i.year i.sk_rat_quartile i.Expgroup i.Expgroup#i.sk_rat_quartile i.Expgroup#i.year i.sk_rat_quarti
    > le#i.year ,robust
    
    Linear regression                               Number of obs     =        320
                                                    F(131, 188)       =    4013.33
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.9981
                                                    Root MSE          =     .01405
    
    ------------------------------------------------------------------------------------------
                             |               Robust
                       lwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------------------+----------------------------------------------------------------
                    shr_immg |   1.353097   .4160027     3.25   0.001     .5324644     2.17373
                             |
                        year |
                       2011  |  -.0213799   .0190928    -1.12   0.264    -.0590435    .0162837
                       2012  |   -.029565   .0169838    -1.74   0.083    -.0630684    .0039384
                       2013  |  -.0462414   .0173351    -2.67   0.008    -.0804376   -.0120451
                       2014  |  -.0317497    .019167    -1.66   0.099    -.0695597    .0060603
                       2015  |  -.0277666   .0202777    -1.37   0.173    -.0677676    .0122344
                       2016  |  -.0088269   .0166888    -0.53   0.597    -.0417483    .0240945
                       2017  |   .0113036   .0191693     0.59   0.556     -.026511    .0491182
                       2018  |   .0542644   .0177676     3.05   0.003     .0192151    .0893138
                       2019  |   .0776262   .0211254     3.67   0.000      .035953    .1192994
                             |
             sk_rat_quartile |
                          2  |   .0686419   .0104295     6.58   0.000      .048068    .0892158
                          3  |   .0560242   .0160501     3.49   0.001     .0243629    .0876856
                          4  |   .5036946   .0187571    26.85   0.000     .4666933     .540696
                             |
    Any ideas appreciated.

    Cheers,
    Paris

  • #2
    Weighted versus unweighted estimates are naturally going to be different. We don’t have enough information to guide you as to which is more appropriate.

    What are the weights for? For example, are they cluster weights, something else? It’s likely not frequency weights or you would have used fweights instead.

    What is the question you are trying to answer (in words, not syntax or math)? Some questions are more naturally answered by weighting.

    Comment


    • #3
      Dear Leonardo,

      The regression is weighted by the sample size which is workers. The dep var is log wage of workers and indep is the share of immigrants. My question is the heterscadasticy issue can be solved solely by vce(robust)? if so, why do papers apply weight?
      Code:
      . hettest   shr_immg
      
      Breusch–Pagan/Cook–Weisberg test for heteroskedasticity 
      Assumption: Normal error terms
      Variable: shr_immg
      
      H0: Constant variance
      
          chi2(1) =   3.29
      Prob > chi2 = 0.0699
      So I use robust and weight to correct the heterscedasticity. While I realized that the unweighted one is larger, so I need to explain why it happens shall I drop the weight, or the correct cofficient is given by weight?

      Comment


      • #4
        Paris:
        I would add what follows to Leonardo's helpful suggestions:
        1) you have a sky-rocketing R_sq. Therefore, you are possibly overifitting your regression;
        2) yes, heteoskedasticity can be dealt with -robust- standard errors (see the excellent teaching notes on this topic by Richard Williams at Heteroskedasticity (nd.edu)
        3) once you have applied -robust- there's no need to re-run -hettest-, as the null of no heteroskedasticity would be rejected again, because -robust- changes standard errors, not residuals.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Prof Carlo,

          You are right, R is pretty high, though I should follow the literature which a lot of them use the same wage equation. If I drop any vars or change the specification, I must explain why, because of high R?! Is it an acceptable answer to referees?
          Because the model has been organized through complex stratification i.e education -experience groups etc, it might need to use weight ...

          Comment


          • #6
            In order for a sample to say something about a population, the sample needs be representative of a population. A "simple" way of achieving that is to draw the sample randomly. The problem is that a simple random sample is in practice everything but simple. So in practice, not everybody in the population gets the same chance of being included in the sample, and as a result the sample no longer represents the population. Weights are typically employed to correct that bias. But, as always, it is not that easy. You could also have weights to correct for intentional over-sampling of groups, or weights to correct for non-response. You need to decide what your unit of analysis is, e.g. is it households or individuals. This again influences what weight you want to use. For international comparisons: do you want to give each country the same weight or do you want to weight by population and if so, what population?
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Prof Buis,

              The sample is the whole workers of firms in a country. I try to estimate the impact of immigrants on native wages. So the weight is native workers which are categorized into education-experience groups, 4 Education groups, 8 Experience groups, for 10 years. So its completely designed based on common attributes of workers. Workers in each of 4 groups, have the same characteristics. I believe that weight is a proper option as its customary in this field. The different results obtained through weighted and unweighted, made me think about it why it happened.

              Comment


              • #8
                Paris:
                did you perform -linktest- after -regress-?
                Kind regards,
                Carlo
                (StataNow 18.5)

                Comment


                • #9
                  Here it is:
                  Code:
                  
                  . linktest
                  
                        Source |       SS           df       MS      Number of obs   =       320
                  -------------+----------------------------------   F(2, 317)       =  82235.76
                         Model |  19.2571094         2   9.6285547   Prob > F        =    0.0000
                      Residual |   .03711587       317  .000117085   R-squared       =    0.9981
                  -------------+----------------------------------   Adj R-squared   =    0.9981
                         Total |  19.2942253       319  .060483465   Root MSE        =    .01082
                  
                  ------------------------------------------------------------------------------
                         lwage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                          _hat |   .9788981   .1199436     8.16   0.000      .742912    1.214884
                        _hatsq |   .0015595   .0088621     0.18   0.860    -.0158765    .0189954
                         _cons |   .0712918   .4054803     0.18   0.861    -.7264809    .8690645
                  ------------------------------------------------------------------------------

                  Comment


                  • #10
                    Paris:
                    your model seems OK.
                    I do not see other issues, but the very high R_Sq.
                    Kind regards,
                    Carlo
                    (StataNow 18.5)

                    Comment

                    Working...
                    X