Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weird dfbetas after regress with case specific dummies

    Hello Statalisters,

    I noticed some strange behaviour of dfbeta after regress: when I want to correct for outliers using a single categorical variable (and using the i-prefix in the regresion), dfbetas for these outliers are sometimes very strange: I would expect them to be zero or missing, however in some cases they are pretty large (e.g. 31.8 in on of the examples below). This behaviour seems to be random, depending on the seed?
    If I create separate dummies, behaviour of dfbetas is as expected. See code below for an example.
    Does anyone has an idea of what's going wrong?

    Code:
    clear all
    set obs 50
    set seed 1
    gen x=rnormal()
    gen y = 1+1.2*x+rnormal()
    
    replace y =3*y if _n==1 | _n==3 |_n==5 |_n==7 //create some extreme values --> outliers
    
    gen outliers=0
    replace outliers=_n if _n==1 | _n==3 |_n==5 |_n==7 //get outlier indicators
    * Create separte dummies for outliers
    gen D1=0
    replace D1=1 in 1
    gen D3=0
    replace D3=1 in 3
    gen D5=0
    replace D5=1 in 5
    gen D7=0
    replace D7=1 in 7
    
    reg y x i.outliers //regression with extreme-case dummies
    
    dfbeta //weird??? dfbeta's 
    /* 
    for obs 1 : all zero
    for obs 3: 4/5 dfbeta small but not zero, HOWEVER: _dfbeta_3 = 31.8 ??
    for obs 5: all missing 
    for obs 7: all missing
    */
    reg y x D1 D3 D5 D7 // same regression with separate dummies
    dfbeta // all dfbeta's for obs 1, 3, 5 and 7 missing (what I would expect)
    
    ***
    * Same procedure with different seed
    
    clear all
    set obs 50
    set seed 2
    gen x=rnormal()
    gen y = 1+1.2*x+rnormal()
    
    replace y =3*y if _n==1 | _n==3 |_n==5 |_n==7 //create some extreme values --> outliers
    
    gen outliers=0
    replace outliers=_n if _n==1 | _n==3 |_n==5 |_n==7 //get outlier indicators
    * Create separte dummies for outliers
    gen D1=0
    replace D1=1 in 1
    gen D3=0
    replace D3=1 in 3
    gen D5=0
    replace D5=1 in 5
    gen D7=0
    replace D7=1 in 7
    
    reg y x i.outliers //regression with extreme-case dummies
    
    dfbeta //weird??? dfbeta's 
    /* Dfbeta's 
    for obs 1 : one missing, 3 small but not zero, HOWEVER: _dfbeta_2 = 4 ??
    for obs 3: all missing 
    for obs 5: all missing 
    for obs 7: 1 missing and 4 zeroes
    */
    reg y x D1 D3 D5 D7 // same regression with separate dummies
    dfbeta // all dfbeta's for obs 1, 3, 5 and 7 missing (what I would expect)
    ***

    Thank you very much,
    Mike
Working...
X