Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • mlogit predict probability

    Hello Stata community

    I run the following mlogit:

    Code:
    mlogit(y) = a + b1*x1 + b2*x2 + b3*x3 + i.year + i.indus
    Where y take 3 values 0, 1, 2

    I want to do the following 2 things:

    1. Estimate the probability outcome using mean value of x1, x2, and x3 (but year and indus varies), given y take specific value 1 or 2. Then create in new column

    This means firms in the same industry and same year should have same predict probability

    2. Estimate the probability outcome for each x1, x2, and x3 while keeping the other two variables at mean value, given y take specific value 1 or 2. Then create in new column

    For example p(x1 | y=1) = invlogit(a + b1*x1 + b2*mean(x2) + b3*mean(x3) + year + indus) given y = 1
    And so on for x2 and x3

    I am not quiet sure how to do this efficiency since
    Code:
    predict
    does not have any option (or I do not know) to explicitly choosing input value


    My temporary solution so far:

    Code:
    use https://www.stata-press.com/data/r18/lbw, clear
    
    gen y  = low
    gen x1 = age
    gen x2 = smoke
    gen x3 = ptl
    
    foreach var of varlist x1 x2 x3 {
        summarize `var', meanonly
        gen m_`var' = r(mean)
        gen ori_`var' = `var'
    }
    
    mlogit y x1 x2 x3
    
    * Step 2: Loop to replace, predict, and restore
    foreach var of varlist x1 x2 x3 {
        * Replace x(i) with its mean
        replace `var' = m_`var'
        
        * Predict outcome probabilities with x(i) set to its mean
        predict y1_p`var', pr outcome(1)
        predict y2_p`var', pr outcome(2)
        
        * Restore original values of x(i)
        replace `var' = ori_`var'
    }








    Hope someone could help me

    Thank you
    Last edited by Truong Quoc Phan; 04 Apr 2024, 20:21.

  • #2
    You did not give any example data to work with, so I created a toy example data set to develop and test the code.

    Code:
    clear*
    set obs 1000
    set seed 1234
    
    forvalues i = 1/3 {
        gen x`i' = runiform()
    }
    gen y = runiformint(1, 3)
    gen int indus = floor(_n/20)
    gen year = 2000 + mod(_n, 20)
    isid indus year, sort
    
    mlogit y x1 x2 x3 i.year i.indus
    
    keep if e(sample)
    gen `c(obs_t)' obs_no = _n
    
    foreach v of varlist x* {
        summ `v', meanonly
        local `v'_mean `r(mean)'
    }
    
    forvalues i = 1/2 {
        frame put _all, into(working)
        frame working {
            foreach v of varlist x* {
                replace `v' = ``v'_mean'
            }
            predict phat_y`i'_m1_m2_m3, outcome(`i')
        }
        frlink 1:1 obs_no, frame(working)
        frget phat_y`i'_m1_m2_m3, from(working)
        drop working
        frame drop working
        forvalues j = 1/3 {
            frame put _all, into(working)
            frame working {
                local suffix
                forvalues k = 1/3 {
                    if `k' != `j' {
                        replace x`k' = `x`k'_mean'
                        local suffix `suffix'_m`k'
                    }
                    else {
                        local suffix `suffix'_x`k'
                    }
                }
                predict phat_y`i'`suffix', outcome(`i')
            }
            frlink 1:1 obs_no, frame(working)
            frget phat_y`i'`suffix', from(working)
            drop working
            frame drop working
        }
    }
    The predicted value variables' names all begin with phat. They are followed by y1 or y2, according to which value of y's probability is predicted, and then some combination of x1, x2, x3, m1, m2, and m3. In this notation xj denotes that the observed values of xj were used for this prediction, and mj denotes that the mean of xj was used for this prediction.

    Comment


    • #3
      Clyde Schechter thank you so much for your help

      Comment

      Working...
      X