Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to remove the dummy variable representing the base category of a factor variable?

    Consider the following example.
    Code:
    // load data
    . webuse nlswork, clear
    
    // specify a variable list, where  "race" is a factor variable
    . local X = "age ib3.race"
    
    // expand the factor variable "race", and then identify the base category indicator
    // I do this due to the collinearity issue
    . _rmcoll `X', expand
    
    // check the updated local macro
    . display "`X'"
    
    // check whether the st_data command call a matrix without the base category variable
    . mata: st_data(., st_local("X"))
    That is, I am trying to identify and remove the variable causing the perfect collinearity (in the example above, the dummy variable for the third race category).

    But, as we can see in the example, the _rmcoll command just identifies the base category, and thus the st_data command returns a matrix that replace the column of the third race to zeros.

    What I want to make is a matrix that does not contain the column of zeros.

    How can I do that?

  • #2
    Omitting a variable in an estimation and excluding it from the variable list are two completely different things. As long as you are using factor variable notation, base coefficients are indicated by the letter "b" preceding the period, i.e., "#b.varname". Since periods are not legal characters in Stata variable names, this provides a unique way of identifying factor variables, and in this case, base coefficients from a variable list.

    Code:
    webuse nlswork, clear
    fvexpand ib3.race i.occ_code
    display "`r(varlist)'"
    local wanted= trim(itrim(ustrregexra("`r(varlist)'", "\b(\d+[b][\.][a-zA-Z\d\_]+)\b", "")))
    display "`wanted'"
    Res.:

    Code:
    . display "`r(varlist)'"
    1.race 2.race 3b.race 1b.occ_code 2.occ_code 3.occ_code 4.occ_code 5.occ_code 6.occ_code 7.occ_code 8.occ_code 9.occ_code 10.occ_code 11.occ_code 12.occ_code 13.occ_code
    
    . 
    . local wanted= trim(itrim(ustrregexra("`r(varlist)'", "\b(\d+[b][\.][a-zA-Z\d\_]+)\b", "")))
    
    . 
    . display "`wanted'"
    1.race 2.race 2.occ_code 3.occ_code 4.occ_code 5.occ_code 6.occ_code 7.occ_code 8.occ_code 9.occ_code 10.occ_code 11.occ_code 12.occ_code 13.occ_code

    Comment


    • #3
      Andrew Musau I got the point! Thank you for the detailed answer.

      Comment

      Working...
      X