Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coding dummy variables

    Hi, I have a categorical variable "Source of Funding". It has 4 categories: 1. Institutional Sources (IS) 2. Non-Institutional Sources (NIS) 3. Both IS and NIS 4. No funding. How do I code this in dummies? Generally, I can have 3 dummy variables: one for IS, one for NIS and another for both. My doubt is if I can just have 2 dummies (dummy for IS and dummy for NIS) which can both be coded 1 when the firm is funded by both IS and NIS. If yes, how can I interpret the results from a regression model?


  • #2
    Denila:
    you may want to try something along the following lines:
    Code:
    set obs 10
    g Source_of_Funding=1 in 1/3
    replace Source_of_Funding=2 in 4/6
    replace Source_of_Funding=3 in 7/8
    replace Source_of_Funding=4 in 9/10
    label define Source_of_Funding 1 "Institutional Sources (IS)" 2 "Non-Institutional Sources (NIS)" 3 "Both IS and NIS" 4 "No funding"
    label val Source_of_Funding Source_of_Funding
    tab Source_of_Funding
    You can group levels of a given categorical variable provided that it makes sense in your research field and/or that categorizing results in such a small number of observations for a given level that it becomes practically immaterial.
    As far as the recoded categorical predictors is concerned, its contribution in explaining the variation of the conditional mean of the dependent variable (you do not provide further details, hence what follows is unavoidably a general advice) changes accordingly.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Hello Denila. I see you're a new member of Statalist, so it may be that you are also a (relatively) new user of Stata. If so, I wonder if you have read about factor variables yet. I suspect that whatever it is you want to do with your set of indicator variables can be achieved more efficiently via factor variable notation. To get started, type the following in the Command window and hit Enter:

      Code:
      help fvvarlist
      Pay attention to the i. prefix and the ib#. prefix, which can be used to specify a base level (aka., a reference category).

      Here's a simple (and silly) example using the auto dataset that comes with Stata.

      Code:
      clear *
      sysuse auto
      tab rep78
      regress mpg i.rep78   // 1st level as referent (the default)
      regress mpg ib3.rep78 // 3rd level as the referent
      regress mpg ib5.rep78 // 5th level as the referent
      HTH.


      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 18.5 (Windows)

      Comment


      • #4
        Originally posted by Bruce Weaver View Post
        Hello Denila. I see you're a new member of Statalist, so it may be that you are also a (relatively) new user of Stata. If so, I wonder if you have read about factor variables yet. I suspect that whatever it is you want to do with your set of indicator variables can be achieved more efficiently via factor variable notation. To get started, type the following in the Command window and hit Enter:

        Code:
        help fvvarlist
        Pay attention to the i. prefix and the ib#. prefix, which can be used to specify a base level (aka., a reference category).

        Here's a simple (and silly) example using the auto dataset that comes with Stata.

        Code:
        clear *
        sysuse auto
        tab rep78
        regress mpg i.rep78 // 1st level as referent (the default)
        regress mpg ib3.rep78 // 3rd level as the referent
        regress mpg ib5.rep78 // 5th level as the referent
        HTH.

        Dear Bruce, Thank you for your reply. I am aware of this option. Let me try to explain my current fix in a better manner.
        The data, if coded as per your suggestion, will have 3 dummy variables, say D1 for IS, D2 for NIS and D3 for Both having 4 as base.
        My question is when a firm borrows from both sources (IS and NIS), can dummy for IS and dummy for NIS both be coded 1, leaving out D3?
        So, I have 2 ways I can code this data. One is, when a firm borrows from both,
        D1=0; D2=0; D3=1
        The other is,
        D1=1; D2=1
        The second leaves out D3 completely. It still indicates that the firm has borrowed fromboth IS and NIS.

        My concern is that I have been taught that the number of dummy variables should be the number of categories minus 1. So, I should actually have 3 categories. But my PhD supervisor suggests that I should follow the second way of coding to keep it simple. My question is, is it technically right to do so and how do I interpret in case it is technically right.

        I hope this makes things little bit clearer.
        Last edited by Denila Jinny; 16 Jul 2018, 00:47.

        Comment


        • #5
          Hi Denila. Sorry for not reading #1 carefully enough the first time around. Given that IS and NIS are not mutually exclusive, I think you would need to include their interaction to account for the possibility of both occurring. E.g.,

          Code:
          * Mimic the IS & NIS data
          clear *
          sysuse auto
          tab rep78
          generate rep45 = rep78 > 3
          rename (mpg foreign rep45) (Y IS NIS)
          
          * Regression model with IS, NIS and their interaction
          regress Y IS##NIS
          * Use -margins- to show the cell means & simple main effects
          margins IS#NIS
          margins IS@NIS, contrast(nowald effects)
          margins NIS@IS, contrast(nowald effects)
          HTH.
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment


          • #6
            Hello Bruce, I have information on whether the firms have borrowed from both the sources. Should I still use interactions?

            Comment


            • #7
              Hi Denila. Yes, you need to include the interaction to account for all 4 possibilities: neither, IS only, NIS only, both.
              --
              Bruce Weaver
              Email: [email protected]
              Version: Stata/MP 18.5 (Windows)

              Comment


              • #8
                Bruce, thank you for patiently replying to my queries.

                I do not understand what difference using an interaction variable would make when I already have a separate dummy to indicate if the firm has borrowed from both the sources. Can you please elaborate? Thank you.

                A glimpse

                Firm ID . IS . NIS. BOTH
                1 . 1. O. 0

                ​​2. 0. 1 . 0 .

                3. 0 . 0 . 1

                4. 0 . 0 . 0

                Firm 4 doesn't borrow from any external aource, which category is considered the base.


                Can you please elaborate a bit more as to why you are suggesting that I have to consider interaction term? Thank you so much.

                Comment


                • #9
                  By including the interaction of the two dichotomous explanatory variables, you are able to separate the effects of the two, and get 2*2 = 4 cell means. Did you try the example in #5? The first -margins- command displays the 4 cell means for that example. If you do not include the interaction, you will not separate the effects of the two dichotomous variables. Here is a cleaned up version of the demo in #5 with some more comments added. Perhaps it will help to clarify things.

                  Code:
                  * Mimic the IS & NIS data
                  clear *
                  sysuse auto
                  tab rep78
                  * Compute new variables to eliminate labels
                  generate Y = mpg
                  generate byte IS = foreign
                  generate byte NIS = rep78 > 3
                  tabulate IS NIS  // Show the n for each of the 4 cells
                  
                  * The rest of the code mimics your analysis.
                  
                  * Regression model with IS, NIS and their interaction
                  regress Y IS##NIS
                  * Use -margins- to show the cell means & simple main effects
                  margins IS#NIS
                  * This table shows the means of the 4 cells obtained by
                  * the factorial combination of the two dichotomous variables (IS & NIS).
                  margins IS@NIS, contrast(nowald effects)
                  * Compare "(1 vs base) 0" to "1.IS" in regression output
                  margins NIS@IS, contrast(nowald effects)
                  * Compare "(1 vs base) 0" to "1.NIS" in regression output
                  HTH.
                  --
                  Bruce Weaver
                  Email: [email protected]
                  Version: Stata/MP 18.5 (Windows)

                  Comment


                  • #10
                    Hello Bruce, Thank you for patiently helping me out. I finally understand what you say. Thank you so much once again.

                    Comment

                    Working...
                    X