Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting the number of times a dummy variable changes value

    Hello,

    I have an unbalanced dataset from 1999 to 2019, but with gaps (not sure if it's important to mention).

    I have a variable called public_dum which is equal to one if the individual works in the public sector, and 0 if he/she works in the private sector.

    Code:
    tab public_dum
    
     public_dum |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |      8,816       82.65       82.65
              1 |      1,851       17.35      100.00
    ------------+-----------------------------------
          Total |     10,667      100.00
    My goal is to count the number of individuals that goes from the private to the public sector over years. For simplicity, I would stop counting at the first occurrence (even if some individuals might switch several times).

    So ideally, I would create a variable called switch that takes 1 if the individual switched (at least once) from private to public, and 0 if she/he has never switched.

    I am not very experienced in Stata, and I have difficulty in starting it, so I would appreciate if someone could shed some lights on it.

    Here is a subset of the three variables concerned:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long idpers int year float public_dum
     11101 1999 .
     21101 2001 0
     26102 1999 .
     42101 2000 .
     42101 2001 0
     42101 2002 0
     42101 2003 0
     42101 2004 0
     42101 2005 0
     42101 2006 0
     42101 2007 .
     42101 2008 0
     42101 2009 0
     42101 2010 0
     42101 2011 0
     42101 2012 0
     42101 2013 0
     42101 2014 0
     42101 2015 0
     42101 2016 .
     42101 2017 0
     42102 2000 0
     42102 2001 0
     42102 2002 1
     42102 2003 1
     42102 2004 1
     42102 2005 1
     42102 2016 1
     45101 1999 0
     45101 2000 0
     45101 2001 0
     45101 2007 .
     45101 2008 .
     45101 2009 .
     45101 2010 0
     45101 2011 .
     45101 2012 .
     45101 2013 0
     45101 2014 .
     45101 2015 .
     45101 2016 0
     45101 2017 0
     45101 2018 0
     45102 1999 0
     45102 2000 0
     45102 2001 0
     45102 2007 0
     45102 2008 0
     45102 2009 0
     45102 2010 0
     45102 2011 0
     45102 2012 0
     45102 2013 0
     45102 2014 0
     45102 2015 0
     45102 2016 0
     45102 2017 0
     45102 2018 0
     45102 2019 0
     45103 2015 .
     45103 2016 .
     45103 2017 .
     45103 2018 .
     45103 2019 .
    103101 1999 0
    103101 2000 0
    103101 2001 0
    103101 2009 0
    103101 2010 0
    103104 2001 .
    103104 2009 .
    103104 2010 .
    103104 2011 .
    103104 2012 .
    103104 2013 .
    103104 2014 .
    103105 2000 0
    117101 1999 0
    117101 2000 0
    117101 2001 0
    117101 2002 0
    117101 2003 0
    117101 2004 0
    117101 2005 0
    117101 2006 0
    117101 2007 0
    117101 2008 0
    117101 2009 0
    117101 2010 0
    117101 2011 0
    117101 2012 0
    117101 2013 0
    117101 2014 0
    117101 2015 0
    117101 2016 0
    117101 2017 .
    117101 2018 .
    117102 1999 .
    117102 2000 .
    117102 2001 .
    end
    label values idpers IDPERS


  • #2
    Zsolt:
    you may want to try:
    Code:
    . bysort idpers: gen wanted=sum(public_dum)
    
    . g switch=1 if wanted==1
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      What are missing values for?
      And in your data example the switch is happening only once for one person. So your data description is not reflected in the example data.

      A rudimentary code below:


      Code:
      xtset idpers year
      gen switch = D.public_dum
      first line declares your data a panel dataset
      The second line just takes the first difference of values [t - t-1] and respects the ID variable (that is it won't subtract the starting value of a new idper from the end value of the previous idper.

      Where
      Code:
       switch == 1
      is where the switch happens. You can make your conditions more specific on the actual data, especially if you are only looking for the first occurrence of a switch.

      HTH!

      Comment


      • #4
        Thank you for your answer, Carlo.

        Actually, I was thinking about doing it in this spirit:

        Code:
        gen switch=0
        bys idpers: replace switch=1 if public_dum[_n]=!public_dum[_n-1]
        So that I can count individuals that switch from public to private as well, but Stata warns be "invalid syntax".

        Do you eventually see what is wrong?

        Thank you

        Comment


        • #5
          Thank you, Asjad. I will try your code.

          Missing values refer to a particular kind of public employees that I am not interested in.

          Comment


          • #6

            Code:
            bysort idpers (year) : gen switch = public == 1 & public[_n-1] == 0 
            
            by idpers : egen ever_switch = max(switch) 
            
            egen tag = tag(idpers) 
            
            count if tag & ever_switch
            is other technique. See the help on egen and count for more.

            Comment


            • #7
              Thank you, Nick. I understand the first line. Can you please briefly explain what the others do, if you don't mind? Thank you.

              Comment


              • #8
                The use of egen, max() here matches https://www.stata.com/support/faqs/d...ble-recording/

                tag() is explained tersely by the help for egen.The original 1999 publication has a bit more at dm70 in https://www.stata.com/products/stb/journals/stb50.pdf

                90% of the understanding of count comes from understanding the command name. The manual entry has links to other papers.

                Comment

                Working...
                X