Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating dummy for switching and reverting back on other side of threshold

    Dear all,

    I have a few questions concerning the informative posts on this thread: https://www.statalist.org/forums/for...-periods/page2

    - In #7, Nick Cox posted the following link: https://www.stata.com/support/faqs/d...ions-in-group/. This webpage explains that if one runs the code showcased on that same webpage (after proper sorting):
    Code:
     
     by eid (egenotype), sort: gen diff = egenotype[1] != egenotype[_N] 
    one will obtain a dummy assuming the value of 1 if the first and last values of egenotype differ, correct? However, in the example, within eid=2, egenotype goes from being equal to ww to vv and then back to ww. Dr. Cox nonetheless specifies that the dummy created will still assume the value of 1 for eid=2 as there is variation within eid=2, correct?

    - Employing a similar example, except supposing that ww=0 and vv=1, and that a certain threshold exists: 0.5, we generate a diff dummy taking the value of 1 if within a given eid, egenotype crosses that threshold. Subsequently, we run a code quasi-identical to that above to generate a variable showing switchers. How would a researcher go about generating a dummy assuming the value of 1 if a unit switches back (above or below) the threshold, after having switched a first time from one side to another of the threshold to another.

  • #2
    one will obtain a dummy assuming the value of 1 if the first and last values of egenotype differ, correct? However, in the example, within eid=2, egenotype goes from being equal to ww to vv and then back to ww. Dr. Cox nonetheless specifies that the dummy created will still assume the value of 1 for eid=2 as there is variation within eid=2, correct?
    No, not correct. The code compares the values of egenotype in the first and last observations after the observations have been sorted on egnotype itself (within eid). The original order of the values of the data is irrelevant. The logic is simply this: if there is only a single value of egenotype in all observations, then (in any order) the first and last will be the same. If there are multiple values of egenotype, the sorting by egenotype (within id) will cause the (new) first observation to be the smallest such value, and the last observation to be the largest such value, so they will be different.

    How would a researcher go about generating a dummy assuming the value of 1 if a unit switches back (above or below) the threshold, after having switched a first time from one side to another of the threshold to another.
    Code:
    by id (date), sort: gen n_spells = ///
        sum(cond(_n == 1, 0, (unit > threshold) != (unit[_n-1] > threshold)))
    by id (date): gen byte wanted = n_spells > = 2
    Note: Untested, as no example data was provided.
    Last edited by Clyde Schechter; 25 Jan 2022, 15:31. Reason: Correct error in code

    Comment


    • #3
      Thank you very much for your prompt, clear and concise response. I wanted to first ask a general question on a publicly available example before delving deeper into my specific case.

      Here are my data:

      Code:
      input float(occ_zipcode_prog_year switch_high_share_compbased share_compbased_occzipyear)
      529 1   .3333333
      529 1   .3333333
      529 1   .3333333
      530 1          1
      531 1  .14285715
      531 1  .14285715
      531 1  .14285715
      531 1  .14285715
      531 1  .14285715
      531 1  .14285715
      531 1  .14285715
      532 1   .3333333
      532 1   .3333333
      532 1   .3333333
      533 1          1
      534 1          1
      535 1          1
      536 1          1
      537 1          1
      538 1          1
      539 1          1
      540 1          1
      541 1          1
      542 1          1
      543 1          1
      544 1          1
      546 1  .05882353
      546 1  .05882353
      546 1  .05882353
      546 1  .05882353
      546 1  .05882353
      546 1  .05882353
      546 1  .05882353
      546 1  .05882353
      546 1  .05882353
      546 1  .05882353
      546 1  .05882353
      546 1  .05882353
      546 1  .05882353
      546 1  .05882353
      546 1  .05882353
      546 1  .05882353
      546 1  .05882353
      Applying your code to my data (occ_zipcode_prog_year is the group variable):

      Code:
         
       by occ_zipcode_prog_year, sort: gen n_spells = ///     cond(_n == 1, 0, sum((share_compbased_occzipyear > 0.009) != (share_compbased_occzipyear[_n-1] > 0.009)) by occ_zipcode_prog_year: gen byte wanted = n_spells > = 2
      The new corrected code fulfills its function perfectly; it assumes the value of of one whenever a unit that has switched from one side to another of the threshold reverts to the other side of the threshold (i.e. switches back). Many thanks again!
      Attached Files

      Comment

      Working...
      X