Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate dummy variable attributed to all observations of an ID if one of the observations meets criteria

    Hi, I haven't had much experience using Stata and was wondering whether one could generate a dummy variable which is attributed to all observations of an ID provided one specific observation out of the ID's observations' meets a certain criteria.

    Here is an example of data:
    name_id year value_share dummy variable I want
    city1 1 0.127 1
    city1 2 0 1
    city2 1 0.0530 0
    city2 2 0.275 0
    city3 1 0 0
    city3 2 0.235 0
    city4 1 0.874 1
    city4 2 0.573 1
    I'd like to create a dummy variable taking the value 1 for both the observations/years of each city, provided the value_share in the year 1 is >=0.1 (value_share in year 2 is irrelevant for the criteria of condition being met).

    This seems rather rudimentary but I can't seem to think of any way of doing this using only one command (one line) ideally. Also, I don't wish to drop any IDs from the data (like dropping those who don't meet criteria) nor creating a subset nor combining years 1 and 2 into a single row per city.

    Out of interest, how would the command differ if one needed to generate a dummy variable attributed to all observations of an ID provided at least one observation (non-specific) out of the ID's observations' meets a certain a certain criteria?

  • #2

    Code:
    bysort name_id (value_share) : gen wanted = value_share[_N] >= 0.1
    That is fragile if any value_share is missing, in which case

    Code:
    bysort name_id : egen wanted = max(value_share)
    replace wanted = wanted >=  0.1 if wanted <  .
    Greek lesson: a criterion; two criteria

    Comment


    • #3
      Nick Cox thank you - that seemed to partially work but unfortunately I'm still not getting exactly what I'd like: both years for an ID are now taking on value =1 even if the year in which "value_share" >=0.1 is in year 2 rather than year 1. I need both observations/years for an ID to take on value=1 only if "value_share" >=0.1 occurs at least in year 1 (whether it occurs in "year 2" or not doesn't matter for the criterion* to be met). I imagine the command will be somewhat similar to the ones you've provided but additionally conditioning on year 1 somehow?

      And thanks, I am indeed missing some values for the "value_share" variable so second set of commands helps!

      Comment


      • #4
        Working backwards, I wonder if you are confusing missing (an observation is in the dataset but a value on a key variable is missing) with absent (an observation that might have been in the dataset is not in fact present).

        I see what you want. I took "one" to mean "any" but you meant this and did ask for it:


        Code:
         
         bysort name_id (year) : gen wanted = value_share[1] >= 0.1 if value_share[1] < .
        I answered the last question in #1 which you did imply to be a different question.

        Comment


        • #5
          Nick Cox worked perfectly, thank you!

          Comment

          Working...
          X