Generate dummy variable attributed to all observations of an ID if one of the observations meets criteria

Liv Flett

Join Date: May 2020

Posts: 3
#1

Generate dummy variable attributed to all observations of an ID if one of the observations meets criteria

03 May 2020, 07:50

Hi, I haven't had much experience using Stata and was wondering whether one could generate a dummy variable which is attributed to all observations of an ID provided one specific observation out of the ID's observations' meets a certain criteria.

Here is an example of data:

name_id year value_share dummy variable I want

city1 1 0.127 1

city1 2 0 1

city2 1 0.0530 0

city2 2 0.275 0

city3 1 0 0

city3 2 0.235 0

city4 1 0.874 1

city4 2 0.573 1

I'd like to create a dummy variable taking the value 1 for both the observations/years of each city, provided the value_share in the year 1 is >=0.1 (value_share in year 2 is irrelevant for the criteria of condition being met).

This seems rather rudimentary but I can't seem to think of any way of doing this using only one command (one line) ideally. Also, I don't wish to drop any IDs from the data (like dropping those who don't meet criteria) nor creating a subset nor combining years 1 and 2 into a single row per city.

Out of interest, how would the command differ if one needed to generate a dummy variable attributed to all observations of an ID provided at least one observation (non-specific) out of the ID's observations' meets a certain a certain criteria?
Tags: data, dummy variable, panel data
Nick Cox

Join Date: Mar 2014

Posts: 35810
#2

03 May 2020, 08:07

Code:

bysort name_id (value_share) : gen wanted = value_share[_N] >= 0.1

That is fragile if any value_share is missing, in which case

Code:

bysort name_id : egen wanted = max(value_share) replace wanted = wanted >= 0.1 if wanted < .

Greek lesson: a criterion; two criteria
1 like
Comment
Liv Flett

Join Date: May 2020

Posts: 3
#3

03 May 2020, 08:46

Nick Cox thank you - that seemed to partially work but unfortunately I'm still not getting exactly what I'd like: both years for an ID are now taking on value =1 even if the year in which "value_share" >=0.1 is in year 2 rather than year 1. I need both observations/years for an ID to take on value=1 only if "value_share" >=0.1 occurs at least in year 1 (whether it occurs in "year 2" or not doesn't matter for the criterion* to be met). I imagine the command will be somewhat similar to the ones you've provided but additionally conditioning on year 1 somehow?

And thanks, I am indeed missing some values for the "value_share" variable so second set of commands helps!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35810
#4

03 May 2020, 09:07

Working backwards, I wonder if you are confusing missing (an observation is in the dataset but a value on a key variable is missing) with absent (an observation that might have been in the dataset is not in fact present).

I see what you want. I took "one" to mean "any" but you meant this and did ask for it:

Code:

bysort name_id (year) : gen wanted = value_share[1] >= 0.1 if value_share[1] < .

I answered the last question in #1 which you did imply to be a different question.
1 like
Comment
Liv Flett

Join Date: May 2020

Posts: 3
#5

03 May 2020, 09:50

Nick Cox worked perfectly, thank you!
Comment

name_id	year	value_share	dummy variable I want
city1	1	0.127	1
city1	2	0	1
city2	1	0.0530	0
city2	2	0.275	0
city3	1	0	0
city3	2	0.235	0
city4	1	0.874	1
city4	2	0.573	1

Announcement

Generate dummy variable attributed to all observations of an ID if one of the observations meets criteria

Comment

Comment

Comment

Comment