Making a group of observations all share the same value for a variable?

Johnathan Athanasius

Join Date: Jun 2022
Posts: 9

Making a group of observations all share the same value for a variable?

28 Sep 2022, 18:01

Hello,

I wanted to start off with a disclaimer that I can't post an example of my data with dataex because my Stata is on a server that purposely does not support installing new commands.

I however made this table to try and describe it!
All the observations have a unique ID. The respective case ID just associates the controls with their respective cases. I made the matchgroup variable because the real ID numbers in the data are really long (like 15 characters) so having a matchgroup number ranging from 1 to around 2000 was much easier.

Essentially what I'm trying to do is fill in all the missing values for timing of exposure for the controls so that they share the value with their respective cases.
So I basically want Table A to turn into Table B.

TABLE A

ID	Respective Case ID	subject_type	matchgroup	timing of exposure
42	42	case	1	4
43	42	control	1	-
44	42	control	1	-
45	45	case	2	11
46	45	control	2	-
47	45	control	2	-
48	48	case	3	1
49	48	control	3	-
50	48	control	3	-
51	51	case	4	20
52	51	control	4	-
53	51	control	4	-

TABLE B

ID	Respective Case ID	subject_type	matchgroup	timing of exposure
42	42	case	1	4
43	42	control	1	4
44	42	control	1	4
45	45	case	2	11
46	45	control	2	11
47	45	control	2	11
48	48	case	3	1
49	48	control	3	1
50	48	control	3	1
51	51	case	4	20
52	51	control	4	20
53	51	control	4	20

Is there a command I can use to do this?

Thank you,
Jon

Last edited by Johnathan Athanasius; 28 Sep 2022, 18:05. Reason: Added short blurb on what the variables are.

Tags: None

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1197
#2

28 Sep 2022, 18:15

Something like this:

Code:

bysort respective_case_id: egen timing = max(timing_of_exposure)
1 like
Comment
Johnathan Athanasius

Join Date: Jun 2022

Posts: 9
#3

28 Sep 2022, 18:54

Brilliant!

Thank you so much, Hemanshu, that worked perfectly!
You also really helped me with my last question from September - I really appreciate that.

Thank you,
Jon
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29773
#4

28 Sep 2022, 22:12

I wanted to start off with a disclaimer that I can't post an example of my data with dataex because my Stata is on a server that purposely does not support installing new commands.

I appreciate the difficulties that IT people can impose upon their system users. But I do need to point out that -dataex- is part of official Stata in versions 17, 16, 15.1, and 14.2. So unless you are also being made to run a pretty ancient version of Stata, you don't have to install anything new to use it: it's already there.
1 like
Comment

Eunhye Kang

Join Date: Jun 2017
Posts: 31

27 May 2024, 02:41

Hello. I am desperately asking for your help.
I have a household-based survey and need to create a "number of children" variable which I have kept failing to over the past weeks.

Here is how the data looks like.
It surveyed all cohabiting family members aged 10 years old or older within a household.
But it still gives you information on whether there is any additional members under age 10.
With this data, I want to create a "total number of children" variable to see how many children parents have.

This table shows an example of a household that has 4 family members; father (aged 46), mother (aged 45), child1(aged 13), and child2(under age 10).
I want to give codes to parents that they have two children.
Could you help me with generating this variable?

I have tried to solve it by myself over the several weeks but failed. Any kind of advice would be appreciated..!

hhld id key	within hhld id key	head of hhld	total n. of hhld	n. of family members under age 10	age	relationship with hhld head
2316	1	Y	4	1	46	oneself
2316	2	N	4	1	45	spouse
2316	2	N	4	1	45	spouse
2316	3	N	4	1	13	children
2316	1	Y	4	1	46	oneself
2316	3	N	4	1	13	children

Comment

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1197

27 May 2024, 03:26

Here is a way of doing this with relatively simple commands:

Code:

clear
input int hhld_id_key byte within_hhld_id_key str1 head_of_hhld byte(total_n_of_hhld n_of_family_members_under_age_10 age) str8 relationship_with_hhld_head
2316 1 "Y" 4 1 46 "oneself"
2316 2 "N" 4 1 45 "spouse"  
2316 2 "N" 4 1 45 "spouse"  
2316 3 "N" 4 1 13 "children"
2316 1 "Y" 4 1 46 "oneself"
2316 3 "N" 4 1 13 "children"
end

gen `c(obs_t)' x = _n
sort hhld_id_key within_hhld_id_key
by hhld_id_key: gen byte new_member = (within_hhld_id_key != within_hhld_id_key[_n-1])
by hhld_id_key: gen int num_over_10 = sum(new_member == 1 & relationship_with_hhld_head == "children")
by hhld_id_key: gen int total_n_of_children = num_over_10[_N] + n_of_family_members_under_age_10[_N]
sort x
drop x new_member num_over_10

which produces:

Code:

. list , noobs sepby(hhld_id_key)

  +----------------------------------------------------------------------------------+
  | hhld_i~y   within~y   head_o~d   total_~d   n_of_~10   age   relati~d   total_~n |
  |----------------------------------------------------------------------------------|
  |     2316          1          Y          4          1    46    oneself          2 |
  |     2316          2          N          4          1    45     spouse          2 |
  |     2316          2          N          4          1    45     spouse          2 |
  |     2316          3          N          4          1    13   children          2 |
  |     2316          1          Y          4          1    46    oneself          2 |
  |     2316          3          N          4          1    13   children          2 |
  +----------------------------------------------------------------------------------+

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35173
#7

27 May 2024, 04:23

See also e.g.

1536867X1101100210 (sagepub.com)

https://www.stata.com/support/faqs/d...ng-properties/
1 like
Comment
Eunhye Kang

Join Date: Jun 2017

Posts: 31
#8

28 May 2024, 20:12

Thank you very much for your responses. I was able to generate variables I needed!
I accidentally uploaded my posting here following someone else's threads, but your fast reply was indeed helpful!
Comment
Mariama Kam

Join Date: Aug 2024

Posts: 10
#9

26 Aug 2024, 13:47

Hi
Please help.
I am combining two labor force datasets LB2016 and LB2019. The variables are not uniquely identified. I am redefining variables from the dataset LF2016 to match some variables in the master file LF2019. I have a variable named, ''BRANCH,'' with 19 values on its label in the dataset LB2019. The same variable ''BRANCH'' has more than 50 values in its label in the second dataset LF2016. I would like to redefine the variable ''BRANCH'' in LF2016 to match the label values for LB2019. I have attempted to do this through manage label but STATA is not allowing me to assign the same label value to multiple observations. Is there any command to ease this procedure? Many thanks for your assistance
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1197
#10

26 Aug 2024, 14:00

Originally posted by Mariama Kam View Post

I have attempted to do this through manage label but STATA is not allowing me to assign the same label value to multiple observations.

I am not sure what you mean by this. Can you show a data extract and the code you are using?

You may also want to install elabel (by daniel klein; via the Stata Journal) using

Code:

net sj 21-2 dm0101_1

and then check, for instance

Code:

help elabel_recode
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29773
#11

26 Aug 2024, 14:08

Attempting to directly modify the labels to make them consistent is going to be, at best, tedious and error prone. In my opinion, inconsistencies across data sets that will be combined should be fixed in the individual data sets before combining them. So I would convert BRANCH back to a string variable in both data sets (-decode-). Then combine the data sets, and, finally, then -encode- the string variable: this will guarantee an internally consistent value label for BRANCH. It will not, in general, agree with the labeling in the original data sets, however, that is typically not necessary if analysis will be carried out in the combined data only. (And it is clearly impossible to do this in a way that will agree with both of the original data sets.)

Code:

use LB2016, clear decode BRANCH, gen(branch) drop BRANCH tempfile lb2016 save `lb2016' use LB2019, clear decode BRANCH, gen(branch) drop BRANCH append using `lb2016' encode branch, gen(BRANCH) drop branch

Added: Crossed with #2.
Comment
Mariama Kam

Join Date: Aug 2024

Posts: 10
#12

27 Aug 2024, 13:26

Many Thanks Clyde. Here are the restricted versions of the two datasets I am merging. I keep receiving the following message: variables BRANCH21_E1 HH5 M5A Milieu_new do not uniquely identify observations in the master data

Attached Files

LB2016r and LB2019r.xlsx (15.3 KB, 1 view)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29773
#13

27 Aug 2024, 13:58

Attachments are discouraged here. I am one of many Forum members who will not download files from people I do not know. The helpful way to show example data here is to load your Stata data set and then use the -dataex- command. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Also, for help with a problem like this, you will need to explain what the variables you are talking about are, and also show what command(s) gave you the error message in question.
Comment
Mariama Kam

Join Date: Aug 2024

Posts: 10
#14

27 Aug 2024, 20:10

Many thanks Clyde. I have used the code you provided and the datasets have been amended. However, I noticed that many observations were dropped (missing).
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29773
#15

28 Aug 2024, 10:18

Again, without seeing example data that illustrates the problem, there isn't anything I can do to help.
Comment

Announcement

Making a group of observations all share the same value for a variable?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment