Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making a group of observations all share the same value for a variable?

    Hello,

    I wanted to start off with a disclaimer that I can't post an example of my data with dataex because my Stata is on a server that purposely does not support installing new commands.

    I however made this table to try and describe it!
    All the observations have a unique ID. The respective case ID just associates the controls with their respective cases. I made the matchgroup variable because the real ID numbers in the data are really long (like 15 characters) so having a matchgroup number ranging from 1 to around 2000 was much easier.

    Essentially what I'm trying to do is fill in all the missing values for timing of exposure for the controls so that they share the value with their respective cases.
    So I basically want Table A to turn into Table B.

    TABLE A
    ID Respective Case ID subject_type matchgroup timing of exposure
    42 42 case 1 4
    43 42 control 1 -
    44 42 control 1 -
    45 45 case 2 11
    46 45 control 2 -
    47 45 control 2 -
    48 48 case 3 1
    49 48 control 3 -
    50 48 control 3 -
    51 51 case 4 20
    52 51 control 4 -
    53 51 control 4 -

    TABLE B
    ID Respective Case ID subject_type matchgroup timing of exposure
    42 42 case 1 4
    43 42 control 1 4
    44 42 control 1 4
    45 45 case 2 11
    46 45 control 2 11
    47 45 control 2 11
    48 48 case 3 1
    49 48 control 3 1
    50 48 control 3 1
    51 51 case 4 20
    52 51 control 4 20
    53 51 control 4 20

    Is there a command I can use to do this?


    Thank you,
    Jon
    Last edited by Johnathan Athanasius; 28 Sep 2022, 18:05. Reason: Added short blurb on what the variables are.

  • #2
    Something like this:
    Code:
    bysort respective_case_id: egen timing = max(timing_of_exposure)

    Comment


    • #3
      Brilliant!

      Thank you so much, Hemanshu, that worked perfectly!
      You also really helped me with my last question from September - I really appreciate that.

      Thank you,
      Jon

      Comment


      • #4
        I wanted to start off with a disclaimer that I can't post an example of my data with dataex because my Stata is on a server that purposely does not support installing new commands.
        I appreciate the difficulties that IT people can impose upon their system users. But I do need to point out that -dataex- is part of official Stata in versions 17, 16, 15.1, and 14.2. So unless you are also being made to run a pretty ancient version of Stata, you don't have to install anything new to use it: it's already there.

        Comment


        • #5
          Hello. I am desperately asking for your help.
          I have a household-based survey and need to create a "number of children" variable which I have kept failing to over the past weeks.

          Here is how the data looks like.
          It surveyed all cohabiting family members aged 10 years old or older within a household.
          But it still gives you information on whether there is any additional members under age 10.
          With this data, I want to create a "total number of children" variable to see how many children parents have.

          This table shows an example of a household that has 4 family members; father (aged 46), mother (aged 45), child1(aged 13), and child2(under age 10).
          I want to give codes to parents that they have two children.
          Could you help me with generating this variable?

          I have tried to solve it by myself over the several weeks but failed. Any kind of advice would be appreciated..!
          hhld id key within hhld id key head of hhld total n. of hhld n. of family members under age 10 age relationship with hhld head
          2316 1 Y 4 1 46 oneself
          2316 2 N 4 1 45 spouse
          2316 2 N 4 1 45 spouse
          2316 3 N 4 1 13 children
          2316 1 Y 4 1 46 oneself
          2316 3 N 4 1 13 children

          Comment


          • #6
            Here is a way of doing this with relatively simple commands:

            Code:
            clear
            input int hhld_id_key byte within_hhld_id_key str1 head_of_hhld byte(total_n_of_hhld n_of_family_members_under_age_10 age) str8 relationship_with_hhld_head
            2316 1 "Y" 4 1 46 "oneself"
            2316 2 "N" 4 1 45 "spouse"  
            2316 2 "N" 4 1 45 "spouse"  
            2316 3 "N" 4 1 13 "children"
            2316 1 "Y" 4 1 46 "oneself"
            2316 3 "N" 4 1 13 "children"
            end
            
            gen `c(obs_t)' x = _n
            sort hhld_id_key within_hhld_id_key
            by hhld_id_key: gen byte new_member = (within_hhld_id_key != within_hhld_id_key[_n-1])
            by hhld_id_key: gen int num_over_10 = sum(new_member == 1 & relationship_with_hhld_head == "children")
            by hhld_id_key: gen int total_n_of_children = num_over_10[_N] + n_of_family_members_under_age_10[_N]
            sort x
            drop x new_member num_over_10
            which produces:

            Code:
            . list , noobs sepby(hhld_id_key)
            
              +----------------------------------------------------------------------------------+
              | hhld_i~y   within~y   head_o~d   total_~d   n_of_~10   age   relati~d   total_~n |
              |----------------------------------------------------------------------------------|
              |     2316          1          Y          4          1    46    oneself          2 |
              |     2316          2          N          4          1    45     spouse          2 |
              |     2316          2          N          4          1    45     spouse          2 |
              |     2316          3          N          4          1    13   children          2 |
              |     2316          1          Y          4          1    46    oneself          2 |
              |     2316          3          N          4          1    13   children          2 |
              +----------------------------------------------------------------------------------+

            Comment


            • #7
              See also e.g.

              1536867X1101100210 (sagepub.com)

              https://www.stata.com/support/faqs/d...ng-properties/

              Comment


              • #8
                Thank you very much for your responses. I was able to generate variables I needed!
                I accidentally uploaded my posting here following someone else's threads, but your fast reply was indeed helpful!

                Comment


                • #9
                  Hi
                  Please help.
                  I am combining two labor force datasets LB2016 and LB2019. The variables are not uniquely identified. I am redefining variables from the dataset LF2016 to match some variables in the master file LF2019. I have a variable named, ''BRANCH,'' with 19 values on its label in the dataset LB2019. The same variable ''BRANCH'' has more than 50 values in its label in the second dataset LF2016. I would like to redefine the variable ''BRANCH'' in LF2016 to match the label values for LB2019. I have attempted to do this through manage label but STATA is not allowing me to assign the same label value to multiple observations. Is there any command to ease this procedure? Many thanks for your assistance

                  Comment


                  • #10
                    Originally posted by Mariama Kam View Post
                    I have attempted to do this through manage label but STATA is not allowing me to assign the same label value to multiple observations.
                    I am not sure what you mean by this. Can you show a data extract and the code you are using?

                    You may also want to install elabel (by daniel klein; via the Stata Journal) using
                    Code:
                    net sj 21-2 dm0101_1
                    and then check, for instance
                    Code:
                    help elabel_recode

                    Comment


                    • #11
                      Attempting to directly modify the labels to make them consistent is going to be, at best, tedious and error prone. In my opinion, inconsistencies across data sets that will be combined should be fixed in the individual data sets before combining them. So I would convert BRANCH back to a string variable in both data sets (-decode-). Then combine the data sets, and, finally, then -encode- the string variable: this will guarantee an internally consistent value label for BRANCH. It will not, in general, agree with the labeling in the original data sets, however, that is typically not necessary if analysis will be carried out in the combined data only. (And it is clearly impossible to do this in a way that will agree with both of the original data sets.)

                      Code:
                      use LB2016, clear
                      decode BRANCH, gen(branch)
                      drop BRANCH
                      tempfile lb2016
                      save `lb2016'
                      
                      use LB2019, clear
                      decode BRANCH, gen(branch)
                      drop BRANCH
                      
                      append using `lb2016'
                      encode branch, gen(BRANCH)
                      drop branch
                      Added: Crossed with #2.

                      Comment


                      • #12
                        Many Thanks Clyde. Here are the restricted versions of the two datasets I am merging. I keep receiving the following message: variables BRANCH21_E1 HH5 M5A Milieu_new do not uniquely identify observations in the master data
                        Attached Files

                        Comment


                        • #13
                          Attachments are discouraged here. I am one of many Forum members who will not download files from people I do not know. The helpful way to show example data here is to load your Stata data set and then use the -dataex- command. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                          Also, for help with a problem like this, you will need to explain what the variables you are talking about are, and also show what command(s) gave you the error message in question.

                          Comment


                          • #14
                            Many thanks Clyde. I have used the code you provided and the datasets have been amended. However, I noticed that many observations were dropped (missing).

                            Comment


                            • #15
                              Again, without seeing example data that illustrates the problem, there isn't anything I can do to help.

                              Comment

                              Working...
                              X