Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging Datasets - Observations to Variables

    Hi,
    I have a question that I think is related to merging datasets.
    I have one dataset with many observations about various topics. Then, I have a second dataset with a few observations about owning a business. The respondents are the same, although not all of them are in the business dataset, as this only includes the respondents that own a business. Some respondents own multiple businesses, and in this case, the business dataset has them registered as a separate observation. Hence, there may be multiple observations per respondent in this dataset.

    I know that I can merge these with a one to many merge. However, I do not want multiple observation lines for the same respondent. I would like to have one observation line per respondent, including all there businesses and questions about those as variables.

    Thus: E.g.:

    Dataset 1:
    Respondent
    1
    2
    3
    4

    Dataset 2:
    Respondent
    2
    2
    3

    Etc.

    I would like:
    Merged Data:
    Respondent Business 1 Question about Business 1 Business 2
    1 No ... No
    2 Yes ... Yes
    3 Yes ... No
    4 No ... No
    Etc.

    Would anyone know how to get this done in Stata?

    I hope my question was clear.
    Many thanks in advance!
    Best regards,
    Gin

  • #2
    Context is important here. Why are the IDs in the second dataset repeated? Are they duplicates? Given that the variables are the same, is there any chance that a respondent answers differently on the same question? If so, how do we resolve that? If these are simply duplicates, the procedure is simple.

    Code:
    *DATASET 2
    clear
    input float respondent str3 business
    2 "Yes"
    2 "Yes"
    3 "No"
    end
    
    duplicates drop *, force
    rename business business2
    tempfile 2
    save `2'
    
    *DATASET 1
    clear
    input float respondent str3 business
    1 "No"
    2 "Yes"
    3 "Yes"
    4 "No"
    end
    rename business business1
    merge 1:1 respondent using `2', nogen
    replace business2= "No" if missing(business2)
    Res.:

    Code:
    . l
    
         +--------------------------------+
         | respon~t   busine~1   busine~2 |
         |--------------------------------|
      1. |        1         No         No |
      2. |        2        Yes        Yes |
      3. |        3        Yes         No |
      4. |        4         No         No |
         +--------------------------------+

    Comment


    • #3
      Hi,
      Thanks for your reply.
      No, they are not duplicates. Every business is registered as a separate observation, so if a respondent is repeated, this means that they own multiple business; and thus that the answers to all the follow-up questions on that specific business are also different. Does that make sense?

      Thanks,
      Gin

      Comment


      • #4
        One way to handle this is to create a variable for each business that an individual owns. Presumably, there is a variable that identifies which question relates to which business. You achieve this by reshaping wide the dataset. If an individual does not own multiple businesses, then whether it makes sense to replace his/her observations with "No" for a particular question is something that you need to decide.

        Code:
        *DATASET 2
        clear
        input float respondent str3 business float question
        2 "Yes" 1
        2 "No" 2
        3 "No" 1
        end
        
        rename business business2_q
        reshape wide business2_q, i(respondent) j(question)
        tempfile 2
        save `2'
        
        *DATASET 1
        clear
        input float respondent str3 business
        1 "No"
        2 "Yes"
        3 "Yes"
        4 "No"
        end
        rename business business1
        merge 1:1 respondent using `2', nogen
        foreach q of varlist business2_q*{
            replace `q'= "No" if missing(`q')
        }
        Res.:

        Code:
        . l, abbrev(12)
        
             +------------------------------------------------------+
             | respondent   business1   business2_q1   business2_q2 |
             |------------------------------------------------------|
          1. |          1          No             No             No |
          2. |          2         Yes            Yes             No |
          3. |          3         Yes             No             No |
          4. |          4          No             No             No |
             +------------------------------------------------------+
        Alternatively, you can aggregate the data by rephrasing the question. For example:

        1. Q: Did you make any losses in the last quarter of 2023 on your main business? A: No
        2. Q: Did you make any losses in the last quarter of 2023 on any of your other businesses? A: Yes
        can be rephrased to:

        Q: Did you make any losses in any of your businesses in the last quarter of 2023? A: Yes
        The aggregation thus becomes "Yes" if there is at least one "Yes" response, "No" otherwise.


        Code:
        *DATASET 2
        clear
        input float respondent str3 business float question
        2 "Yes" 1
        2 "No" 2
        3 "No" 1
        end
        
        bysort respondent: egen business2= max(business=="Yes")
        contract respondent business2
        drop _freq
        tempfile 2
        save `2'
        
        *DATASET 1
        clear
        input float respondent str3 business
        1 "No"
        2 "Yes"
        3 "Yes"
        4 "No"
        end
        rename business business1
        merge 1:1 respondent using `2', nogen
        replace business2= 0 if missing(business2)
        lab def business2 1 "Yes" 0 "No"
        lab values business2 business2
        Res.:

        Code:
        . l
        
             +--------------------------------+
             | respon~t   busine~1   busine~2 |
             |--------------------------------|
          1. |        1         No         No |
          2. |        2        Yes        Yes |
          3. |        3        Yes         No |
          4. |        4         No         No |
             +--------------------------------+

        Comment

        Working...
        X