Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to merge without unique ID


    Hi group members
    I want to merge two data sets: one is on household level and the other is on individual level. The household level data has details about expenditure on around 42 items and individual data set have sociodemographic information individuals in a particular household. Both data sets have household id (hhid) but this variable is not unique due to which the data is not merging. Please suggest how can I merge these files:
    .Data set 1
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str38 hhid int item_code
    "HCES2022655621010121713017 101111 301" 139
    "HCES2022655621010121713017 101111 310" 139
    "HCES2022655621010121713017 101111 311" 139
    "HCES2022655621010121713017 101111 313" 139
    "HCES2022655371010122023014 201111 202" 139
    "HCES2022655261010122023016 101112 201" 139
    "HCES2022655261010122023016 101112 301" 139
    "HCES2022655261010122023016 101112 303" 139
    "HCES2022655261010122023016 101112 306" 139
    "HCES2022655261010122023016 101112 307" 139
    "HCES2022655261010122023016 101112 308" 139
    "HCES2022655261010122023016 101112 309" 139
    "HCES2022655261010122023016 101112 310" 139
    "HCES2022655261010122023016 101112 311" 139
    Data set 2
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str38 hhid float person_no
    "HCES2022310002282831212025 228111 301" 1
    "HCES2022310002282831212025 228111 302" 5
    "HCES2022310002282831212025 228111 302" 3
    "HCES2022310002282831212025 228111 302" 4
    "HCES2022310002282831212025 228111 302" 2
    "HCES2022310002282831212025 228111 302" 1
    "HCES2022310002282831212025 228111 303" 2
    "HCES2022310002282831212025 228111 303" 5
    "HCES2022310002282831212025 228111 303" 4
    "HCES2022310002282831212025 228111 303" 1
    "HCES2022310002282831212025 228111 303" 3
    "HCES2022310002282831212025 228111 304" 2
    "HCES2022310002282831212025 228111 304" 1
    "HCES2022310002282831212025 228111 305" 1
    "HCES2022310002282831212025 228111 306" 1
    "HCES2022310002282831212025 228111 307" 1


  • #2
    Hi Jyoti,

    You can drop the duplicates in the household data by:
    Code:
    duplicates drop id_var, force
    then merge it with individuals (if your individuals' data contains duplicates, it is better to run the previous code for them at first):
    Code:
    merge 1:m hhid using ind_data

    Comment


    • #3
      Jyoti, you would need clarity on what your objective is. Here are two possibilities:

      If, for instance, you want to ultimately run regressions at the individual level, and you would want each individual's observation to have information on all household level characteristics (including expenditure on all 42 items), then you should use reshape wide to first change the household level data so that you have just one observation per household, and that observation contains information on all expenditures. After this, you can use merge 1:m to merge the two datasets.

      Another possibility is that you want to conduct household level regressions, which need information on all individuals as well as all expenditures. In this case, you should first reshape wide both the above datasets, so that both have a unique observation for a household, and then you can merge the two datasets using merge 1:1.

      Of course, if you do not need data on multiple individuals and/or multiple expenditures at the household level, you can also simply drop duplicates as suggested in #2, but my sense is this may not be what you want.
      Last edited by Hemanshu Kumar; 05 Jul 2024, 07:27.

      Comment

      Working...
      X