Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with merging data files into one file

    Hi everyone, i am preparing a panel dataset to run analysis for my research project. The data files come separately and i have to combine them into one. This i have done using "merge" command functionality in Stata version 14. (Please find my attached do file). The merged dataset looks ok,..yet my programming is hazy and i am sure there can be a better method and i need t perfect my do file. Any contributions will be most helpful. Find the command and the dataset


    //COMBINING WWAVE 1 PANEL STUDY
    clear
    cap log close
    set more off
    cd "C:\Users\MCFADDEN\Documents\DISSERTATION WRITING 2019\NIDS DATASETS\WAVE 1\Stata14"
    log using "National Incomes Data Preparation.log" , text replace

    //IMPORTING AND PREPARING WAVE 1 DATA FILES AND MERGING THEM INTO ONE FILE
    //USING HOUSEHOLD QUESTIONNAIRE AS MASTER DATASET AND MERGE USING HOUSEHOLD ROASTER
    use "HHQuestionnaire_W1_Anon_V7.0.0.dta", clear
    duplicates report w1_hhid
    duplicates drop w1_hhid if w1_hhid>1, force
    sort w1_hhid
    merge 1:m w1_hhid using "HouseholdRoster_W1_Anon_V7.0.0.dta", update
    drop _merge
    save householdfile.dta, replace

    ///ADDING THE HOUSEHOLD DERIVED FILE
    clear
    use "hhderived_W1_Anon_V7.0.0.dta", clear
    sort w1_hhid
    merge 1:m w1_hhid using "householdfile.dta", update
    drop _merge
    save "household_hhderived.dta", replace

    //ADDING THE ADMIN FILE
    clear
    use "Admin_W1_Anon_V7.0.0.dta", clear
    duplicates drop w1_hhid, force
    merge 1:m w1_hhid using "household_hhderived.dta", update
    drop _merge
    save household_admin_roster.dta, replace

    ///ADDING THE PROXY FILE
    clear
    use "Proxy_W1_Anon_V7.0.0.dta", clear
    sort pid w1_hhid
    merge 1:m pid using "household_admin_roster.dta", update
    drop _merge
    save "household_admin_roster_adult_child_indder_proxy.d ta", replace

    //ADDING THE ADULT QUESTIONNAIRE
    duplicates drop pid, force
    sort pid
    merge 1:m pid using "Adult_W1_Anon_V7.0.0.dta", update
    drop _merge
    save household_admin_roster_adult.dta, replace

    ///ADDING THE CHILD DATASET
    clear
    use "Child_W1_Anon_V7.0.0.dta", clear
    sort pid
    merge 1:m pid using "household_admin_roster_adult.dta", update
    drop _merge
    save household_admin_roster_adult_child.dta, replace


    ///ADDING THE DERIVED DATA FILe
    clear
    use "indderived_W1_Anon_V7.0.0.dta", clear
    sort pid w1_hhid
    duplicates drop pid w1_hhid, force
    merge 1:m pid using "household_admin_roster_adult_child.dta", update
    drop _merge
    save "nids_wave1.dta", replace

    Please follow the link below to access the panel dataset from my drive, as i was not able to upload it here. https://www.icloud.com/iclouddrive/0...v7.0.0-stata14


  • #2
    Frankly, I didn't see any problem with the code.
    Best regards,

    Marcos

    Comment


    • #3
      Let me add to Marcos' comment. If you have code that does what you need then there really is no great benefit to asking us to help you make it more elegant or efficient unless you plan to run this thousands of times.

      There are elegant and efficient ways in Stata to do a wide variety of things, and if one finds them that's fine, but in terms of actually getting research done many of us will use less elegant solutions and get on with our work rather than spending a substantial amount of time looking for elegance. In your case, one might have imagined a loop over files with the contents of the loop being the merges, but you merge on different things which make such an loop not as easy or useful.

      Comment


      • #4
        Thank you Marcos Almeida and Phil Bromiley,..you see i am still new to Stata and i kinda came up with the Code after reading quite a number of books,..to get it right,..so i was not too sure if the output dataset was all correct. This is why i posted the Code and the dataset, so that whoever might run the code over the dataset and examine the output data may help me make observation i might have missed. However, if the existing code is ok, i will work with it as it is. And again, thank you so much.

        Comment

        Working...
        X