Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can someone tell me why the variable 'id' can't be found please?

    sum HH1 HH2 LN // HH1=cluster number, HH2=household number, LN=line number (child in household)
    gen idnumber = string(HH1) + " " + string(HH2) + " " + string(LN)
    gen id = string(HH1) + " " + string(HH2) // use this to merge with hh data below
    save ch.dta, replace // this will save to current working directory

    use "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/House hold.dta"
    // 28479 observations, 177 variables
    // this also has the same HH1 HH2 variables:
    sum HH1 HH2
    gen id = string(HH1) + " " + string(HH2) // to merge on household id with child data

    merge 1:m id using "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/childrens .dta"

    Can someone please tell me why after the above coding when I try to merge the data states response is: variable id not found r(111);

    Many thanks

  • #2
    Did you use the right using dataset to merge? Further up, you saved it as "ch.dta", is this the corresponding one to "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/childrens .dta"?

    Comment


    • #3
      Thank you for trying to help. Yes I see what you mean, but it makes no difference and still doesn't work:


      use "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/ch.dta"
      // "ch Original data .dta"
      // 19285 observations, 303 variables

      // merge 1:1 _n using "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 > SPSS Datasets/hh.dta"
      // merging needs to be done on a unique id number that matches the records of each dataset.
      _n is the row number so the hh is unlikely to be matched to the
      hh in the child dataset with this merge command, try: */

      // it seems there is not an unique id number in the ch dataset? I am creating one here first:

      sum HH1 HH2 LN // HH1=cluster number, HH2=household number, LN=line number (child in household)
      gen idnumber = string(HH1) + " " + string(HH2) + " " + string(LN)
      gen id = string(HH1) + " " + string(HH2) // use this to merge with hh data below
      save ch.dta, replace // this will save to current working directory

      use "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/hh.dta"
      // 28479 observations, 177 variables
      // this also has the same HH1 HH2 variables:
      sum HH1 HH2
      gen id = string(HH1) + " " + string(HH2) // to merge on household id with child data

      merge 1:m id using "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/hh.dta"
      // need to merge one-to-many (1:m) as one household to many children

      when I try to merge the data states response is: variable id not found r(111);

      Comment


      • #4
        The message means there is no 'id' varaible in either the dataset currently 'used' or the dataset you try to merge.

        Let's see:

        * you 'use' ch.dta, create an id variable and save ch.dta.
        * you 'use' hh.dta, create an id variable, you don't save
        * you merge with hh.dta, which is the old copy of hh.dta, without the id variable.

        So Stata is correct.

        Comment


        • #5
          Thank you for your help. I never doubted Stata was correct! so where in the list of commands should I save the the data?

          Comment


          • #6
            Currently you are trying to merge hh.dta with a copy of itself.

            I don't know what you are trying to do, but since you created an id variable in ch.dta beforehand, I suppose you wanted to merge hh.dta with ch.dta, and then this means you really wanted to type:

            merge 1:m id using "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/ch.dta"

            But that's just a guess.

            Comment


            • #7
              Ok so now I've totally ruined my dataset and I've no idea where I've gone wrong to correct!

              Comment


              • #8
                Please clarify. You can only 'ruin a dataset' by saving it on disk and replacing the original. Something that should never, ever be done on an 'original' dataset unless you have a safe copy somewhere. As long as the original is safe, you just have to run your program again and find out where it goes wrong.

                Here the only saved modifications are adding variables to ch.dta, and it's easy to undo: drop them. If you tried to rerun your program as is, obviously Stata complained that ch.dta already has 'id' and 'idnumber' and can't generate them (a very convenient safeguard provided by Stata).

                Let's get back to basics: what's in your datasets, what are you trying to do? What do you think is 'ruined' and why?

                Comment


                • #9
                  Yes I think thats what I've done, saved the data and re run it too many times. However I've just downloaded the original data sets again and re run the tests once, and something still isn't right as my variables are missing. I'm very grateful for you help, this is my fist project using stata and I'm learning!

                  I'm using house hold survey data, the ch file is children data, the hh file is house hold data

                  Comment


                  • #10
                    I'm trying to merge the ch and hh files, but match them

                    Comment


                    • #11
                      Maybe something like the following. I just copied your code and simplified a bit. Of course you will have to check that the merge is correct (it provide diagnostics, and you can browse unmatched data if necessary, for instance try "br if _merge ~= 3".

                      Stata may still complain:
                      * the 'id' variable must not already exist in thoses datasets, but HH1 and HH2 must exist.
                      * the 'id' variable must be unique in the 'ch_temp' dataset or merge will fail.
                      * if there are common variables (apart from 'id') only the value in hh_temp are kept. If it's a problem, rename variables in hh_temp or ch_temp before the merge.

                      Code:
                      cd "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/"
                      
                      * Add id to household data
                      use "House hold", clear
                      gen id = string(HH1) + " " + string(HH2)
                      save hh_temp, replace
                      
                      * Add id to children data
                      use childrens, clear
                      gen id = string(HH1) + " " + string(HH2)
                      save ch_temp, replace
                      
                      use hh_temp, clear
                      merge m:1 id using ch_temp

                      Comment

                      Working...
                      X