Can someone tell me why the variable 'id' can't be found please?

justine gosling

Join Date: Sep 2019

Posts: 19
#1

Can someone tell me why the variable 'id' can't be found please?

21 Nov 2019, 03:21

sum HH1 HH2 LN // HH1=cluster number, HH2=household number, LN=line number (child in household)
gen idnumber = string(HH1) + " " + string(HH2) + " " + string(LN)
gen id = string(HH1) + " " + string(HH2) // use this to merge with hh data below
save ch.dta, replace // this will save to current working directory

use "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/House hold.dta"
// 28479 observations, 177 variables
// this also has the same HH1 HH2 variables:
sum HH1 HH2
gen id = string(HH1) + " " + string(HH2) // to merge on household id with child data

merge 1:m id using "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/childrens .dta"

Can someone please tell me why after the above coding when I try to merge the data states response is: variable id not found r(111);

Many thanks
Tags: None
Barbara Voe

Join Date: Nov 2019

Posts: 6
#2

21 Nov 2019, 04:46

Did you use the right using dataset to merge? Further up, you saved it as "ch.dta", is this the corresponding one to "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/childrens .dta"?
1 like
Comment
justine gosling

Join Date: Sep 2019

Posts: 19
#3

21 Nov 2019, 05:04

Thank you for trying to help. Yes I see what you mean, but it makes no difference and still doesn't work:

use "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/ch.dta"
// "ch Original data .dta"
// 19285 observations, 303 variables

// merge 1:1 _n using "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 > SPSS Datasets/hh.dta"
// merging needs to be done on a unique id number that matches the records of each dataset.
_n is the row number so the hh is unlikely to be matched to the
hh in the child dataset with this merge command, try: */

// it seems there is not an unique id number in the ch dataset? I am creating one here first:

sum HH1 HH2 LN // HH1=cluster number, HH2=household number, LN=line number (child in household)
gen idnumber = string(HH1) + " " + string(HH2) + " " + string(LN)
gen id = string(HH1) + " " + string(HH2) // use this to merge with hh data below
save ch.dta, replace // this will save to current working directory

use "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/hh.dta"
// 28479 observations, 177 variables
// this also has the same HH1 HH2 variables:
sum HH1 HH2
gen id = string(HH1) + " " + string(HH2) // to merge on household id with child data

merge 1:m id using "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/hh.dta"
// need to merge one-to-many (1:m) as one household to many children

when I try to merge the data states response is: variable id not found r(111);
Comment
Jean-Claude Arbaut

Join Date: Jul 2017

Posts: 209
#4

21 Nov 2019, 05:14

The message means there is no 'id' varaible in either the dataset currently 'used' or the dataset you try to merge.

Let's see:

* you 'use' ch.dta, create an id variable and save ch.dta.
* you 'use' hh.dta, create an id variable, you don't save
* you merge with hh.dta, which is the old copy of hh.dta, without the id variable.

So Stata is correct.
1 like
Comment
justine gosling

Join Date: Sep 2019

Posts: 19
#5

21 Nov 2019, 05:19

Thank you for your help. I never doubted Stata was correct! so where in the list of commands should I save the the data?
Comment
Jean-Claude Arbaut

Join Date: Jul 2017

Posts: 209
#6

21 Nov 2019, 05:22

Currently you are trying to merge hh.dta with a copy of itself.

I don't know what you are trying to do, but since you created an id variable in ch.dta beforehand, I suppose you wanted to merge hh.dta with ch.dta, and then this means you really wanted to type:

merge 1:m id using "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/ch.dta"

But that's just a guess.
Comment
justine gosling

Join Date: Sep 2019

Posts: 19
#7

21 Nov 2019, 05:24

Ok so now I've totally ruined my dataset and I've no idea where I've gone wrong to correct!
Comment
Jean-Claude Arbaut

Join Date: Jul 2017

Posts: 209
#8

21 Nov 2019, 05:39

Please clarify. You can only 'ruin a dataset' by saving it on disk and replacing the original. Something that should never, ever be done on an 'original' dataset unless you have a safe copy somewhere. As long as the original is safe, you just have to run your program again and find out where it goes wrong.

Here the only saved modifications are adding variables to ch.dta, and it's easy to undo: drop them. If you tried to rerun your program as is, obviously Stata complained that ch.dta already has 'id' and 'idnumber' and can't generate them (a very convenient safeguard provided by Stata).

Let's get back to basics: what's in your datasets, what are you trying to do? What do you think is 'ruined' and why?
1 like
Comment
justine gosling

Join Date: Sep 2019

Posts: 19
#9

21 Nov 2019, 06:13

Yes I think thats what I've done, saved the data and re run it too many times. However I've just downloaded the original data sets again and re run the tests once, and something still isn't right as my variables are missing. I'm very grateful for you help, this is my fist project using stata and I'm learning!

I'm using house hold survey data, the ch file is children data, the hh file is house hold data
Comment
justine gosling

Join Date: Sep 2019

Posts: 19
#10

21 Nov 2019, 06:14

I'm trying to merge the ch and hh files, but match them
Comment
Jean-Claude Arbaut

Join Date: Jul 2017

Posts: 209
#11

21 Nov 2019, 06:54

Maybe something like the following. I just copied your code and simplified a bit. Of course you will have to check that the merge is correct (it provide diagnostics, and you can browse unmatched data if necessary, for instance try "br if _merge ~= 3".

Stata may still complain:
* the 'id' variable must not already exist in thoses datasets, but HH1 and HH2 must exist.
* the 'id' variable must be unique in the 'ch_temp' dataset or merge will fail.
* if there are common variables (apart from 'id') only the value in hh_temp are kept. If it's a problem, rename variables in hh_temp or ch_temp before the merge.

Code:

cd "/Users/justinegosling/Desktop/Malawi_MICS5_Datasets/Malawi MICS 2013-14 SPSS Datasets/" * Add id to household data use "House hold", clear gen id = string(HH1) + " " + string(HH2) save hh_temp, replace * Add id to children data use childrens, clear gen id = string(HH1) + " " + string(HH2) save ch_temp, replace use hh_temp, clear merge m:1 id using ch_temp
Comment

Announcement

Can someone tell me why the variable 'id' can't be found please?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment