How to find controls in cross sectional data

Akif Alig

Join Date: Feb 2020

Posts: 24
#1

How to find controls in cross sectional data

25 Feb 2020, 07:01

Hello

I am working on a cross sectional data (Demographic Health Survey data), in the data-set there is a district where a uranium mine is located. I want to to see the effect of radiation on the health of the people of that district. i am considering people of the district as cases (let us say 100 individuals are there in our sample from that district), now i have to find 100 control from the data (i.e. individuals who have same background characteristics(religion, wealth, education etc.) ). This can be done by one-to-one matching. I am not able to do it in stata. If anyone knows how to do that then please help. Many thanks
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2404
#2

25 Feb 2020, 11:01

What you describe is somewhat confusing, perhaps due to what may be an ambiguous use of "case" and "control." What defines a "case" in your situation?

I'm not certain from your description that you actually have a case-control study. If you are sure that you do, search the StataList archive for such things as /case control match/, as questions about how to match controls to cases in Stata have been very frequently asked and answered on StataList.
Comment
Akif Alig

Join Date: Feb 2020

Posts: 24
#3

25 Feb 2020, 11:36

Yeah i can understand
Actually its really hard to explain the whole scenario

lets understand it step by step
so my data set have sample of 5000 children
among them 50 are exposed to radiation (because they are living in a are where uranium mine is located)
so 4950 are not exposed to radiation
now i can generate a binary variable: "radiation" (1=yes, 0 = no)
now i want see the effect of radiation on nutritional status of children i.e. i want to compare nutritional status in two groups: 'children exposed to radiation'and 'children not-exposed to radiation'

Now i want 50 children from non exposed group that have same characteristics as the exposed children have
for example if 1st children in exposed group is a children born in Muslim rich family and whose mother is 'having higher' education and is 'housewife'
now i want a children from non-exposed group who born in a Muslim rich family whose mother is 'having higher' education and is 'housewife'

similarly for 2nd child, 3rd child ....... 50th child
this way i will have 50 children in exposed group(radiation) and 50 children in non-exposed group, and both group will be balanced (same characteristics)
now i can see the effect of radiation on nutritional status controlling the other background characteristics.

i can do this manually
but
i wanted to know whether is there any method in stata or not

Thanks alot
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29951
#4

25 Feb 2020, 11:52

Yes, there is a method, and it's not all that complicated. But it's not packaged as a ready-made program. It has to be customized to your actual data organization, which is only possible if you show a data example. Please show example data from your Stata data set; be sure your example includes both observations for exposed and unexposed children, and that includes some potential matches and others that do not match. You also have to state exactly what variables in your data set you want to match on.

To show your example data, be sure to use the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Please do not attempt to do this manually. You should never do manual calculations for any serious work. To be credible, all serious work should be done using reliable methods and should have complete documentation of what was done. Manual calculations fail on both of these criteria.
1 like
Comment
Akif Alig

Join Date: Feb 2020

Posts: 24
#5

25 Feb 2020, 22:17

Thank you sir Clyde

actually the data-set is very big
and the code generated using dataex is very long and the result window is not showing the whole code

I can mail you the dataset, (if you are comfortable)

by the way, thank you very much for your response
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2404
#6

26 Feb 2020, 07:50

Looking at -help dataex-, you will see that this command allows you to select a subset of variables through the -varlist- option. You would only need to specify a few variables to offer a good example, one that includes your variable for radiation exposure (yes/no), and a few of the potentially confounding variables. The -help- also indicates that -dataex- by default lists 100 observations, although you can specify more or fewer. Using this information will permit you to list a reasonably sized data example.
Comment

Announcement

How to find controls in cross sectional data

Comment

Comment

Comment

Comment

Comment