Dear Members,
I defined a questionnaire to gather respondents' willingness to get vaccinated against COVID-19 via a discrete choice experiment. I relied on a company specialized in political opinion polls and market research to administer the survey. The company computed a weight for each respondent based on 1) the geographical location where the respondent lives (five macroareas of Italy), 2) whether the respondent has a bachelor degree or not, and 3) to which age group she/he pertains (five classes are considered).
The sum of the weights is equal to the number of individuals in the database. The individuals pertaining to the age classes 30-39 and 40-49 are oversampled, as per our request (related to a research hypothesis). The proportion of such two classes within the sample is larger than the actual in the Italian population. Weights are computed in order to take into account for this feature and guarantee that the sample is representative of the characteristics of the Italian population.
The company uses the method described at the following URL to calculate the weights:
I will use the data to estimate a logit model, multinomial logit models and mixed logit models.
The issue I am facing with is the proper path to follow to declare the nature of the weight. I have no experience in the use of Stata to deal with this issue.
I am using Stata 17 on a PC with Windows 10 Pro 64 bit.
I read the following forums:
I read parts of the following manual:
I watched this video:
by StataCorp, which I found particular useful. At minute 04:39 the video provides the commands from the selection made by the menu window.
I also consulted the help for "weight" from the command line.
Combining the information from the video, the svysvyset manual and the results from the help for "weight" I tried to think what is the most appropriate solution.
In the video by StataCorp, Chuck Huber provides an example in which in the Main menu of the "svyset - Declare design for dataset" there are four stages. This is the part on the use of the menu which I do not know how to adapt to my case. I am inclined to think that the number of stages I have to select is 1, but I am not sure if in the section "Primary sampling units" I have to indicate the variable "ID", which I used to identify my respondents, or leave it blank. I am inclined to think I should leave empty the options on "Strata" and "Finite pop. correction". As far as the "Sampling weight" is concerned I should leave it blank as well.
Please see here below what I meant above:
data:image/s3,"s3://crabby-images/cee9e/cee9e5ee1e5e27be224a0c3383539f91deaa6587" alt="Click image for larger version
Name: Stata_image_1.PNG
Views: 1
Size: 22.4 KB
ID: 1659629"
Looking at the window but to the right of "Main", the other relevant part I should specify relates to "Weights". The relevant image is as follows:
data:image/s3,"s3://crabby-images/c4b61/c4b61c83306df11a7a34ee080bb852d71c827cd6" alt="Click image for larger version
Name: Stata_image_2.PNG
Views: 1
Size: 36.0 KB
ID: 1659628"
As I reported above I supposed that I should indicate in "Sampling weight variable" the weight variable provided by the company, which I labelled "pesoco".
However, I am not convinced that I am doing the right thing.
In particular, as I indicated above, I am not sure I should indicate "ID" as "Primary sampling units", or leave it blank. In the latter case the command that is generated is the following:
and Stata returns the following outcome (command included): data:image/s3,"s3://crabby-images/7b005/7b005352bb4f315f3101c02aeedae05eae4c0b98" alt="Click image for larger version
Name: Stata_image_3.PNG
Views: 1
Size: 10.8 KB
ID: 1659630"
Conversely, in the former case, the command generated using the menus is the following:
and Stata returns the following outcome (command included): data:image/s3,"s3://crabby-images/9c99d/9c99d432232a4d625e6f979e637f776c1cc29e81" alt="Click image for larger version
Name: Stata_image_4.PNG
Views: 1
Size: 10.3 KB
ID: 1659631"
I would be very grateful if any of the Members could kindly provide me with an insight on which path I should follow.
Many thanks.
Marco
I defined a questionnaire to gather respondents' willingness to get vaccinated against COVID-19 via a discrete choice experiment. I relied on a company specialized in political opinion polls and market research to administer the survey. The company computed a weight for each respondent based on 1) the geographical location where the respondent lives (five macroareas of Italy), 2) whether the respondent has a bachelor degree or not, and 3) to which age group she/he pertains (five classes are considered).
The sum of the weights is equal to the number of individuals in the database. The individuals pertaining to the age classes 30-39 and 40-49 are oversampled, as per our request (related to a research hypothesis). The proportion of such two classes within the sample is larger than the actual in the Italian population. Weights are computed in order to take into account for this feature and guarantee that the sample is representative of the characteristics of the Italian population.
The company uses the method described at the following URL to calculate the weights:
HTML Code:
http://mrdcsoftware.com/blog/what-is-rim-weighting-with-free-excel-working-model
The issue I am facing with is the proper path to follow to declare the nature of the weight. I have no experience in the use of Stata to deal with this issue.
I am using Stata 17 on a PC with Windows 10 Pro 64 bit.
I read the following forums:
HTML Code:
https://www.statalist.org/forums/forum/forum-help/sandbox/1526810-svyset-confusion
HTML Code:
https://www.statalist.org/forums/forum/general-stata-discussion/general/1445985-accounting-for-sample-weights-in-analyses-just-add-the-weighted-variables-in-the-regression
HTML Code:
https://www.statalist.org/forums/forum/general-stata-discussion/general/303670-proper-survey-weight-specification
HTML Code:
https://www.stata.com/manuals/svysvyset.pdf
HTML Code:
https://www.youtube.com/watch?v=XYjWCL7IEKU
I also consulted the help for "weight" from the command line.
Combining the information from the video, the svysvyset manual and the results from the help for "weight" I tried to think what is the most appropriate solution.
In the video by StataCorp, Chuck Huber provides an example in which in the Main menu of the "svyset - Declare design for dataset" there are four stages. This is the part on the use of the menu which I do not know how to adapt to my case. I am inclined to think that the number of stages I have to select is 1, but I am not sure if in the section "Primary sampling units" I have to indicate the variable "ID", which I used to identify my respondents, or leave it blank. I am inclined to think I should leave empty the options on "Strata" and "Finite pop. correction". As far as the "Sampling weight" is concerned I should leave it blank as well.
Please see here below what I meant above:
Looking at the window but to the right of "Main", the other relevant part I should specify relates to "Weights". The relevant image is as follows:
As I reported above I supposed that I should indicate in "Sampling weight variable" the weight variable provided by the company, which I labelled "pesoco".
However, I am not convinced that I am doing the right thing.
In particular, as I indicated above, I am not sure I should indicate "ID" as "Primary sampling units", or leave it blank. In the latter case the command that is generated is the following:
Code:
svyset _n [pweight=pesoco], vce(linearized) singleunit(missing)
Conversely, in the former case, the command generated using the menus is the following:
Code:
svyset ID [pweight=pesoco], vce(linearized) singleunit(missing)
I would be very grateful if any of the Members could kindly provide me with an insight on which path I should follow.
Many thanks.
Marco
Comment