Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choose the appropriate way to deal with weights in svyset

    Dear Members,

    I defined a questionnaire to gather respondents' willingness to get vaccinated against COVID-19 via a discrete choice experiment. I relied on a company specialized in political opinion polls and market research to administer the survey. The company computed a weight for each respondent based on 1) the geographical location where the respondent lives (five macroareas of Italy), 2) whether the respondent has a bachelor degree or not, and 3) to which age group she/he pertains (five classes are considered).

    The sum of the weights is equal to the number of individuals in the database. The individuals pertaining to the age classes 30-39 and 40-49 are oversampled, as per our request (related to a research hypothesis). The proportion of such two classes within the sample is larger than the actual in the Italian population. Weights are computed in order to take into account for this feature and guarantee that the sample is representative of the characteristics of the Italian population.

    The company uses the method described at the following URL to calculate the weights:

    HTML Code:
    http://mrdcsoftware.com/blog/what-is-rim-weighting-with-free-excel-working-model
    I will use the data to estimate a logit model, multinomial logit models and mixed logit models.

    The issue I am facing with is the proper path to follow to declare the nature of the weight. I have no experience in the use of Stata to deal with this issue.

    I am using Stata 17 on a PC with Windows 10 Pro 64 bit.

    I read the following forums:

    HTML Code:
    https://www.statalist.org/forums/forum/forum-help/sandbox/1526810-svyset-confusion
    HTML Code:
    https://www.statalist.org/forums/forum/general-stata-discussion/general/1445985-accounting-for-sample-weights-in-analyses-just-add-the-weighted-variables-in-the-regression
    HTML Code:
    https://www.statalist.org/forums/forum/general-stata-discussion/general/303670-proper-survey-weight-specification
    I read parts of the following manual:

    HTML Code:
    https://www.stata.com/manuals/svysvyset.pdf
    I watched this video:

    HTML Code:
    https://www.youtube.com/watch?v=XYjWCL7IEKU
    by StataCorp, which I found particular useful. At minute 04:39 the video provides the commands from the selection made by the menu window.

    I also consulted the help for "weight" from the command line.

    Combining the information from the video, the svysvyset manual and the results from the help for "weight" I tried to think what is the most appropriate solution.

    In the video by StataCorp, Chuck Huber provides an example in which in the Main menu of the "svyset - Declare design for dataset" there are four stages. This is the part on the use of the menu which I do not know how to adapt to my case. I am inclined to think that the number of stages I have to select is 1, but I am not sure if in the section "Primary sampling units" I have to indicate the variable "ID", which I used to identify my respondents, or leave it blank. I am inclined to think I should leave empty the options on "Strata" and "Finite pop. correction". As far as the "Sampling weight" is concerned I should leave it blank as well.

    Please see here below what I meant above:
    Click image for larger version

Name:	Stata_image_1.PNG
Views:	1
Size:	22.4 KB
ID:	1659629




    Looking at the window but to the right of "Main", the other relevant part I should specify relates to "Weights". The relevant image is as follows:
    Click image for larger version

Name:	Stata_image_2.PNG
Views:	1
Size:	36.0 KB
ID:	1659628



    As I reported above I supposed that I should indicate in "Sampling weight variable" the weight variable provided by the company, which I labelled "pesoco".

    However, I am not convinced that I am doing the right thing.

    In particular, as I indicated above, I am not sure I should indicate "ID" as "Primary sampling units", or leave it blank. In the latter case the command that is generated is the following:

    Code:
    svyset _n [pweight=pesoco], vce(linearized) singleunit(missing)
    and Stata returns the following outcome (command included):
    Click image for larger version

Name:	Stata_image_3.PNG
Views:	1
Size:	10.8 KB
ID:	1659630


    Conversely, in the former case, the command generated using the menus is the following:

    Code:
    svyset ID [pweight=pesoco], vce(linearized) singleunit(missing)
    and Stata returns the following outcome (command included):
    Click image for larger version

Name:	Stata_image_4.PNG
Views:	1
Size:	10.3 KB
ID:	1659631



    I would be very grateful if any of the Members could kindly provide me with an insight on which path I should follow.

    Many thanks.

    Marco





    Last edited by Marco Giansoldati; 14 Apr 2022, 11:30.

  • #2
    Cross-posted at https://stackoverflow.com/questions/...vyset-in-stata

    Comment


    • #3
      Dear Dr. Cox,
      Thank you very much for indicating the URL of the cross-posting. I thought to give more visibility to the post. I will alert the Members here if I get a reply on Stackoverflow.

      Many thanks.
      Marco
      Last edited by Marco Giansoldati; 15 Apr 2022, 04:54.

      Comment


      • #4
        Thanks Dr Marco for the detailed information on weighting!
        I am using large panel survey dataset (a longitudinal study in which different surveys are undertaken at different points of time during the survey period) for which different weights were used for the respective individual surveys. I wanted to merge the individual surveys together along with their combined weights. However, I am getting difficulty on how to combine the weights. How can I undertake it please?
        Thanks in advance!

        Comment


        • #5
          Dear Tariku Tesfaye, I think you may use the command -merge- but I am not familiar with the use of weights. You may have a look at page 194 of the following manual

          HTML Code:
          https://www.stata.com/manuals/svy.pdf
          It seems that solves your issue.

          Best wishes
          Marco
          Last edited by Marco Giansoldati; 17 Apr 2022, 03:02.

          Comment


          • #6
            Have you tried your two svysettings that you suggested? I would think it wouldn't make any difference whether you used _n or ID. If it does make a difference, write back, showing how the output differs between the two approaches.

            I wonder if the company is failing to give you the info you need to know to improve the sysetting. Did it cluster by region, apply postweights? It sounds like maybe they did, but all they are giving you is a weight. If so, I don't know what else you can do here. Your point estimates are hopefully right, but the standard errors may be off.

            I don't think this is an uncommon practice though. I've seen other studies where they give you a weight, but not information on clustering or stratification.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 18.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              Dear Prof. Richard Williams,
              Thank you very much for your kind suggestion. I tried the two svysetting and run the same logit command and the results were identical.

              The company provided the weight for each respondent, stating that it is based on 1) the geographical location where the respondent lives (five macroareas of Italy), 2) whether the respondent has a bachelor degree or not, and 3) to which age group she/he pertains (five classes are considered). The company stated that the methodology used to compute the weight is the one reported here, as indicated in my first post:

              HTML Code:
                http://mrdcsoftware.com/blog/what-is-rim-weighting-with-free-excel-working-model
              I check if I can get more information on rim weighting beyond what is provided in the above reported link and get back to you.

              I do thank you very much for kind help.

              Best wishes
              Marco

              Comment


              • #8
                Again, I don't think it is unusual to provide only a weight. The point estimates should be right, but the standard errors will be off because you aren't including information on clustering and stratification.

                But, providing that information may go over the heads of many users, and could be difficult to implement if you don't have the right software.

                If you do contact the data providers, and they are able and willing to provide additional sampling information, ask if they can tell you what the exact svyset command should be in Stata.

                I'm not an expert on sampling. I vaguely remember Clyde Schechter making a similar comment once when only weights were provided. Perhaps he can comment on whether my comments seem right or not.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 18.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  Dear Prof. Richard Williams,

                  I do thank you very much for your kind reply and for your suggestions.

                  When I asked for more details on how the weights were computed I was directed to the URL on rim-weighting. However, I will try to contact the company again and check if I can get more details.

                  Many thanks for your time and indications.

                  Best wishes
                  Marco

                  Comment

                  Working...
                  X