Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate adjusted prevalence variable

    I wonder could someone help me with a problem I'm having? I have a dataset of 10,000 people in 77 villages with disease outcome 1,0. I'm trying to generate a variable with gives me mean prevalence of disease per village, adjusted for by age category, sex, and wealth quintile. Does anyone have any idea how to do this? I've looked on several STATA help forums but haven't found anything useful so far.

  • #2
    Eimear:
    did you consider -poisson-?
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hello Eimear,

      Welcome to the Stata Forum!

      Apart from Carlo's excellent suggestion, I gather you could try to - svyset - your data, and then apply the - svy:proportion - command, also "playing" with - subpop - and - over -, so as to provide the desired result.

      Please take a look at this text (http://www.cdc.gov/nchs/tutorials/NH...ics/Task4c.htm) so as to check if it the examples relate to your demands.

      Best,

      Marcos
      Best regards,

      Marcos

      Comment


      • #4
        Related to Marcos' reply, if you want to calculate directly standardized prevalences, you can also look into the -dstdize- and -distrate- (SSC) commands.
        Last edited by Andrea Discacciati; 23 Feb 2016, 09:12.

        Comment


        • #5
          Eimear Cleary I was almost forgetting: please prefer to write "Stata", as recommended in the FAQ.

          Thanks.
          Best regards,

          Marcos

          Comment


          • #6
            Dear expert,

            I am a new member. Pleased to meet you. I have already "a dataset of 50 variables and approximately a million observations", of basic health research. The samples of the survey at the district / city come from 440 districts / cities (of the total 456 districts / cities) spread 33 (thirty-three) provinces. Household members of all ages the unit of analysis to the question of diseases by selecting regions 'rural' as the study sample. Known to all respondents in the region of survey both urban and rural areas as much as 973.657. So, the population in this study were all members of the household who reside in rural areas as much as 620.025, while the research samples are all respondents residing in the countryside at the rural endemic area in 11 Province as many as 161.872.

            The sampling design of survey was using "two-stage sampling" My project describes the determinant prevalence of one communicable disease (binary logistic 0 and 1) amongst respondents who have livestock breeding animal’s in the rural endemic area using both command svy: proportion and xi:svy: logistic. we used cross-sectional design.

            I making command
            In the 1st stage
            svy: proportion dependent variable of disease (binary logistic 0 and 1) amongst some independent variable (simultaneous with categoric variable)

            Survey: Proportion estimation
            Number of strata = 126 Number of obs = 161872
            Number of PSUs = 2783 Population size = 159828
            Design df = 2657

            Linearized
            Proportion Std. Err. [95% Conf. Interval]

            malaria
            no .9649451 .0011241 .962741 .9671493
            yes .0350549 .0011241 .0328507 .037259

            b1r1
            _prop_3 .1947366 .0014533 .1918869 .1975864
            jambi .1001543 .0009097 .0983705 .1019382
            _prop_5 .1696909 .0017459 .1662674 .1731144
            bengkulu .0859729 .0010122 .0839881 .0879577
            _prop_7 .0502815 .0008057 .0487016 .0518613
            _prop_8 .0157476 .0003684 .0150253 .01647
            _prop_9 .0828369 .0008273 .0812147 .0844592
            _prop_10 .1243242 .0012131 .1219455 .1267029
            _prop_11 .0739918 .0010545 .0719241 .0760595
            _prop_12 .029523 .001297 .0269799 .0320662
            papua .0727402 .0015528 .0696955 .075785
            and so on


            In the 2nd stage
            Multivariate survey logistic regression

            After after finished the selection of Binary Logistic In order to obtain a regression model that is cost-effective and able to explain the relationship between independent and dependent variables in the population, the necessary procedures for the selection of variables by bivariate analysis between each independent variable with the dependent variable. When the results of the bivariate test have a p-value <0.25, then these variables can enter into the multivariate model then I am making full model the prevalence dependent variable of disease (binary logistic 0 and 1) amongst some independent variable (simultaneous with categoric variable)

            Survey: Logistic regression
            Number of strata = 126 Number of obs = 161872
            Number of PSUs = 2783 Population size = 159828.48
            Design df = 2657
            F( 12, 2646) = 17.14
            Prob > F = 0.0000

            Linearized
            malaria Odds Ratio Std. Err. t P>t [95% Conf. Interval]

            age .7635662 .0349214 -5.90 0.000 .6980709 .8352065
            gender .8111374 .0248119 -6.84 0.000 .7639152 .8612787
            _Ijob_1 1.089732 .0472513 1.98 0.048 1.000909 1.186438
            _Ijob_2 .9832968 .0583158 -0.28 0.776 .8753463 1.10456
            kindofmedanimals 1.74521 .3497826 2.78 0.005 1.178063 2.585395
            kindofargeanimals 2.117317 .2983352 5.32 0.000 1.606182 2.79111
            _Imed_anima_1 1.085587 .3233135 0.28 0.783 .6053981 1.946651
            _Imed_anima_2 .9285925 .1877122 -0.37 0.714 .6247107 1.380294
            _Imed_anima_3 .8684384 .3590308 -0.34 0.733 .386077 1.953458
            _Imed_anima_4 1 (omitted)
            _Ilarge_ani_1 .3151109 .1097948 -3.31 0.001 .1591264 .6240002
            _Ilarge_ani_2 .3434502 .0617429 -5.94 0.000 .241419 .488603
            _Ilarge_ani_3 .2476288 .1702481 -2.03 0.042 .0643164 .9534115
            _Ilarge_ani_4 1 (omitted)
            _cons .0396555 .0016659 -76.83 0.000 .0365198 .0430605


            And the last. I am making the final model, the dominant factor affecting the prevalence dependent variable of disease based on independent variables.
            From multivariate analysis, the variables have significantly with the dependent variable of disease were the independent variables. While one of them as a confounding variable'

            Linearized
            malaria Odds Ratio Std. Err. t P>t [95% Conf. Interval]

            age .7274964 .0250229 -9.25 0.000 .6800482 .7782551
            gender .797493 .0234836 -7.68 0.000 .7527493 .8448963
            kindofmedanimals 1.665189 .1634349 5.20 0.000 1.373668 2.018577
            kindofargeanimals 2.123207 .2960397 5.40 0.000 1.615306 2.790807
            _Ilarge_ani_1 .3239441 .1111568 -3.28 0.001 .1652949 .6348641
            _Ilarge_ani_2 .3422465 .0610071 -6.02 0.000 .2412898 .485444
            _Ilarge_ani_3 .2377812 .1546599 -2.21 0.027 .0664169 .8512876
            _Ilarge_ani_4 1 (omitted)
            _cons .0417156 .0014397 -92.05 0.000 .0389859 .0446364

            estat gof
            Logistic model for malaria, goodness-of-fit test

            F(9,2649) = 2.31
            Prob > F = 0.0140


            Please give me suggestion

            Comment

            Working...
            X