Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Model selection with age-standardization of survey data

    Dear Statalisters,

    Please forgive my ignorance, but is there a way to determine the best model e.g. AIC or determine a gof via a test . This is for survey data that is reporting on age-standardized rates. I cannot use the dstdize or istdize syntax because it is survey data and I cannot seem to figure out a way to do it with a code like this:
    Code:
    svy:  mean outcome, stdize(agecat) stdweight(std_wgt)
    Last edited by Ruth-Alma Turkson-Ocran; 23 Nov 2019, 02:46.

  • #2
    You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. I'm not even sure what you've estimated.


    Comment


    • #3
      Thank you Phil. I will provide more clarification:

      How do you select the best model (e.g. equivalent of a GoF [Wald] or AIC/BIC) with a model examining the prevalence, of hypertension for example, using age-standardization and survey data (e.g. NHANES/NHIS) in Stata?

      To further clarify, per NHANES guidelines, I used:
      Code:
       svy: mean outcome, stdize(agecat) stdweight(std_wgt)
      to find the prevalence of hypertension, then based on literature and Table1 significance, I adjusted for other factors like marital status, education, income, etc. The estimates were not different, so I am choosing to go with the unadjusted model. However, statistically, how do you know that you have selected the best fit model. As far as I know, the
      Code:
      svy: mean
      syntax does not work with
      Code:
      estat gof
      or
      Code:
      estat ic
      to obtain AIC/BIC figures. I am thinking that given that it is survey data and that I used age-standardization, that perhaps are there may be some principles that may not be accounted for or which do not translate well for determining the best model, but I am unsure about this.

      Given all these, how does one correctly specify the right model in these circumstances? How do I know that my model is correct? and if it can be done, how do I do that in Stata?

      Also, here is a sample of the data using dataex:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(htn std_wgt agecat wtfa7yr nstratum npsu married poor2 edu_cat)
      0 .2660267 2  587.5714  230   2 1 3 3
      1 .0600966 5 1043.8572  177   1 0 2 3
      1 .2639372 3       835  231   2 0 1 2
      0 .2660267 2  52.14286  114   1 1 3 2
      0 .3396116 1 1159.2858   78   1 0 3 1
      1 .2639372 3 1312.5714   45   2 0 3 1
      1 .0600966 5  423.2857 1122  14 1 3 3
      0 .2639372 3  729.5714 1117  34 1 3 3
      1 .3396116 1 350.85715  138   1 1 3 2
      1 .0703279 4  379.5714  275   2 0 3 1
      0 .0703279 4 198.14285  281   1 0 3 1
      1 .2660267 2 379.14285   67   2 0 3 1
      0 .0703279 4  958.7143  141   1 1 3 3
      1 .2639372 3 347.14285  165   2 1 2 1
      0 .2660267 2 318.85715   54   1 1 3 3
      0 .2660267 2  540.8571   64   1 1 3 2
      1 .0703279 4  653.7143   24   1 1 3 1
      0 .3396116 1  353.5714   82   1 0 3 3
      1 .0703279 4 115.28571  111   1 1 3 1
      0 .3396116 1  271.7143   20   1 0 3 3
      0 .3396116 1 100.14286  289   1 1 2 1
      0 .2639372 3  785.8571   38   2 1 3 3
      1 .0600966 5  769.1429  125   1 0 3 1
      0 .2639372 3  734.8571   26   2 0 3 3
      0 .2660267 2  302.7143  272   2 0 3 2
      0 .2660267 2  416.2857   82   2 0 2 3
      0 .2639372 3  39.71429 1105 150 1 2 3
      0 .2639372 3        93  124   1 0 1 3
      0 .2639372 3  805.8571  108   1 1 2 2
      0 .2660267 2 260.14285  208   2 0 1 2
      0 .2660267 2 443.85715  294   2 0 3 2
      1 .2639372 3  996.2857  105   2 0 3 2
      0 .2660267 2  178.7143  150   2 0 3 3
      0 .2639372 3 28.857143  114   1 0 2 2
      0 .0600966 5       458  196   1 0 3 1
      0 .2660267 2  548.8571  271   1 1 3 3
      0 .2639372 3 347.85715   20   2 0 3 1
      0 .2639372 3  691.8571  114   1 0 3 2
      0 .2639372 3 205.42857  275   2 0 3 3
      0 .0703279 4 106.28571   44   1 1 3 3
      0 .2660267 2  336.7143  244   1 0 1 2
      0 .2660267 2  581.4286  106   1 0 1 1
      0 .2639372 3       738  112   1 0 3 3
      0 .2639372 3 245.85715 1109   2 1 3 1
      0 .2639372 3       393 1149  28 1 2 2
      0 .0600966 5  720.2857  191   1 0 3 1
      0 .2660267 2 247.42857   83   1 1 3 3
      0 .2660267 2  638.8571  100   2 1 3 3
      1 .2639372 3 506.85715 1113   3 0 2 1
      0 .2660267 2 290.14285 1100  93 0 3 1
      end
      label values htn htn
      label def htn 0 "0: No History of Hypertension", modify
      label def htn 1 "1: History of Hypertension", modify
      label values agecat agecat
      label def agecat 1 "1: < 25 years", modify
      label def agecat 2 "2: 25 - 44 years", modify
      label def agecat 3 "3: 45 - 64 years", modify
      label def agecat 4 "4: 65 - 74 years", modify
      label def agecat 5 "5: 75+ years", modify
      label values married married
      label def married 0 "0: Not Married", modify
      label def married 1 "1: Currently Married", modify
      label values poor2 poor2
      label def poor2 1 "1: Poor", modify
      label def poor2 2 "2: Near Poor", modify
      label def poor2 3 "3: Not Poor/Near Poor", modify
      label values edu_cat edu_cat
      label def edu_cat 1 "1: ≤ High School Graduate", modify
      label def edu_cat 2 "2: Some College", modify
      label def edu_cat 3 "3: ≥ Bachelors Degree", modify
      use the following to survey weight it:
      Code:
      svyset [pweight=wtfa5yr], strata(nstratum) psu(npsu)

      Comment

      Working...
      X