Model selection with age-standardization of survey data

Ruth-Alma Turkson-Ocran

Join Date: Mar 2018

Posts: 31
#1

Model selection with age-standardization of survey data

23 Nov 2019, 02:38

Dear Statalisters,

Please forgive my ignorance, but is there a way to determine the best model e.g. AIC or determine a gof via a test . This is for survey data that is reporting on age-standardized rates. I cannot use the dstdize or istdize syntax because it is survey data and I cannot seem to figure out a way to do it with a code like this:

Code:

svy: mean outcome, stdize(agecat) stdweight(std_wgt)

Last edited by Ruth-Alma Turkson-Ocran; 23 Nov 2019, 02:46.
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

25 Nov 2019, 09:14

You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. I'm not even sure what you've estimated.
Comment

Ruth-Alma Turkson-Ocran

Join Date: Mar 2018
Posts: 31

25 Nov 2019, 15:55

Thank you Phil. I will provide more clarification:

How do you select the best model (e.g. equivalent of a GoF [Wald] or AIC/BIC) with a model examining the prevalence, of hypertension for example, using age-standardization and survey data (e.g. NHANES/NHIS) in Stata?

To further clarify, per NHANES guidelines, I used:

Code:

 svy: mean outcome, stdize(agecat) stdweight(std_wgt)

to find the prevalence of hypertension, then based on literature and Table1 significance, I adjusted for other factors like marital status, education, income, etc. The estimates were not different, so I am choosing to go with the unadjusted model. However, statistically, how do you know that you have selected the best fit model. As far as I know, the

Code:

svy: mean

syntax does not work with

Code:

estat gof

Code:

estat ic

to obtain AIC/BIC figures. I am thinking that given that it is survey data and that I used age-standardization, that perhaps are there may be some principles that may not be accounted for or which do not translate well for determining the best model, but I am unsure about this.

Given all these, how does one correctly specify the right model in these circumstances? How do I know that my model is correct? and if it can be done, how do I do that in Stata?

Also, here is a sample of the data using dataex:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(htn std_wgt agecat wtfa7yr nstratum npsu married poor2 edu_cat)
0 .2660267 2  587.5714  230   2 1 3 3
1 .0600966 5 1043.8572  177   1 0 2 3
1 .2639372 3       835  231   2 0 1 2
0 .2660267 2  52.14286  114   1 1 3 2
0 .3396116 1 1159.2858   78   1 0 3 1
1 .2639372 3 1312.5714   45   2 0 3 1
1 .0600966 5  423.2857 1122  14 1 3 3
0 .2639372 3  729.5714 1117  34 1 3 3
1 .3396116 1 350.85715  138   1 1 3 2
1 .0703279 4  379.5714  275   2 0 3 1
0 .0703279 4 198.14285  281   1 0 3 1
1 .2660267 2 379.14285   67   2 0 3 1
0 .0703279 4  958.7143  141   1 1 3 3
1 .2639372 3 347.14285  165   2 1 2 1
0 .2660267 2 318.85715   54   1 1 3 3
0 .2660267 2  540.8571   64   1 1 3 2
1 .0703279 4  653.7143   24   1 1 3 1
0 .3396116 1  353.5714   82   1 0 3 3
1 .0703279 4 115.28571  111   1 1 3 1
0 .3396116 1  271.7143   20   1 0 3 3
0 .3396116 1 100.14286  289   1 1 2 1
0 .2639372 3  785.8571   38   2 1 3 3
1 .0600966 5  769.1429  125   1 0 3 1
0 .2639372 3  734.8571   26   2 0 3 3
0 .2660267 2  302.7143  272   2 0 3 2
0 .2660267 2  416.2857   82   2 0 2 3
0 .2639372 3  39.71429 1105 150 1 2 3
0 .2639372 3        93  124   1 0 1 3
0 .2639372 3  805.8571  108   1 1 2 2
0 .2660267 2 260.14285  208   2 0 1 2
0 .2660267 2 443.85715  294   2 0 3 2
1 .2639372 3  996.2857  105   2 0 3 2
0 .2660267 2  178.7143  150   2 0 3 3
0 .2639372 3 28.857143  114   1 0 2 2
0 .0600966 5       458  196   1 0 3 1
0 .2660267 2  548.8571  271   1 1 3 3
0 .2639372 3 347.85715   20   2 0 3 1
0 .2639372 3  691.8571  114   1 0 3 2
0 .2639372 3 205.42857  275   2 0 3 3
0 .0703279 4 106.28571   44   1 1 3 3
0 .2660267 2  336.7143  244   1 0 1 2
0 .2660267 2  581.4286  106   1 0 1 1
0 .2639372 3       738  112   1 0 3 3
0 .2639372 3 245.85715 1109   2 1 3 1
0 .2639372 3       393 1149  28 1 2 2
0 .0600966 5  720.2857  191   1 0 3 1
0 .2660267 2 247.42857   83   1 1 3 3
0 .2660267 2  638.8571  100   2 1 3 3
1 .2639372 3 506.85715 1113   3 0 2 1
0 .2660267 2 290.14285 1100  93 0 3 1
end
label values htn htn
label def htn 0 "0: No History of Hypertension", modify
label def htn 1 "1: History of Hypertension", modify
label values agecat agecat
label def agecat 1 "1: < 25 years", modify
label def agecat 2 "2: 25 - 44 years", modify
label def agecat 3 "3: 45 - 64 years", modify
label def agecat 4 "4: 65 - 74 years", modify
label def agecat 5 "5: 75+ years", modify
label values married married
label def married 0 "0: Not Married", modify
label def married 1 "1: Currently Married", modify
label values poor2 poor2
label def poor2 1 "1: Poor", modify
label def poor2 2 "2: Near Poor", modify
label def poor2 3 "3: Not Poor/Near Poor", modify
label values edu_cat edu_cat
label def edu_cat 1 "1: ≤ High School Graduate", modify
label def edu_cat 2 "2: Some College", modify
label def edu_cat 3 "3: ≥ Bachelors Degree", modify

use the following to survey weight it:

Code:

svyset [pweight=wtfa5yr], strata(nstratum) psu(npsu)

Announcement

Model selection with age-standardization of survey data

Comment

Comment