Graphical inspection of a categorical and a continous variable

Julian Pritsch

Join Date: Apr 2014

Posts: 80
#1

Graphical inspection of a categorical and a continous variable

08 Apr 2014, 10:17

Dear Stata-forum,

I am using Stata 13.1 for Windows 7.

I have a question regarding a graphical solution. I have answers to a question which is represented by an ordinal variable 0-3. Additionally, I have a z-standardized variable. How can I find out graphically how these two are related? In theory, most of the cases which score high (+1SD/+2SD) on my contionous variabel should score high (2/3) on my categorical variable.

Is there a way of showing that with a graph?

Thanks,
Julian
Tags: categorical, continous, graphics
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

08 Apr 2014, 16:38

This is not the forum for substantive statistical questions, but for trying out the new Forum tools. If that's what you are doing, ignore this post. Otherwise re-post in the "General" forum.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#3

05 Jul 2017, 23:12

Without a reference to the specific LQAS manual, it's difficult to know what the local consultant did. However it is apparent that the consultant ignored the multistage character of the design.

What is not clear is whether your target population is 1) all people in the 21 districts served by the provider organizations or 2) all people in the 21 districts who might meet the criteria for service by a PO.

To address your questions

1. svyset statement
.As only one sub-county is selected per district, you are right to omit a sub-county stage. You have also omitted a respondent stage, which would ordinarily mean that all respondents are analyzed (fpc = 1). I would modify the statement to:

Code:

svyset District [pweight=sampweights], fpc(fpc1) || PO (fpc2) || _n

Technically , this is valid only for a design in which respondents were randomly selected with replacement. This, of course, is not true, but it is a way to avoid the assumption that fpc = 1 at the respondent stage.

2. Sampling weight

Add a factor for number of respondents in each PO to your formula. This treats the response rate in the PO as a sampling rate, so can be considered a non-response adjustment.

sampweight = (21/6)*(#of sub-counties in discrict/1)*(#of POs in sub-county/3)*( #members in PO/# responding in PO)

3. finite population corrections
fpc1 = 6/21
fpc2 for a PO = 3/(number of POs in the selected sub-county.

Equivalently
fpc1 = 21
fpc2 for the selected sub-county = (number of POs in the selected sub-county.

.

4. Post-stratification

I'm not sure what you meant by postweights to "account for non-response across districts". There is potentially different response rate in each PO. The last term in the sampling weight definitions automatically corrects for differential response. (It does not remove bias caused by differences between responders and non-responders.)

I think what you are seeking is how to use external information so that the sample better represents the population. When this is done to, say, match the age distribution of the sample to that of the population, you use the poststratum option in svyset. However you may have external information on several factors for all people served by a PO in all 21 districts, for example:

• male-female percentages
* whether the PO serves a rural area or a more urban area
• the number of people served by each PO (this can be a rough count)
• whether a district is "large" or "small"

These last two are particularly important. If there are a few "large" districts or "large" POs in a subcounty, a simple random sample is apt to miss them. The preferred method for sampling units of different sizes is sampling with probability proportional to size (PPS).

If the weighted sample distributions for these factors differs much from the external information, you can try to apply post-stratification techniques. For a single classification, you can use the poststratum option of svyset, as mentioned.. To simultaneously post-stratify on several factors, use ipfweight by Michael Bergmann or survwgt rake by Nick Winter, both at SSC. John D'Souza's calibrate (followed by calibest) (SSC) can control for difference in sample and population means of quantitative characteristics.

Reference:

Battaglia, M. P., Hoaglin, D. C., & Frankel, M. R. (2013). Practical considerations in raking survey data. Survey Practice, 2(5).
(This illustrates method of ipfweight and survwgt rake)

http://www.surveypractice.org/index..../view/176/html

and an earlier version with examples at
http://www.abtassoc.net/presentation...data_2_JOS.pdf

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement

Graphical inspection of a categorical and a continous variable

Comment

Comment