Comparing means of subgroups

Alex McIntosh

Join Date: Mar 2022
Posts: 17

Comparing means of subgroups

01 Apr 2022, 15:17

Hi all,

I am trying to build logits to measure gender employment gaps among lone parents in Canada in Feb 2020, in the Labour Force Survey data. I've created a dummy variable for employment (1=employed, or absent; 0=unemployed or not in labour force), and want to measure the difference (gap) in mean employment rates between subgroups.

What syntax can I use to capture the difference in proportions of employment between male and female lone parents? How can I compare this gender gap among lone parents, e.g., within two sets of (ordinal) sub-groups: 1) older or younger child (<6 or 6-12) and 2) education?

Please below an excerpt from dataex:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float lfs byte sex float(loneyg edu)
0 1 . 1
1 1 . 1
1 1 . 0
1 1 . 2
1 1 . 1
0 1 . 2
1 1 . 0
1 1 . 1
1 1 . 1
1 1 . 0
1 1 . 1
1 1 . 2
1 1 . 1
1 1 . 2
1 1 . 2
0 1 . 0
1 1 . 0
1 1 . 0
1 1 . 1
1 1 . 0
1 1 . 1
1 1 . 2
1 1 . 2
0 1 . 1
1 1 . 1
0 1 . 0
1 1 . 1
1 1 . 0
1 1 . 1
1 1 . 2
1 1 . 1
0 1 . 1
1 1 . 1
1 1 . 1
1 1 . 0
1 1 . 0
1 1 . 1
1 1 . 1
1 1 . 1
0 1 . 0
1 1 . 0
1 1 . 2
1 1 . 2
1 1 . 1
1 1 . 0
1 1 . 0
0 1 . 0
1 1 . 1
1 1 . 1
0 1 . 0
1 1 . 1
1 1 . 0
1 1 . 2
1 1 . 0
1 1 . 1
1 1 . 0
1 1 . 1
1 1 . 1
1 1 . 0
1 1 . 1
1 1 1 1
1 1 . 2
1 1 . 0
0 1 . 0
1 1 . 1
0 1 . 0
1 1 . 2
1 1 . 1
1 1 . 1
1 1 . 1
1 1 . 1
1 1 . 2
1 1 . 2
1 1 . 0
1 1 . 2
1 1 . 2
1 1 . 1
1 1 . 1
1 1 . 1
0 1 . 0
1 1 . 2
1 1 . 2
1 1 . 0
1 1 . 1
0 1 . 1
1 1 . 1
1 1 . 1
1 1 . 0
1 1 . 2
1 1 . 1
1 1 . 1
1 1 . 1
1 1 . 2
1 1 . 0
0 1 . 2
1 1 . 0
1 1 . 2
1 1 . 2
1 1 . 1
1 1 . 0
end
label values lfs lfs
label def lfs 0 "not", modify
label def lfs 1 "Employed", modify
label values sex SEX
label def SEX 1 "Male", modify
label values loneyg loneyg
label def loneyg 1 "Lone parents, yg child", modify
label values edu edu
label def edu 0 "(<)HS", modify
label def edu 1 "some uni/college deg/trades", modify
label def edu 2 "BA degree+", modify

Last edited by Alex McIntosh; 01 Apr 2022, 16:09. Reason: Edited to exclude mention of analysis "over time"

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

01 Apr 2022, 15:53

Well, your example data is not suitable for this kind of analysis. You have no variable indicating children's age, and all of the participants are male. There is also no time variable, so there is no way to speak of what is happening "over time." On the assumption that these problems do not plague your full data set, and calling the variable indicating children's age children_age, you would do something like this:

Code:

logistic lfs i.sex##i.edu##i.children_age) if loneyg == 1 margins edu#children_age, dydx(sex)

Here I do not deal with the "over time" aspect of your question because I don't want to try to guess what you have in mind in the absence of better information.

As an aside, you have this variable loneyg which is coded 1/missing. While this is often useful and commonly done in spreadsheets, this is a setup for errors in Stata. Dichotomous variables should almost always be coded 1 = true, 0 = false. Before you go astray, I suggest you recode loneyg accordingly.
1 like
Comment
Alex McIntosh

Join Date: Mar 2022

Posts: 17
#3

01 Apr 2022, 16:43

I'm sorry about my lack of proficiency with dataex. I'm clearly a novice in many aspects of Stata.

I was originally trying to use these logits with an appended dataset (Feb to May 2020, with a variable "survmnth" for month), but I realized I first have to build meaningful logits for any given month, and so am trying to start a bit simpler. (I'm also perusing Chapter 18 on programming, in the documentation).

The variable "loneyg" is meant to capture if the child of a lone parent is 0=<6 years old, or 1=6-12. So it is 0/1, but I'm not sure if this is a misuse of a dichotomous variable. In any case, this group is only n=1,887 for Feb, in a sample of n=45,708, so most cases are "missing" in my bungled dataex. In the analytical sample I'm trying to build (from the full, appended dataset) they are 6,814 / 168,792.

When I use loneyg in the code suggested (with the appended dataset, from February to May, i.e. "2" to "5" in the variable survmnth)

Code:

logistic lfs i.sex##i.edu##i.loneyg margins edu#loneyg, dydx(sex) marginsplot

I get:

I am wondering what syntax I need to stratify these marginal effects of sex by subgroups (like education, younger or older child), but with the x-axis corresponding to survmnth?
Comment

Announcement

Comparing means of subgroups

Comment

Comment