Suppose that in a study of the effects of radiation individuals are classified as dead or not dead (stage 1).
The nature of the study requires that deaths are classified as "due to cancer" or "not due to cancer". (stage 2)
Death from cancer can either be "leukemia deaths" or "deaths from other cancers." (stage 3)
The 4 mutually exclusive groups are (also depicted in the figure attached from McCullagh GLM book, pg 161):
1. alive
2. death from causes other than cancer
3. death from cancers other than leukemia
4. death from leukemia
Say that age and gender are two factors we are interested in examining in each stage. My [incorrect] approach using nested logistic approach is the following (using a simulated example, data attached) is shown below.
What is the correct syntax for fitting this model? Thanks
__________________________________________________ ____
// viz data
. tab choice_c4
choice_c4 | Freq. Percent Cum.
------------+-----------------------------------
0 | 18 18.00 18.00 (alive)
1 | 33 33.00 51.00 (death from causes other than cancer)
2 | 33 33.00 84.00 (death from cancers other than leukemia )
3 | 16 16.00 100.00 (death from leukemia)
------------+-----------------------------------
Total | 100 100.00
. list id choice_c4 age sex_c2 in f/5,noobs
+-----------------------------------+
| id choice~4 age sex_c2 |
|-----------------------------------|
| 1 3 27.66556 1 |
| 2 2 45.42451 0 |
| 3 1 53.2578 0 |
| 4 3 41.86436 1 |
| 5 2 35.91204 1 |
+-----------------------------------+
.
. // create choice indicators
. generate choice0 = (choice_c4 == 0)
. generate choice1 = (choice_c4 == 1)
. generate choice2 = (choice_c4 == 2)
. generate choice3 = (choice_c4 == 3)
.
. // format to long
. reshape long choice, i(id) j(myclass)
(j = 0 1 2 3)
Data Wide -> Long
-------------------------------------------------------------------
> ----------
Number of observations 100 -> 400
Number of variables 8 -> 6
j variable (4 values) -> myclass
xij variables:
choice0 choice1 ... choice3 -> choice
-------------------------------------------------------------------
> ----------
. drop choice_c4
. list id myclass age sex_c2 choice in f/10,noobs
+-------------------------------------------+
| id myclass age sex_c2 choice |
|-------------------------------------------|
| 1 0 27.66556 1 0 |
| 1 1 27.66556 1 0 |
| 1 2 27.66556 1 0 |
| 1 3 27.66556 1 1 |
| 2 0 45.42451 0 0 |
|-------------------------------------------|
| 2 1 45.42451 0 0 |
| 2 2 45.42451 0 1 |
| 2 3 45.42451 0 0 |
| 3 0 53.2578 0 0 |
| 3 1 53.2578 0 1 |
+-------------------------------------------+
.
. // we will produce the tree architecture for the nested logistic
> regression
.
. nlogitgen top = myclass(A: 0, BCD: 1|2|3)
New variable top is generated with 2 groups
label list lb_top
lb_top:
1 A
2 BCD
. nlogitgen middle = myclass(B: 1, CD: 2|3)
New variable middle is generated with 2 groups
label list lb_middle
lb_middle:
1 B
2 CD
. nlogitgen bottom = myclass(C: 2, D: 3)
New variable bottom is generated with 2 groups
label list lb_bottom
lb_bottom:
1 C
2 D
.
. // run the nested logistic regression model
. nlogit choice age sex_c2 || top: || middle: || bottom:, case(id)
no cases remain after removing invalid observations
r(2000);
end of do-file
r(2000);
The nature of the study requires that deaths are classified as "due to cancer" or "not due to cancer". (stage 2)
Death from cancer can either be "leukemia deaths" or "deaths from other cancers." (stage 3)
The 4 mutually exclusive groups are (also depicted in the figure attached from McCullagh GLM book, pg 161):
1. alive
2. death from causes other than cancer
3. death from cancers other than leukemia
4. death from leukemia
Say that age and gender are two factors we are interested in examining in each stage. My [incorrect] approach using nested logistic approach is the following (using a simulated example, data attached) is shown below.
What is the correct syntax for fitting this model? Thanks
__________________________________________________ ____
// viz data
. tab choice_c4
choice_c4 | Freq. Percent Cum.
------------+-----------------------------------
0 | 18 18.00 18.00 (alive)
1 | 33 33.00 51.00 (death from causes other than cancer)
2 | 33 33.00 84.00 (death from cancers other than leukemia )
3 | 16 16.00 100.00 (death from leukemia)
------------+-----------------------------------
Total | 100 100.00
. list id choice_c4 age sex_c2 in f/5,noobs
+-----------------------------------+
| id choice~4 age sex_c2 |
|-----------------------------------|
| 1 3 27.66556 1 |
| 2 2 45.42451 0 |
| 3 1 53.2578 0 |
| 4 3 41.86436 1 |
| 5 2 35.91204 1 |
+-----------------------------------+
.
. // create choice indicators
. generate choice0 = (choice_c4 == 0)
. generate choice1 = (choice_c4 == 1)
. generate choice2 = (choice_c4 == 2)
. generate choice3 = (choice_c4 == 3)
.
. // format to long
. reshape long choice, i(id) j(myclass)
(j = 0 1 2 3)
Data Wide -> Long
-------------------------------------------------------------------
> ----------
Number of observations 100 -> 400
Number of variables 8 -> 6
j variable (4 values) -> myclass
xij variables:
choice0 choice1 ... choice3 -> choice
-------------------------------------------------------------------
> ----------
. drop choice_c4
. list id myclass age sex_c2 choice in f/10,noobs
+-------------------------------------------+
| id myclass age sex_c2 choice |
|-------------------------------------------|
| 1 0 27.66556 1 0 |
| 1 1 27.66556 1 0 |
| 1 2 27.66556 1 0 |
| 1 3 27.66556 1 1 |
| 2 0 45.42451 0 0 |
|-------------------------------------------|
| 2 1 45.42451 0 0 |
| 2 2 45.42451 0 1 |
| 2 3 45.42451 0 0 |
| 3 0 53.2578 0 0 |
| 3 1 53.2578 0 1 |
+-------------------------------------------+
.
. // we will produce the tree architecture for the nested logistic
> regression
.
. nlogitgen top = myclass(A: 0, BCD: 1|2|3)
New variable top is generated with 2 groups
label list lb_top
lb_top:
1 A
2 BCD
. nlogitgen middle = myclass(B: 1, CD: 2|3)
New variable middle is generated with 2 groups
label list lb_middle
lb_middle:
1 B
2 CD
. nlogitgen bottom = myclass(C: 2, D: 3)
New variable bottom is generated with 2 groups
label list lb_bottom
lb_bottom:
1 C
2 D
.
. // run the nested logistic regression model
. nlogit choice age sex_c2 || top: || middle: || bottom:, case(id)
no cases remain after removing invalid observations
r(2000);
end of do-file
r(2000);
Comment