Dear all, I have calculated a “Diversity Index” for a given population. Per the census website, the DI: “the DI tells us the chance that two people chosen at random will be from different racial and ethnic groups….The DI is bounded between 0 and 1, with a zero-value indicating that everyone in the population has the same racial and ethnic characteristics, while a value close to 1 indicates that everyone in the population has different characteristics.” (I put the full equation below)
I’m running a fractional logit model with the DI as the dependent variable and year as the independent variable. I’d like to plot the trend line and 95% CIs around the trend. Here is my code. First, I use dataex to show the data; then I show the model with continuous year; then the model with categorical year.
Questions:
1) any obvious problems with this approach? In particular, I wasn’t sure if I need to make any adjustments to the fractional logit code for the fact this the same group of individuals over time, or maybe use a different approach to fractional logit.
2) better to include year as c.year or i.year? The plots look quite different.
I am using Stata 14.
Thank you!!!
***********************************OPTION 2: WITH CATEGORICAL YEAR:
----------------------------------------------------------------------------------------------------------
FYI, DIVERSITY INDEX EQUATION BELOW:
Diversity Index Equation
DI = 1 – (H² + W² + B² + AIAN² + Asian² + NHPI² + SOR² + Multi²)
H is the proportion of the population who are Hispanic or Latino.
W is the proportion of the population who are White alone, not Hispanic or Latino.
B is the proportion of the population who are Black or African American alone, not Hispanic or Latino.
AIAN is the proportion of the population who are American Indian and Alaska Native alone, not Hispanic or Latino.
Asian is the proportion of the population who are Asian alone, not Hispanic or Latino.
NHPI is the proportion of the population who are Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino.
SOR is the proportion of the population who are Some Other Race alone, not Hispanic or Latino.
MULTI is the proportion of the population who are Two or More Races, not Hispanic or Latino.
Source: https://www.census.gov/library/visua...20-census.html
I’m running a fractional logit model with the DI as the dependent variable and year as the independent variable. I’d like to plot the trend line and 95% CIs around the trend. Here is my code. First, I use dataex to show the data; then I show the model with continuous year; then the model with categorical year.
Questions:
1) any obvious problems with this approach? In particular, I wasn’t sure if I need to make any adjustments to the fractional logit code for the fact this the same group of individuals over time, or maybe use a different approach to fractional logit.
2) better to include year as c.year or i.year? The plots look quite different.
I am using Stata 14.
Thank you!!!
Code:
******************************DATA dataex di_rev year_r ----------------------- copy starting from the next line --------------------- > -- * Example generated by -dataex-. To install: ssc install dataex clear input float(di_rev year_r) .34123 1 .35147 2 .36345 3 .37255 4 .39094 5 .39714 6 .39895 7 end ------------------ copy up to and including the previous line ---------------- > -- Listed 7 out of 7 observations
Code:
************************************OPTION 1: WITH CONTINUOUS YEAR . fracreg logit di_rev c.year_r Iteration 0: log pseudolikelihood = -5.3012582 Iteration 1: log pseudolikelihood = -4.6198733 Iteration 2: log pseudolikelihood = -4.6196722 Iteration 3: log pseudolikelihood = -4.6196722 Fractional logistic regression Number of obs = 7 Wald chi2(1) = 163.74 Prob > chi2 = 0.0000 Log pseudolikelihood = -4.6196722 Pseudo R2 = 0.0014 ------------------------------------------------------------------------------ | Robust di_rev | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- year_r | .0446101 .0034862 12.80 0.000 .0377772 .051443 _cons | -.6959255 .0097576 -71.32 0.000 -.7150501 -.6768008 ------------------------------------------------------------------------------ . quietly margins, at(year_r=(1(1)7)) . marginsplot Variables that uniquely identify margins: year_r
***********************************OPTION 2: WITH CATEGORICAL YEAR:
Code:
. . fracreg logit di_rev i.year_r note: 7.year_r omitted because of collinearity Iteration 0: log pseudolikelihood = -5.3011755 Iteration 1: log pseudolikelihood = -4.6196655 Iteration 2: log pseudolikelihood = -4.6194615 Iteration 3: log pseudolikelihood = -4.6194615 Fractional logistic regression Number of obs = 7 Wald chi2(0) = . Prob > chi2 = . Log pseudolikelihood = -4.6194615 Pseudo R2 = 0.0015 ------------------------------------------------------------------------------ | Robust di_rev | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- year_r | 2 | .0452339 1.05e-11 4.3e+09 0.000 .0452339 .0452339 3 | .0973966 6.01e-11 1.6e+09 0.000 .0973966 .0973966 4 | .136525 1.20e-10 1.1e+09 0.000 .136525 .136525 5 | .2144551 2.28e-10 9.4e+08 0.000 .2144551 .2144551 6 | .2404216 2.45e-10 9.8e+08 0.000 .2404216 .2404216 7 | .2479757 2.47e-10 1.0e+09 0.000 .2479757 .2479757 | _cons | -.6578177 1.96e-13 -3.4e+12 0.000 -.6578177 -.6578177 ------------------------------------------------------------------------------ . quietly margins i.year_r . marginsplot Variables that uniquely identify margins: year_r
----------------------------------------------------------------------------------------------------------
FYI, DIVERSITY INDEX EQUATION BELOW:
Diversity Index Equation
DI = 1 – (H² + W² + B² + AIAN² + Asian² + NHPI² + SOR² + Multi²)
H is the proportion of the population who are Hispanic or Latino.
W is the proportion of the population who are White alone, not Hispanic or Latino.
B is the proportion of the population who are Black or African American alone, not Hispanic or Latino.
AIAN is the proportion of the population who are American Indian and Alaska Native alone, not Hispanic or Latino.
Asian is the proportion of the population who are Asian alone, not Hispanic or Latino.
NHPI is the proportion of the population who are Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino.
SOR is the proportion of the population who are Some Other Race alone, not Hispanic or Latino.
MULTI is the proportion of the population who are Two or More Races, not Hispanic or Latino.
Source: https://www.census.gov/library/visua...20-census.html
Comment