Hi everyone!
I am here posting some questions to see if I can get some help because I haven't been able to find answers to, despite everything posted on quantile regressions and related.
I begin with what I want to study: Given the increasing demand for organic foods, mainly due to health and environmental concerns, I want to study whether sociodemographic characteristics of households have different impact on those households with higher levels of organic/bio food products in their annual shopping basket compared to those with lower levels, and across which product groups these differences are observed. This info would be useful for designing target policies and strategies looking for promoting organic food consumption to certain sociodemographic groups and/or product types.
Now, I describe my data: I have a sample of roughly 1800 households observed annually over 4 years (2016-2019), with data about the share of ecological/bio food products in their annual shopping basket for several food groups (eg, fruits, meats, grains, dairy) and regions (eg, Region 1, Region 2, Region 3,...). The panel is unbalanced.
My endogenous variable is continuous but limited between 0 and 1, and it is the share of ecological/bio products in the annual shopping basket of households (ECOsh).
The explanatory variables are the sociodemographic characteristics of households, which are collected by categorical variables and a dummy variable. For example: age of shopper is defined as follows: < 30; 30 <= x < 60; >= 60; household members as 1-2, 3, 4+; activity status (employed/unemployed); number of children at home (0, 1, 2, 3+); annual income level (< 20000; 20000<= x < 30000; 30000<= x < 40000; 40000<= x < 50000; >= 50000). The preliminary statistical analysis point out clear heterogeneity of the data by each sociodemographic variable.
My questions are the following:
Best,
Hugo
I am here posting some questions to see if I can get some help because I haven't been able to find answers to, despite everything posted on quantile regressions and related.
I begin with what I want to study: Given the increasing demand for organic foods, mainly due to health and environmental concerns, I want to study whether sociodemographic characteristics of households have different impact on those households with higher levels of organic/bio food products in their annual shopping basket compared to those with lower levels, and across which product groups these differences are observed. This info would be useful for designing target policies and strategies looking for promoting organic food consumption to certain sociodemographic groups and/or product types.
Now, I describe my data: I have a sample of roughly 1800 households observed annually over 4 years (2016-2019), with data about the share of ecological/bio food products in their annual shopping basket for several food groups (eg, fruits, meats, grains, dairy) and regions (eg, Region 1, Region 2, Region 3,...). The panel is unbalanced.
My endogenous variable is continuous but limited between 0 and 1, and it is the share of ecological/bio products in the annual shopping basket of households (ECOsh).
The explanatory variables are the sociodemographic characteristics of households, which are collected by categorical variables and a dummy variable. For example: age of shopper is defined as follows: < 30; 30 <= x < 60; >= 60; household members as 1-2, 3, 4+; activity status (employed/unemployed); number of children at home (0, 1, 2, 3+); annual income level (< 20000; 20000<= x < 30000; 30000<= x < 40000; 40000<= x < 50000; >= 50000). The preliminary statistical analysis point out clear heterogeneity of the data by each sociodemographic variable.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(EXPsh id_foodgroup id_region id_activity id_hhsize id_incomelevel id_age id_numchild) int Año .3711231 1 4 1 1 1 2 1 2019 .8186947 2 5 1 3 2 1 1 2017 .50880945 2 5 1 2 3 3 0 2019 .3795943 4 1 1 2 2 3 1 2016 .1115448 5 5 1 2 2 1 1 2018 .0852778 5 7 2 2 3 1 0 2019 .3037719 5 7 1 1 2 2 1 2019 .02551874 3 4 1 3 1 2 3 2018 .123654897 3 6 2 1 2 2 1 2017 .066647395 4 3 2 1 1 1 1 2016 .004051853 5 3 1 3 3 1 2 2017
- Is it a panel quantile approach adequate for the purpose of the study? I looked up Hao and Naiman (2007) manual and saw an example of income and a categorical variable (education groups by years of schooling) and a dummy (reace group) variable in a cross section context, but found nothing in a panel context.
- Given that the analysis is conducted by food group, could the xtqreg approach be applied here? The number of years is reduced, only 4. Or, would it be more appropriate to conduct a qreg analysis year by year?
- if answer to number 2 is positive, then would it be necessary to apply a jackknife correction via boostrap?
- Given that the endogenous variable is a rate, would it be advisable to use it in logs rather than treat it as a rate?
- If the answer to question 2 is negative, and hence the quantile regression is not the best approach, what alternative model might work for this type of data to answer the main question? Note that the variability in the explanatory variables may be minimal along the sample period.
Best,
Hugo
Comment