Dear Statalist members,
I am working with food consumption data. Each participant describes every food that he/she consumed during two consecutive days, "int_num" is coded 21 for day 1 and 22 for day 2.
Each food is entered as a new row in the database and is categorised by a food group, which appears as the variable "group". There are 18 groups, coded from 1 to 18. A quantity ("cons_qty") is also associated with every food consumed.
As for now, if a food group was not consumed, there is no row in the database with an associated 0 value for "cons_qty", it simply doesn't exist.
Here is an example of my dataset, participant 1 is from lines 1-70, participant 2 lines 70-100:
I want to describe mean consumptions of the different food groups, in different subsets: 1) only the participants that consumed that food group and 2) all participants taking into account also those who did not consume that food group.
I have generated new variables using the command tag: "tag_people" to identify uniquely one participant, "tag_recall" one day and "tag_group" one group (see example dataset). I am working with the average consumption per food group per participant over the two days, the variable "mean_qty_per_group_per_person".
As participants who did not consume the food do not appear there, I face some issues to generate mean consumptions for all participants. I've tried using a loop where I manually enter the total number of participants to obtain the mean:
This works but if there is a change in the database this could lead to errors. Plus, I would need data concerning participants who did not consume a food group for further statistical analysis.
My idea was then to create a new row for each group that was not consumed and to allocate a value of 0 for "cons_qty". The conditions would be to identify for a given "patient_id" if there is a number missing between 1 and 18 for the variable "group". If a number is missing, create a new row with the patient_id, indicate the missing number for the variable "group" and generate a value of 0 for the variable "cons_qty". For example, for participant 1 in the dataset above, the group 3 was not consumed. I would like to create a new row with patient_id= 01, group= 3 and cons_qty=0.
I've checked Statalist and found this interesting post: https://www.statalist.org/forums/for...w-observations, but I did not manage to translate it to my case and have also looked the expand command.
Would you have any suggestions on how to implement the above? or other approaches that would work?
Thank you,
Constance
I am working with food consumption data. Each participant describes every food that he/she consumed during two consecutive days, "int_num" is coded 21 for day 1 and 22 for day 2.
Each food is entered as a new row in the database and is categorised by a food group, which appears as the variable "group". There are 18 groups, coded from 1 to 18. A quantity ("cons_qty") is also associated with every food consumed.
As for now, if a food group was not consumed, there is no row in the database with an associated 0 value for "cons_qty", it simply doesn't exist.
Here is an example of my dataset, participant 1 is from lines 1-70, participant 2 lines 70-100:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte(int_num tag_recall tag_people tag_number_recalls_consumed) float number_recalls_consumed byte group float(cons_qty qty_per_group_per_rec mean_qty_per_group_per_person) 21 0 0 1 2 1 114 114 179.5 22 0 0 0 2 1 245 245 179.5 21 0 0 0 . 2 62 . . 22 0 0 0 . 2 23.179415 . . 21 0 0 1 2 2 24.886913 243.4164 321.4293 22 0 0 0 . 2 .25754905 . . 22 0 0 0 . 2 42.92484 . . 21 0 0 0 . 2 40 . . 22 0 0 0 . 2 18 . . 22 0 0 0 . 2 200 . . 22 0 0 0 2 2 9.954765 399.4421 321.4293 22 0 0 0 . 2 8.71042 . . 22 1 0 0 . 2 9.577605 . . 22 0 0 0 . 2 29.864294 . . 22 0 0 0 . 2 24.886913 . . 21 0 0 0 . 2 9.954765 . . 21 0 0 0 . 2 68 . . 21 0 0 0 . 2 29.864294 . . 21 1 1 0 . 2 8.71042 . . 22 0 0 0 . 2 9.65809 . . 22 0 0 0 . 2 22.42823 . . 22 0 0 0 2 4 117 373.5 373.5 21 0 0 0 . 4 100.5 . . 22 0 0 0 . 4 156 . . 22 0 0 0 . 4 100.5 . . 21 0 0 0 . 4 156 . . 21 0 0 1 2 4 117 373.5 373.5 22 0 0 0 . 5 68.67975 . . 21 0 0 0 . 5 30 . . 21 0 0 0 . 5 150 . . 22 0 0 0 . 5 14.487134 . . 22 0 0 0 . 5 150 . . 22 0 0 0 2 5 150 383.1669 356.5834 21 0 0 1 2 5 150 330 356.5834 22 0 0 0 . 6 9.65809 . . 22 0 0 0 . 6 98.72713 . . 22 0 0 0 . 6 20.5 . . 21 0 0 0 . 6 60 . . 21 0 0 0 . 6 49.77383 . . 22 0 0 0 2 6 49.77383 178.65906 154.46645 21 0 0 1 2 6 20.5 130.27383 154.46645 21 0 0 0 . 7 24.886913 . . 21 0 0 0 . 7 40 . . 22 0 0 0 . 7 150 . . 21 0 0 1 2 7 75 239.8869 235.28806 22 0 0 0 . 7 24.886913 . . 22 0 0 0 2 7 55.80229 230.6892 235.28806 21 0 0 0 . 7 100 . . 21 0 0 1 1 9 51.92 51.92 25.96 22 0 0 0 . 10 3.219363 . . 22 0 0 0 2 10 6.438726 9.658089 15.089045 21 0 0 1 2 10 20.52 20.52 15.089045 21 0 0 1 1 12 75 75 37.5 22 0 0 0 2 13 3200 3200 3250 21 0 0 1 2 13 3300 3300 3250 21 0 0 0 . 15 20 . . 22 0 0 0 . 15 .04292484 . . 21 0 0 0 . 15 5.972859 . . 22 0 0 0 . 15 5.972859 . . 22 0 0 0 . 15 .04292484 . . 21 0 0 1 2 15 4.75 167.92287 94.12877 22 0 0 0 2 15 .53656054 20.334673 94.12877 22 0 0 0 . 15 11.4 . . 21 0 0 0 . 15 13.2 . . 22 0 0 0 . 15 .8584968 . . 22 0 0 0 . 15 .53656054 . . 22 0 0 0 . 15 .04292484 . . 22 0 0 0 . 15 .8584968 . . 21 0 0 0 . 15 124 . . 22 0 0 0 . 15 .04292484 . . 22 0 0 1 1 1 160 160 80 22 0 0 0 . 2 3.356522 . . 22 0 0 1 1 2 16.782608 20.13913 10.069565 21 0 0 1 2 4 75 75 161.7 22 0 0 0 2 4 248.4 248.4 161.7 22 0 0 0 2 5 8.391304 8.391304 66.69565 21 0 0 1 2 5 125 125 66.69565 22 0 0 0 . 6 59.598 . . 22 0 0 0 2 6 58.05 182.26105 140.33052 21 0 0 1 2 6 98.4 98.4 140.33052 22 0 0 0 . 6 64.613045 . . 22 0 0 0 2 7 52.86522 52.86522 126.4326 21 0 0 1 2 7 200 200 126.4326 22 0 0 0 . 10 30.208696 . . 22 0 0 0 2 10 12.8 43.0087 22.25435 21 0 0 1 2 10 1.5 1.5 22.25435 22 0 0 0 . 11 40 . . 22 0 0 0 2 11 46 86 67 21 0 0 1 2 11 48 48 67 22 0 0 1 1 12 80 80 40 22 0 0 0 2 13 75 2087.5 2037.5 21 0 0 0 . 13 600 . . 22 0 0 0 . 13 600 . . 21 1 1 0 . 13 300 . . 21 0 0 0 . 13 75 . . 22 0 0 0 . 13 300 . . 21 0 0 0 . 13 37.5 . . 21 0 0 0 . 13 900 . . 22 0 0 0 . 13 300 . . 22 0 0 0 . 13 300 . . end
I have generated new variables using the command tag: "tag_people" to identify uniquely one participant, "tag_recall" one day and "tag_group" one group (see example dataset). I am working with the average consumption per food group per participant over the two days, the variable "mean_qty_per_group_per_person".
As participants who did not consume the food do not appear there, I face some issues to generate mean consumptions for all participants. I've tried using a loop where I manually enter the total number of participants to obtain the mean:
Code:
foreach var of local food_pattern { capt drop `var' gen `var'=0 replace `var'= mean_qty_per_group_per_person if group==`i' & tag_number_recalls_consumed==1 capt drop meanqty_`var' egen meanqty_`var' = total(`var'/123--> the total number of participants) local i=`i'+1 }
My idea was then to create a new row for each group that was not consumed and to allocate a value of 0 for "cons_qty". The conditions would be to identify for a given "patient_id" if there is a number missing between 1 and 18 for the variable "group". If a number is missing, create a new row with the patient_id, indicate the missing number for the variable "group" and generate a value of 0 for the variable "cons_qty". For example, for participant 1 in the dataset above, the group 3 was not consumed. I would like to create a new row with patient_id= 01, group= 3 and cons_qty=0.
I've checked Statalist and found this interesting post: https://www.statalist.org/forums/for...w-observations, but I did not manage to translate it to my case and have also looked the expand command.
Would you have any suggestions on how to implement the above? or other approaches that would work?
Thank you,
Constance
Comment