Creating new rows in a database

Constance Legay

Join Date: Oct 2020
Posts: 11

Creating new rows in a database

09 Nov 2020, 10:04

Dear Statalist members,

I am working with food consumption data. Each participant describes every food that he/she consumed during two consecutive days, "int_num" is coded 21 for day 1 and 22 for day 2.

Each food is entered as a new row in the database and is categorised by a food group, which appears as the variable "group". There are 18 groups, coded from 1 to 18. A quantity ("cons_qty") is also associated with every food consumed.

As for now, if a food group was not consumed, there is no row in the database with an associated 0 value for "cons_qty", it simply doesn't exist.

Here is an example of my dataset, participant 1 is from lines 1-70, participant 2 lines 70-100:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(int_num tag_recall tag_people tag_number_recalls_consumed) float number_recalls_consumed byte group float(cons_qty qty_per_group_per_rec mean_qty_per_group_per_person)
21 0 0 1 2  1       114       114     179.5
22 0 0 0 2  1       245       245     179.5
21 0 0 0 .  2        62         .         .
22 0 0 0 .  2 23.179415         .         .
21 0 0 1 2  2 24.886913  243.4164  321.4293
22 0 0 0 .  2 .25754905         .         .
22 0 0 0 .  2  42.92484         .         .
21 0 0 0 .  2        40         .         .
22 0 0 0 .  2        18         .         .
22 0 0 0 .  2       200         .         .
22 0 0 0 2  2  9.954765  399.4421  321.4293
22 0 0 0 .  2   8.71042         .         .
22 1 0 0 .  2  9.577605         .         .
22 0 0 0 .  2 29.864294         .         .
22 0 0 0 .  2 24.886913         .         .
21 0 0 0 .  2  9.954765         .         .
21 0 0 0 .  2        68         .         .
21 0 0 0 .  2 29.864294         .         .
21 1 1 0 .  2   8.71042         .         .
22 0 0 0 .  2   9.65809         .         .
22 0 0 0 .  2  22.42823         .         .
22 0 0 0 2  4       117     373.5     373.5
21 0 0 0 .  4     100.5         .         .
22 0 0 0 .  4       156         .         .
22 0 0 0 .  4     100.5         .         .
21 0 0 0 .  4       156         .         .
21 0 0 1 2  4       117     373.5     373.5
22 0 0 0 .  5  68.67975         .         .
21 0 0 0 .  5        30         .         .
21 0 0 0 .  5       150         .         .
22 0 0 0 .  5 14.487134         .         .
22 0 0 0 .  5       150         .         .
22 0 0 0 2  5       150  383.1669  356.5834
21 0 0 1 2  5       150       330  356.5834
22 0 0 0 .  6   9.65809         .         .
22 0 0 0 .  6  98.72713         .         .
22 0 0 0 .  6      20.5         .         .
21 0 0 0 .  6        60         .         .
21 0 0 0 .  6  49.77383         .         .
22 0 0 0 2  6  49.77383 178.65906 154.46645
21 0 0 1 2  6      20.5 130.27383 154.46645
21 0 0 0 .  7 24.886913         .         .
21 0 0 0 .  7        40         .         .
22 0 0 0 .  7       150         .         .
21 0 0 1 2  7        75  239.8869 235.28806
22 0 0 0 .  7 24.886913         .         .
22 0 0 0 2  7  55.80229  230.6892 235.28806
21 0 0 0 .  7       100         .         .
21 0 0 1 1  9     51.92     51.92     25.96
22 0 0 0 . 10  3.219363         .         .
22 0 0 0 2 10  6.438726  9.658089 15.089045
21 0 0 1 2 10     20.52     20.52 15.089045
21 0 0 1 1 12        75        75      37.5
22 0 0 0 2 13      3200      3200      3250
21 0 0 1 2 13      3300      3300      3250
21 0 0 0 . 15        20         .         .
22 0 0 0 . 15 .04292484         .         .
21 0 0 0 . 15  5.972859         .         .
22 0 0 0 . 15  5.972859         .         .
22 0 0 0 . 15 .04292484         .         .
21 0 0 1 2 15      4.75 167.92287  94.12877
22 0 0 0 2 15 .53656054 20.334673  94.12877
22 0 0 0 . 15      11.4         .         .
21 0 0 0 . 15      13.2         .         .
22 0 0 0 . 15  .8584968         .         .
22 0 0 0 . 15 .53656054         .         .
22 0 0 0 . 15 .04292484         .         .
22 0 0 0 . 15  .8584968         .         .
21 0 0 0 . 15       124         .         .
22 0 0 0 . 15 .04292484         .         .
22 0 0 1 1  1       160       160        80
22 0 0 0 .  2  3.356522         .         .
22 0 0 1 1  2 16.782608  20.13913 10.069565
21 0 0 1 2  4        75        75     161.7
22 0 0 0 2  4     248.4     248.4     161.7
22 0 0 0 2  5  8.391304  8.391304  66.69565
21 0 0 1 2  5       125       125  66.69565
22 0 0 0 .  6    59.598         .         .
22 0 0 0 2  6     58.05 182.26105 140.33052
21 0 0 1 2  6      98.4      98.4 140.33052
22 0 0 0 .  6 64.613045         .         .
22 0 0 0 2  7  52.86522  52.86522  126.4326
21 0 0 1 2  7       200       200  126.4326
22 0 0 0 . 10 30.208696         .         .
22 0 0 0 2 10      12.8   43.0087  22.25435
21 0 0 1 2 10       1.5       1.5  22.25435
22 0 0 0 . 11        40         .         .
22 0 0 0 2 11        46        86        67
21 0 0 1 2 11        48        48        67
22 0 0 1 1 12        80        80        40
22 0 0 0 2 13        75    2087.5    2037.5
21 0 0 0 . 13       600         .         .
22 0 0 0 . 13       600         .         .
21 1 1 0 . 13       300         .         .
21 0 0 0 . 13        75         .         .
22 0 0 0 . 13       300         .         .
21 0 0 0 . 13      37.5         .         .
21 0 0 0 . 13       900         .         .
22 0 0 0 . 13       300         .         .
22 0 0 0 . 13       300         .         .
end

I want to describe mean consumptions of the different food groups, in different subsets: 1) only the participants that consumed that food group and 2) all participants taking into account also those who did not consume that food group.

I have generated new variables using the command tag: "tag_people" to identify uniquely one participant, "tag_recall" one day and "tag_group" one group (see example dataset). I am working with the average consumption per food group per participant over the two days, the variable "mean_qty_per_group_per_person".

As participants who did not consume the food do not appear there, I face some issues to generate mean consumptions for all participants. I've tried using a loop where I manually enter the total number of participants to obtain the mean:

Code:

foreach var of local food_pattern {
        
    capt drop `var'
    gen `var'=0
    replace `var'= mean_qty_per_group_per_person if group==`i' & tag_number_recalls_consumed==1
    
    capt drop meanqty_`var'
    egen meanqty_`var' = total(`var'/123--> the total number of participants)
    
    local i=`i'+1

}

This works but if there is a change in the database this could lead to errors. Plus, I would need data concerning participants who did not consume a food group for further statistical analysis.

My idea was then to create a new row for each group that was not consumed and to allocate a value of 0 for "cons_qty". The conditions would be to identify for a given "patient_id" if there is a number missing between 1 and 18 for the variable "group". If a number is missing, create a new row with the patient_id, indicate the missing number for the variable "group" and generate a value of 0 for the variable "cons_qty". For example, for participant 1 in the dataset above, the group 3 was not consumed. I would like to create a new row with patient_id= 01, group= 3 and cons_qty=0.

I've checked Statalist and found this interesting post: https://www.statalist.org/forums/for...w-observations, but I did not manage to translate it to my case and have also looked the expand command.

Would you have any suggestions on how to implement the above? or other approaches that would work?

Thank you,
Constance

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29953
#2

09 Nov 2020, 12:07

If I understand you correctly, you want to have a data set in which there is an observation ("row") for each person on each day for each food group. And you want the cons_qty variable set to zero in those observations that, at present, simply don't exist in the data.

That is not possible to do with the data you show in your examples, because there needs to be a variable that identifies which person each observation refers to. Your tag_people variable is presumably somehow related to that, but it does not provide the required information. I will assume that somewhere in your real data there is a variable that identifies individuals, and that it is called person_id. With that, you can do:

Code:

fillin person_id int_num group replace cons_qty = 0 if _fillin
1 like
Comment

�
Constance Legay

Join Date: Oct 2020

Posts: 11
#3

11 Nov 2020, 05:44

Dear Clyde, yes I indeed have a variable id_num that allows me to identify individuals. Thank you, the code worked really well!
Comment

�
Constance Legay

Join Date: Oct 2020

Posts: 11
#4

26 Nov 2020, 03:37

I have another question related to the dataset in #1. I hope it is okay to post it on the same thread, let me know if I should create a new thread instead.

Each group is further divided into subgroups. The different groups don't have the same number of subgroups, for example group 4 has 5 subgroups whereas group 7 has 8 subgroups. I want to create new observations for subgroups that were not consumed, in the same idea that for the groups (see #1).

I tried using:

Code:

fillin person_id int_num group subgroup replace cons_qty = 0 if _fillin

but the problem is that it creates 8 subgroups for each group (8 is the highest number of subgroups inside a group in my dataset), independently of the group concerned.

I saw that the option by() is not allowed for fillin and I also tried using if to distinguish between the different groups,it didn't work either. I checked mipolate but as I understood, it doesn't do what I am looking for.

My solution would be to use fillin on subsets of my data where I keep only one group and append all the subsets afterwards. Yet, it doesn't seem like the most efficient way to do it. Would you have another approach to suggest?
Comment

�
Constance Legay

Join Date: Oct 2020

Posts: 11
#5

07 Dec 2020, 07:57

Question in #4 was resolved here: https://www.statalist.org/forums/forum/general-stata-discussion/general/1584339-creating-new-observations-by-group
Comment

�

Announcement

Creating new rows in a database

Comment

Comment

Comment

Comment