Hi,
I have a large, long dataset (about 3 million observations) containing timing data about how long participants took on various sections, and questions in a survey. A small example of the dataset is posted below. I want to add up how long each participant took on a given section in the survey, as well as how long it took participants to reach the first question of each new section starting from when they began the survey. However, many of the section names are not all clumped together despite occurring in chronological order – (ex. Participant BBZJE answers some questions in section DS below, and then several in HG, before answering a couple more in section DS. Participant BCBFX has a similar issue with VC and BD sections as well). Furthermore, despite the 'spilt' questions being part of the same section, I need to treat each new clump of questions as if it were a new section. Stated another way, I need to sort by Timestamp before sorting by SectionName to keep it in chronological order. Is there a way to keep Timestamp sorted in chronological order, but sum the times by the name of the section?
David
I have a large, long dataset (about 3 million observations) containing timing data about how long participants took on various sections, and questions in a survey. A small example of the dataset is posted below. I want to add up how long each participant took on a given section in the survey, as well as how long it took participants to reach the first question of each new section starting from when they began the survey. However, many of the section names are not all clumped together despite occurring in chronological order – (ex. Participant BBZJE answers some questions in section DS below, and then several in HG, before answering a couple more in section DS. Participant BCBFX has a similar issue with VC and BD sections as well). Furthermore, despite the 'spilt' questions being part of the same section, I need to treat each new clump of questions as if it were a new section. Stated another way, I need to sort by Timestamp before sorting by SectionName to keep it in chronological order. Is there a way to keep Timestamp sorted in chronological order, but sum the times by the name of the section?
David
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str7 ID str2 SectionName str23 QuestionName str19 Timestamp "BBZJE" "DS" "DS009[20]" "2019-01-29 12:13:12" "BBZJE" "DS" "DS008[21]" "2019-01-29 12:13:16" "BBZJE" "DS" "DS009[21]" "2019-01-29 12:13:18" "BBZJE" "DS" "DS008[21]" "2019-01-29 12:13:21" "BBZJE" "DS" "DS009[21]" "2019-01-29 12:13:24" "BBZJE" "DS" "DS008[23]" "2019-01-29 12:13:28" "BBZJE" "DS" "DS009[23]" "2019-01-29 12:13:30" "BBZJE" "DS" "DS008[31]" "2019-01-29 12:13:34" "BBZJE" "DS" "DS009[31]" "2019-01-29 12:13:36" "BBZJE" "HG" "HG001" "2019-01-29 12:13:40" "BBZJE" "HG" "HG003" "2019-01-29 12:13:42" "BBZJE" "HG" "HG003_other" "2019-01-29 12:13:42" "BBZJE" "HG" "HG004" "2019-01-29 12:13:44" "BBZJE" "HG" "HG006" "2019-01-29 12:13:46" "BBZJE" "HG" "HG201" "2019-01-29 12:13:53" "BBZJE" "DS" "DS001" "2019-01-29 12:13:55" "BBZJE" "DS" "DS003" "2019-01-29 12:13:58" "BBZJE" "LK" "LK001" "2019-01-29 12:14:00" "BBZJE" "FA" "FA001" "2019-01-29 12:14:05" "BBZJE" "la" "lastquestion" "2019-01-29 12:14:07" "BBZWR" "VC" "VC001_form1" "2019-02-09 13:21:19" "BBZWR" "HR" "HR001" "2019-02-09 13:21:43" "BBZWR" "HR" "HR014" "2019-02-09 13:21:45" "BBZWR" "HR" "HR013[7]" "2019-02-09 13:21:47" "BBZWR" "HR" "HR009" "2019-02-09 13:22:10" "BBZWR" "HR" "HR010" "2019-02-09 13:22:14" "BBZWR" "HR" "HR012" "2019-02-09 13:22:18" "BCBFX" "VC" "VC002_form2" "2019-01-09 13:34:38" "BCBFX" "VC" "VC205" "2019-01-09 13:34:46" "BCBFX" "VC" "VC206" "2019-01-09 13:34:46" "BCBFX" "VC" "VC207" "2019-01-09 13:34:46" "BCBFX" "VC" "VC208" "2019-01-09 13:34:46" "BCBFX" "VC" "VC209" "2019-01-09 13:34:46" "BCBFX" "VC" "VC210" "2019-01-09 13:34:46" "BCBFX" "BD" "BD_intro" "2019-01-09 13:35:01" "BCBFX" "BD" "BD001_month" "2019-01-09 13:35:16" "BCBFX" "BD" "BD001_year" "2019-01-09 13:35:16" "BCBFX" "BD" "BD201" "2019-01-09 13:35:29" "BCBFX" "BD" "BD201_other" "2019-01-09 13:35:29" "BCBFX" "BD" "BD202" "2019-01-09 13:35:32" "BCBFX" "BD" "BD209_sh" "2019-01-09 13:35:35" "BCBFX" "BD" "BD209_en" "2019-01-09 13:35:37" "BCBFX" "BD" "BD003" "2019-01-09 13:35:52" "BCBFX" "VC" "VC211" "2019-01-09 13:36:04" "BCBFX" "VC" "VC212" "2019-01-09 13:36:09" "BCBFX" "BD" "BD019" "2019-01-09 13:36:19" "BCBFX" "BD" "BD034" "2019-01-09 13:36:30" end
Comment