How to reshape/convert a household data set in the wide format to long format

Shamsudini Amidu

Join Date: Jun 2022

Posts: 61
#1

How to reshape/convert a household data set in the wide format to long format

14 Jun 2022, 15:11

Hi family,
I have a data set that the individual household member's information is presented in the wide format instead of the expected long format(row per every household member with repeated hhID if same household).
I tried my best by using the command reshape wide mid1, i(otherhh_id) j(hhnumber) but it didn't work out.
Any help?.
I attached the dataset for your perusal.

Thank you.
Sham
Attached Files

roster.dta (2.19 MB, 1 view)
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#2

14 Jun 2022, 15:42

Let me recommend that you take a moment to read the Forum FAQ for advice on ways to optimize your Statalist experience. (Click on FAQ underneath the Statalist banner at the top of the page.) A few particular points you will find there:

1. Attachments are discouraged. This is because many people are reluctant to download attachments from strangers. There is an easy, and very helpful, way to show example data: use the -dataex- command. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

2. Don't say that something "didn't work." There are many ways in which something can fail to work as expected. Always give the particulars about how it didn't work. Did Stata crash or hang? Did you get an error message? If so, show the error message exactly as Stata gave it to you. Did it appear to run, but produce results that were not in line with what you wanted to get? If so, show the results you got and, unless it is blatantly obvious to anybody without knowledge of your particular problem, explain or show how the desired results would differ.

If you post back bearing these suggestions in mind, you have a much better chance of getting a timely and helpful response.
Comment
Shamsudini Amidu

Join Date: Jun 2022

Posts: 61
#3

14 Jun 2022, 17:17

Waoo!
Thank you soo much for your advice. I am most grateful for the direction. All issues and comments are well noted.
Will do the dataex and revert with the questions.
Am most grateful.
Sorry for my choice of words.
Comment

Shamsudini Amidu

Join Date: Jun 2022
Posts: 61

15 Jun 2022, 01:18

Hi family once again,
I have a household data set that comes in the form of wide instead of long,. each household member's information is presented in the column istead of row form.
So I need a help on how to convert such dataset from wide to long.
I have tried typing this command

Code:

reshape long mid1 s1q2 s1q3 s1q4dd s1q4mm, i(hhid) j(nhhid)

but Stata generated an error message as showed below;

variable nhhid contains all missing values

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float hhid byte(mid1_01 mid1_02 s1q2_01 s1q2_02 s1q3_01 s1q3_02 s1q4dd_01 s1q4dd_02 s1q4mm_01 s1q4mm_02)
111 1 2 1 2 1 2  1  1 1 1
222 1 2 1 2 1 2 99 99 1 1
end
label values s1q2_01 S1Q2
label values s1q2_02 S1Q2
label def S1Q2 1 "Male", modify
label def S1Q2 2 "Female", modify
label values s1q3_01 S1Q3
label values s1q3_02 S1Q3
label def S1Q3 1 "Head", modify
label def S1Q3 2 "Spouse (Wife/Husband)", modify
label values s1q4dd_01 S1Q4DD
label values s1q4dd_02 S1Q4DD
label def S1Q4DD 99 "DK", modify
label values s1q4mm_01 S1Q4MM
label values s1q4mm_02 S1Q4MM
label def S1Q4MM 1 "January", modify

------------------ copy up to and including the previous line ------------------

Listed 2 out of 2 observations

Any help?

Thank you.
Amidu Shamsudini

Last edited by Shamsudini Amidu; 15 Jun 2022, 01:29.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#5

16 Jun 2022, 12:19

If you take any of your variable sets, e.g. mid1_01 and mid1_02, and remove the mid1, that leaves _01 and _02. But _01 and _02 are not numeric, they are strings because of the underscore character that starts them off. So, to get this command to run you have to specify the -string- option:

Code:

reshape long mid1 s1q2 s1q3 s1q4dd s1q4mm, i(hhid) j(nhhid) string

Now, this may not produce exactly what you were hoping for. I suspect that you would like nhhid to be numeric. If so, follow the above with -destring nhhid, ignore("_") replace-.
Comment
Shamsudini Amidu

Join Date: Jun 2022

Posts: 61
#6

07 Sep 2022, 14:10

Hi Clyde Schechter,
Waooo! thank you so much for your guide.
It works now.
Sorry for the late confirmation I went to the field where there is no net work.
I am most grateful.
Sham
Comment
Shamsudini Amidu

Join Date: Jun 2022

Posts: 61
#7

07 Sep 2022, 14:42

Good evening Family,
I have a longitudinal data set from wave1,2 and 3 append into 1 data set.
Now what I wanted is to drop observation(FPrimaryhhid) that did not appear in all the waves(1,2,3). So for me to keep the observation your key (FPrimaryhhid) must reflect in w1,2 and3
I mean you qualify if only you responded in wave1,2 and 3.

Below is the data structure.
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 fprimaryhhid str9 fprimary byte hhmid long(depressionlevel attendedschl) byte wave
"1010010021" "101001002" 1 3 2 1
"1010010021" "101001002" 1 2 2 2
"1010010021" "101001002" 1 2 2 3
"1010010022" "101001002" 2 4 1 1
"1010010022" "101001002" 2 1 2 2
"1010010022" "101001002" 2 3 2 3
"1010010023" "101001002" 3 4 1 1
"1010010031" "101001003" 1 1 2 1
"1010010031" "101001003" 1 1 2 2
"1010010031" "101001003" 1 3 2 3
"1010010032" "101001003" 2 2 2 2
"1010010032" "101001003" 2 3 2 3
"1010010041" "101001004" 1 1 2 1
"1010010041" "101001004" 1 2 2 2
"1010010042" "101001004" 2 2 1 2
"1010010091" "101001009" 1 1 2 1
"1010010091" "101001009" 1 3 2 2
"1010010091" "101001009" 1 2 2 3
"1010010092" "101001009" 2 1 1 1
end

The command I use is
drop FPrimaryhhid if wave !=1 & wave !=2 & wave !=3
the error I got is;
invalid syntax

I need your help please.

Thank you for reading.
Sham.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#8

07 Sep 2022, 15:31

Stata is a programming language, not a natural language. And Stata cannot figure out your meaning when you do not adhere to its strict syntax rules. The -drop- command allows two different syntaxes:

Code:

drop list_of_variables_to_be_dropped // NOTE: NO if PERMITTED drop if condition_defining_which_observations_to_be_dropped // NOTE: NO VARIABLES CAN BE MENTIONED

In fact, in Stata there are no commands a tall that will drop a variable only in certain observations. A variable either stays or is dropped in its entirety.

Moreover, -if wave!=1 & wave !=2 & wave !=3- would cause Stata only to seek out observations in which the value of the variable wave is neither 1, nor 2, nor 3. And, since wave can only be 1, 2, or 3, there aren't any such observations.

But those things are not what you meant anyway. What you meant is "drop every observations for a given FPrimaryhhid if it does not appear in conjunction with all three values of the variable wave." Here's how you do that:

Code:

assert inlist(wave, 1, 2, 3) // VERIFY wave CAN ONLY BE 1, 2, or 3 by fprimaryhhid (wave), sort: drop if _N < 3
Comment
Shamsudini Amidu

Join Date: Jun 2022

Posts: 61
#9

07 Sep 2022, 16:26

Hi Clyde Schechter,
It worked.
Thank you so much for educating me more about the drop command.
I have learn a lot from that.
God bless you.
Sham
Comment
Shamsudini Amidu

Join Date: Jun 2022

Posts: 61
#10

07 Sep 2022, 16:56

Base on your guide,
I have also try an alternative and it worked.

See command below;

bysort fprimaryhhid : gen respoappearance=_N
drop if hhsize<3

Can you can you confirmed if the above can be considered an alternative.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#11

07 Sep 2022, 18:15

No. The last one should be -drop if respoappearance < 3-. Your command drops all household sizes < 3, which is not what you want.
Comment
Shamsudini Amidu

Join Date: Jun 2022

Posts: 61
#12

08 Sep 2022, 00:27

Sorry!
bysort fprimaryhhid : gen respoappearance=_N
drop if respoappearance<3

Is that correct? and can that be an alternative to the answer you provided earlier .
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#13

08 Sep 2022, 09:32

Yes, that will work, provided you are certain that the variable wave is always 1, 2, or 3--no other values, and no missing values--and fprimaryhhid and wave jointly identify unique observations.

I'm curious why you prefer it to the solution in #8. It creates the variable respoappearance, and in other circumstances that variable could be useful. But if it is really true that wave is always 1, 2, or 3, and each household has only one observation per wave, then after the -drop if respoappearance <3-, the only possible value for repoappearance among the remaining observations will be 3, so it no longer carries any information. It seems odd to create a variable whose information will immediately be destroyed, when you can accomplish the task without doing so.
Comment
Shamsudini Amidu

Join Date: Jun 2022

Posts: 61
#14

08 Sep 2022, 10:52

Alright.
Now I get it .
It is safer to stick to this;
assert inlist(wave, 1, 2, 3) // VERIFY wave CAN ONLY BE 1, 2, or 3 by fprimaryhhid (wave), sort: drop if _N < 3 to avoid any distortion. Thank you for throwing more light... I appreciate your time. May God continue to bless you for making me improve my skills in Stata. Sham
Comment
Shamsudini Amidu

Join Date: Jun 2022

Posts: 61
#15

08 Sep 2022, 21:59

Hello,
Please I need a clarification on the difference between 1:1 merging and m:m merging.

I have two(2) data set that I want to merge together.

Dataset1:
data set one has the following data set
input long(hhid_memid hhid) byte(memid wave) long depression
1010010021 101001002 1 1 3
1010010022 101001002 2 1 4
1010010023 101001002 3 1 4
1010010031 101001003 1 1 1
1010010041 101001004 1 1 1
1010010091 101001009 1 1 1
1010010092 101001009 2 1 1
1010010101 101001010 1 1 4
1010010102 101001010 2 1 1
1010010121 101001012 1 1 1
1010010122 101001012 2 1 3
1010010131 101001013 1 1 1
1010010132 101001013 2 1 1
1010010151 101001015 1 1 4
1010010152 101001015 2 1 4
1010010153 101001015 3 1 3
1010010221 101001022 1 1 3
1010010222 101001022 2 1 1
1010010231 101001023 1 1 1
1010010232 101001023 2 1 2
end

Dataset2:
input long(hhid_memid hhid) byte memid
1010010021 101001002 1
1010010022 101001002 2
1010010023 101001002 3
1010010031 101001003 1
1010010041 101001004 1
1010010091 101001009 1
1010010092 101001009 2
1010010101 101001010 1
1010010102 101001010 2
1010010121 101001012 1
1010010122 101001012 2
1010010123 101001012 3
1010010131 101001013 1
1010010132 101001013 2
1010010151 101001015 1
1010010152 101001015 2
1010010153 101001015 3
1010010154 101001015 4
1010010155 101001015 5
1010010156 101001015 6
end

variable: hhid_memid //This is what identifies the individual in the household( combined/concatenated household number and member id within a household roster)

variable: hhid // This is what identifies each household in the survey

variable: memid// member id within a household roster

Now when I used 1:1 using the key variable (hhid_memid) in both file this is the merge results I got;

merge 1:1 hhid_memid using "C:\Users\USER\Desktop\opn_stata_merged\Kpadam_Pro ject\Append approach\Dependet_
> independent_merge\Individuals_dep_inde\Pannelquest ion\wave1_education.dta"
(note: variable memid was byte, now long to accommodate using data's values)

Result # of obs.
-----------------------------------------
not matched 8
from master 4 (_merge==1)
from using 4 (_merge==2)

matched 16 (_merge==3)
-----------------------------------------

On the other hand, when I used m:m on the key variable(hhid ) this is the results I got as well;

. merge m:m hhid using "C:\Users\USER\Desktop\opn_stata_merged\Kpadam_Pro ject\Append approach\Dependet_indepe
> ndent_merge\Individuals_dep_inde\Pannelquestion\wa ve1_education.dta"

Result # of obs.
-----------------------------------------
not matched 4
from master 4 (_merge==1)
from using 0 (_merge==2)

matched 20 (_merge==3)
-----------------------------------------
Which of the merge method should I best used to merge the individual response from both files together because, all the two command gave me 2 different matched results.
I will be very glad for your clarity.
Sham.
Thank you.
Comment

Announcement