reshape issue

Euslaner

Join Date: Apr 2014

Posts: 180
#1

reshape issue

13 Mar 2024, 16:37

I have a data set that looks like this:

ariable name type format label variable label
-----------------------------------------------------------------------------------------------------------------
globalcountry~w long %8.0g globalcountryaveragenew
Global Country Average
argentinanew long %8.0g argentinanew
Argentina
australianew long %8.0g australianew
Australia
belgiumnew long %8.0g belgiumnew
Belgium
brazilnew long %8.0g brazilnew
Brazil
canadanew long %8.0g canadanew
Canada
chilenew long %8.0g chilenew Chile
colombianew long %8.0g colombianew
Colombia
francenew long %8.0g francenew
France
germanynew long %8.0g germanynew
Germany
greatbritainnew long %8.0g greatbritainnew
Great Britain
hungarynew long %8.0g hungarynew
Hungary
indianew long %8.0g indianew India
indonesianew long %8.0g indonesianew
Indonesia
italynew long %8.0g italynew Italy
japannew long %8.0g japannew Japan
malaysianew long %8.0g malaysianew
Malaysia
mexiconew long %8.0g mexiconew
Mexico
netherlandsnew long %8.0g netherlandsnew
Netherlands
newzealandnew long %8.0g newzealandnew
New Zealand
perunew long %8.0g perunew Peru
polandnew long %8.0g polandnew
Poland
singaporenew long %8.0g singaporenew
Singapore
southafricanew long %8.0g southafricanew
South Africa
southkoreanew long %8.0g southkoreanew
South Korea
spainnew long %8.0g spainnew Spain
swedennew long %8.0g swedennew
Sweden
thailandnew long %8.0g thailandnew
Thailand
turkeynew long %8.0g turkeynew
Turkey
unitedstatesnew long %8.0g unitedstatesnew
United States
question str28 %28s

I typed:

reshape long globalcountrynew argentinanew australianew belgiumnew brazilnew canadanew chilenew colombianew fra
> ncenew germanynew greatbritainnew hungarynew indianew indonesianew italynew japannew malaysianew mexiconew neth
> erlandsnew newzealandnew perunew polandnew singaporenew southafricanew southkoreanew spainnew swedennew thailan
> dnew turkeynew unitedstatesnew, i(question) j(question)
variable question already exists
Data are already long

I am doing something wrong, but I can't figure out what is the issue (the country names are variable names and the values in 4 rows beneath them). The variable question is at the end and is a string with an- abbreviated name for the values in the 4 rows) Any help would .be appreciated.
Thanks, Ric Uslaner
Tags: None
alejoforero

Join Date: Sep 2014

Posts: 49
#2

13 Mar 2024, 16:41

you cannot use the same variable for both i() and j()
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35208
#3

13 Mar 2024, 17:38

Further, reshape long in its simple form needs to be fed shared prefixes, not full variable names. You could probably make progress with the @ syntax but I am not at any computer to experiment.
Comment
Euslaner

Join Date: Apr 2014

Posts: 180
#4

13 Mar 2024, 17:45

I changed the i and j variables and retyped it=

reshape long globalcountrynew argentinanew australianew belgiumnew brazilnew canadanew chilenew colombianew fra
> ncenew germanynew greatbritainnew hungarynew indianew indonesianew italynew japannew malaysianew mexiconew neth
> erlandsnew newzealandnew perunew polandnew singaporenew southafricanew southkoreanew spainnew swedennew thailan
> dnew turkeynew unitedstatesnew, i(question) j(country)

and received the following message from Stata;

no xij variables found

Any help appreciated.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35208

13 Mar 2024, 18:39

I guess you posted #4 before you could read #3 but it explains your main problem.

There is no data example for us to work on but this concocted example should point in a good direction.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(question argentinanew australianew belgiumnew brazilnew)
1 6 1 6 10
2 5 6 9  6
3 3 2 4  7
end

reshape long @new, i(question) j(country) string 

list 

     +----------------------------+
     | question     country   new |
     |----------------------------|
  1. |        1   argentina     6 |
  2. |        1   australia     1 |
  3. |        1     belgium     6 |
  4. |        1      brazil    10 |
  5. |        2   argentina     5 |
     |----------------------------|
  6. |        2   australia     6 |
  7. |        2     belgium     9 |
  8. |        2      brazil     6 |
  9. |        3   argentina     3 |
 10. |        3   australia     2 |
     |----------------------------|
 11. |        3     belgium     4 |
 12. |        3      brazil     7 |
     +----------------------------+

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29792
#6

13 Mar 2024, 18:46

The thing you are missing here is what Nick pointed out in #4: "reshape long in its simple form needs to be fed shared prefixes, not full variable names"

Try this:

Code:

reshape long @new, i(question) j(country) string

In this case, it is shared suffixes rather than prefixes, but the principle is the same.

And then you may want to rename new to something more informative or evocative of what the variable contains.

Caution: You may encounter a labeling problem with the end result here. From the -describe- output you show in #1, each of these *new variables is a value labeled integer. And each of the variables has its own distinct value label. Now, it may be that those value labels are actually just copies of each other under different names. But if they are actually different number <=> label text associations, you will have a problem, because the final variable, new, will be labeled with just one of these--I don't know how to predict which one. But this would mean that all of the values of new that came from original variables with different labels will be mis-labeled. So be on the lookout for this.
1 like
Comment
Euslaner

Join Date: Apr 2014

Posts: 180
#7

14 Mar 2024, 08:21

Sorry here is what the data look like. The top row contains variable names, then 5 rows of data representing questions with a variable question describing the survey question. I don't have a variable for country and will creat+e pne after reshaping, but this means that I have only one variable for i and j (question):, so the command reshape long @new,
can have either i(question) or j(question), How to handle please.

globalcountryaverage argentina australia belgium brazil canada chile colombia france germany greatbritain hungary india indonesia italy japan malaysia mexico netherlands newzealand peru poland singapore southafrica southkorea spain sweden thailand turkey unitedstates question globalcountryaveragenew argentinanew australianew belgiumnew brazilnew canadanew chilenew colombianew francenew germanynew greatbritainnew hungarynew indianew indonesianew italynew japannew malaysianew mexiconew netherlandsnew newzealandnew perunew polandnew singaporenew southafricanew southkoreanew spainnew swedennew thailandnew turkeynew unitedstatesnew
0.741 0.803 0.785 0.695 0.802 0.837 0.732 0.772 0.727 0.725 0.841 0.633 0.606 0.710 0.747 0.725 0.700 0.821 0.799 0.866 0.764 0.733 0.553 0.796 0.545 0.848 0.839 0.748 0.606 0.746 allowrefugees 0.741 0.803 0.785 0.695 0.802 0.837 0.732 0.772 0.727 0.725 0.841 0.633 0.606 0.710 0.747 0.725 0.700 0.821 0.799 0.866 0.764 0.733 0.553 0.796 0.545 0.848 0.839 0.748 0.606 0.746
0.582 0.640 0.498 0.626 0.460 0.448 0.688 0.644 0.561 0.615 0.525 0.584 0.583 0.731 0.527 0.375 0.733 0.630 0.606 0.350 0.779 0.527 0.619 0.771 0.493 0.492 0.522 0.686 0.710 0.464 foreignersnotrefugrees 0.582 0.640 0.498 0.626 0.460 0.448 0.688 0.644 0.561 0.615 0.525 0.584 0.583 0.731 0.527 0.375 0.733 0.630 0.606 0.350 0.779 0.527 0.619 0.771 0.493 0.492 0.522 0.686 0.710 0.464
0.488 0.636 0.626 0.308 0.660 0.619 0.411 0.480 0.346 0.405 0.537 0.289 0.575 0.595 0.515 0.255 0.578 0.525 0.425 0.663 0.440 0.567 0.408 0.585 0.287 0.498 0.395 0.709 0.238 0.567 refugeeswantbettereco 0.488 0.636 0.626 0.308 0.660 0.619 0.411 0.480 0.346 0.405 0.537 0.289 0.575 0.595 0.515 0.255 0.578 0.525 0.425 0.663 0.440 0.567 0.408 0.585 0.287 0.498 0.395 0.709 0.238 0.567
0.453 0.454 0.661 0.307 0.609 0.636 0.440 0.408 0.388 0.400 0.559 0.266 0.534 0.470 0.476 0.291 0.322 0.439 0.490 0.698 0.379 0.452 0.360 0.410 0.240 0.565 0.551 0.554 0.210 0.576 refugeesintegrate 0.453 0.454 0.661 0.307 0.609 0.636 0.440 0.408 0.388 0.400 0.559 0.266 0.534 0.470 0.476 0.291 0.322 0.439 0.490 0.698 0.379 0.452 0.360 0.410 0.240 0.565 0.551 0.554 0.210 0.576
0.434 0.290 0.364 0.493 0.283 0.289 0.594 0.328 0.407 0.477 0.368 0.461 0.566 0.479 0.443 0.285 0.720 0.360 0.463 0.210 0.620 0.258 0.481 0.518 0.391 0.292 0.500 0.523 0.756 0.353 refugeespositivecontribution 0.434 0.290 0.364 0.493 0.283 0.289 0.594 0.328 0.407 0.477 0.368 0.461 0.566 0.479 0.443 0.285 0.720 0.360 0.463 0.210 0.620 0.258 0.481 0.518 0.391 0.292 0.500 0.523 0.756 0.353
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 9944

14 Mar 2024, 09:05

It appears that your questions are identified by the suffix attached to the country name. You can reshape everything to long and then create a variable identifying the question based on the suffix, then reshape wide.

Code:

clear
input float(globalcountryaverage argentina australia belgium brazil globalcountryaveragenew argentinanew australianew belgiumnew brazilnew)
0.741 0.803 0.785 0.695 0.802  0.255 0.578 0.525 0.425 0.663  
0.582 0.640 0.498 0.626 0.460  0.375 0.733 0.630 0.606 0.350
0.488 0.636 0.626 0.308 0.660  0.255 0.578 0.525 0.425 0.440
0.453 0.454 0.661 0.307 0.609  0.375 0.733 0.630 0.606 0.350
0.434 0.290 0.364 0.493 0.283  0.255 0.578 0.525 0.663 0.440
end

rename * value=
gen obsno=_n
reshape long value, i(obsno) j(country) string
gen which= cond(regexm(country, "new$"), 2, 1)
replace country= ustrregexra(country, "(.*)(new$)", "$1")
reshape wide value, i(country obsno) j(which)

Res.:

Code:

. l, sepby(country)

     +------------------------------------------------+
     | obsno                country   value1   value2 |
     |------------------------------------------------|
  1. |     1              argentina     .803     .578 |
  2. |     2              argentina      .64     .733 |
  3. |     3              argentina     .636     .578 |
  4. |     4              argentina     .454     .733 |
  5. |     5              argentina      .29     .578 |
     |------------------------------------------------|
  6. |     1              australia     .785     .525 |
  7. |     2              australia     .498      .63 |
  8. |     3              australia     .626     .525 |
  9. |     4              australia     .661      .63 |
 10. |     5              australia     .364     .525 |
     |------------------------------------------------|
 11. |     1                belgium     .695     .425 |
 12. |     2                belgium     .626     .606 |
 13. |     3                belgium     .308     .425 |
 14. |     4                belgium     .307     .606 |
 15. |     5                belgium     .493     .663 |
     |------------------------------------------------|
 16. |     1                 brazil     .802     .663 |
 17. |     2                 brazil      .46      .35 |
 18. |     3                 brazil      .66      .44 |
 19. |     4                 brazil     .609      .35 |
 20. |     5                 brazil     .283      .44 |
     |------------------------------------------------|
 21. |     1   globalcountryaverage     .741     .255 |
 22. |     2   globalcountryaverage     .582     .375 |
 23. |     3   globalcountryaverage     .488     .255 |
 24. |     4   globalcountryaverage     .453     .375 |
 25. |     5   globalcountryaverage     .434     .255 |
     +------------------------------------------------+

Comment

Euslaner

Join Date: Apr 2014

Posts: 180
#9

14 Mar 2024, 09:33

Thanks, what should I write for the reshape command to get:

country allowrefugees foreignersnotrefugrees refugeeswantbettereco refugeesintegrate refugeespositivecontribution

where the top entries are in the variable question and the rows are one (from the *new variables) for each country?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29792
#10

14 Mar 2024, 10:38

The data shown in #7 are peculiar. We have variables named globalcountryaverage through unitedstates, and then another series globalcountryaveragenew through unitedstatesnew. The countries mentioned each series are the same, and the numerical values for the corresponding variables are also the same. So, somehow this data set has two copies of the same data. It makes life simpler if we just discard one set of the redundant values. From there it is fairly simple:

Code:

drop *new rename (globalcountryaverage-unitedstates) _= reshape long _, i(question) j(country) string reshape wide _, i(country) j(question) string rename _* *
1 like
Comment
Euslaner

Join Date: Apr 2014

Posts: 180
#11

14 Mar 2024, 12:48

Thanks much.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment