How can I create groups of observations in a panel data?

Ana Vasconcelos

Join Date: Aug 2016

Posts: 193
#1

How can I create groups of observations in a panel data?

08 Sep 2016, 11:08

Hello,

I have a panel data for multiple countries at a quarterly frequency. I would like to group the countries by "high income countries", "middle income countries" and "lower income countries" in order to compare the regression results by groups of countries.
Can someone help me with the code?
Thank you in advance.
Tags: data, panel data, regression

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17603

08 Sep 2016, 11:42

Ana:
are you looking for something along the following lines?

Code:

set obs 3
g income=100000 in 1
replace income=100000/2 in 2
replace income=100000/3 in 3
g income_flag=1 if income>=100000
replace income_flag=2 if income <100000 & income>=50000
replace income_flag=3 if income <50000
label define income_flag 1 "high income" 2 "middle income" 3 "low income"
label val income_flag income_flag
tab income_flag

Kind regards,
Carlo
(StataNow 18.5)

Comment

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#3

09 Sep 2016, 11:35

To extend what Carlo provided but for regression, you start by making the indicator for the three groups:
g income_flag=1 if income>=100000 replace income_flag=2 if income <100000 & income>=50000 replace income_flag=3 if income <50000 Then you just interact that indicator with your x variables:
regress y i.income_flag##(c.x1 c.x2)

Then you get the labels by issuing:
regress ,coefl

Now you can just use the coefl labels to do whatever tests you want.

test _b[1b.income_flag#c.x1]=_b[2.income_flag#c.x1]
1 like
Comment
Ana Vasconcelos

Join Date: Aug 2016

Posts: 193
#4

10 Sep 2016, 07:21

Hello,

Thank you for your suggestion. What I need to do is to group the countries but using the name of the country (not using the value of the income has you said before). For example, I have data on:

- Australia (high income economy)
- Austria (high income economy)
- Italy (high income economy)
- Japan (high income economy)
- Malta (high income economy)
- Argentina (upper middle income economy)
- Brazil (upper middle income economy)
- Chile (upper middle income economy)
- Angola (lower middle income economy)
- Bolivia (lower middle income economy)
- Bangladesh (low income economy)
- Burkina Faso (low income economy)

Now, using the name of the countries, I need to group them by "high income economies", "upper middle income economies", "lower middle income economies", "low income economies".
Can you help me with the code?

Thank you in advance.
Comment

Alfonso Sánchez-Peñalver

Join Date: Mar 2014
Posts: 432

10 Sep 2016, 08:14

I believe the following ought to work.

Code:

gen groups = 1 * inlist(country, "Australia", "Austria", "Italy", Japan", "Malta") + ///
             2 * inlist(country, "Argentina", "Brazil", "Chile") + ///
             3 * inlist(country, "Angola", "Bolivia", "Bangladesh", "Burkina Faso")
label def gnames ///
       1 "High Income"  ///
       2 "Upper Income"  ///
       3 "Low Income"
label val groups gnames
label var groups "Country income groups"

Alfonso Sanchez-Penalver

Comment

Ana Vasconcelos

Join Date: Aug 2016

Posts: 193
#6

20 Sep 2016, 16:08

Thank you for your suggestion Alfonso. However, when I apply the aforementioned code, it appears the following error: "expression too long" r (130).
Can someone help me with this error?
Thank you in advance.

Last edited by Ana Vasconcelos; 20 Sep 2016, 16:15.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35211

20 Sep 2016, 16:15

There is a small typo in Alfonso's code; "Japan" should appear thus; otherwise it appears quite unproblematic and far from any length limits.

This runs fine:

Code:

clear 
set obs 1
gen country = "Australia" 
gen groups = 1 * inlist(country, "Australia", "Austria", "Italy", "Japan", "Malta") + ///
             2 * inlist(country, "Argentina", "Brazil", "Chile") + ///
             3 * inlist(country, "Angola", "Bolivia", "Bangladesh", "Burkina Faso")
label def gnames ///
       1 "High Income"  ///
       2 "Upper Income"  ///
       3 "Low Income"
label val groups gnames
label var groups "Country income groups"

So I have to guess that you are trying quite different code. Reporting an implausible error with code you don't show us poses an unanswerable question.

FAQ Advice #12 applies!

Comment

Ana Vasconcelos

Join Date: Aug 2016

Posts: 193
#8

20 Sep 2016, 16:40

Thank you for your suugestion Nich Cox. The code that I apply was:

gen groups = 1 * inlist(country, "UAE", "GERMANY", "FINLAND", "FRANCE", "ESTONIA", "ARGENTINA", "AUSTRALIA", "AUSTRIA", "BAHAMAS", "BAHRAIN", ///
"BELGIUM", "BRUNEI", "CANADA", "CHILE", "CROATIA", "CYPRUS", "CZECH_REPUBLIC", "DENMARK", "JAPAN", "IRELAND", "ISRAEL", "ITALY", "GREECE", "HONG KONG", ///
"HUNGARY", "ICELAND", "KOREA_SOUTH", "KUWAIT", "LATVIA", "LITHUANIA", "LUXEMBOURG", "MALTA", "NETHERLANDS", "NEW_ZEALAND", "NORWAY", "OMAN", "POLAND", "PORTUGAL", ///
"QATAR", "SAUDI_ARABIA", "SINGAPORE", "SLOVAKIA", "SLOVENIA", "SPAIN", "SWEDEN", "SWITZERLAND", "TAIWAN", "TRINIDAD_TOBAGO", "UNITED_KINGDOM", "UNITED_STATES", "URUGUAY") + ///
2 * inlist (country, "ALBANIA", "ALGERIA", "ANGOLA", "AZERBAIJAN", "BELARUS", "BOTSWANA", "BRAZIL", "BULGARIA", "CHINA", "COLOMBIA", "COSTA_RICA", "CUBA", "DOMINICA_REPUBLIC", ///
"ECUADOR", "GABON", "GUYANA", "IRAN", "IRAQ", "JAMAICA", "JORDAN", "KAZAKHSTAN", "LEBANON", "LIBYA", "MALAYSIA", "MEXICO", "NAMIBIA", "PANAMA", "PARAGUAY", "PERU", "ROMANIA", ///
"RUSSIA", "SERBIA", "SOUTH_AFRICA", "SURINAME", "THAILAND", "TURKEY", "VENEZUELA") + ///
3 * inlist (country, "COTE_DIVOIRE", "CONGO", "ARMENIA", "BANGLADESH", "BOLIVIA", "CAMEROON", "EGYPT", "EL_SALVADOR", "GHANA", "GUATEMALA", "HONDURAS", "INDIA", "INDONESIA", "KENYA", ///
"MOLDOVA", "MONGOLIA", "MOROCCO", "MYANMAR", "NICARAGUA", "NIGERIA", "PAKISTAN", "PAPUA_NEW_GUINEA", "PHILIPPINES", "SRI_LANKA", "SUDAN", "SYRIA", "TUNISIA", "UKRAINE", "VIETNAM", ///
"YEMEN", "ZAMBIA") + ///
4 * inlist (country, "BURKINA_FASO", "CONGO_DR", "ETHIOPIA", "GAMBIA", "GUINEA", "GUINEA_BISSAU", "HAITI", "KOREA_DPR", "LIBERIA", "MADAGASCAR", "MALAWI", "MALI", "MOZAMBIQUE", "NIGER", ///
"SENEGAL", "SIERRA_LEONE", "SOMALIA", "TANZANIA", "TOGO", "UGANDA", "ZIMBABWE")

label def gnames ///
1 "High Income" ///
2 "Upper Middle Income" ///
3 "Lower Middle Income" ///
4 "Low Income"

label val groups gnames
label var groups "Country income groups"

However, in the end it appears the following message: "expression too long" r (130).
Can someone help me with the code?
Thank you very much.
Comment

Alfonso Sánchez-Peñalver

Join Date: Mar 2014
Posts: 432

20 Sep 2016, 17:15

See help inlist. It has a limit of 10 arguments if they are strings. I think the first inlist has many more than that. You can always break it up into how many you need since the countries are mutually exclusive. For example:

Code:

gen groups = 1 * inlist(country, "UAE", "GERMANY", "FINLAND", "FRANCE", "ESTONIA", "ARGENTINA", "AUSTRALIA", "AUSTRIA", "BAHAMAS", "BAHRAIN", "BELGIUM") + ///,
    1 * inlist(country, "BRUNEI", "CANADA", "CHILE", "CROATIA", "CYPRUS", "CZECH_REPUBLIC", "DENMARK", "JAPAN", "IRELAND", "ISRAEL") + ///
    1 * inlist(country, "ITALY", "GREECE", "HONG KONG", "HUNGARY", "ICELAND", "KOREA_SOUTH", "KUWAIT", "LATVIA", "LITHUANIA", "LUXEMBOURG") + ///
    1 * inlist(country,  "MALTA", "NETHERLANDS", "NEW_ZEALAND", "NORWAY", "OMAN", "POLAND", "PORTUGAL", "QATAR", "SAUDI_ARABIA", "SINGAPORE") + ///
    1 * inlist(country, "SLOVAKIA", "SLOVENIA", "SPAIN", "SWEDEN", "SWITZERLAND", "TAIWAN", "TRINIDAD_TOBAGO", "UNITED_KINGDOM", "UNITED_STATES", "URUGUAY")

Do that for all the categories you have more than 10. I think that I have 10 strings in each of the inlist functions in the code above. If I have more I apologize, if I have less it will still work.

Alfonso Sanchez-Penalver

Comment

Ana Vasconcelos

Join Date: Aug 2016

Posts: 193
#10

21 Sep 2016, 13:49

Thank you for your help Alfonso. I break up the arguments in the way that you said..however I still have the same error: "expression too long"...
Can someone help me with the error?

I use the following code:

gen groups = 1 * inlist(country, "UAE", "GERMANY", "FINLAND", "FRANCE", "ESTONIA", "ARGENTINA", "AUSTRALIA", "AUSTRIA", "BAHAMAS", "BAHRAIN") + ///
1 * inlist(country, "BELGIUM", "BRUNEI", "CANADA", "CHILE", "CROATIA", "CYPRUS", "CZECH_REPUBLIC", "DENMARK", "JAPAN", "IRELAND") + ///
1 * inlist(country, "ISRAEL", "ITALY", "GREECE", "HONG KONG", "HUNGARY", "ICELAND", "KOREA_SOUTH", "KUWAIT", "LATVIA", "LITHUANIA") + ///
1 * inlist(country, "LUXEMBOURG", "MALTA", "NETHERLANDS", "NEW_ZEALAND", "NORWAY", "OMAN", "POLAND", "PORTUGAL", "QATAR", "SAUDI_ARABIA") + ///
1 * inlist(country, "SINGAPORE", "SLOVAKIA", "SLOVENIA", "SPAIN", "SWEDEN", "SWITZERLAND", "TAIWAN", "TRINIDAD_TOBAGO", "UNITED_KINGDOM") + ///
1 * inlist(country, "UNITED_STATES", "URUGUAY") + ///
2 * inlist(country, "ALBANIA", "ALGERIA", "ANGOLA", "AZERBAIJAN", "BELARUS", "BOTSWANA", "BRAZIL", "BULGARIA", "CHINA", "COLOMBIA") + ///
2 * inlist(country, "COSTA_RICA", "CUBA", "DOMINICA_REPUBLIC", "ECUADOR", "GABON", "GUYANA", "IRAN", "IRAQ", "JAMAICA", "JORDAN") + ///
2 * inlist(country, "KAZAKHSTAN", "LEBANON", "LIBYA", "MALAYSIA", "MEXICO", "NAMIBIA", "PANAMA", "PARAGUAY", "PERU", "ROMANIA") + ///
2 * inlist(country, "RUSSIA", "SERBIA", "SOUTH_AFRICA", "SURINAME", "THAILAND", "TURKEY", "VENEZUELA") + ///
3 * inlist(country, "COTE_DIVOIRE", "CONGO", "ARMENIA", "BANGLADESH", "BOLIVIA", "CAMEROON", "EGYPT", "EL_SALVADOR", "GHANA", "GUATEMALA") + ///
3 * inlist(country, "HONDURAS", "INDIA", "INDONESIA", "KENYA","MOLDOVA", "MONGOLIA", "MOROCCO", "MYANMAR", "NICARAGUA", "NIGERIA") + ///
3 * inlist(country, "PAKISTAN", "PAPUA_NEW_GUINEA", "PHILIPPINES", "SRI_LANKA", "SUDAN", "SYRIA", "TUNISIA", "UKRAINE", "VIETNAM") + ///
3 * inlist(country, "YEMEN", "ZAMBIA") + ///
4 * inlist(country, "BURKINA_FASO", "CONGO_DR", "ETHIOPIA", "GAMBIA", "GUINEA", "GUINEA_BISSAU", "HAITI", "KOREA_DPR", "LIBERIA", "MADAGASCAR") + ///
4 * inlist(country, "MALAWI", "MALI", "MOZAMBIQUE", "NIGER","SENEGAL", "SIERRA_LEONE", "SOMALIA", "TANZANIA", "TOGO") + ///
4 * inlist(country, "UGANDA", "ZIMBABWE")

label def gnames ///
1 "High Income" ///
2 "Upper Middle Income" ///
3 "Lower Middle Income" ///
4 "Low Income"

label val groups gnames
label var groups "Country income groups"

Thank you in advance.

Last edited by Ana Vasconcelos; 21 Sep 2016, 14:02.
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#11

21 Sep 2016, 14:12

reduce the number of country names to 9 per inlist expression. In the help it says that it takes 10 strings, and I thought that meant 10 strings to compare to. But it seems as it's 10 strings total, so that country is the first string, plus 9 comparison strings.

Alfonso Sanchez-Penalver
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35211
#12

21 Sep 2016, 16:08

Alternatively, create a minimal dataset containing the classification and then merge datasets. Explained at http://www.stata.com/support/faqs/da...ets/index.html

To get a dataset with just the country names,

Code:

bysort country: keep if _n == 1
1 like
Comment
Ana Vasconcelos

Join Date: Aug 2016

Posts: 193
#13

22 Sep 2016, 07:54

Hello. Thank your for your help...I still have a problem with the code. I tryed to run the command to generate "groups" and it appears an error saying: "too many literals". I don't find the solution for this problem.. Can somenone help me?

I used the following code:

gen groups = 1 * inlist(country, "UAE", "GERMANY", "FINLAND", "FRANCE", "ESTONIA", "ARGENTINA", "AUSTRALIA", "AUSTRIA", "BAHAMAS") + ///
1 * inlist(country, "BELGIUM", "BRUNEI", "CANADA", "CHILE", "CROATIA", "CYPRUS", "CZECH_REPUBLIC", "DENMARK", "JAPAN") + ///
1 * inlist(country, "ISRAEL", "ITALY", "GREECE", "HONG KONG", "HUNGARY", "ICELAND", "KOREA_SOUTH", "KUWAIT", "LATVIA") + ///
1 * inlist(country, "LUXEMBOURG", "MALTA", "NETHERLANDS", "NEW_ZEALAND", "NORWAY", "OMAN", "POLAND", "PORTUGAL", "QATAR") + ///
1 * inlist(country, "SINGAPORE", "SLOVAKIA", "SLOVENIA", "SPAIN", "SWEDEN", "SWITZERLAND", "TAIWAN", "TRINIDAD_TOBAGO") + ///
1 * inlist(country, "UNITED_STATES", "URUGUAY", "BAHRAIN", "IRELAND", "LITHUANIA", "SAUDI_ARABIA", "UNITED_KINGDOM") + ///
2 * inlist(country, "ALBANIA", "ALGERIA", "ANGOLA", "AZERBAIJAN", "BELARUS", "BOTSWANA", "BRAZIL", "BULGARIA", "CHINA") + ///
2 * inlist(country, "COSTA_RICA", "CUBA", "DOMINICA_REPUBLIC", "ECUADOR", "GABON", "GUYANA", "IRAN", "IRAQ", "JAMAICA") + ///
2 * inlist(country, "KAZAKHSTAN", "LEBANON", "LIBYA", "MALAYSIA", "MEXICO", "NAMIBIA", "PANAMA", "PARAGUAY", "PERU") + ///
2 * inlist(country, "RUSSIA", "SERBIA", "SOUTH_AFRICA", "SURINAME", "THAILAND", "TURKEY", "VENEZUELA") + ///
2 * inlist(country,"COLOMBIA", "JORDAN", "ROMANIA") + ///
3 * inlist(country, "COTE_DIVOIRE", "CONGO", "ARMENIA", "BANGLADESH", "BOLIVIA", "CAMEROON", "EGYPT", "EL_SALVADOR", "GHANA") + ///
3 * inlist(country, "HONDURAS", "INDIA", "INDONESIA", "KENYA","MOLDOVA", "MONGOLIA", "MOROCCO", "MYANMAR", "NICARAGUA") + ///
3 * inlist(country, "PAKISTAN", "PAPUA_NEW_GUINEA", "PHILIPPINES", "SRI_LANKA", "SUDAN", "SYRIA", "TUNISIA", "UKRAINE") + ///
3 * inlist(country, "YEMEN", "ZAMBIA", "GUATEMALA", "NIGERIA", "VIETNAM") + ///
4 * inlist(country, "BURKINA_FASO", "CONGO_DR", "ETHIOPIA", "GAMBIA", "GUINEA", "GUINEA_BISSAU", "HAITI", "KOREA_DPR", "LIBERIA") + ///
4 * inlist(country, "MALAWI", "MALI", "MOZAMBIQUE", "NIGER","SENEGAL", "SIERRA_LEONE", "SOMALIA", "TANZANIA") + ///
4 * inlist(country, "UGANDA", "ZIMBABWE", "MADAGASCAR", "TOGO")

label def gnames ///
1 "High Income" ///
2 "Upper Middle Income" ///
3 "Lower Middle Income" ///
4 "Low Income"

label val groups gnames
label var groups "Country income groups"

Thank you in advance.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35211
#14

22 Sep 2016, 08:19

You already have a quite different solution.

Alternatively to the alternatively, split the command into several:

Code:

generate replace replace replace
Comment
Ana Vasconcelos

Join Date: Aug 2016

Posts: 193
#15

27 Sep 2016, 16:10

Hello. I split the command into generate and replace and it worked.
Thank you for your suggestion.

Last edited by Ana Vasconcelos; 27 Sep 2016, 16:15.
Comment

Announcement