Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I create groups of observations in a panel data?

    Hello,

    I have a panel data for multiple countries at a quarterly frequency. I would like to group the countries by "high income countries", "middle income countries" and "lower income countries" in order to compare the regression results by groups of countries.
    Can someone help me with the code?
    Thank you in advance.

  • #2
    Ana:
    are you looking for something along the following lines?
    Code:
    set obs 3
    g income=100000 in 1
    replace income=100000/2 in 2
    replace income=100000/3 in 3
    g income_flag=1 if income>=100000
    replace income_flag=2 if income <100000 & income>=50000
    replace income_flag=3 if income <50000
    label define income_flag 1 "high income" 2 "middle income" 3 "low income"
    label val income_flag income_flag
    tab income_flag
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      To extend what Carlo provided but for regression, you start by making the indicator for the three groups:
      g income_flag=1 if income>=100000 replace income_flag=2 if income <100000 & income>=50000 replace income_flag=3 if income <50000 Then you just interact that indicator with your x variables:
      regress y i.income_flag##(c.x1 c.x2)

      Then you get the labels by issuing:
      regress ,coefl

      Now you can just use the coefl labels to do whatever tests you want.

      test _b[1b.income_flag#c.x1]=_b[2.income_flag#c.x1]

      Comment


      • #4
        Hello,

        Thank you for your suggestion. What I need to do is to group the countries but using the name of the country (not using the value of the income has you said before). For example, I have data on:


        - Australia (high income economy)
        - Austria (high income economy)
        - Italy (high income economy)
        - Japan (high income economy)
        - Malta (high income economy)
        - Argentina (upper middle income economy)
        - Brazil (upper middle income economy)
        - Chile (upper middle income economy)
        - Angola (lower middle income economy)
        - Bolivia (lower middle income economy)
        - Bangladesh (low income economy)
        - Burkina Faso (low income economy)

        Now, using the name of the countries, I need to group them by "high income economies", "upper middle income economies", "lower middle income economies", "low income economies".
        Can you help me with the code?

        Thank you in advance.

        Comment


        • #5
          I believe the following ought to work.

          Code:
          gen groups = 1 * inlist(country, "Australia", "Austria", "Italy", Japan", "Malta") + ///
                       2 * inlist(country, "Argentina", "Brazil", "Chile") + ///
                       3 * inlist(country, "Angola", "Bolivia", "Bangladesh", "Burkina Faso")
          label def gnames ///
                 1 "High Income"  ///
                 2 "Upper Income"  ///
                 3 "Low Income"
          label val groups gnames
          label var groups "Country income groups"
          Alfonso Sanchez-Penalver

          Comment


          • #6
            Thank you for your suggestion Alfonso. However, when I apply the aforementioned code, it appears the following error: "expression too long" r (130).
            Can someone help me with this error?
            Thank you in advance.
            Last edited by Ana Vasconcelos; 20 Sep 2016, 16:15.

            Comment


            • #7
              There is a small typo in Alfonso's code; "Japan" should appear thus; otherwise it appears quite unproblematic and far from any length limits.

              This runs fine:

              Code:
              clear 
              set obs 1
              gen country = "Australia" 
              gen groups = 1 * inlist(country, "Australia", "Austria", "Italy", "Japan", "Malta") + ///
                           2 * inlist(country, "Argentina", "Brazil", "Chile") + ///
                           3 * inlist(country, "Angola", "Bolivia", "Bangladesh", "Burkina Faso")
              label def gnames ///
                     1 "High Income"  ///
                     2 "Upper Income"  ///
                     3 "Low Income"
              label val groups gnames
              label var groups "Country income groups"
              So I have to guess that you are trying quite different code. Reporting an implausible error with code you don't show us poses an unanswerable question.

              FAQ Advice #12 applies!

              Comment


              • #8
                Thank you for your suugestion Nich Cox. The code that I apply was:


                gen groups = 1 * inlist(country, "UAE", "GERMANY", "FINLAND", "FRANCE", "ESTONIA", "ARGENTINA", "AUSTRALIA", "AUSTRIA", "BAHAMAS", "BAHRAIN", ///
                "BELGIUM", "BRUNEI", "CANADA", "CHILE", "CROATIA", "CYPRUS", "CZECH_REPUBLIC", "DENMARK", "JAPAN", "IRELAND", "ISRAEL", "ITALY", "GREECE", "HONG KONG", ///
                "HUNGARY", "ICELAND", "KOREA_SOUTH", "KUWAIT", "LATVIA", "LITHUANIA", "LUXEMBOURG", "MALTA", "NETHERLANDS", "NEW_ZEALAND", "NORWAY", "OMAN", "POLAND", "PORTUGAL", ///
                "QATAR", "SAUDI_ARABIA", "SINGAPORE", "SLOVAKIA", "SLOVENIA", "SPAIN", "SWEDEN", "SWITZERLAND", "TAIWAN", "TRINIDAD_TOBAGO", "UNITED_KINGDOM", "UNITED_STATES", "URUGUAY") + ///
                2 * inlist (country, "ALBANIA", "ALGERIA", "ANGOLA", "AZERBAIJAN", "BELARUS", "BOTSWANA", "BRAZIL", "BULGARIA", "CHINA", "COLOMBIA", "COSTA_RICA", "CUBA", "DOMINICA_REPUBLIC", ///
                "ECUADOR", "GABON", "GUYANA", "IRAN", "IRAQ", "JAMAICA", "JORDAN", "KAZAKHSTAN", "LEBANON", "LIBYA", "MALAYSIA", "MEXICO", "NAMIBIA", "PANAMA", "PARAGUAY", "PERU", "ROMANIA", ///
                "RUSSIA", "SERBIA", "SOUTH_AFRICA", "SURINAME", "THAILAND", "TURKEY", "VENEZUELA") + ///
                3 * inlist (country, "COTE_DIVOIRE", "CONGO", "ARMENIA", "BANGLADESH", "BOLIVIA", "CAMEROON", "EGYPT", "EL_SALVADOR", "GHANA", "GUATEMALA", "HONDURAS", "INDIA", "INDONESIA", "KENYA", ///
                "MOLDOVA", "MONGOLIA", "MOROCCO", "MYANMAR", "NICARAGUA", "NIGERIA", "PAKISTAN", "PAPUA_NEW_GUINEA", "PHILIPPINES", "SRI_LANKA", "SUDAN", "SYRIA", "TUNISIA", "UKRAINE", "VIETNAM", ///
                "YEMEN", "ZAMBIA") + ///
                4 * inlist (country, "BURKINA_FASO", "CONGO_DR", "ETHIOPIA", "GAMBIA", "GUINEA", "GUINEA_BISSAU", "HAITI", "KOREA_DPR", "LIBERIA", "MADAGASCAR", "MALAWI", "MALI", "MOZAMBIQUE", "NIGER", ///
                "SENEGAL", "SIERRA_LEONE", "SOMALIA", "TANZANIA", "TOGO", "UGANDA", "ZIMBABWE")



                label def gnames ///
                1 "High Income" ///
                2 "Upper Middle Income" ///
                3 "Lower Middle Income" ///
                4 "Low Income"

                label val groups gnames
                label var groups "Country income groups"


                However, in the end it appears the following message: "expression too long" r (130).
                Can someone help me with the code?
                Thank you very much.

                Comment


                • #9
                  See help inlist. It has a limit of 10 arguments if they are strings. I think the first inlist has many more than that. You can always break it up into how many you need since the countries are mutually exclusive. For example:
                  Code:
                  gen groups = 1 * inlist(country, "UAE", "GERMANY", "FINLAND", "FRANCE", "ESTONIA", "ARGENTINA", "AUSTRALIA", "AUSTRIA", "BAHAMAS", "BAHRAIN", "BELGIUM") + ///,
                      1 * inlist(country, "BRUNEI", "CANADA", "CHILE", "CROATIA", "CYPRUS", "CZECH_REPUBLIC", "DENMARK", "JAPAN", "IRELAND", "ISRAEL") + ///
                      1 * inlist(country, "ITALY", "GREECE", "HONG KONG", "HUNGARY", "ICELAND", "KOREA_SOUTH", "KUWAIT", "LATVIA", "LITHUANIA", "LUXEMBOURG") + ///
                      1 * inlist(country,  "MALTA", "NETHERLANDS", "NEW_ZEALAND", "NORWAY", "OMAN", "POLAND", "PORTUGAL", "QATAR", "SAUDI_ARABIA", "SINGAPORE") + ///
                      1 * inlist(country, "SLOVAKIA", "SLOVENIA", "SPAIN", "SWEDEN", "SWITZERLAND", "TAIWAN", "TRINIDAD_TOBAGO", "UNITED_KINGDOM", "UNITED_STATES", "URUGUAY")
                  Do that for all the categories you have more than 10. I think that I have 10 strings in each of the inlist functions in the code above. If I have more I apologize, if I have less it will still work.
                  Alfonso Sanchez-Penalver

                  Comment


                  • #10
                    Thank you for your help Alfonso. I break up the arguments in the way that you said..however I still have the same error: "expression too long"...
                    Can someone help me with the error?

                    I use the following code:

                    gen groups = 1 * inlist(country, "UAE", "GERMANY", "FINLAND", "FRANCE", "ESTONIA", "ARGENTINA", "AUSTRALIA", "AUSTRIA", "BAHAMAS", "BAHRAIN") + ///
                    1 * inlist(country, "BELGIUM", "BRUNEI", "CANADA", "CHILE", "CROATIA", "CYPRUS", "CZECH_REPUBLIC", "DENMARK", "JAPAN", "IRELAND") + ///
                    1 * inlist(country, "ISRAEL", "ITALY", "GREECE", "HONG KONG", "HUNGARY", "ICELAND", "KOREA_SOUTH", "KUWAIT", "LATVIA", "LITHUANIA") + ///
                    1 * inlist(country, "LUXEMBOURG", "MALTA", "NETHERLANDS", "NEW_ZEALAND", "NORWAY", "OMAN", "POLAND", "PORTUGAL", "QATAR", "SAUDI_ARABIA") + ///
                    1 * inlist(country, "SINGAPORE", "SLOVAKIA", "SLOVENIA", "SPAIN", "SWEDEN", "SWITZERLAND", "TAIWAN", "TRINIDAD_TOBAGO", "UNITED_KINGDOM") + ///
                    1 * inlist(country, "UNITED_STATES", "URUGUAY") + ///
                    2 * inlist(country, "ALBANIA", "ALGERIA", "ANGOLA", "AZERBAIJAN", "BELARUS", "BOTSWANA", "BRAZIL", "BULGARIA", "CHINA", "COLOMBIA") + ///
                    2 * inlist(country, "COSTA_RICA", "CUBA", "DOMINICA_REPUBLIC", "ECUADOR", "GABON", "GUYANA", "IRAN", "IRAQ", "JAMAICA", "JORDAN") + ///
                    2 * inlist(country, "KAZAKHSTAN", "LEBANON", "LIBYA", "MALAYSIA", "MEXICO", "NAMIBIA", "PANAMA", "PARAGUAY", "PERU", "ROMANIA") + ///
                    2 * inlist(country, "RUSSIA", "SERBIA", "SOUTH_AFRICA", "SURINAME", "THAILAND", "TURKEY", "VENEZUELA") + ///
                    3 * inlist(country, "COTE_DIVOIRE", "CONGO", "ARMENIA", "BANGLADESH", "BOLIVIA", "CAMEROON", "EGYPT", "EL_SALVADOR", "GHANA", "GUATEMALA") + ///
                    3 * inlist(country, "HONDURAS", "INDIA", "INDONESIA", "KENYA","MOLDOVA", "MONGOLIA", "MOROCCO", "MYANMAR", "NICARAGUA", "NIGERIA") + ///
                    3 * inlist(country, "PAKISTAN", "PAPUA_NEW_GUINEA", "PHILIPPINES", "SRI_LANKA", "SUDAN", "SYRIA", "TUNISIA", "UKRAINE", "VIETNAM") + ///
                    3 * inlist(country, "YEMEN", "ZAMBIA") + ///
                    4 * inlist(country, "BURKINA_FASO", "CONGO_DR", "ETHIOPIA", "GAMBIA", "GUINEA", "GUINEA_BISSAU", "HAITI", "KOREA_DPR", "LIBERIA", "MADAGASCAR") + ///
                    4 * inlist(country, "MALAWI", "MALI", "MOZAMBIQUE", "NIGER","SENEGAL", "SIERRA_LEONE", "SOMALIA", "TANZANIA", "TOGO") + ///
                    4 * inlist(country, "UGANDA", "ZIMBABWE")




                    label def gnames ///
                    1 "High Income" ///
                    2 "Upper Middle Income" ///
                    3 "Lower Middle Income" ///
                    4 "Low Income"

                    label val groups gnames
                    label var groups "Country income groups"


                    Thank you in advance.
                    Last edited by Ana Vasconcelos; 21 Sep 2016, 14:02.

                    Comment


                    • #11
                      reduce the number of country names to 9 per inlist expression. In the help it says that it takes 10 strings, and I thought that meant 10 strings to compare to. But it seems as it's 10 strings total, so that country is the first string, plus 9 comparison strings.
                      Alfonso Sanchez-Penalver

                      Comment


                      • #12
                        Alternatively, create a minimal dataset containing the classification and then merge datasets. Explained at http://www.stata.com/support/faqs/da...ets/index.html

                        To get a dataset with just the country names,

                        Code:
                        bysort country: keep if _n == 1

                        Comment


                        • #13
                          Hello. Thank your for your help...I still have a problem with the code. I tryed to run the command to generate "groups" and it appears an error saying: "too many literals". I don't find the solution for this problem.. Can somenone help me?

                          I used the following code:

                          gen groups = 1 * inlist(country, "UAE", "GERMANY", "FINLAND", "FRANCE", "ESTONIA", "ARGENTINA", "AUSTRALIA", "AUSTRIA", "BAHAMAS") + ///
                          1 * inlist(country, "BELGIUM", "BRUNEI", "CANADA", "CHILE", "CROATIA", "CYPRUS", "CZECH_REPUBLIC", "DENMARK", "JAPAN") + ///
                          1 * inlist(country, "ISRAEL", "ITALY", "GREECE", "HONG KONG", "HUNGARY", "ICELAND", "KOREA_SOUTH", "KUWAIT", "LATVIA") + ///
                          1 * inlist(country, "LUXEMBOURG", "MALTA", "NETHERLANDS", "NEW_ZEALAND", "NORWAY", "OMAN", "POLAND", "PORTUGAL", "QATAR") + ///
                          1 * inlist(country, "SINGAPORE", "SLOVAKIA", "SLOVENIA", "SPAIN", "SWEDEN", "SWITZERLAND", "TAIWAN", "TRINIDAD_TOBAGO") + ///
                          1 * inlist(country, "UNITED_STATES", "URUGUAY", "BAHRAIN", "IRELAND", "LITHUANIA", "SAUDI_ARABIA", "UNITED_KINGDOM") + ///
                          2 * inlist(country, "ALBANIA", "ALGERIA", "ANGOLA", "AZERBAIJAN", "BELARUS", "BOTSWANA", "BRAZIL", "BULGARIA", "CHINA") + ///
                          2 * inlist(country, "COSTA_RICA", "CUBA", "DOMINICA_REPUBLIC", "ECUADOR", "GABON", "GUYANA", "IRAN", "IRAQ", "JAMAICA") + ///
                          2 * inlist(country, "KAZAKHSTAN", "LEBANON", "LIBYA", "MALAYSIA", "MEXICO", "NAMIBIA", "PANAMA", "PARAGUAY", "PERU") + ///
                          2 * inlist(country, "RUSSIA", "SERBIA", "SOUTH_AFRICA", "SURINAME", "THAILAND", "TURKEY", "VENEZUELA") + ///
                          2 * inlist(country,"COLOMBIA", "JORDAN", "ROMANIA") + ///
                          3 * inlist(country, "COTE_DIVOIRE", "CONGO", "ARMENIA", "BANGLADESH", "BOLIVIA", "CAMEROON", "EGYPT", "EL_SALVADOR", "GHANA") + ///
                          3 * inlist(country, "HONDURAS", "INDIA", "INDONESIA", "KENYA","MOLDOVA", "MONGOLIA", "MOROCCO", "MYANMAR", "NICARAGUA") + ///
                          3 * inlist(country, "PAKISTAN", "PAPUA_NEW_GUINEA", "PHILIPPINES", "SRI_LANKA", "SUDAN", "SYRIA", "TUNISIA", "UKRAINE") + ///
                          3 * inlist(country, "YEMEN", "ZAMBIA", "GUATEMALA", "NIGERIA", "VIETNAM") + ///
                          4 * inlist(country, "BURKINA_FASO", "CONGO_DR", "ETHIOPIA", "GAMBIA", "GUINEA", "GUINEA_BISSAU", "HAITI", "KOREA_DPR", "LIBERIA") + ///
                          4 * inlist(country, "MALAWI", "MALI", "MOZAMBIQUE", "NIGER","SENEGAL", "SIERRA_LEONE", "SOMALIA", "TANZANIA") + ///
                          4 * inlist(country, "UGANDA", "ZIMBABWE", "MADAGASCAR", "TOGO")




                          label def gnames ///
                          1 "High Income" ///
                          2 "Upper Middle Income" ///
                          3 "Lower Middle Income" ///
                          4 "Low Income"

                          label val groups gnames
                          label var groups "Country income groups"

                          Thank you in advance.

                          Comment


                          • #14
                            You already have a quite different solution.

                            Alternatively to the alternatively, split the command into several:

                            Code:
                            generate 
                            
                            replace 
                            
                            replace 
                            
                            replace

                            Comment


                            • #15
                              Hello. I split the command into generate and replace and it worked.
                              Thank you for your suggestion.
                              Last edited by Ana Vasconcelos; 27 Sep 2016, 16:15.

                              Comment

                              Working...
                              X