Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dividing countries into low and middle income and high income through income percentiles

    Hello everyone,

    I am trying to study the relationship between financial development and economic growth in a panel of 135 countries from 1960-2015. I am using two-step system GMM and I have already conducted my baseline regressions with the whole panel of countries. Now my aim is to split the whole sample of countries into high income and middle- and low-income (2 groups) based on their level of GDP per capita. My idea is to do this through percentiles. All the countries that will have a GDP per capita less than the 50th percentile will get a dummy equal to one. All the others (those above the 50th percentile) will get a dummy equal to 0.

    I already tried the following command to split the countries in percentiles. However, as it is shown in the output this is not the correct command as the GDP cannot be 0.06 or 0.9. I would highly appreciate if somebody would suggest the command to divide the countries based on the description I wrote above or even other suggestions.

    pshare estimate rgdpc, nquantiles (2)

    Percentile shares (proportion) Number of obs = 1,388


    rgdpc Coef. Std. Err. [95% Conf. Interval]

    0-50 .066562 .0028395 .0609918 .0721321
    50-100 .933438 .0028395 .9278679 .9390082

    Thank you!

  • #2
    I have not used the community contributed -pshare- command, but an examination of the help for it (-ssc describe pshare-) shows that it is not at all a command to calculate percentiles, but rather one to calculate "percentile shares," that is, the share of some variable corresponding to different percentile ranks.
    The 0.066 and 0.0933 would therefore be shares (proportions) of total GDP held by countries above and below the 50th percentile (presuming your data set is appropriate), and would have to be between 0 and 1. When the output of a command confuses you, looking at the Stata help for it is a good thing to do.

    There are many built-in Stata commands to obtain percentile values, including the 50th. -summarize YourVariable, detail- is one. -centile- is another useful command here.

    Setting this aside: Taking a continuous variable like GDP and turning it into a dummy variable for use as a predictor is not something most people would recommend, as it discards a tremendous amount of useful information. I don't think you'd want to treat a country at the 49th percentile of GDP as categorically different from one at the 51st percentile, or that those two countries are as different in GDP as are a pair ranked at the 5th and 95th percentiles. There are situations in which categorization like this is useful, but I don't think this is one of them.

    Comment


    • #3
      I agree with Mike Lacy that it is usally not a good idea to categorize a continuous variable, but I think that it can be useful in some cases. For example, when you suspect an effect might be different for countries from different income groups, it could be useful to create a categorical variable and interact this with the variable of interest. Of course, it is always questionable how to divide these income groups. In any case, I would warn against dividing by percentiles, as the outcome categories will depend on the distribution of the income of countries you have. If this is skewed, your categories will not be very meaningful.

      Something I would suggest looking at is income groups as defined by the World Bank, who have very clear GNI per capita thresholds for different income groups.

      Comment


      • #4
        Having panel data like you do, raises questions regarding how you want to implement your classification:

        1. Do you want to classify countries in the initial year, 1960, and then to assign this classification over all years?

        2. Do you want to treat this as pooled data, which you seem to be doing in your attempt in #1?

        3. Do you want to classify them year by year, where you use the GDP distribution for the given year as a reference?

        The most useful thing for you would probably be using the -egen, pctile()- function, which can calculate you the cutoff point, e.g., to do 3. above

        (pseudo code follows because you have not shown your data using -dataex-)

        Code:
        egen mediangdp = pctile(gdp), by(year) p(50)
        gen highincome = gdp>mediangdp

        Comment


        • #5
          Thank you for your answers. What I want to do is to look how financial development (in my case I have a proxy; credit to private sector / GDP) affects economic growth in countries with different income levels.

          Comment

          Working...
          X