
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • identifying id's for panel data

    Dear all,

    currently I'm working on bilateral trade data for Germany. I have export and import values for Germany with 25 other countries within sectors. I have almost 900 sectors. I wanna regress bilateral German export on exchange rate and GDP with the given country. Therefore I need to specify sectors as a unique id. E.g I have data on apparel sector in 2001 for Austria. I have data on the same sector in the same year for Turkey an so on. To deal with this issue I encoded countries as numbers, so that AUS is 1 and USA is 25. Sectors are described with numbers, Some of them are 6-digits. Therefore I multiplied country number times 1000000 and added that to the sector. However it turns out stata rounds up big numbers. E.g. I have sector 610339 in 2001 for Turkey. Stata calculated the id for it as 24610340. I have sector 610341 in 2001 for TUR and stata calculated it as 24610340 again. Please help me

  • #2
    egen id = group(country sector)


    • #3
      Thank you! It works. After few hours browsing the Internet and trying different approaches I am finally one little step forward


      • #4
        Backing up to post #1, the problem with the approach you took is one of precision, not of Stata rounding.

        For a quick explanation, see the FAQ at

        For more, see the output from help precision.

        Here is an example that uses what you learn from those sources to make your approach work.
        . set obs 1
        Number of observations (_N) was 0, now 1.
        . generate country = 24
        . generate sector = 610339
        . generate float id_f = country*1000000 + sector
        . generate long id_l = country*1000000 + sector
        . generate double id_d = country*1000000 + sector
        . format %10.0f id*
        . list, noobs
          | country   sector       id_f       id_l       id_d |
          |      24   610339   24610340   24610339   24610339 |
        Here are the limits on storage of decimal integers with full accuracy in the various numeric storage types. The fixed-point variables lose the 27 largest positive values to missing value codes; the similar loss for floating point variables occurs only for the largest exponent, so it doesn't affect the much smaller integer values.

        byte - 7 bits -127 100
        int - 15 bits -32,767 32,740
        long - 31 bits -2,147,483,647 2,147,483,620
        float - 24 bits -16,777,216 16,777,216
        double - 53 bits -9,007,199,254,740,992 9,007,199,254,740,992

