Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to transform my dataset to have a dataset at the municipal level?

    Hi Statalist,

    I would like to change the structure of my dataset, but I don't know how to proceed. Currently, my dataset looks as the following (let's call it Real Dataset):


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str4 URL2 double Power_MW int Submission_Year long(cod_Municipality_1 cod_Municipality_2)
    "https://www.boe.es/diario_boe/txt.php?id=BOE-A-2017-4860"  20 16770 25 10
    "https://www.boe.es/diario_boe/txt.php?id=BOE-A-2017-4859" 9.9 16958 25 10
    "https://www.boe.es/diario_boe/txt.php?id=BOE-A-2017-4858" 9.9 16964 25 10
    "https://www.boe.es/diario_boe/txt.php?id=BOE-A-2018-14107" 120 19318  8  5
    "https://www.boe.es/diario_boe/txt.php?id=BOE-A-2018-14108"  74 19318  8 21
    end
    format %tdnn/dd/CCYY Submission_Year
    label values cod_Municipality_1 cod_Municipality_1
    label def cod_Municipality_1 8 "Calzadilla de los Barros", modify
    label def cod_Municipality_1 25 "La Puebla de Arganzón", modify
    label values cod_Municipality_2 cod_Municipality_2
    label def cod_Municipality_2 5 "Bienvenida", modify
    label def cod_Municipality_2 10 "Condado de Treviño", modify
    label def cod_Municipality_2 21 "Medina de las Torres", modify
    Basically, I want to change this dataset into one at the "municipality level", in other terms something like:
    Municipalities Project_1 Project_2 Project_3 ... Cumul_Pro Power_MW_Pro1 Power_MW_Pro2 Power_MW_Pro3 ... Cumul_Power_MW_Pro Others_Municipalities_Involved
    Calzadilla de los Barros First Submission_Year appearing, in chronological way Second Submission_Year, if any, in chronological way Third Submission_Year, if any, in chronological way 3 (the cumulative count of the three projects, if three), etc. The Power in Mega Watts of Project 1 The power in Mega Watts of Project 2, if this municipality has a second project The power in Mega Watts of Project 3, if this municipality has a third project, etc. The cumulative power in Mega Watts of the projects installed in this municipality The count of the number of municipalities involved, appart the one listed in Municipalities
    We can define as a different project if the extraction's URL differs, for the same municipality.
    The main municipality (the one that appears in -Municipalities- should be the variable -Municipality_1- for each line in our Real Dataset. We then include the other municipalities participating in the project in Others_Municipalities_Involved.

    I hope it's clear what I want to achieve. Any suggestions you have for improvement are welcome!
    Thank you in advance for your help. I'm really stuck and don't really know if what I'm doing is really feasible.

    Best,

    Michael

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str4 URL2 double Power_MW int Submission_Year long(cod_Municipality_1 cod_Municipality_2)
    "https://www.boe.es/diario_boe/txt.php?id=BOE-A-2017-4860"  20 16770 25 10
    "https://www.boe.es/diario_boe/txt.php?id=BOE-A-2017-4859" 9.9 16958 25 10
    "https://www.boe.es/diario_boe/txt.php?id=BOE-A-2017-4858" 9.9 16964 25 10
    "https://www.boe.es/diario_boe/txt.php?id=BOE-A-2018-14107" 120 19318  8  5
    "https://www.boe.es/diario_boe/txt.php?id=BOE-A-2018-14108"  74 19318  8 21
    end
    format %tdnn/dd/CCYY Submission_Year
    label values cod_Municipality_1 cod_Municipality_1
    label def cod_Municipality_1 8 "Calzadilla de los Barros", modify
    label def cod_Municipality_1 25 "La Puebla de Arganzón", modify
    label values cod_Municipality_2 cod_Municipality_2
    label def cod_Municipality_2 5 "Bienvenida", modify
    label def cod_Municipality_2 10 "Condado de Treviño", modify
    label def cod_Municipality_2 21 "Medina de las Torres", modify
    
    bys cod_Municipality_1 (Submission_Year): gen which=_n
    drop URL2
    reshape wide Power_MW Submission_Year cod_Municipality_2, i(cod_Municipality_1) j(which)
    order cod_Municipality_1 Sub* Power* cod_Municipality_*
    egen count_municipalities=rownonmiss( cod_Municipality_*)
    replace count_municipalities= count_municipalities-1
    Res.:

    Code:
    . l cod_Municipality_1 Sub* Power* cod_Municipality_21 cod_Municipality_22 count
    
         +--------------------------------------------------------------------------------------------------------------------------------------------------------+
         |       cod_Municipality_1   Submissi~1   Submissi~2   Submiss~3   Power_~1   Power_~2   Power_~3    cod_Municipality_21   cod_Municipalit~22   count_~s |
         |--------------------------------------------------------------------------------------------------------------------------------------------------------|
      1. | Calzadilla de los Barros   11/21/2012   11/21/2012           .         74        120          .   Medina de las Torres           Bienvenida          2 |
      2. |    La Puebla de Arganzón   11/30/2005     6/6/2006   6/12/2006         20        9.9        9.9     Condado de Treviño   Condado de Treviño          3 |
         +--------------------------------------------------------------------------------------------------------------------------------------------------------+

    Comment


    • #3
      Hi Andrew Musau,

      Happy to read you again, Andrew!
      Thank you very much for your suggestion! It is exactly what I was looking for.


      I wish you all the best.
      Michael

      Comment


      • #4
        Hi again,

        I need some help please. I'm encountering an error:

        Code:
        // Beautiful Code provided by Prof. Andrew Musau, Molde University College. I modified some parts. All errors are mine:
        * Extract:
        format submission_year %td
        gen quarter_year = qofd(submission_year)
        format quarter_year %tq
        
        bysort cod_municipality_1 (quarter_year): gen which=_n
        drop URL2
        reshape wide Power_MW Quarter_Year cod_municipality_*, i(cod_municipality_1) j(which)
        Error obtained:

        Code:
        variable cod_municipality_11 already defined
        r(110);
        Or when I run this:

        Code:
        * [...]
        reshape wide power_mw quarter_year cod_municipality_2 cod_municipality_3 cod_municipality_4 ... cod_municipality_22, i(cod_municipality_1) j(which)
        I encounter the following error:

        Code:
        variable cod_municipality_6 not constant within cod_municipality_1
        variable cod_municipality_7 not constant within cod_municipality_1
        variable cod_municipality_8 not constant within cod_municipality_1
        variable cod_municipality_9 not constant within cod_municipality_1
            Your data are currently long. You are performing a reshape wide. You typed something like
        
                . reshape wide a b, i(cod_municipality_1) j(which)
        
            There are variables other than a, b, cod_municipality_1, which in your data. They must be constant within cod_municipality_1 because that is the only way they can fit
            into wide data without loss of information.
        
            The variable or variables listed above are not constant within cod_municipality_1. Perhaps the values are in error. Type reshape error for a list of the problem
            observations.
        
            Either that, or the values vary because they should vary, in which case you must either add the variables to the list of xij variables to be reshaped, or drop them.
        r(9);
        Does anyone know what is going on here, please?
        Thank you very much in advance!
        Last edited by Michael Duarte Goncalves; 22 Apr 2024, 08:22.

        Comment


        • #5
          Originally posted by Michael Duarte Goncalves View Post
          [CODE]

          Error obtained:

          Code:
          variable cod_municipality_11 already defined
          r(110);
          This can be resolved by starting with

          Code:
          rename cod_municipality_1 target
          drop if missing(cod_municipality_*)
          bysort target (quarter_year): gen which=_n
          reshape wide power_mw quarter_year cod_municipality_*, i(target) j(which)



          Comment


          • #6
            Hi again Andrew,

            Thanks for your time and help! I obtained another error now. I'm sorry:

            Code:
            format Submission_Year %td
            gen quarter_year = qofd(Submission_Year)
            format quarter_year %tq
            
            // Now, let's reshape everything:
            rename cod_Municipality_1 target
            drop if missing(cod_Municipality_*)
            Code:
            cod_Municipality_* invalid name
            r(198);
            I have stata version 17. I don't know if the error is because of my stata's version.
            Thank you again for your time.

            Michael
            Last edited by Michael Duarte Goncalves; 22 Apr 2024, 09:31.

            Comment


            • #7
              I have updated my code as follows, and it seems to work correctly:

              Code:
              format Submission_Year %td
              gen quarter_year = qofd(Submission_Year)
              format quarter_year %tq
              
              egen id = group(cod_Municipality_1)
              
              // Now, let's reshape everything:
              
              drop URL2
              bys id (quarter_year): gen which = _n
              rename (cod_Municipality_*) (=_)
              reshape wide Power_MW Quarter_Year Submission_Year cod_Municipality_*, i(id) j(which)
              Many thanks for your patience, time and help. I'm really grateful for the people (often experts!) who spend many hours answering questions from stressed-out users about their bugs.

              On this thread, thank you Andrew Musau !


              All the best, and hope to benefit from your help in the future.

              Michael
              Last edited by Michael Duarte Goncalves; 22 Apr 2024, 09:29.

              Comment


              • #8
                Sorry, I forgot to mention my updated post#7 is inspired by William Lisowski's answer in another thread. See below:

                Error code r(110) variable XXX already defined


                Thank you all.
                Michael

                Comment

                Working...
                X