Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data Cleaning and manipulation

    Hello Statlist,
    I have an issue with my data that I have been trying to clean and would greatly appreciate your help on this.

    I have a dataset as follows:
    donor_country recipient_country year sector value
    Austria Brazil 2002 Biodiversity 0.5
    Austria Brazil 2002 Policy 0.2
    Austria Mexico 2004 Biodiversity 0.1
    Austria Mexico 2004 Research 0.3
    Austria Colombia 2002 Preservation 0.2
    Austria Colombia 2002 Education 0.5

    I would like to create a new variable called biodiversity that is equal to value if the sector has biodiversity in it, but I want this value to repeat as long as the donor_country, recipient_country, & year don't change. So that it looks like this:

    donor_country recipient_country year sector value biodiversity
    Austria Brazil 2002 Biodiversity 0.5 0.5
    Austria Brazil 2002 Policy 0.2 0.5
    Austria Mexico 2004 Biodiversity 0.1 0.1
    Austria Mexico 2004 Research 0.3 0.1
    Austria Colombia 2002 Preservation 0.2 0
    Austria Colombia 2002 Education 0.5 0

    I tried using the following STATA command:

    sort donor_country recipient_country year sector
    egen biodiversity = total(value) if sector == "Biodiversity", by(donor_country recipient_country year)
    bysort donor_country recipient_country year: replace biodiversity = biodiversity[1] if missing(biodiversity)

    However I have noted that it works on some observations but not my all observations, Do you have any idea what I can do to fix it?

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str7 donor_country str8 recipient_country int year str12 sector float value
    "Austria" "Brazil"   2002 "Biodiversity" .5
    "Austria" "Brazil"   2002 "Policy"       .2
    "Austria" "Mexico"   2004 "Biodiversity" .1
    "Austria" "Mexico"   2004 "Research"     .3
    "Austria" "Colombia" 2002 "Preservation" .2
    "Austria" "Colombia" 2002 "Education"    .5
    end
    
    by donor_country recipient_country year (sector), sort: ///
        egen wanted = max(cond(sector == "Biodiversity", value, .))
    When you use the -if- qualifier, it restricts everything to those observations that meet the condition specified. For something that needs to involve the entire cluster of observations but selects one or some for inclusion in a calculation, you have to use the -cond()- function.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Added: Note that the code assumes, but does not verify, that within each group of observations for the same pair of countries and year there is at most one observation with sector == "Biodiversity". If that is not true, two things follow. 1. your request is undefined, and 2. the code will produce an incorrect answer. Probably before running this code you should verify this. One way to do it is to verify the stricter condition that this is true of every different sector: -isid donor_country recipient_country year sector-.
    Last edited by Clyde Schechter; 24 Feb 2024, 17:29.

    Comment


    • #3
      Thank you so much for your input on this, It worked perfectly!

      Comment

      Working...
      X