Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace value in many observations and variables

    Dear all
    I have a database of around 50 variables and 200 observations. I would like to replace with the value 0, in those observations that says in the all variables the word 'quantity'.
    For example, from the following database, I would like to replace the word quantity with the value 0 in variable 1 and 2.
    Observation Var 1 Var 2
    1 400 200
    2 quantity 800
    3 300 quantity
    4 quantity 1000
    The question is, what command can I use to do it in the 50 variables that I have and in the 200 observations, in a simple way?
    Thank you very much!

  • #2
    The tableau you show is missing crucial information for solving your problem. It also cannot possibly be your real data, because Var 1 and Var 2 are not legal variable names (spaces are not permitted). So I'm going to assume the real variable names are Var1 and Var2 (and up through Var50), and that the variables are strings, not value-labeled numerics. If I have guessed wrong, then this code will not work and we will both have wasted our time. In the future, be sure to use the -dataex- command to post actual example Stata data instead of leaving it to people to guess what you are working with.

    Code:
    forvalues i = 1/50 {
        replace Var`i' = "0" if Var`i' == "quantity"
    }
    I also imagine that ultimately you want to change these variables to numeric and do some calculations with them. So use the -destring- command for that. (-help destring- if you are not familiar with it.)

    If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.


    Comment


    • #3
      Greetings-
      I have a similar situation and tried to apply the same code to my situation and received an varlist required error and not sure how to address it. I am using census data from 9 African countries, 2 of which don't include data on the urban/rural variable so I have done my research using the geographic level codes provided in the data to determine which observations would count as rural vs urban so I want to replace missing values in the urban variable column with either 1 for rural or 2 for urban. My initial attempt is here: THese observations (269314 through 277472) all live in an urban district in Botswana so I want to change the current missing designation to 2.

      forvalues i = 269314/277472 {
      replace `urban' = "2" if `urban' == "."
      }

      varlist required
      r(100);

      I need to do this for 93 districts (740,000 observations within these 2 countries) If there is a faster/better way, I would love to hear it!
      Thank you

      Comment


      • #4
        first, you never refer to "`i'" within your loop so it is not what you want

        second, there appears to be no need for a loop here; try the following:
        Code:
        replace urban=2 if urban==. in 269314/277472
        added in edit: just realized that your query in #3 is written as though the variable is a string variable and I answered as though it is numeric - in my code, just surround the 2 and the . with double quote marks as in your #3 if your variable "urban" is actually a string variable

        Comment


        • #5
          Thank you Rich!! I figured there must be an easier way!

          Comment


          • #6
            Hello,

            I have a problem I believe is similar to the original question. I have a series of variables representing high school graduation rates. They are currently string variables. Although most are numerical, when the cohort size gets too small, the observations are suppressed by utilizing ranges (e.g. "85-89," "GE90"). I'd like to replace the ranges with their midpoint where ever possible, but would like to find an easier way than going variable by variable and manually replacing the data. I thought loops might be the answer. Here's an example of what I'm working with.

            Code:
            clear
            input str5(all_rate_1819 mwh_rate_1819 mbl_rate_1819 mhi_rate_1819)
            "88"    "89"    "70-79" "85-89"
            "88"    "93"    "80"    "90-94"
            "79"    "81"    "60-69" "GE50" 
            "83"    "85"    "70-74" "75-79"
            "85-89" "85-89" "GE90"  "PS"   
            end
            Any suggestions would be greatly appreciated.

            Comment


            • #7
              See -help destring- to learn about converting valid numeric strings to numeric variables.
              Code:
              clear
              input str5(all_rate_1819 mwh_rate_1819 mbl_rate_1819 mhi_rate_1819)
              "88"    "89"    "70-79" "85-89"
              "88"    "93"    "80"    "90-94"
              "79"    "81"    "60-69" "GE50"
              "83"    "85"    "70-74" "75-79"
              "85-89" "85-89" "GE90"  "PS"   
              end
              //
              foreach v of varlist all_rate_1819 mwh_rate_1819 mbl_rate_1819 mhi_rate_1819 {
                 // Valid ones
                 destring `v', gen(temp) force
                 rename temp `v'_num
                 // Odd ones
                 replace `v'_num = (real(substr(`v', 1, 2)) + real(substr(`v', 4,2)))/2 ///
                    if missing(`v'_num)
                 // GE halfway to 100
                 replace `v'_num = ( real(substr(`v', 3, 2)) + 100)/2 ///
                    if substr(`v', 1, 2) == "GE"
              }

              Comment


              • #8
                Thank you, Mike! I'm familiar with -destring-, loops just always give me trouble. I appreciate the help.

                Comment


                • #9
                  I have a follow up Q. Re: the "PS" observation in the above code, what is best to do with these? I tried to turn them into missing variables using:
                  Code:
                  foreach v of varlist all_rate_1819 mwh_rate_1819 mbl_rate_1819 mhi_rate_1819 {
                      replace `v' = . if `v' == "PS"
                  }
                  end
                  but that didn't work. Is my best bet to just start a new variable?

                  Comment


                  • #10
                    Never mind my last question, I figured it out. Thanks all!

                    Comment


                    • #11
                      Hello, I have a result of tabstat of age by (currentgrad) s(m min max) as
                      current grade max
                      p1 21
                      p2 33
                      p3 31
                      and so on
                      I am supposed to use a state command that would change the max age
                      to "7" for "p1" and "8" for p2 and so on

                      I need help

                      Comment

                      Working...
                      X