Shifting values over one variable in a row

Jane Evans

Join Date: Sep 2017

Posts: 13
#1

Shifting values over one variable in a row

26 Jun 2018, 02:26

Hi Everyone,

We are cleaning data that was entered manually into a csv and have some issues where in several observations, all entries were somehow shifted over one. For example, a subset of our data might look like this:

ID Gender Language Income v25

1 F 3 5000

2 M 3 300

3 F 2 100

4 F 1 3000

where the values for ID 2 need to be shifted one to the left. I am wondering if there is a way to automate this shifting of values without having to replace values in variables one by one. I read in another post that you can reshape long and then use by and subscripting, but I am not sure how to reshape this dataset as there are no common variable stub names.

Any advice would be appreciated!

Thanks!
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

26 Jun 2018, 05:04

If I were confronted with this problem, my likely approach would be to open the original CSV file in Excel and deal with the problems in Excel, deleting cells so that Excel can then shift the rest of the row left. The problem with using a Stata program is that your data really requires careful review for other problems hidden by the effects of the problem you have detected. Using a program may well keep you from that careful review.

Having imported the flawed file, you now have what should be a numeric variable (Language) treated as a string variable by Stata because of the "M" for ID 2. So along with getting things to line up correctly, you then have to destring variables suffering from that problem.

Last edited by William Lisowski; 26 Jun 2018, 05:41.
1 like
Comment

eric_a_booth

Join Date: Apr 2014
Posts: 288

26 Jun 2018, 06:37

Code:

clear
input ID       Gender str5(Language)    Income    v25
. 1    "F"    3    5000    
. 2    "M"    3    300
. 3    "F"    2    100    
. 4    "F"    1    3000
end
l


tostring *, replace 
    

ds  //select everything; edit this to select subset of data
token `"`r(varlist)'"'

while `"`1'"' != "" & `"`2'"' != "" {
    tr: replace `1' = `2' if mi(`1') | `1'=="." 
    mac shift
    }
    
    destring *, replace
    drop v25
    l

Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

Comment

Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#4

26 Jun 2018, 08:54

1. I do agree with William that it would be "safer" if Jane could be dealing with the problem within original excel file. Any suggested code in Stata, even it might be helpful, still requires very carefully rechecking and some destring works.

2. The code of Eric is very insightful, but it seems to me that the issue of Jane is a little different than the example within the code. Particularly, if I might be correct, the original data of Jane would looks like:

Code:

clear input byte id str1(gender language) int(income v25) 1 "F" "3" 5000 . 2 "" "M" 3 300 3 "F" "2" 100 . 4 "F" "1" 3000 . end list, noobs +--------------------------------------+ | id gender language income v25 | |--------------------------------------| | 1 F 3 5000 . | | 2 M 3 300 | | 3 F 2 100 . | | 4 F 1 3000 . | +--------------------------------------+

wherein, v25 is unexpected variables and Jane only want to "move 1 left step" for id==2 (where gender is missing, making all the info be pushed wrongly in 1 "right step").

Other relevant issue is that: There still might be some missing values with in the original variables (language, income), which are still expected to be moved left when gender is "wrongly" missing.

3. Then Jane, please notice FAQ, particularly the question 12.2, and use dataex to give out here a small sample of your data. Upon your sample, the specific suggestion for your issue could be provided effectively. And quick, I believe.

Cheers.
Comment

Announcement

Shifting values over one variable in a row

Comment

Comment

Comment