combining two cross selection datasets with stata 18

Berli Machado

Join Date: Feb 2024

Posts: 4
#1

combining two cross selection datasets with stata 18

27 Feb 2024, 17:11

Good evening everyone,

I wanr to make a repeated cross-sectional data analysis with two cross selection datasets of the year 2017 and 2021 with different sample of individuals for each year in stata 18. I combined both datasets with the append comand and have all variables together. II want to make a multiple linear regression, does anyone knows if I have to make further general changes on the new combined datasets or can I use the dataset?

I use stata 18 and have already erased the missing values.

This is my first post, so I apolize if I make any mistakes.

Thank you so much for your help!

Last edited by Berli Machado; 27 Feb 2024, 17:18.
Tags: cross sectional, regression
George Ford

Join Date: Aug 2014

Posts: 3039
#2

27 Feb 2024, 18:09

If you've appended the data and both sets had the same variable names (which should be apparent in data view), then you're ready to go.

You may want to mark the years if there's no variable for year.
1 like
Comment
Berli Machado

Join Date: Feb 2024

Posts: 4
#3

28 Feb 2024, 07:34

George, thank you for your response!

What you said about the variable names confused me a little bit, what do you mean by same variable names? The names of the variable were not the same for the 2 datasets but since its not a panel, that should not be a problem for the analysis or am I wrong?

The variables that are for 2017 have missing values for every observation of 2021 and with the variables for 2021 I did the same thing. Would that be correct?

I also changed the names of the variable so that I could type them more easily, for example I used income17 and income21, but they are different variables for 17 and for 21.

Last edited by Berli Machado; 28 Feb 2024, 07:43.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1109
#4

28 Feb 2024, 07:53

Hello Berli Machado. I think it would help a lot if you used two -dataex- commands to show readers what the two datasets look like--see Section 12 of the FAQ. Ten observations per dataset should be sufficient. Thanks.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment
George Ford

Join Date: Aug 2014

Posts: 3039
#5

28 Feb 2024, 08:11

-append- stacks the data on variable name. you have 1 cross section on top of another. If the variable names are different, then you get a bunch of missings on some variable for period 0 and the same for period 1.

If you want to do analysis on the entire dataset, you have to homogenize the variable names prior to append.

Say you wanted to know if Y increased between period 0 and 1.

reg y x1 x2 period1

The coefficient on period1 is a direct test of that. To run that model, you have to have y, x1, x2 in both periods.
1 like
Comment
Berli Machado

Join Date: Feb 2024

Posts: 4
#6

28 Feb 2024, 09:03

Hello guys, maybe I should have said the purpose of my analysis. I want to analyse if the education of a person (measured in years) affects how much political interesse they have (from 1- not really interessed to 5 - very interessed). For that I will use two surveys from the last two national elections in germany. For education I was thinking of making a new variable with low middle and high education and controll for householdincome, age and education of the father.

Should the variables have the same names in that case?
Comment
Berli Machado

Join Date: Feb 2024

Posts: 4
#7

28 Feb 2024, 10:00

Bruce Weaver this is what I got from the comand you told me. If there is a certain way to upload the codes, I would be glad to change it, but here is the code in textformat from my logfile:

[CODE]
* Example generated by -dataex-. For more info, type help dataex
clear
input int year byte(polinteresse17 polinteresse21)
2021 . 3
2021 . 3
2021 . 2
2021 . 1
2021 . 3
2021 . 3
2021 . 4
2021 . 1
2021 . 2
2021 . 3
2021 . 3

This is the result after I combined both datasets with append. The variable "polinteresse17" is for the observations of the year 2017, thats why I only have missings for the year 2021. This is also the case for the variable "polinteresse21" within the year 2017.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1109
#8

28 Feb 2024, 10:35

Hi Berli Machado. As George Ford suggested earlier, I think you need to rename polinteresse17 and polinteresse21 to polinteresse in both files before you use -append-. If you do that, you'll end up with a file that looks something like this:

Code:

year polinteresse 2017 3 2017 3 2017 2 2017 1 2017 3 2021 3 2021 4 2021 1 2021 2 2021 3 2021 3 etc.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
1 like
Comment

Announcement

combining two cross selection datasets with stata 18

Comment

Comment

Comment

Comment

Comment

Comment

Comment