Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • combining two cross selection datasets with stata 18

    Good evening everyone,

    I wanr to make a repeated cross-sectional data analysis with two cross selection datasets of the year 2017 and 2021 with different sample of individuals for each year in stata 18. I combined both datasets with the append comand and have all variables together. II want to make a multiple linear regression, does anyone knows if I have to make further general changes on the new combined datasets or can I use the dataset?

    I use stata 18 and have already erased the missing values.

    This is my first post, so I apolize if I make any mistakes.

    Thank you so much for your help!
    Last edited by Berli Machado; 27 Feb 2024, 17:18.

  • #2
    If you've appended the data and both sets had the same variable names (which should be apparent in data view), then you're ready to go.

    You may want to mark the years if there's no variable for year.

    Comment


    • #3
      George, thank you for your response!

      What you said about the variable names confused me a little bit, what do you mean by same variable names? The names of the variable were not the same for the 2 datasets but since its not a panel, that should not be a problem for the analysis or am I wrong?

      The variables that are for 2017 have missing values for every observation of 2021 and with the variables for 2021 I did the same thing. Would that be correct?

      I also changed the names of the variable so that I could type them more easily, for example I used income17 and income21, but they are different variables for 17 and for 21.



      Last edited by Berli Machado; 28 Feb 2024, 07:43.

      Comment


      • #4
        Hello Berli Machado. I think it would help a lot if you used two -dataex- commands to show readers what the two datasets look like--see Section 12 of the FAQ. Ten observations per dataset should be sufficient. Thanks.
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment


        • #5
          -append- stacks the data on variable name. you have 1 cross section on top of another. If the variable names are different, then you get a bunch of missings on some variable for period 0 and the same for period 1.

          If you want to do analysis on the entire dataset, you have to homogenize the variable names prior to append.

          Say you wanted to know if Y increased between period 0 and 1.

          reg y x1 x2 period1

          The coefficient on period1 is a direct test of that. To run that model, you have to have y, x1, x2 in both periods.

          Comment


          • #6
            Hello guys, maybe I should have said the purpose of my analysis. I want to analyse if the education of a person (measured in years) affects how much political interesse they have (from 1- not really interessed to 5 - very interessed). For that I will use two surveys from the last two national elections in germany. For education I was thinking of making a new variable with low middle and high education and controll for householdincome, age and education of the father.

            Should the variables have the same names in that case?

            Comment


            • #7
              Bruce Weaver this is what I got from the comand you told me. If there is a certain way to upload the codes, I would be glad to change it, but here is the code in textformat from my logfile:

              [CODE]
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input int year byte(polinteresse17 polinteresse21)
              2021 . 3
              2021 . 3
              2021 . 2
              2021 . 1
              2021 . 3
              2021 . 3
              2021 . 4
              2021 . 1
              2021 . 2
              2021 . 3
              2021 . 3

              This is the result after I combined both datasets with append. The variable "polinteresse17" is for the observations of the year 2017, thats why I only have missings for the year 2021. This is also the case for the variable "polinteresse21" within the year 2017.

              Comment


              • #8
                Hi Berli Machado. As George Ford suggested earlier, I think you need to rename polinteresse17 and polinteresse21 to polinteresse in both files before you use -append-. If you do that, you'll end up with a file that looks something like this:

                Code:
                    year   polinteresse  
                    2017              3  
                    2017              3  
                    2017              2  
                    2017              1  
                    2017              3  
                    2021              3  
                    2021              4  
                    2021              1  
                    2021              2  
                    2021              3  
                    2021              3  
                etc.
                --
                Bruce Weaver
                Email: [email protected]
                Version: Stata/MP 18.5 (Windows)

                Comment

                Working...
                X