Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which model to use, logit regression panel data

    Dear all,

    I have a question regarding which model and commands I should use for my dataset. My dependent variable is categorical (only takes the values 1 or 2), and so are my two independent variables (also taking the values of 1 or 2). I observed 200 individuals (ID=200), all answering 12 questions. Therefore, I have a balanced panel-data set containing 200 people (ID) over 12 periods of time (the 12 questions). The variable Group indicates to which group the participant belongs, according to a certain personal characterstic.
    Here's an example of my dataset, where ID is the personal ID of the participants, Class1, Anti1 and Class2 are three of their answers given to each of the 12 questions.:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(Group ID Class1 Anti1 Class2)
    1 1 1 2 1
    2 1 2 2 2
    2 1 1 2 1
    2 1 2 2 2
    2 1 1 2 1
    1 1 1 1 1
    1 1 1 1 1
    1 1 2 1 2
    1 1 1 1 1
    1 1 2 1 2
    2 1 1 2 1
    2 1 2 1 2
    2 2 2 2 2
    2 2 1 1 1
    2 2 1 2 2
    2 2 1 1 1
    1 2 2 2 1
    1 2 1 1 1
    1 2 2 2 1
    1 2 1 2 1
    1 2 1 2 1
    1 2 2 1 2
    1 2 1 1 2
    1 2 2 1 1
    1 3 1 1 1
    2 3 2 1 2
    2 3 1 2 1
    1 3 2 2 2
    2 3 1 1 1
    1 3 1 1 1
    2 3 2 1 1
    1 3 2 2 2
    2 3 1 2 1
    1 3 2 2 2
    1 3 1 1 1
    2 3 2 1 2
    end
    I want to evaluate whether a person is more likely to answer C2 with either 1or 2, when the answer to Anti1 was 1 or 2 consecutively and controlling for the answer C1. In other words, I want to use a logit model with C2 as the dependent variable, and Anti1 and C1 as independent variables.
    I am familiar with using the xtlogit command, and using either fixed or random effects. However, I want to evaluate per person whether the Anti1 coefficient is significant. More specifically, I want to find out whether in one of the two groups there are more 'significant Anti1' cases than in the other group. For this reason, I don't want aggregated results, but rather results clustered by the group.
    Is there any way that I can get this result so that I can see the exact number of people for which the Anti1 coefficient was significant in both of the groups?

    Thanks in advance, if anything is unclear please let me know.

  • #2
    More specifically, I want to find out whether in one of the two groups there are more 'significant Anti1' cases than in the other group. For this reason, I don't want aggregated results, but rather results clustered by the group.
    This sounds like you want to test for an interaction between the group variable and the Anti1 effect.on the C2 response controlling for C1.

    The first thing you should do is recode the data from 1/2 to 0/1 (mandatory for C2, optional for the others, but probably less confusing to do it for all.) That's because Stata's dichotomous variable model estimation commands interpret the outcome variable as 0 = false, non-zero = true, so 1 and 2 would both be considered the same response.

    Code:
    recode Group Class* Anti1 (1 = 0)(2 = 1)
    
    xtset ID
    xtlogit Class2 i.Group##i.Anti1 Class1, fe or
    The coefficient of the 1.Group#1.Anti1 interaction term will tell you the extent (in the odds ratio metric) to which Group membership influences the Anti1 effect on the Class2 response, adjusted for Class1 response. The coefficient itself is the ratio of odds ratios (ROR), which is analogous to a difference in differences.

    Is there any way that I can get this result so that I can see the exact number of people for which the Anti1 coefficient was significant in both of the groups?
    Well, you could run -logit Class2 Anti1 Class1- under -statsby:- to get a separate regression for each individual and then count up results within each group. But I strongly recommend not doing that. Each of those regressions will be small (based on only 12 observations) and you are just going to be mining noise out of your data. Moreover this approach would rely on the distinction between "significant" results and "non-significant" results in two different estimations, which is well known to be utterly meaningless.






    Comment


    • #3
      Thank you very much Clyde, those are really good ideas. I forgot about the re-coding, so I will adjust this immediately.

      Secondly, I was actually thinking to use a non-parametric test (Chi-square test/ Fisher's exact test, depending on the expected cell value) in order to find out whether in one group there are more cases of a significant Anti1 than in the other group.
      I thought this was the correct way to go as my data is all ordinal, my sample size (per ID) is quite small (only 12 obs), and I am not sure if all my error terms are independent (when using the whole panel-data set).

      This is the reason why I wanted to apply the -statsby- function, and then use a non-parametric test.
      I hope this is possible, but when I am doing something terribly wrong in my thought-process please let me know.

      Comment


      • #4
        I thought this was the correct way to go as my data is all ordinal, my sample size (per ID) is quite small (only 12 obs), and I am not sure if all my error terms are independent (when using the whole panel-data set).
        Except that your "ordinal" variables are actually dichotomous, so the nominal vs ordinal distinction disappears. The small sample size is actually a reason not to do what you're proposing: the count of "statistically significant" Anti1's will be wildly unstable and your results will not be remotely reproducible.* While the non-independence of error terms is a fair concern, you can overcome this with the -vce(cluster clustering_variable)- option in a logistic model. If you were to do this count of significant Anti1's approach in a publication, I think the reviewers would rip you to shreds.

        *There is no ideal solution for small sample sizes. But with a total of 200 subjects each answering the 12 questions, I think you are actually in reasonable shape from a sample-size perspective. But to the extent that the limitation of only 12 questions influences the analysis choice, it argues for using the logistic model and against counting significant p-values.
        Last edited by Clyde Schechter; 15 Sep 2017, 08:23.

        Comment


        • #5
          Dear all,
          I am working on the causes of violent conflicts having the data of 195 countries. My dependent variable is binary i.e,. 1= country having violent conflict and 0=no conflict. My data is panel for 195 countries with 18 years from 2000-2018. The independent variables are including binary, continuous and scale variables like...political terror scale, unemployment, poverty, religious discrimination economic growth etc...
          now the question is that, from where I should start in stata and can I use fixed and random effect models in binary panel logit or probit models..??
          can anybody provide a link that explains binary panel logistic model in stata...?

          Comment


          • #6
            Sayd:
            welcome to this forum.
            I would start from -xtlogit- entry and related references in Stata .pdf manual.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment

            Working...
            X