Fixed effects estimation for different movies

Jean Jacques

Join Date: Sep 2020

Posts: 97
#1

Fixed effects estimation for different movies

21 Sep 2021, 05:01

Hi guys! I'm unsure about a procedure I'm doing with fixed effects and I would like to check whether what I'm doing is correct.

I have people watching different movies (variable movie that goes from 1 to n where each number is a different movie) and I have other characteristics of the person as country of origin and gender. I also have as dependent variable score which is the score that each person gave to the movie. So I have that a movie was watched by different individuals, from different countries and different genders and I wanna see whether there're differences in the score that a person give to the same movie according to the gender or the country of origin (i.e. the same movie is rated differently for women and men)

Just to give an idea of how my data looks like
movie country_of_individual gender score

1 US 0 10

1 UK 1 8

1 GER 1 7

2 UK 0 2

2 GER 1 1

3 UK 1 8

3 US 0 6

What I'm doing in Stata is

Code:

xtset room xtreg score_assigned i.country gender , fe vce(cluster room)

So my question is quite simple: are this 2 lines of code correct to explore what I'm wanna do (assess whether country or gender explains the scores that a movie receives, i.e. the same movie is rated differently by males and females) ?

Thanks!
Tags: fixed effects
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

21 Sep 2021, 05:17

Jean Jacques:
your dataset does not look like a proper panel dataset (to me, at least), as I do not see any -timevar- in your example.
That said, I would go -regress- with interaction and standard errors clustered on -movie-:

Code:

regress score i.gender##i.country, vce(cluster movie)

Kind regards,
Carlo
(Stata 19.0)
Comment
Jean Jacques

Join Date: Sep 2020

Posts: 97
#3

21 Sep 2021, 05:26

Hi Carlo, thanks for the answer

My idea to test is something like "there're unobserved heterogeneity among individuals coming from different countries: people from UK are more generous in the score than people from US". That's why I was thinking on something like "movie fixed effect".

To do that I can't just do the average across individuals by country because they're also watching different movies so perhaps they give different scores because people in the UK watch better movies than people in the US so they're not more generous but they are rating different items. Let me know if I'm not clear. Thanks !
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

21 Sep 2021, 07:15

Jean Jacques:
the fixed effect you mention is related to -panelid- (movies in your case) and not to an independent variable (such as people coming from different countries who go to cinemas).
What you mention as unobserved heterogeneity may also be interpreted as endogeneity: that is a latent variable lurking within residuals that is correlalted with both -people- and the regresand: if that were the case, you would have the issue to find the right instruments to deal with that nuisance.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jean Jacques

Join Date: Sep 2020

Posts: 97
#5

21 Sep 2021, 08:06

Hi Carlo, thanks for the answer.

I think I'm not understanding. The point is that people from different countries, according to their cultures for example, they may like different type of movies so they rate movie == 1 differently. But perhaps they also go to watch different movies exactly for the same reason. That's why I think why, with a movie fixed effect, I can control for those unobserved cultural factors. And that's why my proposed solution is:

Code:

xtset room xtreg score_assigned i.country gender , fe vce(cluster room)

I hope this was a bit more clear
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#6

21 Sep 2021, 08:10

Jean Jacques:
1) what is -room-?
2) how many waves of data is your dataset composed of?

Kind regards,
Carlo
(Stata 19.0)
Comment
Jean Jacques

Join Date: Sep 2020

Posts: 97
#7

21 Sep 2021, 08:23

1) Sorry! I was thinking in changing the example to hotels' room (different people staying in different hotels' room and rating so it would be the same) but then I found it would be too messy.

2) Indeed I don't have waves. What I have is movies (it can be rooms, products purchased on amazon, etc) and I wanna see whether there're differences in the score provided according to the country of origin of the person that provides the score. Again the idea is that culture makes individuals scores different but also may make individuals to go to watch to different movies (or book different rooms or buy different products on amazon). That's why I'm thinking on something like "movies fixed effect": within the movie (or the room or the product purchased) how much differ the score of UK people in comparison to German people. And with that being albe to say "ok, UK people are more generous in their rate than German ones"

I hope now it's more clear the point that i want to prove
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#8

21 Sep 2021, 08:47

Jean Jacques:
with one wave of data, a panel structure is about of debate (as 2 or more waves of data are needed for a panel dataset).
As your regressand is continuous (-score-), -regress- is the way to go:

Code:

regress score i.gender##i.country i.movie, vce(cluster movie)

That said, the main issue rests on whether your regression specification considers all the predictors (and interactions) that can give a fair and true view of the data generating process you're investigating. For instance, you may experience an endogeneity issue if, say, the average national income is correlated with both -score- (other things being equal, more affluent countries give, on average, higher score to the same movie, because people are more relaxed) and -movie- (other things being equal, more affluent countries receive, on average, a higher volume of higher quality movies, because there is a higher chance that people go to cinema vs. less affluent nations).

Kind regards,
Carlo
(Stata 19.0)
Comment
Tom Scott

Join Date: Apr 2019

Posts: 266
#9

21 Sep 2021, 08:52

Jean you say that you want to see whether there is a difference in the score that a person give to the same movie depending on their country or gender. That's a descriptive not a causal research question so you shouldn't need anything fancy to answer it. You don't have the right type of data for a fixed effects regression. Maybe you are thinking of adding a movie fixed effect by controlling for the movie id? The term fixed effect can mean a lot of different things - https://statmodeling.stat.columbia.e...hy_i_dont_use/

I think you can answer you question with the following code.

Code:

encode country_or_origin, gen(country_num) levelsof movie, local(levels) foreach l of local levels { quietly regress score i.country_num##i.gender, vce(robust) estimates store movie`l' } estimates table _all star

Last edited by Tom Scott; 21 Sep 2021, 09:03.
1 like
Comment
Jean Jacques

Join Date: Sep 2020

Posts: 97
#10

21 Sep 2021, 09:47

Thanks guys! Still not super convinced about why not doing

Code:

xtset room xtreg score_assigned i.country gender , fe vce(cluster room)

but I'll think about that! Thanks
Comment
Tom Scott

Join Date: Apr 2019

Posts: 266
#11

21 Sep 2021, 12:28

Jean Jacques because you have cross-sectional data not repeated measures data. The xt series of commands provides tools for analyzing panel data (also known as longitudinal or repeated measures data). Your unit of observation is the individual, and each individual only gave one response at one time point. So, no panel data means no xt commands.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#12

21 Sep 2021, 12:33

Jean Jacques:
I regret to let you down once more, but panel data analysis is out of debate with cross- sectional data.
If you're not super convinced from previous replies (and it is perfectly legal, obviously), just take a look at any decent textbook on panel data econometrics.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jean Jacques

Join Date: Sep 2020

Posts: 97
#13

22 Sep 2021, 02:09

Thanks Carlo and Tom. Also thanks to William Lisowski who in another post provided a super useful answer. Indeed I'm aware that this isn't a panel, I was just trying to figure it out the best way to deal with the problem that I have. The point as I explained is to deal with the unobserved heterogeneity of people with unobserved characteristics rating different movies as well as watching different movies.

The answer provided by William to another question (basically what should I do when I have "insufficient memory r(950)" error ?) was to use the areg command with absorb(movie), which leaded to the same results than the one I was obtaining when I estimated a fixed effect panel.
Comment

movie	country_of_individual	gender	score
1	US	0	10
1	UK	1	8
1	GER	1	7
2	UK	0	2
2	GER	1	1
3	UK	1	8
3	US	0	6

Announcement

Fixed effects estimation for different movies

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment