
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data

    I'm new to panel data analysis and cannot quite grasp the concept. I have collected data about numerous films including imdb rating, rotten tomatoes rating, budget, box office, genre and whether or not it won the main Oscar. The data is on the span of 2000-2022. What should I id by? I attach a snippet of the data.
    Thanks in advance for any help and directions!
    Attached Files

  • #2
    It's not a balanced panel (by production_company).

    You can't xtset your data by (production_company, year) as you'll have multiple films per year by production_company. You could xtset by production_company alone. Then have i.year as a regressor, as is typical for xtreg (at least in older versions; xtreg can handle multiple fixed effects in the latest version).

    The data is cross sectional in film.

    Or, you can use reghdfe to estimate a model with production_company, year, genre fixed effects, as reghdfe does not require you to xtset your data and can handle multiple fixed effects with ease. I would recommend reghdfe for your data.

    Whether you want to do any of it, or how you want to handle fixed effects, depends on what you are trying to estimate and what coefficients are of interest.


    • #3
      thank you! I was planning on modelling how variables such as movie rating or box office can predict the chance of winning a Best Picture award. Would it still work?


      • #4
        see what this gives you:

        reghdfe bestpicture lnbudget rating lnboxoffice , absorb(production_company genre year)

        I doubt you'll find anything. Very idiosyncratic.

