Dear all,
I am modeling how an index may influence innovation in a region. My dependent variable is the number of innovations at the regional-year level (regional panel data). However, my independent variable, the index, is constructed at the region-sector-year level. This finer level of granularity could capture how the index affects regional innovation differently across sectors.
The challenge is aligning these levels of analysis. Aggregating the index to the region-year level is an option, but it risks losing critical sectoral information. Alternatively, I could use the structure below, which includes region, year, and sector information, allowing me to control for sectoral fixed effects:
Here, for example, in region A in 2020, there are 10 innovations distributed equally across all sectors. I am concerned this might impose a strong assumption: that each sector in the same region-year has the same number of innovations (something I do not know, since I only have the number of innovation by region-year).
Would a multilevel model (e.g., mixed-effects model) account for this hierarchical structure, where innovations are at the region-year level, but the independent variable varies by region-sector-year? I know that typically, dependent variables are at the lowest hierarchical level, with independent variables aggregated upwards (e.g., firm = f(firm, region)). However, in my case, the hierarchy isn’t strictly defined between "region" and "sector."
Is this modeling structure (see below) appropriate for the described data? Any guidance or suggestions would be greatly appreciated!
Thanks in advance!
I am modeling how an index may influence innovation in a region. My dependent variable is the number of innovations at the regional-year level (regional panel data). However, my independent variable, the index, is constructed at the region-sector-year level. This finer level of granularity could capture how the index affects regional innovation differently across sectors.
The challenge is aligning these levels of analysis. Aggregating the index to the region-year level is an option, but it risks losing critical sectoral information. Alternatively, I could use the structure below, which includes region, year, and sector information, allowing me to control for sectoral fixed effects:
Region | Year | sector | innovations | index |
A | 2020 | 1 | 10 | 0.5 |
A | 2020 | 2 | 10 | 0.7 |
A | 2020 | 3 | 10 | 0.9 |
A | 2021 | 1 | 12 | 0.6 |
A | 2021 | 2 | 12 | 0.8 |
A | 2021 | 3 | 12 | 1 |
B | 2020 | 1 | 0 | 0.4 |
B | 2020 | 2 | 0 | 0.6 |
B | 2020 | 3 | 0 | 0.1 |
B | 2021 | 1 | 9 | 0.7 |
B | 2021 | 2 | 9 | 0.4 |
B | 2021 | 3 | 9 | 0.9 |
Would a multilevel model (e.g., mixed-effects model) account for this hierarchical structure, where innovations are at the region-year level, but the independent variable varies by region-sector-year? I know that typically, dependent variables are at the lowest hierarchical level, with independent variables aggregated upwards (e.g., firm = f(firm, region)). However, in my case, the hierarchy isn’t strictly defined between "region" and "sector."
Is this modeling structure (see below) appropriate for the described data? Any guidance or suggestions would be greatly appreciated!
Thanks in advance!
Code:
mixed num_innovations index i.year i.sector || region:
Comment