Hello everyone,
I am currently working on an analysis of newspaper articles using the LIWC-22 software to extract various linguistic categories. I am using Stata BE 17.0 for my analysis. One of the key variables in my dataset is "author", which contains the names of 1200 different authors. I would like to investigate the effect of artificial intelligence discussions on several LIWC categories (e.g., moral) while controlling for the author effect.
The dataset does not have a panel structure since there is more than one observation for a certain day (Websites and/or an author published more than one article per day). Here an overview of my data:
Given the large number of unique authors, creating dummy variables for each author is not feasible (Stata BE only supports matrices with up to 800 rows or columns). I am considering using a mixed-effects model to account for the variability between authors.
I have a few questions and would appreciate any advice or suggestions:
I am currently working on an analysis of newspaper articles using the LIWC-22 software to extract various linguistic categories. I am using Stata BE 17.0 for my analysis. One of the key variables in my dataset is "author", which contains the names of 1200 different authors. I would like to investigate the effect of artificial intelligence discussions on several LIWC categories (e.g., moral) while controlling for the author effect.
The dataset does not have a panel structure since there is more than one observation for a certain day (Websites and/or an author published more than one article per day). Here an overview of my data:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str16 website str10 publication_date str145 author float(moral AIBM) "Analytic Insight" "01/05/2024" "sumedha" 0 3.02 "Analytic Insight" "01/05/2024" "Shiva Ganesh" 0 6.2 "Analytic Insight" "01/05/2024" "sumedha" 0 1.81 "Analytic Insight" "01/05/2024" "P.Sravanthi" 0 8.15 "Analytic Insight" "01/05/2024" "sumedha" 0 5.7 "Analytic Insight" "01/05/2024" "Rachana Saha" .12 5 "Analytic Insight" "01/05/2024" "Gayathri" .19 6.24 "Analytic Insight" "01/05/2024" "Harshini" .24 3.18 "Analytic Insight" "30/04/2024" "sumedha" .34 2.16 "Analytic Insight" "30/04/2024" "Harshini" 0 6.58 "Analytic Insight" "29/04/2024" "P.Sravanthi" 0 5.47 "Analytic Insight" "29/04/2024" "P.Sravanthi" .12 8.2 "Analytic Insight" "29/04/2024" "Parvin Mohmad" 1.38 3.31 "Analytic Insight" "29/04/2024" "Prathima" .31 4.39 "Analytic Insight" "28/04/2024" "Nitesh Kumar" 0 7.03 "Analytic Insight" "28/04/2024" "Sai Chaitanya" .57 9.77 "Analytic Insight" "28/04/2024" "Sai Chaitanya" .36 5.81 "Analytic Insight" "28/04/2024" "Nitesh Kumar" .96 4.05 "Analytic Insight" "27/04/2024" "Rachana Saha" .26 12.81 "Analytic Insight" "27/04/2024" "sumedha" .11 13.19 "Analytic Insight" "27/04/2024" "Prathima" 0 5.16 "Analytic Insight" "27/04/2024" "Nitesh Kumar" 1.07 2.68 "Analytic Insight" "27/04/2024" "Sai Chaitanya" 0 12.71 "Analytic Insight" "26/04/2024" "Shiva Ganesh" .72 8.16 "Analytic Insight" "26/04/2024" "sumedha" 0 6.76 "Analytic Insight" "26/04/2024" "Supraja" .19 7.03 "Analytic Insight" "26/04/2024" "P.Sravanthi" .53 6.44 "Analytic Insight" "26/04/2024" "Parvin Mohmad" .4 4.8 "Analytic Insight" "26/04/2024" "Prathima" 0 1.63 "Analytic Insight" "25/04/2024" "P.Sravanthi" .12 1.48 "Analytic Insight" "25/04/2024" "Prathima" 0 7.42 "Analytic Insight" "25/04/2024" "S Akash" 1.45 7.79 "Analytic Insight" "25/04/2024" "Market Trends" .29 4.82 "Analytic Insight" "25/04/2024" "Prathima" 0 5.92 "Analytic Insight" "25/04/2024" "Rachana Saha" .13 2.59 "Analytic Insight" "24/04/2024" "Supraja" .09 9.16 "Analytic Insight" "24/04/2024" "Shiva Ganesh" 0 7.44 "Analytic Insight" "24/04/2024" "Prathima" .1 1.37 "Analytic Insight" "24/04/2024" "P.Sravanthi" 0 9.09 "Analytic Insight" "24/04/2024" "Parvin Mohmad" 3.42 4.56 "Analytic Insight" "24/04/2024" "Harshini" .08 2.02 "Analytic Insight" "24/04/2024" "sumedha" .95 8.21 "Analytic Insight" "23/04/2024" "Supraja" 0 0 "Analytic Insight" "23/04/2024" "Prathima" .1 6.18 "Analytic Insight" "23/04/2024" "Supraja" .14 6.04 "Analytic Insight" "23/04/2024" "P.Sravanthi" 0 8.57 "Analytic Insight" "23/04/2024" "Prathima" .22 6.11 "Analytic Insight" "23/04/2024" "Shiva Ganesh" .12 9.29 "Analytic Insight" "23/04/2024" "P.Sravanthi" 0 4.41 "Analytic Insight" "22/04/2024" "P.Sravanthi" .51 6.11 "Analytic Insight" "22/04/2024" "Rachana Saha" 0 7.28 "Analytic Insight" "21/04/2024" "Pardeep Sharma" 0 6.39 "Analytic Insight" "21/04/2024" "Nitesh Kumar" 1.67 7.31 "Analytic Insight" "21/04/2024" "Pardeep Sharma" 0 6.21 "Analytic Insight" "21/04/2024" "Nitesh Kumar" 1.33 6.87 "Analytic Insight" "20/04/2024" "Sai Chaitanya" 0 6.29 "Analytic Insight" "20/04/2024" "Rachana Saha" 0 3.02 "Analytic Insight" "19/04/2024" "S Akash" 0 3.18 "Analytic Insight" "19/04/2024" "Supraja" 0 6.13 "Analytic Insight" "19/04/2024" "IndustryTrends" .5 5.62 "Analytic Insight" "19/04/2024" "Rachana Saha" 0 4.43 "Analytic Insight" "18/04/2024" "Prathima" 0 7.41 "Analytic Insight" "18/04/2024" "Prathima" 0 7.57 "Analytic Insight" "18/04/2024" "Shiva Ganesh" .17 3.28 "Analytic Insight" "18/04/2024" "P.Sravanthi" .16 .47 "Analytic Insight" "17/04/2024" "Pardeep Sharma" 1.48 5.7 "Analytic Insight" "16/04/2024" "Parvin Mohmad" .32 4.2 "Analytic Insight" "16/04/2024" "P.Sravanthi" .37 5.97 "Analytic Insight" "16/04/2024" "S Akash" 0 6.89 "Analytic Insight" "16/04/2024" "greeshmitha" .67 4.86 "Analytic Insight" "15/04/2024" "Parvin Mohmad" .17 4.35 "Analytic Insight" "15/04/2024" "Parvin Mohmad" 0 .56 "Analytic Insight" "14/04/2024" "Nitesh Kumar" 0 2.82 "Analytic Insight" "14/04/2024" "greeshmitha" 0 3.74 "Analytic Insight" "14/04/2024" "Nitesh Kumar" 0 1.18 "Analytic Insight" "14/04/2024" "Rachana Saha" 0 6.3 "Analytic Insight" "14/04/2024" "sumedha" .88 10.68 "Analytic Insight" "13/04/2024" "Parvin Mohmad" 1.1 6.62 "Analytic Insight" "13/04/2024" "Pardeep Sharma" 0 6.97 "Analytic Insight" "13/04/2024" "Pardeep Sharma" .31 6.61 "Analytic Insight" "13/04/2024" "Harshini" 0 4.71 "Analytic Insight" "13/04/2024" "Prathima" 0 6.55 "Analytic Insight" "12/04/2024" "P.Sravanthi" .88 6.93 "Analytic Insight" "12/04/2024" "greeshmitha" 0 2.68 "Analytic Insight" "12/04/2024" "Pardeep Sharma" 0 9.67 "Analytic Insight" "11/04/2024" "Shiva Ganesh" 1.49 6.39 "Analytic Insight" "11/04/2024" "P.Sravanthi" 0 2.96 "Analytic Insight" "11/04/2024" "P.Sravanthi" 0 1.07 "Analytic Insight" "11/04/2024" "P.Sravanthi" .14 2.03 "Analytic Insight" "11/04/2024" "greeshmitha" .18 4.2 "Analytic Insight" "11/04/2024" "Rachana Saha" .96 6.12 "Analytic Insight" "11/04/2024" "sumedha" .42 4.66 "Analytic Insight" "10/04/2024" "P.Sravanthi" .31 5.33 "Analytic Insight" "10/04/2024" "P.Sravanthi" 0 2.98 "Analytic Insight" "10/04/2024" "Shiva Ganesh" .21 5.76 "Analytic Insight" "10/04/2024" "Harshini" 0 4.38 "Analytic Insight" "10/04/2024" "P.Sravanthi" 0 2.99 "Analytic Insight" "09/04/2024" "greeshmitha" .57 4.38 "Analytic Insight" "09/04/2024" "P.Sravanthi" .31 5.16 "Analytic Insight" "09/04/2024" "Rachana Saha" 0 6.42 end
Given the large number of unique authors, creating dummy variables for each author is not feasible (Stata BE only supports matrices with up to 800 rows or columns). I am considering using a mixed-effects model to account for the variability between authors.
I have a few questions and would appreciate any advice or suggestions:
- Is the mixed-effects model the best approach to control for the author effect given the large number of authors?
- Are there any other methods or best practices in Stata that could handle this situation more effectively?
- Any recommendations on model diagnostics or validation techniques to ensure the robustness of my results?