Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to handle a Categorical Variable with 1200 Levels in Stata BE 17.0?

    Hello everyone,

    I am currently working on an analysis of newspaper articles using the LIWC-22 software to extract various linguistic categories. I am using Stata BE 17.0 for my analysis. One of the key variables in my dataset is "author", which contains the names of 1200 different authors. I would like to investigate the effect of artificial intelligence discussions on several LIWC categories (e.g., moral) while controlling for the author effect.

    The dataset does not have a panel structure since there is more than one observation for a certain day (Websites and/or an author published more than one article per day). Here an overview of my data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str16 website str10 publication_date str145 author float(moral AIBM)
    "Analytic Insight" "01/05/2024" "sumedha"           0  3.02
    "Analytic Insight" "01/05/2024" "Shiva Ganesh"      0   6.2
    "Analytic Insight" "01/05/2024" "sumedha"           0  1.81
    "Analytic Insight" "01/05/2024" "P.Sravanthi"       0  8.15
    "Analytic Insight" "01/05/2024" "sumedha"           0   5.7
    "Analytic Insight" "01/05/2024" "Rachana Saha"    .12     5
    "Analytic Insight" "01/05/2024" "Gayathri"        .19  6.24
    "Analytic Insight" "01/05/2024" "Harshini"        .24  3.18
    "Analytic Insight" "30/04/2024" "sumedha"         .34  2.16
    "Analytic Insight" "30/04/2024" "Harshini"          0  6.58
    "Analytic Insight" "29/04/2024" "P.Sravanthi"       0  5.47
    "Analytic Insight" "29/04/2024" "P.Sravanthi"     .12   8.2
    "Analytic Insight" "29/04/2024" "Parvin Mohmad"  1.38  3.31
    "Analytic Insight" "29/04/2024" "Prathima"        .31  4.39
    "Analytic Insight" "28/04/2024" "Nitesh Kumar"      0  7.03
    "Analytic Insight" "28/04/2024" "Sai Chaitanya"   .57  9.77
    "Analytic Insight" "28/04/2024" "Sai Chaitanya"   .36  5.81
    "Analytic Insight" "28/04/2024" "Nitesh Kumar"    .96  4.05
    "Analytic Insight" "27/04/2024" "Rachana Saha"    .26 12.81
    "Analytic Insight" "27/04/2024" "sumedha"         .11 13.19
    "Analytic Insight" "27/04/2024" "Prathima"          0  5.16
    "Analytic Insight" "27/04/2024" "Nitesh Kumar"   1.07  2.68
    "Analytic Insight" "27/04/2024" "Sai Chaitanya"     0 12.71
    "Analytic Insight" "26/04/2024" "Shiva Ganesh"    .72  8.16
    "Analytic Insight" "26/04/2024" "sumedha"           0  6.76
    "Analytic Insight" "26/04/2024" "Supraja"         .19  7.03
    "Analytic Insight" "26/04/2024" "P.Sravanthi"     .53  6.44
    "Analytic Insight" "26/04/2024" "Parvin Mohmad"    .4   4.8
    "Analytic Insight" "26/04/2024" "Prathima"          0  1.63
    "Analytic Insight" "25/04/2024" "P.Sravanthi"     .12  1.48
    "Analytic Insight" "25/04/2024" "Prathima"          0  7.42
    "Analytic Insight" "25/04/2024" "S Akash"        1.45  7.79
    "Analytic Insight" "25/04/2024" "Market Trends"   .29  4.82
    "Analytic Insight" "25/04/2024" "Prathima"          0  5.92
    "Analytic Insight" "25/04/2024" "Rachana Saha"    .13  2.59
    "Analytic Insight" "24/04/2024" "Supraja"         .09  9.16
    "Analytic Insight" "24/04/2024" "Shiva Ganesh"      0  7.44
    "Analytic Insight" "24/04/2024" "Prathima"         .1  1.37
    "Analytic Insight" "24/04/2024" "P.Sravanthi"       0  9.09
    "Analytic Insight" "24/04/2024" "Parvin Mohmad"  3.42  4.56
    "Analytic Insight" "24/04/2024" "Harshini"        .08  2.02
    "Analytic Insight" "24/04/2024" "sumedha"         .95  8.21
    "Analytic Insight" "23/04/2024" "Supraja"           0     0
    "Analytic Insight" "23/04/2024" "Prathima"         .1  6.18
    "Analytic Insight" "23/04/2024" "Supraja"         .14  6.04
    "Analytic Insight" "23/04/2024" "P.Sravanthi"       0  8.57
    "Analytic Insight" "23/04/2024" "Prathima"        .22  6.11
    "Analytic Insight" "23/04/2024" "Shiva Ganesh"    .12  9.29
    "Analytic Insight" "23/04/2024" "P.Sravanthi"       0  4.41
    "Analytic Insight" "22/04/2024" "P.Sravanthi"     .51  6.11
    "Analytic Insight" "22/04/2024" "Rachana Saha"      0  7.28
    "Analytic Insight" "21/04/2024" "Pardeep Sharma"    0  6.39
    "Analytic Insight" "21/04/2024" "Nitesh Kumar"   1.67  7.31
    "Analytic Insight" "21/04/2024" "Pardeep Sharma"    0  6.21
    "Analytic Insight" "21/04/2024" "Nitesh Kumar"   1.33  6.87
    "Analytic Insight" "20/04/2024" "Sai Chaitanya"     0  6.29
    "Analytic Insight" "20/04/2024" "Rachana Saha"      0  3.02
    "Analytic Insight" "19/04/2024" "S Akash"           0  3.18
    "Analytic Insight" "19/04/2024" "Supraja"           0  6.13
    "Analytic Insight" "19/04/2024" "IndustryTrends"   .5  5.62
    "Analytic Insight" "19/04/2024" "Rachana Saha"      0  4.43
    "Analytic Insight" "18/04/2024" "Prathima"          0  7.41
    "Analytic Insight" "18/04/2024" "Prathima"          0  7.57
    "Analytic Insight" "18/04/2024" "Shiva Ganesh"    .17  3.28
    "Analytic Insight" "18/04/2024" "P.Sravanthi"     .16   .47
    "Analytic Insight" "17/04/2024" "Pardeep Sharma" 1.48   5.7
    "Analytic Insight" "16/04/2024" "Parvin Mohmad"   .32   4.2
    "Analytic Insight" "16/04/2024" "P.Sravanthi"     .37  5.97
    "Analytic Insight" "16/04/2024" "S Akash"           0  6.89
    "Analytic Insight" "16/04/2024" "greeshmitha"     .67  4.86
    "Analytic Insight" "15/04/2024" "Parvin Mohmad"   .17  4.35
    "Analytic Insight" "15/04/2024" "Parvin Mohmad"     0   .56
    "Analytic Insight" "14/04/2024" "Nitesh Kumar"      0  2.82
    "Analytic Insight" "14/04/2024" "greeshmitha"       0  3.74
    "Analytic Insight" "14/04/2024" "Nitesh Kumar"      0  1.18
    "Analytic Insight" "14/04/2024" "Rachana Saha"      0   6.3
    "Analytic Insight" "14/04/2024" "sumedha"         .88 10.68
    "Analytic Insight" "13/04/2024" "Parvin Mohmad"   1.1  6.62
    "Analytic Insight" "13/04/2024" "Pardeep Sharma"    0  6.97
    "Analytic Insight" "13/04/2024" "Pardeep Sharma"  .31  6.61
    "Analytic Insight" "13/04/2024" "Harshini"          0  4.71
    "Analytic Insight" "13/04/2024" "Prathima"          0  6.55
    "Analytic Insight" "12/04/2024" "P.Sravanthi"     .88  6.93
    "Analytic Insight" "12/04/2024" "greeshmitha"       0  2.68
    "Analytic Insight" "12/04/2024" "Pardeep Sharma"    0  9.67
    "Analytic Insight" "11/04/2024" "Shiva Ganesh"   1.49  6.39
    "Analytic Insight" "11/04/2024" "P.Sravanthi"       0  2.96
    "Analytic Insight" "11/04/2024" "P.Sravanthi"       0  1.07
    "Analytic Insight" "11/04/2024" "P.Sravanthi"     .14  2.03
    "Analytic Insight" "11/04/2024" "greeshmitha"     .18   4.2
    "Analytic Insight" "11/04/2024" "Rachana Saha"    .96  6.12
    "Analytic Insight" "11/04/2024" "sumedha"         .42  4.66
    "Analytic Insight" "10/04/2024" "P.Sravanthi"     .31  5.33
    "Analytic Insight" "10/04/2024" "P.Sravanthi"       0  2.98
    "Analytic Insight" "10/04/2024" "Shiva Ganesh"    .21  5.76
    "Analytic Insight" "10/04/2024" "Harshini"          0  4.38
    "Analytic Insight" "10/04/2024" "P.Sravanthi"       0  2.99
    "Analytic Insight" "09/04/2024" "greeshmitha"     .57  4.38
    "Analytic Insight" "09/04/2024" "P.Sravanthi"     .31  5.16
    "Analytic Insight" "09/04/2024" "Rachana Saha"      0  6.42
    end

    Given the large number of unique authors, creating dummy variables for each author is not feasible (Stata BE only supports matrices with up to 800 rows or columns). I am considering using a mixed-effects model to account for the variability between authors.

    I have a few questions and would appreciate any advice or suggestions:
    1. Is the mixed-effects model the best approach to control for the author effect given the large number of authors?
    2. Are there any other methods or best practices in Stata that could handle this situation more effectively?
    3. Any recommendations on model diagnostics or validation techniques to ensure the robustness of my results?
    Thank you in advance for your help!
Working...
X