Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • multilevel/ cross classified model question

    Hi all, I am working on a study of grant applications and selections. I have about 8 years worth of data, and am looking at Principal Investigators only. The goal is to see if characteristics of the PI predict the likelihood of getting a grant.

    It is common for same person to apply for multiple grants, and be selected for multiple grants. The same person can apply for several grants within the same year, but I don’t have a more refined time unit than year. (so I don’t know the time order of the different grants in the same year)

    Sample sizes:

    about 8,000 applications
    about 3,000 unique applicants

    I am interested in a model predicting the probability of being awarded the grant. Currently I am running a model of applications clustered within applicants using the following code:

    xtlogit awarded yearvariables independentvariables, i(applicant_id)

    My questions:

    1) does this sound like the correct model for the data structure, applications clustered within applicants? Specifically, I’m wondering if I need to account for the particular grant topic/area the person applied for, as some will have higher award rates than others. Additionally, within the same grant area, the same person can apply multiple times, and be selected multiple times. (So within the same grant area, the same person can get multiple awards). However, across the 8 years, there are about 150 year-grant area combinations, which seems like a lot. Seems too much to be included as dummy variables in the model. Could this be a cross classified model, with applications nested in applicants, but a given applicant is linked to multiple grants? (cross classified)?

    2) is it a problem that the same person can apply multiple times in a year, and I don’t have a more refined time unit than year?

    Any advice would be much appreciated!

    Thank you!!

    MJ
    Last edited by MJ Smith; 29 May 2021, 09:20.

  • #2
    Regarding 1), you could put 150 year-grant area in the model as indicator ("dummy") variables: Stata won't blow up. A better approach in this case might be to just put in indicators for the years and indicators for the grant areas (of which it sounds like there are something like 20), rather than all the combinations, unless you have good reason to think that the effect of the grant area is going to vary from year to year and in an arbitrary way. In that case, another approach would be to put them in as a level in the model. This means going up to -melogit- and having crossed random effects for year-grant area and applicant.

    Regarding 2), it depends. I serve on a review panel at the National Institutes of Health (NIH). When reviewing an application, I do not have access to information about other applications submitted by the same investigator. I suppose I could look up what grants the investigator has previously been awarded--that's public information anybody can get on the internet. But I don't, and I doubt anybody else does either: in the application the investigators will usually highlight any previously awarded grants that appear relevant. Crucially, the lag time between submitting an application and getting an award is pretty long, so those previously awarded grants would almost always be from a previous year anyway. On the other hand, if your data come from a granting agency that has fast turnaround, then the coarse-graining of time into years does somewhat limit the questions you can ask, but I would expect that you could still gain useful knowledge from it. I think the limitations would be relatively minor.

    Good luck.

    Comment


    • #3
      This is fantastic advice. I will read over it carefully. Thank you SO much!

      Comment

      Working...
      X