Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting frequencies of one variable for each value of another variable

    Hello Statalist,

    I am working with a labor force survey in which each observation represents an individual in the labor force. I have created a numeric (float) industry-variable "skill_profile"

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float skill_profile
    1
    1
    3
    3
    3
    3
    3
    3
    3
    1
    3
    1
    4
    3
    4
    3
    1
    3
    4
    4
    4
    4
    3
    3
    3
    3
    4
    1
    1
    3
    3
    3
    4
    3
    1
    1
    1
    1
    1
    1
    3
    1
    1
    1
    3
    1
    3
    4
    1
    3
    1
    3
    2
    1
    1
    3
    3
    1
    3
    1
    1
    3
    2
    1
    4
    3
    3
    1
    3
    1
    2
    4
    1
    4
    3
    1
    1
    1
    1
    1
    1
    4
    1
    1
    1
    1
    1
    1
    3
    3
    1
    1
    1
    3
    1
    3
    3
    4
    3
    3
    end
    label values skill_profile skill_profile
    label def skill_profile 1 "low general skills", modify
    label def skill_profile 2 "high general skills", modify
    label def skill_profile 3 "low specific skills", modify
    label def skill_profile 4 "high specific skills", modify
    I would now like to count the number of industries in each "skill_profile" category, using the numeric integer variable "ind" for the industries

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int ind
    6190
    4690
    8190
    5070
    8680
    2290
    6170
    2180
    6990
    4970
     270
    7590
    7860
    9470
    7860
     270
    5680
     270
    7860
    7860
    7870
    7870
    2290
    8680
    8190
    8190
    9480
    4670
    7990
     770
    3770
    8190
    7860
    5070
    2980
    3970
    1270
    1270
    8770
    3095
     770
    5380
    4870
    8470
     570
    8370
     770
    7290
    7590
     770
    8290
    8680
    8090
    8770
    7690
    8680
    7070
    8560
    7480
    4970
    8660
    9470
    7280
    8390
    7860
    8190
    6170
    3680
    6170
    6680
    8090
    6480
     280
    9480
    6370
    4690
    6670
     170
    8470
    6670
    6090
    7860
    8560
    8370
     280
    8180
    8170
    3875
     770
    8680
    4690
    8370
    5680
     770
    4970
    8680
    8680
    7860
    7070
     770
    end
    label values ind ind
    So far, I am only managing tables that list all "ind" values for each skill-profile category. Your help is much appreciated.

  • #2
    As in your previous thread https://www.statalist.org/forums/for...unt-of-another it's a bit dopey (insert benign emoticons according to taste) to give us one variable at a time as an example.

    Code:
    dataex skill_profile ind
    is the one small further step to help us help you.

    Comment


    • #3
      Indeed! Sorry -- I pledge to keep working on my relationship with dataex.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float skill_profile int ind
      1 6190
      1 4690
      3 8190
      3 5070
      3 8680
      3 2290
      3 6170
      3 2180
      3 6990
      1 4970
      3  270
      1 7590
      4 7860
      3 9470
      4 7860
      3  270
      1 5680
      3  270
      4 7860
      4 7860
      4 7870
      4 7870
      3 2290
      3 8680
      3 8190
      3 8190
      4 9480
      1 4670
      1 7990
      3  770
      3 3770
      3 8190
      4 7860
      3 5070
      1 2980
      1 3970
      1 1270
      1 1270
      1 8770
      1 3095
      3  770
      1 5380
      1 4870
      1 8470
      3  570
      1 8370
      3  770
      4 7290
      1 7590
      3  770
      1 8290
      3 8680
      2 8090
      1 8770
      1 7690
      3 8680
      3 7070
      1 8560
      3 7480
      1 4970
      1 8660
      3 9470
      2 7280
      1 8390
      4 7860
      3 8190
      3 6170
      1 3680
      3 6170
      1 6680
      2 8090
      4 6480
      1  280
      4 9480
      3 6370
      1 4690
      1 6670
      1  170
      1 8470
      1 6670
      1 6090
      4 7860
      1 8560
      1 8370
      1  280
      1 8180
      1 8170
      1 3875
      3  770
      3 8680
      1 4690
      1 8370
      1 5680
      3  770
      1 4970
      3 8680
      3 8680
      4 7860
      3 7070
      3  770
      end
      label values skill_profile skill_profile
      label def skill_profile 1 "low general skills", modify
      label def skill_profile 2 "high general skills", modify
      label def skill_profile 3 "low specific skills", modify
      label def skill_profile 4 "high specific skills", modify
      label values ind ind

      Comment


      • #4
        Thanks for that. Is this what you want?

        Code:
        . egen tag = tag(skill_profile ind)
        
        . egen distinct = total(tag) , by(skill_profile)
        
        . tabdisp skill_profile, c(distinct)
        
        ---------------------------------
               skill_profile |   distinct
        ---------------------+-----------
          low general skills |         30
         high general skills |          2
         low specific skills |         15
        high specific skills |          5
        ---------------------------------
        If so, hop, skip and jump towards https://www.stata-journal.com/sjpdf....iclenum=dm0042

        If not, please define how you count the number of industries more precisely.

        Comment


        • #5
          This worked perfectly, thank you! And thanks too for linking to the very useful article.

          Comment

          Working...
          X