Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating weighted network

    In this dataset, a product is being produced by id1 and moved to id2, and (part of it) looks as follows:
    Code:
    ssc install nwcommands
    clear
    input id1 id2
    1 2
    1 1
    3 4
    5 6
    3 3
    6 4
    7 3
    6 6
    2 3
    2 2
    4 5
    3 9
    8 8
    7 8
    3 4
    8 4
    4 7
    3 2
    3 2
    9 1
    9 9
    3 6
    5 1
    end
    
    // setting it as network data and plotting the network
    nwset id1 id2, edgelist name(test)
    nwplot test
    However, some rows id1 and id2 are identical, meaning that the product was not moved. To paint a realistic picture, is it possible to incorporate the fact that some products are not moved in a network? I.e. that the network only represents a part of all the products. Perhaps via a weighted network, where the weight of the tie is the percentage of products moved from node i to node j relative to the total amount of products node i produced. For that I would first need to know how many products are moved between node i and node j, and how many of node i's product are not being moved (where id1 == node i, and id2 == node i)

    I have used this website's search function and looked at the help page for the nwcommands package, but I did not find what I am looking for. Nwcommands does include options to value ties between nodes, but I'm not sure as that is what I am looking for.
    Last edited by Dougie Jones; 08 Aug 2018, 05:43.

  • #2
    Some progress on my part. I've first decided to drop rows where id1 == id2. Then to count the duplicate rows by id1 and id2, following by categorizing them into several categories.
    Code:
    ssc install nwcommands
    clear
    input id1 id2
    1 2
    1 1
    3 4
    5 6
    3 3
    6 4
    7 3
    6 6
    2 3
    2 2
    4 5
    3 9
    8 8
    7 8
    3 4
    8 4
    4 7
    3 2
    3 2
    9 1
    9 9
    3 6
    5 1
    end
    
    drop if id1 == id2
    
    
    * Counting how often each connection is used (count duplicates), and then assigning values for strength of ties
    sort id1 id2
    quietly by id1 id2:  gen dup = cond(_N==1,0,_n)
    tab dup
    gen value = 1
    replace value = 2 if dup > 1 // atleast 2 products were moved from node i to node j
    replace value = 3 if dup > 2 // atleast 3
    replace value = 4 if dup > 3 // atleast 4
    drop dup
    tab value
    
    nwset id1 id2 value, edgelist name(test) directed
    
    nwtabulate test
    I believe that when I tabulate my network with -nwtabulate test-, I can see how often each value occurs. But the result does not look similar to when I tabulate my data with -tab value-, before setting the network.

    Does anyone have some pointers or suggestions?

    Comment


    • #3
      My understanding is that each observation per id1 represents one unit of product, regardless of whether it was moved or not, and that each id1/id2 pair represents a transfer of one unit of product from id1 to id2. On those assumptions, here's a way to do what I think you want. What I have is likely not the shortest or most efficient approach, but I wanted it to be transparent and demonstrate some technique. This gives you an edge list, with weights on each edge, which you can feed to -nwcommands-.
      Code:
      bysort id1: gen nproduct = _N // amount of production, moved or not
      drop if (id1 == id2) // product not moved
      bysort id1 id2: gen nmoved12 =  _N  if (_n==1)  // total moved from id1 to id2
      drop if missing(nmoved12) // variable only calculated once per id1/id2 pair
      gen pctmoved = 100 * nmoved12/nproduct

      Comment


      • #4
        Thank you, Mike! That was actually very helpful. I decided that I will subset the data prior to feeding it to nwcommands, i.e. selecting the connections above a threshold pctmoved and then creating a network.

        Comment

        Working...
        X