Tuesday, November 4, 2014

Exploratory Analysis of an Example Network

This is an exploratory analysis of “American College football: network of American football games between Division IA colleges during regular season Fall 2000” data was downloaded from http://www-personal.umich.edu/~mejn/netdata/ :

Summary

library(igraph)
setwd("DIRECTORY_WHERE_GML_FILE_IS");
g<- read.graph("football.gml", format = c("gml"))
# Get the Vertex count
vcount(g)
## [1] 115
# Get the Edge count
ecount(g)
## [1] 613
# Use the summary function
summary(g);
## IGRAPH U--- 115 613 -- 
## attr: id (v/n), label (v/c), value (v/n)
Explanation of Summary Output
In the summary output the first character “U” after “IGRAPH” shows that this is an undirected graph. The second letter is " _ “, although blank here it may denote a character such as ‘N’ which would imply this is a named graph i.e. a graph with the name vertex attribute set. The third character in this particular case is also”_“, although blank here it may denote a character such as ‘W’ meaning a weighted graph. Finaly the fourth character although blank here may often have a character such as ‘B’ i.e. Bipatriate graph with the type vertex attribute set e.g. players and clubs.
This graph has 115 nodes and 613 edges.
After the final two dashes, the name of the graph is printed. Again on this case we do not have a name set for the graph.
In the following line we see that there are three attributes in the graph
  • id(v/n) (belongs to ‘v’ i.e. the attribute belongs to a vertex and is of type “n” i.e. numeric)
  • label(v/c) (belongs to ‘v’ i.e. the attribute belongs to a vertex and is of type “c” i.e. character)
  • value(v/n) (belongs to ‘v’ i.e. the attribute belongs to a vertex and is of type “n” i.e. numeric)
One may also use the following function to query the vertex attributes:
list.vertex.attributes(g)
## [1] "id"    "label" "value"
Vertices and Edges
# Print all attributes of vertices
vertex.attributes(g)
## $id
##   [1]   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
##  [18]  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33
##  [35]  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50
##  [52]  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67
##  [69]  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84
##  [86]  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101
## [103] 102 103 104 105 106 107 108 109 110 111 112 113 114
## 
## $label
##   [1] "BrighamYoung"         "FloridaState"         "Iowa"                
##   [4] "KansasState"          "NewMexico"            "TexasTech"           
##   [7] "PennState"            "SouthernCalifornia"   "ArizonaState"        
##  [10] "SanDiegoState"        "Baylor"               "NorthTexas"          
##  [13] "NorthernIllinois"     "Northwestern"         "WesternMichigan"     
##  [16] "Wisconsin"            "Wyoming"              "Auburn"              
##  [19] "Akron"                "VirginiaTech"         "Alabama"             
##  [22] "UCLA"                 "Arizona"              "Utah"                
##  [25] "ArkansasState"        "NorthCarolinaState"   "BallState"           
##  [28] "Florida"              "BoiseState"           "BostonCollege"       
##  [31] "WestVirginia"         "BowlingGreenState"    "Michigan"            
##  [34] "Virginia"             "Buffalo"              "Syracuse"            
##  [37] "CentralFlorida"       "GeorgiaTech"          "CentralMichigan"     
##  [40] "Purdue"               "Colorado"             "ColoradoState"       
##  [43] "Connecticut"          "EasternMichigan"      "EastCarolina"        
##  [46] "Duke"                 "FresnoState"          "OhioState"           
##  [49] "Houston"              "Rice"                 "Idaho"               
##  [52] "Washington"           "Kansas"               "SouthernMethodist"   
##  [55] "Kent"                 "Pittsburgh"           "Kentucky"            
##  [58] "Louisville"           "LouisianaTech"        "LouisianaMonroe"     
##  [61] "Minnesota"            "MiamiOhio"            "Vanderbilt"          
##  [64] "MiddleTennesseeState" "Illinois"             "MississippiState"    
##  [67] "Memphis"              "Nevada"               "Oregon"              
##  [70] "NewMexicoState"       "SouthCarolina"        "Ohio"                
##  [73] "IowaState"            "SanJoseState"         "Nebraska"            
##  [76] "SouthernMississippi"  "Tennessee"            "Stanford"            
##  [79] "WashingtonState"      "Temple"               "Navy"                
##  [82] "TexasA&M"             "NotreDame"            "TexasElPaso"         
##  [85] "Oklahoma"             "Toledo"               "Tulane"              
##  [88] "Mississippi"          "Tulsa"                "NorthCarolina"       
##  [91] "UtahState"            "Army"                 "Cincinnati"          
##  [94] "AirForce"             "Rutgers"              "Georgia"             
##  [97] "LouisianaState"       "LouisianaLafayette"   "Texas"               
## [100] "Marshall"             "MichiganState"        "MiamiFlorida"        
## [103] "Missouri"             "Clemson"              "NevadaLasVegas"      
## [106] "WakeForest"           "Indiana"              "OklahomaState"       
## [109] "OregonState"          "Maryland"             "TexasChristian"      
## [112] "California"           "AlabamaBirmingham"    "Arkansas"            
## [115] "Hawaii"              
## 
## $value
##   [1]  7  0  2  3  7  3  2  8  8  7  3 10  6  2  6  2  7  9  6  1  9  8  8
##  [24]  7 10  0  6  9 11  1  1  6  2  0  6  1  5  0  6  2  3  7  5  6  4  0
##  [47] 11  2  4 11 10  8  3 11  6  1  9  4 11 10  2  6  9 10  2  9  4 11  8
##  [70] 10  9  6  3 11  3  4  9  8  8  1  5  3  5 11  3  6  4  9 11  0  5  4
##  [93]  4  7  1  9  9 10  3  6  2  1  3  0  7  0  2  3  8  0  4  8  4  9 11
#OR you could use short form V(g)$id, V(g)$label, V(g)$value
head(V(g)$id)
## [1] 0 1 2 3 4 5
head(V(g)$label)
## [1] "BrighamYoung" "FloridaState" "Iowa"         "KansasState" 
## [5] "NewMexico"    "TexasTech"
head(V(g)$value)
## [1] 7 0 2 3 7 3
Layout
#default
plot(g)
#circle
plot(g, layout=layout.circle)
#fruchterman
plot(g, layout=layout.fruchterman.reingold)
# turn off labels and resize vertices
plot(g, vertex.label=NA, vertex.size=8)
#circle turn off labels and resize vertices
plot(g, layout=layout.circle, vertex.label=NA, vertex.size=8)
#fruchterman turn off labels and resize vertices
plot(g, layout=layout.fruchterman.reingold, vertex.label=NA, vertex.size=8)
# add color
g <- set.vertex.attribute(g, "color", value=get.vertex.attribute(g,'value'))
plot(g, vertex.label=NA, vertex.size=8)
#circle turn off labels and resize vertices
plot(g, layout=layout.circle, vertex.label=NA, vertex.size=8)
#fruchterman turn off labels and resize vertices
plot(g, layout=layout.fruchterman.reingold, vertex.label=NA, vertex.size=8)
Degree Distribution
hist(degree(g),breaks=c(min(degree(g)):max(degree(g))),labels=T,xlab="Number of Games",ylab="Number of Teams")
Split the Graphs
# Get the sorted and a unique vector conference  division
indexes <- c(sort(unique(V(g)$value)))
for(i in indexes) {
    # just select the ones belonging to this conference
    gsub <- induced.subgraph(graph=g, c(which(V(g)$value==i)))
    title <- paste("Conference: ", i)
    plot(gsub, layout=layout.circle, vertex.size=3, main=title)
}












Detecting Communities
# Get community
gc <- walktrap.community(g)
plot(gc, g, layout=layout.fruchterman.reingold, vertex.size=8, vertex.label=NA)
dendPlot(gc)
# Get Memberships
membership(gc)
##   [1]  1 10  9  7  1  7  9  8  8  1  7  1  4  9  4  9  1  5  4  3  5  8  8
##  [24]  1  1 10  4  5  1  3  3  4  9 10  4  3  2 10  4  9  7  1  4  4  2 10
##  [47]  6  9  2  6  1  8  7  6  4  3  5  2  2  2  9  4  5  2  9  5  2  6  8
##  [70]  1  5  4  7  6  7  2  5  8  8  3  3  7  3  6  7  4  2  5  6 10  1  2
##  [93]  2  1  3  5  5  2  7  4  9  3  7 10  1 10  9  7  8 10  6  8  2  5  6
# Get sizes of community
sizes(gc)
## Community sizes
##  1  2  3  4  5  6  7  8  9 10 
## 14 14 10 14 12  9 12 10 11  9
barplot(sizes(gc))
# Get community
gc <- edge.betweenness.community(g)
plot(gc, g, layout=layout.fruchterman.reingold, vertex.size=8, vertex.label=NA)
dendPlot(gc)
# Get Memberships
membership(gc)
##   [1]  1  2  3  4  1  4  3  1  1  1  4  5  6  3  6  3  1  7  6  8  7  1  1
##  [24]  1  5  2  6  7  5  8  8  6  3  2  6  8  6  2  6  3  4  1  6  6  9  2
##  [47] 10  3  9 10  5  1  4 10  6  8  7  9  7  7  3  6  7  7  3  7  9 10  1
##  [70]  5  7  6  4 10  4  9  7  1  1  8  8  4  4 10  4  6  9  7 10  2  5  9
##  [93]  9  1  8  7  7  7  4  6  3  8  4  2  1  2  3  4  1  2 10  1  9  7 10
# Get sizes of community
sizes(gc)
## Community sizes
##  1  2  3  4  5  6  7  8  9 10 
## 18  9 11 13  6 15 16  9  9  9
barplot(sizes(gc))
# Get community
gc <- fastgreedy.community(g)
plot(gc, g, layout=layout.fruchterman.reingold, vertex.size=8, vertex.label=NA)
dendPlot(gc)
# Get Memberships
membership(gc)
##   [1] 3 3 5 5 5 5 1 2 2 2 5 5 4 1 4 1 2 6 4 3 6 2 2 2 5 3 4 6 5 3 3 4 1 3 4
##  [36] 3 6 3 4 1 5 2 6 4 6 3 2 1 6 2 5 2 5 2 4 3 6 6 6 6 1 4 6 6 1 6 6 2 2 5
##  [71] 6 4 5 2 5 6 6 2 2 3 3 5 3 5 5 4 6 6 2 3 5 6 6 3 3 6 6 6 5 4 1 3 5 3 2
## [106] 3 1 5 2 3 2 2 6 6 2
# Get sizes of community
sizes(gc)
## Community sizes
##  1  2  3  4  5  6 
## 10 23 21 13 21 27
barplot(sizes(gc))