NetworkX is a graph analysis library for Python. It has become the standard library for anything graphs in Python. In addition, it’s the basis for most libraries dealing with graph machine learning. Stellargraph in particular requires an understanding of NetworkX to construct graphs.

Below is an overview of the most important API methods. The official documentation is extensive but it remains often confusing to make things happen. Some simple questions (adding arrows, attaching data…) are usually answered in StackOverflow, so the guide below collects these simple but important questions.

## General remarks

The library is flexible but these are my golden rules:

• do not use objects to define nodes, rather use integers and set data on the node. Layout has issues with objects.
• the API changed a lot over the versions, make sure when you find an answer somewhere that it matches your version. Often methods and answers do not apply because they relate to an older version.
import networkx as nx
import matplotlib.pyplot as plt
from faker import Faker
faker = Faker()
%matplotlib inline


## Creating graphs

There are various constructors to create graphs, among others:

    # default
G = nx.Graph()
# an empty graph
EG = nx.empty_graph(100)
# a directed graph
DG = nx.DiGraph()
# a multi-directed graph
MDG = nx.MultiDiGraph()
# a complete graph
CG = nx.complete_graph(10)
# a path graph
PG = nx.path_graph(5)
# a complete bipartite graph
CBG = nx.complete_bipartite_graph(5,3)
# a grid graph
GG = nx.grid_graph([2, 3, 5, 2])


## Graph generators

Graph generators produce random graphs with particular properties which are of interest in the context of statistics of graphs. The best-known phenomenon is six degrees of separation which you can find on the internet, our brains, our social network and whatnot.

### Erdos-Renyi

The Erdos-Renyi model is related to percolations and phase transitions but is in general the most generic random graph model.

The first parameter is the amount of nodes and the second a probability of being connected to another one.

    er = nx.erdos_renyi_graph(50, 0.15)
nx.draw(er, edge_color='silver')


### Watts-Strogratz

The Watts-Strogratz model produces small-world properties.

The first parameter is the amound of node then follows the default degree and thereafter the probability of rewiring and edge. So, the rewiring probability is like the mutation of an otherwise fixed-degree graph.

    ws = nx.watts_strogatz_graph(30, 2, 0.32)
nx.draw(ws)


### Barabasi-Albert

The Barabasi-Albert model reproduces random scale-free graphs which are akin to citation networks, the internet and pretty much everywhere in nature.

    ba = nx.barabasi_albert_graph(50, 5)
nx.draw(ba)


You can easily extract the exponential distribution of degrees:

    g = nx.barabasi_albert_graph(2500, 5)
degrees = list(nx.degree(g))
l = [d[1] for d in degrees]
plt.hist(l)
plt.show()


## Drawing graphs

The draw method without additional will present the graph with spring-layout algorithm.

    nx.draw(PG)


There are of course tons of settings and features and a good result is really dependent on your graph and what you’re looking for. If we take the bipartite graph for example it would be nice to see the two sets of nodes in different colors:

    from networkx.algorithms import bipartite
X, Y = bipartite.sets(CBG)
cols = ["red" if i in X else "blue" for i in CBG.nodes() ]

nx.draw(CBG, with_labels=True, node_color= cols)


The grid graph on the other hand is better drawn with the Kamada-Kawai layout in order to see the grid sturcture:

    nx.draw_kamada_kawai(GG)


## Nodes

If you start from scratch the easiest way to define a graph is via the add_edges_from method as shown here:

    G = nx.Graph()
("time","space"),
("gravitation","curvature"),
("gravitation","space"),
("time","curvature"),
])
labels = {}
nx.draw(G, pos=pos, with_labels= True)
nx.draw_networkx_edge_labels(G,pos,
{
("time","space"): "interacts with",
("gravitation","curvature"): "is"
},
label_pos=0.4
)

{('time',
'space'): Text(0.39999997946599436, 0.6000000143526016, 'interacts with'),
('gravitation',
'curvature'): Text(-0.3999999793235416, -0.6000000147443909, 'is')}


The nodes can however be arbitrary objects:

    class Person:
def __init__(self, name):
self.name = name
@staticmethod
def random():
return Person(faker.name())

g = nx.Graph()
a = Person.random()
b = Person.random()
c = Person.random()
# to show the names you need to pass the labels
nx.draw(g, labels = {n:n.name for n in g.nodes()}, with_labels=True)


As mentioned earlier, it’s better to use numbers for the nodes and set the data via the set_node_attributes methods as shown below.

## Edges

Arrows can only be shown if the graph is directed. NetworkX is essentially a graph analysis library and much less a graph visualization toolbox.

    G=nx.DiGraph()
pos = nx.circular_layout(G)
nx.draw(G, pos, with_labels = True , arrowsize=25)
plt.show()


Data can be assigned to an edge on creation

    G=nx.DiGraph()
a = Person.random()
b = Person.random()
G.add_node(0, data = a)
G.add_node(1, data = b)
labelDic = {n:G.nodes[n]["data"].name for n in G.nodes()}
edgeDic = {e:G.get_edge_data(*e)["label"] for e in G.edges}
nx.draw(G,kpos,  labels = labelDic, with_labels=True, arrowsize=25)
nx.draw_networkx_edge_labels(G, kpos, edge_labels= edgeDic, label_pos=0.4)

{(0, 1): Text(-0.19999999999999996, -8.74227765734758e-09, 'knows')}


## Analysis

There many analysis oriented methods in NetworkX, below are just a few hints to get you started.

Let’s assemble a little network to demonstrate the methods.

    gr = nx.DiGraph()

gr.add_node(1, data = {'label': 'Space' })
gr.add_node(2, data = {'label': 'Time' })
gr.add_node(3, data = {'label': 'Gravitation' })
gr.add_node(4, data = {'label': 'Geometry' })
gr.add_node(5, data = {'label': 'SU(2)' })
gr.add_node(6, data = {'label': 'Spin' })
gr.add_node(7, data = {'label': 'GL(n)' })
edge_array = [(1,2), (2,3), (3,1), (3,4), (2,5), (5,6), (1,7)]
import random
for e in edge_array:
#     nx.set_edge_attributes(gr, {e: {'data':{'weight': round(random.random(),2)}}})

labelDic = {n:gr.nodes[n]["data"]["label"] for n in gr.nodes()}
edgeDic = {e:gr.edges[e]["weight"] for e in G.edges}
nx.draw(G,kpos,  labels = labelDic, with_labels=True, arrowsize=25)
o=nx.draw_networkx_edge_labels(G, kpos, edge_labels= edgeDic, label_pos=0.4)


Getting the adjacency matrix gives a sparse matrix. You need to use the todense method to see the dense matrix. There is also a to_numpy_matrix method which makes it easy to integrate with numpy mechanics.

    nx.adjacency_matrix(gr).todense()

matrix([[0, 1, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 1, 0, 0],
[1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]], dtype=int64)


The spectrum of this adjacency matrix can be directly obtained via the adjacency_spectrum method:

    nx.adjacency_spectrum(gr)

array([-0.5+0.8660254j, -0.5-0.8660254j,  1. +0.j       ,  0. +0.j       ,
0. +0.j       ,  0. +0.j       ,  0. +0.j       ])


The Laplacian matrix (see definition here) is only defined for undirected graphs but is just a method away:

    nx.laplacian_matrix(gr.to_undirected()).todense()

matrix([[ 3, -1, -1,  0,  0,  0, -1],
[-1,  3, -1,  0, -1,  0,  0],
[-1, -1,  3, -1,  0,  0,  0],
[ 0,  0, -1,  1,  0,  0,  0],
[ 0, -1,  0,  0,  2, -1,  0],
[ 0,  0,  0,  0, -1,  1,  0],
[-1,  0,  0,  0,  0,  0,  1]], dtype=int64)


If you need to use the edge data in the adjacency matrix this goes via the attr_matrix:

    nx.attr_matrix(gr, edge_attr="weight")

(matrix([[0.  , 0.75, 0.  , 0.  , 0.  , 0.  , 0.96],
[0.  , 0.  , 0.76, 0.  , 0.53, 0.  , 0.  ],
[0.35, 0.  , 0.  , 0.13, 0.  , 0.  , 0.  ],
[0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
[0.  , 0.  , 0.  , 0.  , 0.  , 0.06, 0.  ],
[0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
[0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ]]), [1, 2, 3, 4, 5, 6, 7])


Simple things like degrees are simple to access:

    list(gr.degree)

[(1, 3), (2, 3), (3, 3), (4, 1), (5, 2), (6, 1), (7, 1)]


The shortest path between two vertices is just as simple but please note that there are dozens of variations in the library:

    nx.shortest_path(gr, 1, 6, weight="weight")

[1, 2, 5, 6]


Things like the radius of a graph is defined for undirected graphs:

    nx.radius(gr.to_undirected())

2

nx.find_cores(gr)

{1: 2, 2: 2, 3: 2, 4: 1, 5: 1, 6: 1, 7: 1}


Centrality is also a whole world on its own. If you wish to visualize the betweenness centrality you can use something like:

    cent = nx.centrality.betweenness_centrality(gr)
nx.draw(gr, node_size=[v * 1500 for v in cent.values()], edge_color='silver')


Getting the connected components of a graph.

    nx.is_connected(gr.to_undirected())
comps = nx.components.connected_components(gr.to_undirected())
for c in comps:
print(c)

{1, 2, 3, 4, 5, 6, 7}


### Clique

A clique is a complete subgraph of a particular size. Large, dense subgraphs are useful for example in the analysis of protein-protein interaction graphs, specifically in the prediction of protein complexes.

    def show_clique(graph, k = 4):
'''
Draws the first clique of the specified size.
'''
cliques = list(nx.algorithms.find_cliques(graph))
kclique = [clq for clq in cliques if len(clq) == k]

if len(kclique)>0:
print(kclique[0])
cols = ["red" if i in kclique[0] else "white" for i in graph.nodes() ]
nx.draw(graph, with_labels=True, node_color= cols, edge_color="silver")
return nx.subgraph(graph, kclique[0])
else:
print("No clique of size %s."%k)
return nx.Graph()


Taking the Barabasi graph above and checking for isomorphism with the complete graph of the same size we can check that the found result is indeed a clique of the requested size.

    subg = show_clique(ba,5)
nx.is_isomorphic(subg, nx.complete_graph(5))

[5, 1, 7, 9, 6]

True


    red = nx.random_lobster(100, 0.9, 0.9)
nx.draw(ba)


    petersen = nx.petersen_graph()
nx.draw(petersen)


    G=nx.karate_club_graph()

cent = nx.centrality.betweenness_centrality(G)
nx.draw(G, node_size=[v * 1500 for v in cent.values()], edge_color='silver')


## Graph visualization

As described above, if you want pretty images you should use packages outside NetworkX. The dot and GraphML formats are pretty standard and saving a graph to a particular format is really easy.

For example, here we save the karate-club to GraphML and used yEd to layout things together with a centrality resizing of the nodes.

    G=nx.karate_club_graph()
nx.write_graphml(G, "./karate.graphml")


For Gephi you can use the GML format:

    nx.write_gml(G, "./karate.gml")


## Get and set data

In the context of machine learning and real-world data graphs it’s important that nodes and edges carry data. The way it works in NetworkX can be a bit tricky, so let’s make it clear here how it functions.

### Node get/set goes like this:

    G = nx.Graph()
G.add_node(12, payload = {'id': 44, 'name': 'Swa' })

{'id': 44, 'name': 'Swa'}
Swa


One can also set the data after the node is added:

    G = nx.Graph()
nx.set_node_attributes(G, {12:{'payload':{'id': 44, 'name': 'Swa' }}})

{'id': 44, 'name': 'Swa'}
Swa


### Edge get/set is like so:

    G = nx.Graph()
print(G.get_edge_data(12,15))

stuff


One can also set the data after the edge is added:

    G = nx.Graph()
nx.set_edge_attributes(G, {(12,15): {'payload':{'label': 'stuff'}}})
print(G.get_edge_data(12,15))

stuff


## Pandas

The library has support for import/export from/to Pandas dataframes. This exchange, however, applies to edges and not to nodes. The row of a frame are used to define an edge and if you want to use a frame for nodes or both, you are on your own. It’s not difficult though, let’s take a graph and turn it into a frame.

    g = nx.barabasi_albert_graph(50, 5)
# set a weight on the edges
for e in g.edges:
nx.set_edge_attributes(g, {e: {'weight':faker.random.random()}})

for n in g.nodes:
nx.set_node_attributes(g, {n: {"feature": {"firstName": faker.first_name(), "lastName": faker.last_name()}}})


You can now use the to_pandas_edgeList method but this will only output the weights besides the edge definitions:

    nx.to_pandas_edgelist(g)

source target weight
0 0 5 0.021629
1 0 6 0.294875
2 0 7 0.967585
3 0 8 0.553814
4 0 9 0.531532
220 40 43 0.313282
221 41 42 0.930995
222 42 48 0.380596
223 43 46 0.236164
224 45 47 0.213095

225 rows × 3 columns

    import pandas as pd
import copy

node_dic = {id:g.nodes[id]["feature"] for id in g.nodes} # easy acces to the nodes
rows = [] # the array we'll give to Pandas

for e in g.edges:
row = copy.copy(node_dic[e[0]])
row["sourceId"] = e[0]
row["targetId"] = e[1]
row["weight"] = g.edges[e]["weight"]
rows.append(row)

df = pd.DataFrame(rows)

df

firstName lastName sourceId targetId weight
0 Rebecca Griffin 0 5 0.021629
1 Rebecca Griffin 0 6 0.294875
2 Rebecca Griffin 0 7 0.967585
3 Rebecca Griffin 0 8 0.553814
4 Rebecca Griffin 0 9 0.531532
220 Tyler Morris 40 43 0.313282
221 Mary Bolton 41 42 0.930995
222 Colton Hernandez 42 48 0.380596
223 Michael Moreno 43 46 0.236164
224 Mary Morris 45 47 0.213095

225 rows × 5 columns

Note that you need this denormalization of the node data because you actually need two datasets to describe a graph in a normalized fashion.

## StellarGraph

The StellarGraph library can import directly from NetworkX and Pandas via the static StellarGraph.from_networkx method.

One important thing to note here is that the features on a node as defined above will not work because the framework expects numbers and not strings or dictionaries. If you do take care of this (one-hot encoding and all that) then this following will do:

    from stellargraph import StellarGraph

gs = StellarGraph.from_networkx(g, edge_type_default = "relation", node_type_default="person", node_features = "feature", edge_weight_attr = "weight")

print(gs.info())

StellarGraph: Undirected multigraph
Nodes: 50, Edges: 225

Node types:
default: [50]
Features: none
Edge types: default-default->default

Edge types:
default-default->default: [225]