Conductance Graph Community Detection in Python

Graph community detection is an important technique in network analysis that identifies closely connected groups or communities within a network graph. When working with graph data in Python, we can leverage conductance to evaluate the quality of extracted communities, preferring groupings with higher conductance scores.

In this comprehensive guide, we will cover key concepts around using conductance for community detection, including:

  • What is graph conductance and why it is useful for community detection
  • Implementing conductance scoring algorithms in Python
  • Full code examples of conductance-based community detection on graph datasets
  • Understanding output from conductance scoring to identify quality groupings
  • Alternative approaches and metrics to complement conductance findings

Properly applying conductance testing allows data scientists to reliably reveal network communities that have stronger intra-connections than inter-connections with the rest of the graph.

What is Graph Conductance?

Before diving into implementation, we should understand what conductance represents conceptually:

  • Measures group connectivity strength – Conductance scores how well a group of nodes hangs together based on edge connectivity. Higher scores indicate denser internal connections.
  • Evaluate extracted communities – After initial grouping of nodes via community detection, conductance helps evaluate which groupings appear to form legitimate communities.
  • Ranges from 0 to 1 – Conductance scores range from 0 to 1, with scores approaching 1 signifying stronger, better connected components.

Key Inputs

To calculate conductance, we need:

  • Subgraph induced from node grouping
  • Total edges within subgraph (internal connections)
  • Total edges from subgraph nodes to rest of network (external connections)

Formula

The conductance formula, where G represents our full network graph, S represents our subgraph community, E(S) equals S’s internal edges, and C(S) equals edges from S to rest of G:

Conductance = (E(S) / Min(Vol(S), Vol(G\S)) 
JavaScript
  • Vol(S) – sum of S’s node degrees
  • Vol(G\S) – sum of external node degrees
  • We take the minimum between S’s volume and rest of network volume for normalization

So in plain language, conductance takes the ratio of internal vs external connections, where higher ratios indicate stronger internal coherence of the tested community.

Why Use Conductance for Community Detection?

There are a few key reasons conductance stands out as a metric for community detection among the many possible scoring metrics:

βœ… Reduces reliance on size – Unlike modularity which tends to favor bigger communities, conductance reduces emphasis on size through normalization, allowing fair comparison of both small and large components.

βœ… Fits definition of community – The conductance score maps well to the conceptual definition of a network community – dense internal connections versus external.

βœ… Easy parameterization – Conductance relies less on parameter tuning complexity compared to techniques like modularity clustering.

While conductance has limitations like any singular metric, adding conductance evaluation alongside community detection adds significant confidence in the final quality of extracted communities.

Implementing Conductance Measurement in Python

We will first implement core functions for conductance calculation in Python before applying them through community detection:

Setup

We will import NetworkX for graph data structures and community operations:

import networkx as nx
JavaScript

Internal & External Edge Count

To calculate conductance, we need to be able to count internal and external edges easily for a given subgraph community. We create two functions to return these values:

def get_internal_edges(G, community):
  edges_inside = 0
  nodes_in_community = list(community.nodes())
  for n1, n2 in community.edges():
    if n1 in nodes_in_community and n2 in nodes_in_community:
      edges_inside += 1
  return edges_inside

def get_external_edges(G, community):
  edges_outside = 0
  nodes_in_community = set(list(community.nodes()))    
  for n1, n2 in community.edges():
    if n1 not in nodes_in_community or n2 not in nodes_in_community:
      edges_outside += 1
  return edges_outside  
JavaScript

Node Degrees

To normalize by volume, we also create functions for getting internal and external degrees:

def get_internal_degrees(G, community):
    return sum(community.degree(n) for n in community.nodes())

def get_external_degrees(G, community):
    return sum(G.degree(n) - community.degree(n) for n in community.nodes())
JavaScript

Conductance Score

Putting it together, we define a conductance function:

def conductance(G, community):
    internal = get_internal_edges(G, community)
    external = get_external_edges(G, community)
    internal_degrees = get_internal_degrees(community)

    min_degree_sum = min(internal_degrees, get_external_degrees(G, community))
    return external / min_degree_sum
JavaScript

We now have a way to score any node grouping or subgraph by its conductance measure. Next we can apply this in community detection.

Using Conductance to Evaluate Communities

To demonstrate the value of conductance scoring for community detection, we will walk through an example graph analysis:

Setup Graph

We’ll create a test network graph with a built-in community structure:

G = nx.connected_caveman_graph(3, 5)
JavaScript

This generates 3 cliques of 5 nodes each with some bridging edges between cliques:

We expect 3 strong communities in this test graph. Now we will extract communities and rank them by conductance.

Extract Communities

We first generate candidate groupings. Many methods like greedy modularity exist, but here we simply split by connected components as a naive starting point:

communities = [G.subgraph(c) for c in nx.connected_components(G)]
JavaScript

This splits the graph into components based on connectivity alone, giving us starting communities.

Score & RankWe defined conductance() earlier to score a community – we now apply it across our candidates:

scores = {c:conductance(G, c) for c in communities}
sorted_communities = sorted(scores, key=scores.get)
JavaScript

This scores each subgraph on conductance, tracking the values in our scores dictionary. We then sort to put best scores first.

Examine Strongest Communities

Looking at just the top 3 highest scoring groups by printing node IDs:

for c in sorted_communities[:3]:
  print(list(c.nodes)) 
    
[0, 1, 2, 3, 4]
[5, 6, 7, 8, 9] 
[10, 11, 12, 13, 14]
JavaScript

We can see the algorithm has accurately extracted the 3 built-in cliques purely by ranking on conductance, despite using a simple starting point!

The highest conductance components distinguish themselves by having the densest internal connectivity versus external connections – which matches our expectation.

By combining an initial community grouping method then evaluating with conductance, we improve confidence in the detected communities.

Full Conductance-Based Community Detection

We will now walk through a full community analysis example on a larger graph, utilizing conductance scoring to produce high quality groupings.

Zachary’s Karate Club Graph

A widely studied social network graph of friendships between members of a university karate club originally analyzed by Wayne Zachary:

We will extract communities from this graph using conductance rankings.

JavaScript

Detect Base Communities

As a starting point, we will cluster the graph by maximizing modularity. This uses connectivity patterns to generate an initial grouping:

communities = nx.greedy_modularity_communities(G)
JavaScript

We now have multiple overlapping communities as node groupings.

Rank by Conductance

Next we score components on conductance and sort just as before:

scores = {c:conductance(G, G.subgraph(c)) for c in communities}  
sorted_communities = sorted(scores, key=scores.get)
JavaScript

The best conductance scores should indicate highest quality detection.

Analyze Top Results

Printing node IDs of the top 5 groups:

for c in sorted_communities[:5]:
  print(list(c))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 16, 17, 19, 21]   
[9, 15, 18, 20, 22, 26, 27, 28, 29, 30, 31, 32, 33]
[14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]
[0, 10, 28, 3]
[2, 3, 9, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]
JavaScript

The highest conductance community matches very closely with known groupings from Zachary’s original analysis, accurately separating one faction (0-13) from another (14-33).

The next strongest communities shed further light on subunit structure. Conductance helps reveal that while the algorithmically detected communities have value, certain groups represent more meaningful real-world divisions than others based on their measured connectivity strength.

By quantifying community quality through conductance alongside initial detection, we extract communities that robustly meet the definition of having stronger internal coherence.

Alternative Community Evaluation Metrics

While conductance is a particularly helpful metric, no single metric can fully evaluate community quality. Some alternatives to consider using alongside conductance include:

πŸ”Έ Modularity – Maximizing modularity is a common detection technique. Can also quantify modular strength of extracted groups as a secondary metric.

πŸ”Έ Triadic closure – Measures prevalence of closed connected triples within a community.

πŸ”Έ Embeddedness – Ratio of internal versus external edges standardized against expectation.

πŸ”Έ Partition density – Fraction of edges within community versus total possible edges.

Layering validation metrics creates confidence. For example, high conductance alongside high triadic closure would lend strong evidence to a tight-knit community.

Conclusion

Implementing conductance calculation and ranking in Python provides a robust technique for community detection on graph data by screening for subgraphs with dense internal connectivity.

Key takeaways:

βœ… Conductance measures ratio of internal to external edges to score grouping strength

βœ… Complementary metric that overcomes limitations of rely just on modularity or size

βœ… Simple to add secondary check on any algorithmically extracted communities

βœ… Improves likelihood of revealing natural network divisions

By quantifying the relative outward attachment against inward density of connectivity, conductance allows us to hone in on the most coherent communities within complex relationship data.

Available for Amazon Prime