Building a Centrality Metrics author filter for the r/Siacoin Subreddit Community

Slaps Lab
2 min readApr 23, 2020

--

This post builds upon a previous post where we scrapped and built multiple social networks (per month) for the r/Siacoin community.

In this post, we will be computing centrality metrics for each social network. After, we will take the top=n authors from each metric to build a list of unique authors that fall within that criteria for at least one of the previously computed metrics. These authors could potentially be used to help reduce noise.

Motivation: In theory, the most influential authors may be the most invested in the success of the project and could therefore contain the “best” submission text.

Computing Metrics:

Using the ‘edgelist_breakouts’ that were built in our previous post, we can iterate over the keys and compute any metric for that network. In this case, we are just going to be using degree, closeness, betweenness, and pagerank.

def sorter(key):
parts = key.split('-')

year = parts[0]
month = parts[1]

if len(month) == 1:
month = f'0{month}'

return f'{year}-{month}'
def metric_sorter(items):
return
sorted(
list(items),
key = lambda tup: tup[1],
reverse = True
)
metrics = {}print('started:', '@', datetime.datetime.now())keys = list(edgelist_breakouts.keys())
for i, key in enumerate(sorted(keys, key = sorter)):

G = nx.Graph()
iteractions = edgelist_breakouts[key]
for interaction in iteractions:
n1 = interaction[0]
n2 = interaction[1]
if G.has_edge(n1, n2):
G[n1][n2]['weight'] += 1
else:
G.add_edge(n1, n2, weight = 1)

degree = nx.degree_centrality(G).items()
closeness = nx.closeness_centrality(G).items()
betweenness = nx.betweenness_centrality(G).items()
pagerank = nx.pagerank(G).items()

metrics[key] = {
'Degree': metric_sorter(degree),
'Closeness': metric_sorter(closeness),
'Betweenness': metric_sorter(betweenness),
'Pagerank': metric_sorter(pagerank)
}

print('completed:', key, '@', datetime.datetime.now())

Building a Filter:

Once computed, a simple sort by descending is applied per metric. This gives us access to the top=n authors. Simple slicing from the top can now be applied.

** Warning: bots exist, probably should be filtering these out.

from collections import defaultdictdef filter_metrics(metrics_by_month, take = 10):
potential_filter = defaultdict(lambda: False)
for key in metrics_by_month.keys():
authors = (
author
for author, metric
in metrics_by_month[key][:take]
)
for author in authors:
potential_filter[author] = True
return potential_filterpotential_filter = filter_metrics(metrics['2016-10'])
potential_filter
### /OUTPUT, 15 authors,
defaultdict(<function __main__.filter_metrics.<locals>.<lambda>()>,
{'in-cred-u-lous': True,
'Fornax96': True,
'Coinosphere': True,
'Toboxx': True,
'cmbartley': True,
'humbrie': True,
'doodlemania': True,
'jacobvschmidt': True,
'Taek42': True,
'bSalm0n': True,
'thederpill': True,
'0nlyNow': True,
'coolfarmer': True,
'wolfchange': True,
'Lorenzo000': True})

In this example for ‘2016–10', 15 unique authors existed in the top=10 across all 4 metrics. We can now use this list to check if an author should have their text included.

author = ''
if
potential_filter[author]:
pass

Jupyter Notebook:

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Slaps Lab
Slaps Lab

Written by Slaps Lab

Focused on generating original, compelling, short stories through the use of Artificial Intelligence.

No responses yet

Write a response