Building a Centrality Metrics author filter for the r/Siacoin Subreddit Community
This post builds upon a previous post where we scrapped and built multiple social networks (per month) for the r/Siacoin community.
In this post, we will be computing centrality metrics for each social network. After, we will take the top=n authors from each metric to build a list of unique authors that fall within that criteria for at least one of the previously computed metrics. These authors could potentially be used to help reduce noise.
Motivation: In theory, the most influential authors may be the most invested in the success of the project and could therefore contain the “best” submission text.
Computing Metrics:
Using the ‘edgelist_breakouts’ that were built in our previous post, we can iterate over the keys and compute any metric for that network. In this case, we are just going to be using degree, closeness, betweenness, and pagerank.
def sorter(key):
parts = key.split('-')
year = parts[0]
month = parts[1]
if len(month) == 1:
month = f'0{month}'
return f'{year}-{month}'def metric_sorter(items):
return sorted(
list(items),
key = lambda tup: tup[1],
reverse = True
)metrics = {}print('started:', '@', datetime.datetime.now())keys = list(edgelist_breakouts.keys())
for i, key in enumerate(sorted(keys, key = sorter)):
G = nx.Graph() iteractions = edgelist_breakouts[key]
for interaction in iteractions:
n1 = interaction[0]
n2 = interaction[1] if G.has_edge(n1, n2):
G[n1][n2]['weight'] += 1
else:
G.add_edge(n1, n2, weight = 1)
degree = nx.degree_centrality(G).items()
closeness = nx.closeness_centrality(G).items()
betweenness = nx.betweenness_centrality(G).items()
pagerank = nx.pagerank(G).items()
metrics[key] = {
'Degree': metric_sorter(degree),
'Closeness': metric_sorter(closeness),
'Betweenness': metric_sorter(betweenness),
'Pagerank': metric_sorter(pagerank)
}
print('completed:', key, '@', datetime.datetime.now())
Building a Filter:
Once computed, a simple sort by descending is applied per metric. This gives us access to the top=n authors. Simple slicing from the top can now be applied.
** Warning: bots exist, probably should be filtering these out.
from collections import defaultdictdef filter_metrics(metrics_by_month, take = 10):
potential_filter = defaultdict(lambda: False)
for key in metrics_by_month.keys():
authors = (
author
for author, metric
in metrics_by_month[key][:take]
)
for author in authors:
potential_filter[author] = True return potential_filterpotential_filter = filter_metrics(metrics['2016-10'])
potential_filter### /OUTPUT, 15 authors,
defaultdict(<function __main__.filter_metrics.<locals>.<lambda>()>,
{'in-cred-u-lous': True,
'Fornax96': True,
'Coinosphere': True,
'Toboxx': True,
'cmbartley': True,
'humbrie': True,
'doodlemania': True,
'jacobvschmidt': True,
'Taek42': True,
'bSalm0n': True,
'thederpill': True,
'0nlyNow': True,
'coolfarmer': True,
'wolfchange': True,
'Lorenzo000': True})
In this example for ‘2016–10', 15 unique authors existed in the top=10 across all 4 metrics. We can now use this list to check if an author should have their text included.
author = ''
if potential_filter[author]:
pass