Building a Centrality Metrics author filter for the r/Siacoin Subreddit Community

2 min readApr 23, 2020

This post builds upon a previous post where we scrapped and built multiple social networks (per month) for the r/Siacoin community.

Extracting Social Networks from the r/Siacoin Subreddit Community

Walk-through on scrapping and building out Social Networks per month for the r/Siacoin Subreddit Community.

medium.com

In this post, we will be computing centrality metrics for each social network. After, we will take the top=n authors from each metric to build a list of unique authors that fall within that criteria for at least one of the previously computed metrics. These authors could potentially be used to help reduce noise.

Motivation: In theory, the most influential authors may be the most invested in the success of the project and could therefore contain the “best” submission text.

Computing Metrics:

Using the ‘edgelist_breakouts’ that were built in our previous post, we can iterate over the keys and compute any metric for that network. In this case, we are just going to be using degree, closeness, betweenness, and pagerank.

def sorter(key):
    parts = key.split('-')
    
    year = parts[0]
    month = parts[1]
    
    if len(month) == 1:
        month = f'0{month}'
        
    return f'{year}-{month}'def metric_sorter(items):
    return sorted(
        list(items),
        key = lambda tup: tup[1],
        reverse = True
    )metrics = {}print('started:', '@', datetime.datetime.now())keys = list(edgelist_breakouts.keys())
for i, key in enumerate(sorted(keys, key = sorter)):
    
    G = nx.Graph()    iteractions = edgelist_breakouts[key]
    for interaction in iteractions:
        n1 = interaction[0]
        n2 = interaction[1]    if G.has_edge(n1, n2):
        G[n1][n2]['weight'] += 1
    else:
        G.add_edge(n1, n2, weight = 1)
            
    degree = nx.degree_centrality(G).items()
    closeness = nx.closeness_centrality(G).items()
    betweenness = nx.betweenness_centrality(G).items()
    pagerank = nx.pagerank(G).items()
         
    metrics[key] = {
        'Degree': metric_sorter(degree),
        'Closeness': metric_sorter(closeness),
        'Betweenness': metric_sorter(betweenness),
        'Pagerank': metric_sorter(pagerank)
    }
    
    print('completed:', key, '@', datetime.datetime.now())

Building a Filter:

Once computed, a simple sort by descending is applied per metric. This gives us access to the top=n authors. Simple slicing from the top can now be applied.

** Warning: bots exist, probably should be filtering these out.

from collections import defaultdictdef filter_metrics(metrics_by_month, take = 10):
    potential_filter = defaultdict(lambda: False)
    for key in metrics_by_month.keys():
        authors = (
            author
            for author, metric
            in metrics_by_month[key][:take]
        )
        for author in authors:
            potential_filter[author] = True    return potential_filterpotential_filter = filter_metrics(metrics['2016-10'])
potential_filter### /OUTPUT, 15 authors,
defaultdict(<function __main__.filter_metrics.<locals>.<lambda>()>,
            {'in-cred-u-lous': True,
             'Fornax96': True,
             'Coinosphere': True,
             'Toboxx': True,
             'cmbartley': True,
             'humbrie': True,
             'doodlemania': True,
             'jacobvschmidt': True,
             'Taek42': True,
             'bSalm0n': True,
             'thederpill': True,
             '0nlyNow': True,
             'coolfarmer': True,
             'wolfchange': True,
             'Lorenzo000': True})

In this example for ‘2016–10', 15 unique authors existed in the top=10 across all 4 metrics. We can now use this list to check if an author should have their text included.

author = ''
if potential_filter[author]:
    pass

Building a Text Generator based on the most influential authors submissions in the r/Siacoin…

Building an LSTM model using Keras based on the content produced by the most influential authors in the r/Siacoin…

medium.com

Jupyter Notebook:

Building an r/siacoin Centrality Metrics filter to reduce author content

Building a Centrality Metrics author filter for the r/Siacoin Subreddit Community

Extracting Social Networks from the r/Siacoin Subreddit Community

Walk-through on scrapping and building out Social Networks per month for the r/Siacoin Subreddit Community.

Computing Metrics:

Building a Filter:

Building a Text Generator based on the most influential authors submissions in the r/Siacoin…

Building an LSTM model using Keras based on the content produced by the most influential authors in the r/Siacoin…

Jupyter Notebook:

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Slaps Lab

No responses yet