
Extracting Social Networks from the r/Siacoin Subreddit Community
A few years back I worked on a project to analyze the most influential authors (top=n) in a subreddit. I always wanted to circle back and extract other subreddit communities but never had the time/energy. This post is about documenting the extraction journey, from start to finish, for the r/siacoin subreddit.
** All Jupyter Notebook links can be found at the end of the post.
Procedure:
- Retrieve Post/Comment data for r/Siacoin.
- Building Edgelists per month.
Retrieving the Data:
I leveraged a previous post. The full notebook for this example can be found below.
## single record output,{
'id': '4fz06l',
'type': 'submission',
'post_id': '4fz06l',
'author': 'deleted',
'text': '[deleted]',
'created_at': 1461368505.0
}
Building Edgelists per month:
I decided to group the data by month and post. I wanted to build an edge between two authors who appeared together on the same ‘post_id’ within the same month. This is an unique edge, so if an author posted multiple times, that would only create a single edge. In order to accomplish this, I grouped all the records by ‘month’ and ‘post_id’.
from collections import defaultdictbreakouts = {}
for record in filtered_dataset:
a_key = record['year']
if a_key not in breakouts:
breakouts[a_key] = {}
b_key = record['month']
if b_key not in breakouts[a_key]:
breakouts[a_key][b_key] = defaultdict(list)
c_key = record['post_id']
breakouts[a_key][b_key][c_key].append(record)
This results in 49 groups (at the time of this writing 4/2020). Posts with only one unique author were filtered out. All ‘deleted’ authors were also filtered out.
An unique edge was created by grabbing the unique authors for a given post and computing the combinations between each author.
import numpy as npedgelist_breakouts = {}
for year_key in breakouts.keys():
for month_key in breakouts[year_key].keys():
edgelist = []
for post_key in breakouts[year_key][month_key].keys():
posts = breakouts[year_key][month_key][post_key]
authors = list(
map(
lambda interaction: interaction['author'],
posts
)
)
for a, b in combinations(np.unique(authors), 2):
sort = sorted([a, b], key = lambda a: a.lower())
edgelist.append(
(sort[0], sort[1])
)
edgelist_breakouts[f'{year_key}-{month_key}'] = edgelist
Building out a Network:
G = nx.Graph()
for interaction in edgelist_breakouts['2017-9']:
n1 = interaction[0]
n2 = interaction[1] if G.has_edge(n1, n2):
G[n1][n2]['weight'] += 1
else:
G.add_edge(n1, n2, weight = 1)
