Visual of the Extracted r/Siacoin Social Networks per Month

Extracting Social Networks from the r/Siacoin Subreddit Community

Slaps Lab
3 min readApr 21, 2020

A few years back I worked on a project to analyze the most influential authors (top=n) in a subreddit. I always wanted to circle back and extract other subreddit communities but never had the time/energy. This post is about documenting the extraction journey, from start to finish, for the r/siacoin subreddit.

** All Jupyter Notebook links can be found at the end of the post.

Procedure:

  1. Retrieve Post/Comment data for r/Siacoin.
  2. Building Edgelists per month.

Retrieving the Data:

I leveraged a previous post. The full notebook for this example can be found below.

## single record output,{
'id': '4fz06l',
'type': 'submission',
'post_id': '4fz06l',
'author': 'deleted',
'text': '[deleted]',
'created_at': 1461368505.0
}

Building Edgelists per month:

I decided to group the data by month and post. I wanted to build an edge between two authors who appeared together on the same ‘post_id’ within the same month. This is an unique edge, so if an author posted multiple times, that would only create a single edge. In order to accomplish this, I grouped all the records by ‘month’ and ‘post_id’.

from collections import defaultdictbreakouts = {}
for record in filtered_dataset:
a_key = record['year']
if a_key not in breakouts:
breakouts[a_key] = {}

b_key = record['month']
if b_key not in breakouts[a_key]:
breakouts[a_key][b_key] = defaultdict(list)

c_key = record['post_id']
breakouts[a_key][b_key][c_key].append(record)

This results in 49 groups (at the time of this writing 4/2020). Posts with only one unique author were filtered out. All ‘deleted’ authors were also filtered out.

An unique edge was created by grabbing the unique authors for a given post and computing the combinations between each author.

import numpy as npedgelist_breakouts = {}
for year_key in breakouts.keys():
for month_key in breakouts[year_key].keys():
edgelist = []
for post_key in breakouts[year_key][month_key].keys():
posts = breakouts[year_key][month_key][post_key]
authors = list(
map(
lambda interaction: interaction['author'],
posts
)
)
for a, b in combinations(np.unique(authors), 2):
sort = sorted([a, b], key = lambda a: a.lower())
edgelist.append(
(sort[0], sort[1])
)

edgelist_breakouts[f'{year_key}-{month_key}'] = edgelist

Building out a Network:

G = nx.Graph()

for interaction in edgelist_breakouts['2017-9']:
n1 = interaction[0]
n2 = interaction[1]
if G.has_edge(n1, n2):
G[n1][n2]['weight'] += 1
else:
G.add_edge(n1, n2, weight = 1)
By Month: 2017–9 (see notebook)

Jupyter Notebooks

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Slaps Lab
Slaps Lab

Written by Slaps Lab

Focused on generating original, compelling, short stories through the use of Artificial Intelligence.

No responses yet

Write a response