Select Page

Have you already tried to qualify your audience on Twitter? Several tools already exist, but I have been curious to do it with the ones I already have;

  • R  to extract data, and for social network analysis,
  • and Gephi to visualise my followers communities.

In these article, I propose you to explore you Twitter audience, from the data extraction with R to identify the centers of interest of your followers. I’ll manipulate exploratory methods: social network analysis (SNA) and Natural Language Processing (NLP). This 1st post will be dedicated to the data extraction on Twitter, and to community detection among your followers.

Authentification Process on Twitter

This small paragraph is dedicated to twitter API newbies. You can already find a lot of resources about Twitter data extracction on Internet. So this step will be short:

To extract data on Twitter, you need to register your app with your Twitter account here: https://apps.twitter.com/

After having created you app, you’ll have access to your identification codes, in the tab “Keys and access tokens”. The codes that are interesting for us are:

  • Consumer Key (API Key)
  • Consumer Secret (API Secret)
  • Access Token
  • Access Token Secret

Copy-past it in your text editor, we’ll need it to authentificate our Twitter session in R.

Authenticate yourself in R

You should now be authentificate on Twitter

What do we extract?

I propose you to identify your followers communities. Who are they? And how do they interact? We have first to define the data we want to extract. It’ll be our followers and their follow and friendships among our own followers community.

Data extraction on Twitter with R

In this part, I adapted a code from Jörg Steinkamp who had the curiosity to visualise the origin of his information on twitter and to identify his own twitter bubble (sadly well-known since the American election in 2016). You can read his original code and his article on his blog: https://joergsteinkamp.wordpress.com/2016/04/20/my-twitter-bubble-r-analysis/

I adapted this code to try to understand my audience and not my bubble, that is to say, to extract my followers networks (friendships+follows).

Warning, the extraction can be time consuming, it has been more than 2 hours for me with only 225 followers

Detect communities with Igraph (R)

In order to detect communities among my followers, I used the Spinglass clustering from the R Igraph package(Reichardt et Bornholdt, 2008 ; Traag et Bruggeman, 2009). The originality of this algorithm is to consider the network as an energetic system, taking into account not only the links in between entities but also the missing links in between entities, according to an attraction/repulsion principle. The communities are built according to the cut that minimizes energy in the system.

Optimize the clustering

Like a classical classification algorithm (knn), the spinglass algorithm needs an apriori definition of the number of groups (called spins). The default number of spins in Igraph is 25. This initial number of spins will not define the final number of communities.

As a first step, it’s this number of spins that I’ll try to optimise, to have the best clustering possible in between 10 and 50 spins. From those 40 clusterings, I’ll choose the best, that is to say, the one who is maximizing my network’s modularity.

 

The result: for my 215 followers network (and 2074 links) it’s an 11 initial spins that works the best. (note that the difference is not really huge in between the clustering, but maybe one day, it can make a huge difference)

social network analysis, spinglass, network, clustering, optimisation, modularity, igraph, spinglass clutering

Verify that your communities are balanced:

Export your data for a Gephi vizualisation

This is the last part of our code; that will allow us to export our data to visualize it on Gephi. We have to prepare 2 files:

  • an edgelist that describes the friendships and follows links on Twitter,
  • a node file describing each of my follower with: an Id, a Label (screenName), a Weight (his followers number), and his community index, we computed with the spinglass clustering

This piece of code isn’t the most elegant one on earth but it works 🙂

Visualization on Gephi

I’ll not go into details on how to render a picture in Gephi. You’ll find super nice resources on the Gephi website: https://gephi.org/users/ and on Martin Grandjean website: http://www.martingrandjean.ch/gephi-introduction/

 

This is the results with my followers network:

data visualisation, twitter followers network, gephi, twitter audience, community detection