Select Page

August : studious holidays

After spending almost 2 years evolving in a quite specialized team in data science, I had neglected some aspects in my practice. I did certainly improve in manipulating and modeling data in R. But I omitted two essential steps: namely the acquisition and the extraction of data, and the communication of the results.

 

Dataviz and communication

Hence, I took these two books in my luggages:

This book is a must-read for anyone who has studied a little bit long, has sought to perform very complex statistical methods and wishes to communicate his results. Communication, or the art of telling stories with data, belongs to the soft skills that can make all the difference when starting a career as a data analyst / scientist.

I would have liked to have it in my hands a few years ago, when I started working in a non-academic environment. I recognize indeed several beginner’s mistakes I did, like: showing the dendogram of a classification to a decision makers, put as much information as I could on a slide, and do a lot (maybe too much) pie-charts. Her demonstration on the nullity of pie charts is bluffing, and for this reason this book worth to be read.

Scott Murray explains in a very pedagogical way how to use d3.js. And pedagogy is really needed to learn how to use this library, which makes us manipulate both javascipt, .svg, .css and .html. It’s a step-by-step class with computer jargon-free language, because it is addressed to data-journalists. I have not finished reading it yet, but I am already proud of myself to have been able to understand and install all the web environment, and to have realized my first histogram on d3.js. I will try to do a bit more attractive and interactive stuff for this blog in the next weeks.

 

Extract and manipulate data

For the acquisition and extraction of data, I followed this class on Udemy: Spatial SQL with PostgreSQL by Professor Arthur Lembo. https://www.udemy.com/spatialsql/learn/v4/overview

I have not finished yet. The idea behind taking this class is to refresh some of my old knowledge:

  • in GIS because my knowledge dates back to the year 2000,
  • in SQL because I never followed a real structured training and it starts to be a burden for my daily practice,
  • and to learn how to use PostGIS, that’s beautiful!

I am quite satisfied with my success in installing my PostgreSQL server and importing my first data. You can also see my beginnings in PostgreSQL here: https://berengeregautier.com/importer-sirene-postgresql/. And I am also satisfied to have connected my server to a GIS and to my R sessions. I’m looking forward to manipulate all the power of these tools (an upcoming post is being prepared )

Manipulate science

From a more epistemological point of view, I have also been interested in the way science is mistreated. I have been shocked by the documentary “Merchant of Doubts”  a few months ago. And I thought naively that never ever it could happen in our European democracies. We are not as stupid ! Without being an eco-activist, I have been passionately interested in the debate about endocrine disruptors in Europe, because I was shocked by the way German journalists treated the subject with such irresponsibility. It struck me a few weeks ago with German newspaper articles on the decline in the number of spermatozoa. Those articles were full of non-sense and full of potentially dangerous ideas. I have since read with great interest the exciting investigation of Stéphane Horel, an investigative journalist in the European Parliament, whose book you can find here: https://www.amazon.fr/Intoxication-St%C3%A9phane-HOREL/dp/2707186376

I will need a few more days to summary all the weird stuff I have read in the German press. This analysis has its own place on a blog that talks about data science and science in general. How do we manipulate numbers and spread doubt about scientific studies that stance marks the consensus of the scientific community? The German non-debate about endocrine disruptors is especially relevant.