The Data Analysis Process
Posted on September 19, 2012
Recently, I was chatting with one of my friends about the stuff I do as an applied statistician. What could have been a boring conversation turned out to be an interesting discussion about what people learn in college compared to what they face outside the classrooms.
As a side effect of that talk I decided to create this post about the differences between what we learn in school and textbooks, and what we learn in the real life.
The idealized data analysis process
The impression you may get in your courses about a data analysis project is that of a straight forward process in which you just have to run some script and you will get a beautiful result.
(I really wished all my projects were like this… but that only happens in my dreams)
The real data analysis process
Real data analysis processes look a lot whole different. If you’re planning on making your living as a data analyst, it’s only a matter of time before you hit the wall. Eventually, you will realize that most of your teachers gave you a very biased view of how the data analysis world works. My cartoon version goes more or less like this:
As most of my colleagues, I learned my lessons the hard way. I wished my professors had talked to me about all the dirtiness involved with data collection, data processing, data cleaning, data formatting, data reshaping, as well as the overvalued expectations when applying fancy models and making predictions. Perhaps that would have prevented me some early frustations in my first projects. Anyway, I love my profession and I’m constantly looking on the bright side of my analyses while having fun at the same time.