Reflection

My understanding of what a data scientist does has mostly stayed the same- certainly more has stayed the same than has changed over the course of this class. One thing I am certain that has changed, however, is the size of the role of cleaning and manipulating the data between “obtaining it one way or another” and “analyzing” it. At first I was completely ignorant of how data was read in in r, so naturally what is literally typically a single line of code (once the file you are trying to read in is in the current working directory!) - reading data into R - seemed to be a huge part of the process. I would say I thought one quarter of the whole process was reading the data in whereas now I would say that constitutes maybe 5 percent. Cleaning, readying, and manipulating the data once you have it read in can sometimes be a quarter of the work of a project, or at least I would say it is more likely to be a quarter of a project than is simply reading the data in. I think R is fantastic for data science! It truly is a remarkably powerful language. With only a few lines of code you can create a graph or fit a model - do much much more than I would have thought could be done with such little code. It truly can read in the data, clean it, analyze it, model it, and finally create presentations and applications to share ones project with others. I will use R going forward at every opportunity - which I hope is often because I want to continue in this program as well as work directly in the field once I have my degree. One thing I am very interested to see going forward is how R compares to Python in terms of data science. I know they are the two dominant languages, and I have much less experience with Python. In practice after this course I will put more of an emphasis on writing elegant code. By this I mean, not copying and pasting more than once but rather creating a variable and using that for instance. This course has taught me that oftentimes there is a really good way to do exactly what you have to do, you just have to find the right package and the right function. In practice I will hold out for exactly the right tool, or function, for the job because oftentimes there is some package that has a function that does just what you need it to without much modification. Prior to this class I would often try to brute force my way to the solution of a problem, like by manually mapping a bunch of variables rather than referencing a table, or by trying to modify a function to do something it isn’t meant to rather than searching for just the right function. In practice going forward I will try to get the most bang out of every line of code that I write.

Written on July 29, 2021