ARCHIVE: Datacamp December

Posted on November 28, 2017

Since my last post, I’ve shift to the Engineering team at my company, and begun work as a Data Analyst. I’m thoroughly enjoying it, am growing in my ability to write efficient Standard SQL queries, and expect to grow most by learning to effectively visualize and present information. However, I need to keep learning on my own time, and Datacamp’s Python Data Scientist track is my final project for 2017.

By the end of December, I will finish Datacamp’s Data Scientist with Python career track. I’ve seen their platform evolve since first starting up Data Education DC in 2014, have dabbled with Python in the past, and am convinced it’s the right language for building stuff with machine learning and solving problems I care about.

To hold myself to this goal, I will post every day on my progress. I’m motivated by Scott Young’s MIT Challenge, my own progress at times when I’ve told others what I’m doing, and want to create some structure that I can use for future projects.

My schedule requires 2 days per course and I’ll aim for 1 day per course, to finish all 20 courses by December 31st. This is just beyond my comfort zone. I’ve got a full time job, the Holiday Season and family commitments are coming up, and I’m learning Chinese at the same time. So, there’s enough time to hit this pace, though it’s certainly not going to be easy.

After thinking through the short term and long term usefulness of R and Python, I’ve decided to leapfrog to the PyData stack. Though I love the R Tidyverse and thoroughly enjoyed Hadley Wickham’s R for Data Science, it became increasingly clear that I need to get used to working in production environments. R was great for data analysis and presentation, but in my own experience and that of my company, it lacks speed and scalability for production software. I ultimately want to deploy machine learning to solve problems I care about, such as translation (Google Translate is amazingly improved, I just tried it after 2 years), transportation, diagnosis and behavioral change, and making learning more engaging. R can help me prototype or analyze data, but it cannot help me solve such problems at any scale that matters.

January 2018 Update: My pace stalled right after this, as new priorities and opportunities suddenly came my way. Still, I’m on track to finish all 20 courses of Python: Data Scientist track by Feb 16th. I’m going for earlier, so I can do a project too. Right now, I’m leaning towards this Kaggle challenge:  https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge

I want to be just as comfortable in Python as I am in R. No…even more so!