Datacamp done

After finishing the Intro to Data Science and Intermediate Data Science courses over the past 2 months, I’ve come to realize that Data Science isn’t the path for me right now. I find Machine Learning fascinating, programming satisfying when I get it right, but want to use them for personal projects. Neither these tracks, nor any other online learning I’ve tried, get you to building ML applications or doing great DS analyses. They can help, but to do Machine Learning or Data Science, you need to actually do Machine Learning or Data Science – to build applications, do analyses, and other public projects. I haven’t done this, but going forward I will. Less classes, more projects. Practice, then coverage.

Long story short, these 2 courses were great, but no longer align with my goals. And, for any further work in Machine Learning and Data Science, I need to do projects, not classes.

ARCHIVE: Datacamp December: Day 3

Posted on December 1, 2017
This is the last day of my “soft launch” since tomorrow will be December, and I’ve just finished Intro to Python for Data Science. However, I’ve also realized I have to make some tradeoffs. There are only so many hours per day and only so many priorities that I can handle in one go. When in doubt or overwhelmed, I cull my inputs and focus on my priorities. These things are priority:

  1. Do a great job at work
  2. Finish the Python Data Scientist track by end of December
  3. Personal growth: learn to speak basic Chinese with family this month and get back into a weightlifting routine.

Most importantly, some things are NOT priority and should be cut:

-Coffee, drinks, or other social gathering with people I already see regularly.  In general, coffee or drinks are primarily about gossip. I’ll only go to gatherings with a specific purpose, like dinner, seeing a movie, a concert, a birthday, etc. If it’s a specific person I want to meet, I’ll meet them 1:1 for a walk, tea, or food. Catching up with friends you’ve not seen in a while is good for the soul 🙂 but for people you already see and talk to regularly, it’s superfluous.

-Generic events: Avoid them like the plague, they are nothing but distraction. Same for happy hours or company events. I need to buckle down and learn by doing, not merely talk about Python, Data Science, or make small talk. Focus.

-Buying stuff: Don’t! You have plenty of things, people in your life will care more to see you or hear from you than get stuff from you, and you need to save money right now so you can start 2018 on a financially firm footing.

-Worrying: Would it help?  Nope. Just do the thing you want to do, something that’s important if you can’t think anything, or workout if lost. Don’t worry.


Progress update: I’m still not done with the first course (Intro to Python for Data Science), but will make sure to finish it before end of day tomorrow. My estimate on pacing isn’t holding up, this is important enough that  I need to make it routine. I should have finished Intro and Intermediate by today, but I’m going to barely finish Intro. The only way to bridge this gap going forward is to make it a daily activity that I do first thing, every morning. I don’t have the discipline in the evening after work and perhaps my brain needs some down time (or gym) then.

Starting tomorrow (December 1st Friday), I will do Python and Chinese first thing in the morning. If it works, I’ll do this every morning Monday – Friday and dedicate a floating 2 hour block on Sunday as well.


January 2018 Update: I made tremendous progress on Chinese, such that I was able to talk to my girlfriend’s family in a functional enough way. There’s still miles to go, and a whole lifetime’s  worth to learn, but I felt I finally “got” how to pronounce Chinese, how the language builds on itself, and am not totally at a loss in conversations. As mentioned in previous archive posts though, Datacamp December has moved to January and February. I’m now on track to finish all 20 courses and a project by Feb 16th. Full steam ahead!

ARCHIVE: Datacamp December: Day 2

Posted on November 30, 2017

The last 2 days can be considered a soft launch of sorts. As I’ve read Scott Young’s advice on ultra learning projects, I’m realizing it’s inevitable that I’ll need to adjust my expectations, pace, or technique. So, I’m treating this week as a soft launch, where I work as planned to make progress and make major changes to my regimen as early as possible. This will be useful if it helps me better specify my goals, identify habits that work, and above all cut out the bad habits, techniques, or assumptions before I’m too far along. After all, it’s not actually December yet :p

January 2018 Update: That inevitable adjustment has pushed me to restart in January, after freeing mind space and time with a Digital Declutter led by Cal Newport through his newsletter. I’m now on track to finish all 20 courses by February 16th or earlier. However, it’s certainly worth revisiting Scott’s advice as I’m bound to hit roadblocks anew. This time though, I’m determined to finish and learn this well. I’ll be as comfortable in Python as I am in R and SQL, or more so. And, I’ll reach this state by finishing all 20 courses and a Kaggle competition or project of my choice by Feb 16th, or earlier. Do what it takes.

ARCHIVE: Datacamp December: Day 1

Posted on November 29, 2017

Today, I finished 1/3 of the Intro to Python for Data Science course. Perhaps even more importantly, I wrote my first Airflow job and made my first pull request and commit as a member of the engineering team at my company. My progress is less than I hoped, but I’m still on track to finish this course by end of tomorrow. The key to doing so is discipline, and that comes from being present, having a structure that defaults to my goals, and managing my energy levels.

At several points in the day I was distracted by online articles, ad-hoc requests or Slack messages from colleagues in other departments (this kills an engineer’s productivity, yet may be a natural tendency of non-engineering colleagues), or my own thoughts about random topics, work and otherwise. I need to be present each day in the morning, at the office, and in the evening. Only then can I do the deep work that is learning a new programming language and skill. However, I have past templates for how to do this in Scott Young’s MIT Challenge, Cal Newport’s book Deep Work, and my own work schedule in grad school. Moreover, I’m working at a company, in a culture, and with colleagues who support my learning, growth, and the depth this requires.

Today I learned:  

  1. When I sit down to work, I need to be present. Tomorrow, I’ll stay focused only on that specific thing I’m doing, not even thinking about future or unrelated matters, let alone worrying, planning, analyzing, or calculating things beyond what’s in front of me. No coffee, no browsing, no Slack when I’m inside a work block.
  2. My daily structure must reach my goals by default. My goals are to do best in class work in my new job, learn Python for Data Science (this Datacamp December challenge), and learn enough Chinese to talk to family over the Holidays. This will not happen unless every day is organized around these goals, I push myself when I’m at my best (I’m a morning person, recharged by exercise, and need a good night’s sleep), and I know that each day, hour, and minute clearly maps to the goal. Long story short, I’ve set up a new daily routine and will pilot it from tomorrow onwards.
  3. Managing my energy level matters is the most important thing. Today, I was distracted in the morning and saw my energy and motivation dissipate, while negative thoughts have had the same effect in the past. In the afternoon, an unexpected coffee gave me plenty of energy but also caused me to stay up much later than I had planned, to the point that I’m writing this post after midnight, as my mind wanders and procrastinates. I need to save my greatest energy for my mornings and ensure that I have steady energy at other times of day, so that I’m as positive, present, and focused as can be when I work. All nighters, late nights, and any time wasted on the computer are a result of not maintaining or channeling my energy properly.

January 2018 Update: Not much to say, except that this stalled in early December with new priorities at work, learning Airflow, Postgres SQL, and PostGIS, and other new opportunities that came my way. I’m back at it now, to finish all 20 Python Data Scientist track courses by Feb 2016. Or earlier. 

ARCHIVE: Datacamp December

Posted on November 28, 2017

Since my last post, I’ve shift to the Engineering team at my company, and begun work as a Data Analyst. I’m thoroughly enjoying it, am growing in my ability to write efficient Standard SQL queries, and expect to grow most by learning to effectively visualize and present information. However, I need to keep learning on my own time, and Datacamp’s Python Data Scientist track is my final project for 2017.

By the end of December, I will finish Datacamp’s Data Scientist with Python career track. I’ve seen their platform evolve since first starting up Data Education DC in 2014, have dabbled with Python in the past, and am convinced it’s the right language for building stuff with machine learning and solving problems I care about.

To hold myself to this goal, I will post every day on my progress. I’m motivated by Scott Young’s MIT Challenge, my own progress at times when I’ve told others what I’m doing, and want to create some structure that I can use for future projects.

My schedule requires 2 days per course and I’ll aim for 1 day per course, to finish all 20 courses by December 31st. This is just beyond my comfort zone. I’ve got a full time job, the Holiday Season and family commitments are coming up, and I’m learning Chinese at the same time. So, there’s enough time to hit this pace, though it’s certainly not going to be easy.

After thinking through the short term and long term usefulness of R and Python, I’ve decided to leapfrog to the PyData stack. Though I love the R Tidyverse and thoroughly enjoyed Hadley Wickham’s R for Data Science, it became increasingly clear that I need to get used to working in production environments. R was great for data analysis and presentation, but in my own experience and that of my company, it lacks speed and scalability for production software. I ultimately want to deploy machine learning to solve problems I care about, such as translation (Google Translate is amazingly improved, I just tried it after 2 years), transportation, diagnosis and behavioral change, and making learning more engaging. R can help me prototype or analyze data, but it cannot help me solve such problems at any scale that matters.

January 2018 Update: My pace stalled right after this, as new priorities and opportunities suddenly came my way. Still, I’m on track to finish all 20 courses of Python: Data Scientist track by Feb 16th. I’m going for earlier, so I can do a project too. Right now, I’m leaning towards this Kaggle challenge:

I want to be just as comfortable in Python as I am in R. No…even more so! 

ARCHIVE: A week of Data Science and Deep Work

Posted on September 18, 2017

Opportunity Knocks:

Though I had planned a vacation for this week to visit spots old and new in Barcelona, life has taken a turn for the better that I’m eagerly embracing. Instead of Spain, I find myself in Sacramento helping someone I care deeply about to take some important first steps in life. Rather than strolling and sightseeing, I’m diving deep into Data Analysis, Statistics, and SQL in ways that I’ve wanted to for over a year now. My weekly goals are below and I’m treating this as my own “MIT challenge” for the next 7 days, ending with a Data Analyst interview at my company the day afterwards.

Weekly Goals:

  1. Solve 20 Data Science and analysis problems provided by friends, colleagues, and an industry-leading guide. Deep work and true learning requires the practice of solving real-world problems, not merely doing toy problems from textbooks, and certainly not listening to lectures.
  2. Bridge specific gaps to solving these real-world problems by covering the relevant sections of Introduction to Statistical Learning in R and R for Data Science. I’ll start with exercises, then concepts and examples, and cover full chapters as needed. This will be more effective use of time and more targeted learning than simply following a set syllabus, reading, and then leaving the real work of exercises (or solving new problems!) for the end.
  3. Master SQL and Periscope, by doing all training cases in the Mode Analytics tutorial and publishing 3 new dashboards and 1 view in Periscope. I start each day working on SQL exercises, problems, or real-world requests.


In support of these weekly goals, I’m setting daily goals the night before each day. I’ll need to solve 3 problems most days, complete 3 sections of about equal length in the Mode SQL Tutorial, and still have time for diving into exercises, examples, and concepts in  Today’s goals, for Sunday 9/17/2017, are to benchmark my speed of work and inform goals for the week.

Daily Goals for today:

  1. Redo take home problem #1 (Conversion), applying best practices from prior work I did in grad school on this Driven Data challenge.
  2. All exercises for Chapters 1 and 2 of ISLR. Start them for Chapter 4 as well if needed. Coverage (reading) is only to bridge gaps encountered in practice (exercises and examples), and always comes after.
  3. Mode SQL Basics 100% done, and Intermediate exercises begun. I use SQL daily, do complex queries weekly, and must become even better.

Do what it takes.


January 2018 Update: Not much to add here, except that I’m glad I took this time to refresh my statistical thinking, R, SQL, and problem solving skills. It helped tremendously for getting the Data Analyst role at Premise and has been a foundation for me to continue learning ever since.

ARCHIVE: Do what it takes.

Posted on August 27, 2017

It’s been some time since I wrote, but I’m finally diving into new and interesting problems that matter to me, and realizing I need to bring my analytical skills up to par quickly. I have to not only do projects using R and learn it more deeply than I ever did before, but I also need to learn to think from first principles again. The only way to learn this is by doing, i.e. by working on problems I care about. The stakes are real, my time is limited, but my ability to focus, imagine, and iterate are boundless if I’m willing to harness them.

Every day at work, I’ve got a chance to learn Project Management in a rapidly-changing startup environment, unlike any other I’ve been in. Let’s seize that, making the most of my work hours to create the project structure and tracking we need to succeed. It’s chaotic and requires re-working of processes and tools, but these iterations are also a chance to improve in meaningful ways. I can get better at anticipating challenges our projects will encounter, the needs of Growth, Product, Data Science, and Engineering, at quickly understanding our new Product vision and building the structure to support it, and ramp up on tools like JIRA and ways to share progress with an entire organization.

Further, we have an increasingly interesting data set at work and I have time and motivation of my own to develop my analytical skills. I’m convinced that rare and valuable skills are the surest currency, that focus is the first such skill, that deep work is the best means to hone such skills, and that I’ve found and will find most meaning in doing such work. No more distractions, no more casting about for optionality in my work or personal life – let’s keep life simple, and do what it takes.

January 2018 Notes: Project management came to a quite end due to internal politics at Premise, but after exploring multiple options, I found a Data Analyst role in our Engineering Insights team that met my needs. The role has gotten me much more exposure to Data Engineering work as well (e.g. writing Airflow ETL pipelines), with an emphasis on exactly the engineering workflow, coding best practices, and programming approach I lack, rather than Data Science’s usual emphasis on statistical methods that I’ve already seen. It’s been great training, to bridge the gaps and take my career to even greater heights.

ARCHIVE: R for Data Science

Posted on April 30, 2017

Diving into this book, and finished Part 1 today.

January 2018 Update: I did not finish the book, though it is excellent.  It’s my first reference for any R scripts I write or review. Moreover, I’ve realized that to be successful as a Data Scientist and build products that scale, I need to master Python, not R. Python is a much more flexible language for analysis, production software, websites, and just about anything I’d care to do. If I’m ever to start my own business or automate routine tasks in a systematic way, i.e. if I’m ever to be able to solve problems with programming, Python’s my best path to doing so. R remains super useful for quick data analysis though, is especially strong for visualization, and I’ve done enough in it to feel confident.

ARCHIVE: La France Profonde

Posted on April 30, 2017

More to come, but had a great trip back to France and a first real visit to Paris, complete with wandering and getting lost, good food and memories, and great friendships made.

January 2018 NOTES: Alex and I had an amazing time with Les Chaussois, including with my host brother Quentin in Paris. We also got to explore Paris and Mont St. Michel on our own, making memorable friendships, getting to know the city on our own terms, and more. I don’t think I’ll have a chance to return soon unfortunately, but I finally understood why so many love Paris and why France first stole my heart nearly 10 years ago when I was struggling with a new language, making new friends, falling in love, and discovering new ways to see the world and make the most of life. As I take a break from distracting digital technologies between Xmas (Dec 25th) and Chinese New Year (Feb 16th), I have and will use the time to keep in touch. Friendships matter.