Visualizing my second semester in Berkeley’s data science master’s program

W203 (Statistics for Data Science)

Richard Mathews II
3 min readAug 13, 2022

Last week, I wrapped up my second semester in Berkeley’s MIDS program. I took W203 Statistics for Data Science, and it suffices to say the class was more challenging than W200 and W201 combined, but I enjoyed it!

Now it’s time to recap the semester with the same visualizations I generated for my first semester.

Knowledge

One of the neat things about using Obsidian for notetaking is that I can watch my “second brain” grow throughout the program. Upon completion of W203, I have a new cluster of nodes in my graph, shown below. I also annotated the nodes from my first semester–W200 and W201. The golden nodes are W203 course content notes, but they are interlinked with a lot of the concepts we learned (probability theory, estimation theory, maximum likelihood, etc.). In fact, that entire cluster of nodes circled by the W203 ellipse did not exist prior to this class.

My Obsidian knowledge graph with color-coded course notes (W200=blue, W201=red, W203=gold)

I also visualized the local graph to see how the different concepts are related. You can see the same graph view for W200 and W201 in my first post.

My Obsidian local graph for W203 (blue nodes are course modules and textbook chapters)

Words

I ran a python script over all my notes for the class and generated a word cloud. The most common words were “data”, “random variable”, “probability”, “distribution”, and “model.”

Word cloud generated using text from my W203 notes

Time

I tracked the time I spent watching asynchronous material, working on assignments, taking exams, and attending live sessions throughout the semester. I plotted this time series on a daily and weekly level (below). The weekly times series is an interesting plot because it clearly shows the first half of the course is more intense, which is what every student hears going in! The first half focuses on theory, where you solve tough math problems and write proofs, and the second half is more practical with coding projects. After the second exam (highest point in the bottom plot), the time I spent only declined from that point forward.

Time-series plots for the time I spent on W203

Conclusion

I hope these visualizations help upcoming/prospective MIDS students better understand the time commitment and course content for W203. I plan on continuing these visualizations throughout my MIDS journey and posting them here on my Medium channel. If you want to get in contact with me to learn about my experience with Berkeley’s MIDS program, reach out to me on LinkedIn or through my personal website.

The code used to create my word cloud and time series visuals can be found here.

--

--

Richard Mathews II

Applied AI scientist, graduate student @ Berkeley, and biohacker. Interested in meta-learning, systems, AI, and data-driven lifestyles.