A Visual Retrospective of Berkeley’s Machine Learning Class

Taking a closer look at W207 Applied Machine Learning

Richard Mathews II
5 min readMay 7, 2023
Structuring my knowledge graph… colored nodes indicate notes for grad school

In my fourth semester in Berkeley’s data science master’s program, I took applied machine learning (W207). The course was updated in 2022 to focus more on neural network topologies, including the transformer architecture and attention mechanism, which translated to a slight bias toward text processing.

In this blog post, I’m continuing my series on visualizing the learning experience in Berkeley’s graduate program in data science. The posts thus far are listed here…

Knowledge

The knowledge graph below shows how the scope of the class compared to my other courses:

My entire Obsidian knowledge graph, with my course content, color-coded

As you can see, the scope of the class was larger than the Python and data engineering classes and about the same size as my statistics class. It’s tucked away in the technical hemisphere of my second brain, almost as a bridge between statistics and data engineering. This makes sense because we covered a lot of ideas around structuring data, feature engineering, and data cleaning, which are touched on in data engineering, and how we can feed prepared data into machine learning algorithms, many of which are grounded in statistical theory.

Let’s take a look at the course's local graph.

My Obsidian local graph for W207 (blue nodes are course modules)

There is a natural split between classical machine learning and neural networks. Neural network-heavy content consumes the right and bottom regions of the graph, covering topics like:

  • Neural Network Design
  • Convolutional Neural Networks
  • Sequence-to-Sequence
  • Embeddings
  • Recurrent Neural Networks
  • Transformer Architecture
  • Attention Mechanism
  • Large Language Models
  • BERT

The course was taught through a combination of lectures, coding assignments, and a group project. We were given datasets to work with and had to apply the techniques we learned in class to solve real-world problems.

Concepts

Now it’s time for the handy word cloud, which is super easy to run since I tag all my relevant course notes in Obsidian with #mids/w207. All it takes is a quick filter and a word cloud algorithm.

Some of the dominant concepts are not surprising–model, input, feature, data, example, vector, and function.

But check out that big word “token.” As I mentioned earlier, there was a lot more emphasis on NLP, prepping text data, and building models for tasks like sentiment analysis. I also see the concepts “sequence”, “embedding”, “representation”, “translation”, and “text”, all related to NLP.

Word cloud generated using text from my notes

Time

Personally, the course was not as time intensive as Statistics but was more difficult than Python, Data Engineering, and Research Design. My time spent on this course should be contextualized with the fact that I have been a data science practitioner for a few years and have learned machine learning concepts prior to the class, so much of this class was simply refresher material.

Time-series plots for the time I spent on W207 throughout the semester

I also timed the course differently, sprinting ahead on all of the asynchronous content and assignments. This shows up as a significant spike in effort spent in the middle of the course. The course might look like a lighter load in the second half, but that is just how I timed it.

Group Project

For my class project, I developed a system that provides feedback to Flipkart sellers from their customer complaints and praise, as well as recommendations for product improvement. To accomplish this, I used the Latent Dirichlet Allocation (LDA) algorithm to identify the dominant themes in positive and negative customer reviews and feed these patterns into GPT to produce product improvement recommendations.

I chose this approach over sending all reviews to GPT because it is much more cost-effective. The reduction in token count achieved through the use of LDA is an important demonstration of how data scientists can use preprocessing and token reduction methods to slash costs and latency in scaled solutions involving LLMs. Overall, this project highlights the importance of considering the costs and scalability of solutions when working with NLP problems in data science.

The project repo can be found here.

Conclusion

W207 was one of the more enjoyable classes I have taken so far. The updated focus on neural network topologies was particularly useful since that is where a lot of the trends are.

I plan on continuing these visualizations throughout my MIDS journey and posting them here on my Medium channel. I am excited to take Berkeley’s “landmark” course, Natural Language Processing, this summer, which I hear was also updated in 2022 to factor in the recent innovations in transformer architectures and large language models. Stay tuned for those visualizations…

If you want to get in contact with me to learn about my experience with Berkeley’s MIDS program, reach out to me on LinkedIn or through my personal website.

The code used to create my word cloud and time series visuals can be found here.

--

--

Richard Mathews II

Applied AI scientist, graduate student @ Berkeley, and biohacker. Interested in meta-learning, systems, AI, and data-driven lifestyles.