Imperfections Unveiled: The Intriguing Reality Behind Our MLOps Course Creation

THE FULL STACK 7-STEPS MLOPS FRAMEWORK

This article represents a last bonus lesson out of a 7-lesson course that walked you step-by-step through how to design, implement, and deploy an ML system using MLOps good practices. During the course, you built a production-ready model to forecast energy consumption levels for the next 24 hours across multiple consumer types from Denmark.

During the course, you learned all the fundamentals of designing, coding and deploying an ML system using a batch-serving architecture.

This course targets mid/advanced ML or software engineers who want to level up their skills by building their own ML end-to-end projects.

Nowadays, certificates are everywhere. Building advanced end-to-end projects that you can later show off is the best way to get recognition as a professional engineer.

Course Introduction

During the 7 lessons course, you learned how to:

design a batch-serving architecture
use Hopsworks as a feature store
design a feature engineering pipeline that reads data from an API
build a training pipeline with hyper-parameter tunning
use W&B as an ML Platform to track your experiments, models, and metadata
implement a batch prediction pipeline
use Poetry to build your own Python packages
deploy your own private PyPi server
orchestrate everything with Airflow
use the predictions to code a web app using FastAPI and Streamlit
use Docker to containerize your code
use Great Expectations to ensure data validation and integrity
monitor the performance of the predictions over time
deploy everything to GCP
build a CI/CD pipeline using GitHub Actions

If you haven’t followed the series and it sounds like something you are interested in, I want to let you know that after completing the course, you will understand everything I said before. Most importantly, you will see WHY I used all these tools and how they work together as a system.

If you want to get the most out of this course, I suggest you access the GitHub repository containing all the lessons’ code. This course is designed to quickly read and replicate the code along the articles.

During the course, you learned how to implement the diagram below. After explaining it step-by-step, it doesn’t sound so scary anymore, right?

Diagram of the architecture built during the course [Image by the Author].

In this final bonus lesson, we want to talk about potential improvements that can be made to the current architecture and design choices made during the course. We also want to highlight the trade-offs we had to make and give you some ideas for future projects.

Think of it as the behind of scenes section 👀

Course Lessons:

Batch Serving. Feature Stores. Feature Engineering Pipelines.
Training Pipelines. ML Platforms. Hyperparameter Tuning.
Batch Prediction Pipeline. Package Python Modules with Poetry.
Private PyPi Server. Orchestrate Everything with Airflow.
Data Validation for Quality and Integrity using GE. Model Performance Continuous Monitoring.
Consume and Visualize your Model’s Predictions using FastAPI and Streamlit. Dockerize Everything.
Deploy All the ML Components to GCP. Build a CI/CD Pipeline Using Github Actions.
[Bonus] Behind the Scenes of an ‘Imperfect’ ML Project – Lessons and Insights

The bonus lesson will openly share the course’s trade-offs, design choices, and potential improvements.

Thus, we strongly encourage you to read the rest of the course if building production-ready ML systems interest you 👇

A Framework for Building a Production-Ready Feature Engineering Pipeline

Data Source

We used a free & open API that provides hourly energy consumption values for all the energy consumer types within Denmark [1].

They provide an intuitive interface where you can easily query and visualize the data. You can access the data here [1].

The data has 4 main attributes:

Hour UTC: the UTC datetime when the data point was observed.
Price Area: Denmark is divided into two price areas: DK1 and DK2 – divided by the Great Belt. DK1 is west of the Great Belt, and DK2 is east of the Great Belt.
Consumer Type: The consumer type is the Industry Code DE35, owned and maintained by Danish Energy.
Total Consumption: Total electricity consumption in kWh

Note: The observations have a lag of 15 days! But for our demo use case, that is not a problem, as we can simulate the same steps as it would in real-time.

A screenshot from our web app showing how we forecasted the energy consumption for area = 1 and consumer_type = 212 [Image by the Author].

The data points have an hourly resolution. For example: "2023–04–15 21:00Z", "2023–04–15 20:00Z", "2023–04–15 19:00Z", etc.

We will model the data as multiple time series. Each unique price area and consumer type tuple represents its unique time series.

Thus, we will build a model that independently forecasts the energy consumption for the next 24 hours for every time series.

Check out the video below to better understand what the data looks like 👇

Bonus Lesson: Behind the Scenes of an ‘Imperfect’ ML Project – Lessons and Insights

No more chit-chat. Let’s jump directly behind the scenes 🔥

Overall Code

#1. Duplicated code

The big elephant in the room is that we had quite a lot of duplicated code between different Python modules, which doesn’t respect the almighty DRY principle.

Such as the settings.py and utils.py files of the ML pipelines or the UI’s dropdown + line plot component.

This code could be refactored into a common module shared across all other modules.

#2. Classes, not functions!

Modeling your code using classes is good practice, but we used only functions during the course.

We could have created a central class for every pipeline, such as FeaturesExtractor, Trainer, and BatchPredictor.

Also, instead of returning plain dictionaries containing the metadata of a run, we could have created a RunResult class to have more control over how the data is passed.

#3. No tests 😟

Any code base is better with a bunch of unit & integration tests to validate all the changes done to the code.

Pipeline Design

#1. The DAG has state

Because the DAG has a state, it isn’t very easy to run it in parallel. Our issue is that you need predictions from previous runs to compute the monitoring metrics.

Thus, by definition, you can’t run multiple parallel instances of the same DAG at different points in time.

When does this become a problem?

When backfilling. Let’s say you want to backfill every hour for the latest 2 months. If you run the program sequentially, it will take forever.

As a solution, we would suggest moving the monitoring component to a different DAG.

#2. Avoid using ":latest" as your resources version

If you use the ":latest" tag to access resources such as:

the model’s artifact,
data (Feature Store feature view or training dataset),
the best configuration artifact, etc.

… you introduce dependencies between multiple runs of the ML pipeline.

It is subtle, but let me explain 👇

Let’s say you run in parallel 2 ML pipelines: A & B. Pipeline A first generates a new dataset version. Then for any reason, pipeline B starts the training pipeline before pipeline A and accesses the "latest" dataset version, which is created by pipeline A.

This is also known as a "race condition" in parallel computing.

In this case, it can easily be solved by hardcoding the version of the resources between tasks of the same pipeline.

For example, instead of accessing "dataset:latest", access "dataset:v3".

As you saw, speed is crucial when it comes to backfilling. Thus, running a DAG in parallel is essential in the long run.

Airflow

#1. Use Docker Tasks

This is not necessarily an issue, but I wanted to highlight that instead of using a Python environment, you could have also shipped your code inside a Docker container – Task Docker Decorator Docs [2].

The primary benefit is that this makes the system more scalable.

But the good news is that the process learned during the course is extremely similar. Instead of pushing the Python package to a PyPi registry, you push a Docker image to a Docker registry.

#2. Smaller, Atomic Tasks

In our case, the Tasks inside the DAG contain a lot of logic. They basically run an entire application.

This is not necessarily bad, but dividing it into smaller pieces is good practice. Thus, debugging, monitoring, and restarting the DAG from a given point of failure is more accessible.

For example, instead of reading/writing data from GCS in Python, we could have used one of Airflow’s GCS operators – GCS Airflow operators [3].

#3. Inject the hyperparameter tuning settings from Airflow

Currently, the hyperparameter tuning setting is hardcoded into a _configs/gridsearch.py_ file.

This is not very flexible, as the only option to modify the configuration is to push a new version to git, which isn’t very practical.

A solution would be injecting the settings from a YAML file, which can be easily added to the Airflow workflow.

A nice YAML configuration tool for ML models is Hydra by facebookresearch. Just try it, you will thank me later.

Monitoring

#1. Not monitoring the system’s health

We could have easily added a system health monitor mechanism by periodically pinging the /health endpoint of the API.

We could have reflected this with a green/red panel on the UI.

#2. No alerts

Based on the MAPE metric we are constantly monitoring, we could have added a system of alerts such as:

_warnings [threshold_B > MAPE > thresholdA]: inform the engineer that something might be wrong;
_alarms [MAPE > threshold_B > thresholdA]: inform the engineer that something is wrong + trigger the hyperparameter tuning logic.

#3. Enrich the UI

We could have added the MAPE metric for every time series individually.

#4. Don’t reinvent the wheel!

We implemented a mini-monitoring tool just as an example. But in a real-world scenario, you should leverage existing tools such as EvidentlyAI or Arize.

These tools & packages already give you professional solutions. Thus you can focus on adding value.

#5. Monitor drifts

As a nice to have, it would also be helpful to monitor data & concept drifts. But as we have almost real-time GT, this is just nice to have.

Web App – Predictions Dashboard

#1. Enrich the UI

The UI is quite basic. For example, we could have enriched the UI by adding texts and alerts when the data is invalid (it doesn’t pass the validation suit).

#2. We request data in a naive way

Out requests to the API are quite naive. Usually, those steps are guarded by a set of Exceptions that catch different behaviors on 300, 400, and 500 response codes.

#3. The settings are hardcoded

We could have injected the settings using a .env file. Thus making the program more configurable. Similar to the Web App FastAPI code.

Deploy & CI/CD Pipeline

#1. Partially CI implementation

To fully complete the CI/CD pipeline, we could have built the Web App Docker images, pushed them to a Docker registry and pulled them from there when deploying them to the GCP VM.

Same story when building the Python packages with Poetry.

This is how it is done by the book.

Also, if we had any tests, we should have run them before deploying the code.

Another idea is to run commands such as flake8 to verify that the code is written using PEP8 conventions.

#2. Host PyPi on a different VM

Also, hosting the PyPi server on a different VM or at least completely independently from the Airflow component would be recommended.

In doing so, the system would have been more modular and flexible.

#3. Host the .env files on a GCS bucket

Instead of manually completing and copying the .env files, we could have stored them on a GCS bucket and automatically downloaded them inside the CI/CD pipeline.

#4. Automate the infrastructure

You have seen how boring it is to set up all the GCP resources you need manually… and this is a small infrastructure. Imagine how it is when you have 100 or 1000 components inside your infrastructure.

That is why it is suggested to automate the creation of your infrastructure with IoC tools such as Terraform.

Conclusion

If you got this far, I want to thank you and tell you how deeply I appreciate that you followed my Full Stack 7-Steps MLOps Framework course 🙏

In this bonus lesson, you saw that no system is perfect and you always have to take certain trade-offs due to:

time constraints,
resource constraints,
bad planning.

Now that you see that everyone is still Learning and doesn’t know everything, you have no excuse but to go and build your next awesome project 🔥

Let’s connect on LinkedIn, and let me know if you have any questions or just share the awesome projects you built after this course.

Access the GitHub repository here.

💡 My goal is to help Machine Learning engineers level up in designing and productionizing ML systems. Follow me on LinkedIn or subscribe to my weekly newsletter for more insights!

🔥 If you enjoy reading articles like this and wish to support my writing, consider becoming a Medium member. Using my referral link, you can support me without extra cost while enjoying limitless access to Medium’s rich collection of stories.