The world’s leading publication for data science, AI, and ML professionals.

How I Created a Kaggle-Like Platform for My Students Using Streamlit and How You Can Do It as Well

Gamify machine learning student projects with Streamlit and Google Sheets

Rankings and points can be a great source of motivation (Image created by DALLE-3)
Rankings and points can be a great source of motivation (Image created by DALLE-3)

I love Kaggle and believe its contribution to disseminating Data Science and machine learning has been invaluable. As a fan of video games and gamification, I appreciate how Kaggle’s ranking and points system encourages participants to improve their models through healthy, constructive competition. Kaggle has become so popular, that many instructors have incorporated Kaggle as part of their preferred tools to teach machine learning.

However, as a business school instructor who teaches some machine learning courses, I’ve found that using Kaggle as a tool to evaluate my students’ final machine learning projects a little bit cumbersome. For starters, tracking students submissions is tedious and manual, and my students (please note that most of them are beginners to data science and programming in general), often find it discouraging to see the result of their efforts ranked at the bottom of the Kaggle ranking. Having said this, I think that it is important to acknowledge that Kaggle wasn’t designed as a teaching tool, which explains its limitations in this context.

I’ve always dreamed of creating a mini version of Kaggle tailored for my students. This platform would allow me to replicate Kaggle’s Gamification success and serve as a template for various subjects, including mathematical programming and combinatorial optimization. Initially, I felt discouraged by the effort required to build such a platform using typical web development frameworks available in python link Django or Flask.

Thankfully, I recently discovered Streamlit and its ability to interact with Google Sheets. In this article, I’ll show you how to use Python, Streamlit, and Google Sheets to build a Kaggle-like web application to gamify your lessons. This app will let students log in with individual accounts, submit their solutions via CSV file uploads, score their solutions based on various machine learning metrics, and dynamically rank the submissions. Best of all, I’ll explain how you can deploy this app for free.

Are you ready for some hands-on learning? Let’s have a glimpse at our final app results…

Figure 1. App deployed and in use (Figure created by the author)
Figure 1. App deployed and in use (Figure created by the author)

Please note that this article might be lengthy. I aim to provide as much detail as possible, as I believe it could be beneficial for many teachers and professors who may not be as fluent in Python as a typical data science expert. If you are already a Python expert, you might prefer to skip the article and jump directly to the project’s GitHub repository below:

GitHub – ceche1212/project_medium_kaggle_app: Repository for code for a kaggle like app that can be…

In the original project that I implemented with my students, the app featured three different machine learning sections: one for a regression problem, one for a binary classification problem, and one for a time series forecasting problem. For simplicity, this tutorial will focus on just one: a binary classification problem using the famous Pima diabetes dataset from the UC Irvine Machine Learning Repository. This dataset can also be downloaded from Kaggle.

Article index:

Streamlit and Google sheets

Since version 1.28 in 2023, Streamlit allows users to connect to Google Sheets using their st.connection method. This method enables users to perform CRUD (create, read, update, and delete) operations using Google Sheets as a database. I learned about these features from a YouTube video made by Sven | Coding Is Fun. If you want to watch it, I’m leaving the link below.

I know what you’re thinking, but before you get scared, hear me out. I understand that Excel (Google sheets) is not a database, and I agree with you. I also have nightmares about dealing with companies that use it as one. However, for the app we want to create, it’s more than enough. It allows us to do everything we need, it’s available online, it’s private (since only we and the app can access it – unless someone is skilled enough to hack my Google account), and best of all, it’s free. I do recognize this is an area for improvement, so I’m exploring the possibility of replacing Google Sheets with a connection to Supabase.

Figure 2. Excel is not a database, unless... (Image created by the author)
Figure 2. Excel is not a database, unless… (Image created by the author)

Before jumping straight into the implementation, it’s worth examining the different modules the app must have, as well as the strategy we will follow for the implementation.

App design

The app we are going to create requires several processes. Firstly, we want to ensure that only our students have access to it, so we need a login system. Once users are logged in, they will need a module to submit their results. This requires a process to upload .csv files, verify that the files comply with the expected number of rows and requested columns, and then calculate an evaluation metric comparing the model predictions against the real test data. After this, the student should be able to see the classroom ranking and how their result compares to their peers. Finally, students should be able to see a log of all their submissions and their teammates’ submissions, as this is a group project. Figure 3 showcases an overall view of the User Flow Diagram for our app.

Figure 3. User flow diagram of the app (Image created by the author)
Figure 3. User flow diagram of the app (Image created by the author)

App implementation

For this app, we will use Visual Studio Code. I strongly recommend creating a new project folder on your machine and opening Visual Studio Code from that folder. In my case, I decided to name the folder project_app_medium.

Setting up the project environment

I strongly recommend creating a virtual environment for each Streamlit app you make to avoid dependency conflicts with your other Python projects. After creating and activating your virtual environment, we need to install the following libraries.

pandas == 1.5.3
numpy == 1.26.4
matplotlib
streamlit
streamlit_option_menu
streamlit-extras
st-gsheets-connection
scikit-learn

To install the libraries, create a new text file and name it requirements.txt. Inside this blank text file, copy the libraries listed above and save it; we will need this file for the deployment of our app. Then, in the terminal, type the following command.

pip install -r requirements.txt

This command will install all the libraries listed in the requirements.txt file. Regarding the libraries we are using, we’ll start with the classic "Mirepoix" of all data science projects: numpy, pandas, and matplotlib. We also need streamlit, as this is the base library that contains our framework. To extend the functionality of Streamlit, we will import some community-developed extensions such as streamlit_option_menu, which allows us to create simple sidebar menus, and streamlit-extras, which contains many customizable features. Additionally, we will use st-gsheets-connection to help us connect with Google Sheets. Please note that besides from the libraries above, we are also going to use hashlib for data security and protection. We are going to talk more about this when defining the details of database.

Throughout this tutorial, the following folder structure will be used, containing the following elements:

  • .streamlit: This folder is where general Streamlit settings will be placed. Inside this folder we will store the secrets.toml file which will contain the Google Sheets API credentials needed to stablish a connection.
  • app.py: The main Streamlit script.
  • .gitignore: As its name suggest, files ignored by Git respectively, when making the project commit.
  • logo.png (optional): Is an image that contains the logo of your company, to be displayed in top of the sidebar menu of our app. This is completely optional, in my case I am showing the logo of my company SAVILA GAMES.
  • requirements.txt: Python dependencies to run the application.
  • README.md: Project description.
project_app_medium/
│
├── .streamlit/                # General streamlit configuration
│   └── secrets.toml           # credentials for google sheets connection
├── app.py                     # App code
├── .gitignore
├── logo.png               
├── requirements.txt           # Python dependencies
└── README.md 

Setting up the google sheets database

Now that the environment is set up, we need to create the structure of the Google Sheets database. Open your Google Sheets app and create a new file; I named mine project_database. Then, proceed to create four tabs in the file. First, the "users" tab will contain all the user login credentials as well as the user group configuration. We will use the information from this tab to create the login module of the app. The tab should have the following column structure:

| email                  | name    | last name | password  | group    |
|------------------------|---------|-----------|-----------|----------|
| [email protected]   | John    | Doe       | Pass1234  | G1       |
| [email protected] | Jane    | Smith     | SecurePwd | G2       |
| [email protected] | Alex    | Jones     | MyPass789 | G1       |
| [email protected] | Emma    | Brown     | Emma12345 | G3       |

The next tab is the "log" tab. This tab will store the historical information of the submissions made by the users. It will also be used for the logic behind the ranking and the history submissions module of the app. The tab should have the following column structure:

| user                  | group     | time             | score |
|-----------------------|-----------|------------------|-------|
| john                  | G1        | 2024-06-17 10:00 | 0.85  |
| jane                  | G2        | 2024-06-17 11:00 | 0.72  |
| alex                  | G1        | 2024-06-17 12:00 | 0.90  |
| emma                  | G3        | 2024-06-17 13:00 | 0.65  |

The next tab will be the "test_data" tab. This tab will contain the real test y data used to evaluate the quality of the outputs submitted by the students. For this tutorial, we will split the Pima dataset and select the last 78 rows as the test dataset. The tab will have only one column containing the binary outcome data, with the following column structure:

| y          |
|------------|
| 0          |
| 1          |
| 1          |
| 0          |

The final tab we will create is the ‘configuration’ tab. This tab will contain customizable parameters for the project, such as the deadline and the number of tries allowed per team per day (for reference, Kaggle allows five tries per team per day). This tab will enable us to dynamically change the project characteristics so that it can be easily adapted for use in different semesters. The "configuration" tab should have only one row and the following column structure:

| deadline              | max_per_day     |
|-----------------------|-----------------|
| 2024-07-01 23:59      | 5               |

You can download and view an example of the google sheets used in this tutorial project by clicking on the link below:

example google sheet project medium

Data Privacy and Security

This is a small project that should be under our control and available only to our students. However, we must take precautions regarding data security, especially concerning the private information of our students. Imagine if, for some reason, the database gets leaked and falls into the hands of scammers or malicious entities. Having access to the names, emails, and passwords of our students could put them at risk. Therefore, we must ensure that, even if this happens, the data remains meaningless to these malevolent agents.

You might wonder if we can implement such a system at minimal cost. This is where hashing and the hashlib library come to the rescue.

Hashing is the process of converting input data into a fixed-size string of characters, typically a hash code, using a mathematical algorithm. It ensures data integrity, facilitates quick data retrieval, and securely stores sensitive information. What hashing algorithms are available? Fortunately, Python comes with hashlib, which includes several hashing algorithms:

  • MD5 (md5)
  • SHA-1 (sha1)
  • SHA-224 (sha224)
  • SHA-256 (sha256)
  • SHA-384 (sha384)
  • SHA-512 (sha512)
  • SHA-3 family (sha3_224, sha3_256, sha3_384, sha3_512)
  • BLAKE2 family (blake2b, blake2s)

In our tutorial, we will use one of the available hashing algorithms in hashlib, specifically SHA-256, to transform the emails and passwords of our students into hashed codes. These hashed codes will be stored in the database. This way, even if the database is leaked, their data remains protected. The beauty of hashing lies in the fact that it is impractically hard to reverse the hashed code back to the original information through brute force. For instance, when the emails from the previous section are hashed using SHA-256, they become secure and unreadable.

# original emails

['[email protected]',
 '[email protected]',
 '[email protected]',
 '[email protected]']
# Hashed emails

['836f82db99121b3481011f16b49dfa5fbc714a0d1b1b9f784a1ebbbf5b39577f',
 'f2d1f1c853fd1f4be1eb5060eaae93066c877d069473795e31db5e70c4880859',
 '134318bc6349ad35d7e6b95123898eecdd437ad9b0c49cc4bdd66a811afc6909',
 'd41d9b2f5671358bc6faf79b7435b4a9805a72d012f06d4804815328f39aed1e']

Pretty difficult to decipher, don’t you think? Below, you can find a function that, given a dataframe and a column, returns the hashed items of the specified column. This way you can hashed your database information and then store this values in the online google sheets database.

import hashlib

def hashit(df,column):

  return_list = []
  for data in df[column].tolist():
    hash_object = hashlib.sha256()
    hash_object.update(data.encode())
    return_list.append(hash_object.hexdigest())

  return return_list

You might be thinking, "But Luis, if they have the username and password, they can log in and impersonate one of the students." However, that situation is already covered. The students will log into the app using the emails and passwords provided by us, but they will input their real email and password on the login screen. The app will then take their login credentials, hash them using SHA-256, and verify the hashed output against the database.

So, if the database gets leaked and someone tries to use the information to log in, it won’t work because their input will be hashed again, and it won’t match the stored hash in the database. Let’s look at the code and outputs below as an example.

password = "password"
hash_object.update(password.encode())
hash_password = hash_object.hexdigest()

print(hash_password)

print output:

"5377a16433598554e4a73a61195dbddea9d9956a22df04c3127c698b0dcdee48"

Now if we rehash the already hashed password, as in the code below.

hash_object.update(hash_password.encode())
double_hash_password = hash_object.hexdigest()
print(double_hash_password)

We obtain the following:

"dfd4bb46c954f3802c7c2385b1a6b625b3cf0b4ce6adf59d3eec711c293994bb"

You can easily verify that these two passwords definitely do not match. So you see hashing the previously hashed password produces something completely new.

Establishing google sheets connection

All the instructions required to establish the connection can be found in the GitHub repository of the st-gsheets-connection package. Let’s follow the instructions together:

  • Go to the Google Developers Console and create a new project. Right next to the Google Cloud icon, you will find a drop-down menu. Click on it and then click "Create New Project." Name your project as you wish; in my case, I am calling it project_app_medium.
Figure 4. Google developer console creating project (Image created by the author)
Figure 4. Google developer console creating project (Image created by the author)
  • Now, with the project selected, we need to activate two different APIs: Google Drive and Google Sheets. In the search bar at the top of the page, type "Google Drive," select the API, and then click on "Enable." Repeat the same steps for Google Sheets.
Figure 5. Google developer console enabling API's (Image created by the author)
Figure 5. Google developer console enabling API’s (Image created by the author)
  • With the project APIs enabled, we now need to create a technical user with access to them. Click on "Credentials," then click on "Create Credentials," and select the "Service Account" option. Assign a name to the technical user; in my case, I named it "medium-project-google-sheets." Assign the "Editor" role to the technical user and finally click on "Done."
Figure 6. Google developer console creating technical user (Image created by the author)
Figure 6. Google developer console creating technical user (Image created by the author)
  • With the technical user created, we now need to generate the credentials for this user. Click on the user you just created, then click on "Keys," then click on "Add Key," and select "Create New Key." Choose the JSON option and then click on "Done." This will automatically download a JSON file with all the credentials needed to use Google Sheets from our app.
Figure 7. Google developer console creating technical user JSON credentials (Image created by the author)
Figure 7. Google developer console creating technical user JSON credentials (Image created by the author)
  • The last step is to store the credentials we just downloaded into a secrets.toml file. If you haven’t done so yet, create a new folder named .streamlit inside your project folder. Inside this folder, create a new file named secrets.toml. Open the file with the text editor of your choice (such as VS Code) and paste the information below into it.
# .streamlit/secrets.toml

[connections.gsheets]
spreadsheet = "<spreadsheet-name-or-url>"
worksheet = "<worksheet-gid-or-folder-id>"  # worksheet GID is used when using Public Spreadsheet URL, when usign service_account it will be picked as folder_id
type = ""  # leave empty when using Public Spreadsheet URL, when using service_account -> type = "service_account"
project_id = ""
private_key_id = ""
private_key = ""
client_email = ""
client_id = ""
auth_uri = ""
token_uri = ""
auth_provider_x509_cert_url = ""
client_x509_cert_url = ""
  • Substitute each element in the secrets.toml file with the data from the JSON credentials file downloaded from Google. For the "spreadsheet" field, copy the URL of the Google Sheets database we created for the project. Next copy the "client_email" data from the JSON file, and then go to the Google Sheets database. In the spreadsheet, click on "Share," then paste the "client_email" into the text input field, make sure the "Editor" permission is selected, and click "Send."

With all these preparations complete, we are now ready to code the app.

Libraries, state session variables and app config

We are now going to proceed to import the required libraries for the app and create the session state variables that the app will use. Most of the session state variables will be associated with the login module. In Streamlit, session state variables store information across different interactions within a session. They help maintain state, such as user inputs or selections, between reruns of the app. Initially, these variables will be set to empty strings and will be updated as the app runs. For our specific app, we will create state variables for the username (using the student email as the username), the password, and the group to which the user belongs. We will also set the page title and the page icon (favicon) using the st.set_page_config() method from Streamlit.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
from pathlib import Path
import streamlit as st
from streamlit_option_menu import option_menu
from streamlit_extras.add_vertical_space import add_vertical_space
from streamlit_extras.stylable_container import stylable_container 
from streamlit_gsheets import GSheetsConnection
from sklearn import metrics
import hashlib

if 'user_name' not in st.session_state:
    st.session_state['user_name'] = ''

if 'student_name' not in st.session_state:
    st.session_state['student_name'] = ''

if 'password' not in st.session_state:
    st.session_state['password'] = ''

if 'group' not in st.session_state:
    st.session_state['group'] = ''

st.set_page_config(
        page_title='Medium Project', 
        page_icon='📈 ' 
    )

To test that everything is working correctly, you can run the application from the root of the directory by executing the following command in the terminal. This will run the app on your localhost.

streamlit run app.py

You should see an empty page named "Medium Project" with the favicon that we selected 📈 , as shown in Figure 8.

Figure 8. Empty Streamlit page (image created by the author)
Figure 8. Empty Streamlit page (image created by the author)

Login module

Now that the session state variables are created, we can proceed to code the first module of our app: the login module. This module will contain the logic illustrated in Figure 9.

Figure 9. Login module process flow diagram (image created by the author)
Figure 9. Login module process flow diagram (image created by the author)

First, we will establish a connection with our Google Sheets database. Next, we will create the sidebar navigation menu using the st.sidebar method in combination with the option_menu method, allowing us to easily create different pages for our app. Once this is set up, we will establish the logic for the login module.

If a user is not logged in, they will be prevented from accessing the next modules. In the login module, we will ask for the username (email) and password. These credentials will be verified against the database. If they match, the user credentials will be stored in the session state variables, and the user will receive a message confirming a successful login. If not the user will receive a warning message, stating that it failed to login. We will also store the project configuration data from the database (project deadline and maximum number of submissions per day). The logic above will be implemented by following the code below.

login = None
log_df_n_cols = 4
login_df_ncols = 5
config_df_ncols = 2

conn = st.connection("gsheets", type=GSheetsConnection)
users_existing_data =  conn.read(worksheet="users", usecols=list(range(login_df_ncols)), ttl=1)
users_existing_data = users_existing_data.dropna(how="all")
users_existing_data.index = users_existing_data['email']

gs_user_db = users_existing_data.T.to_dict()

configuration_parameters = conn.read(worksheet="configuration", usecols=list(range(config_df_ncols)), ttl=1)

deadline = configuration_parameters['deadline']
deadline = pd.to_datetime(deadline)

max_per_day = int(configuration_parameters['max_per_day'])

today = pd.Timestamp.today()

difference = deadline - today

days = difference.dt.days.iloc[0]
seconds = difference.dt.seconds.iloc[0]
hours = seconds // 3600
minutes = (seconds % 3600) // 60

# emojis are obtained from bootstrap icons https://icons.getbootstrap.com/
with st.sidebar:
    st.image("./savila_games_logo.png")
    selected = option_menu(
        menu_title='MEDIUM APP',
        options= ['Login','Rankings','My group Submissions','Submit Results'],
        icons=['bi bi-person-fill-lock', '123','bi bi-database','bi bi-graph-up-arrow'], menu_icon="cast"
    )

    add_vertical_space(1)

    if selected == 'Login':

        user_name = st.text_input('User email', placeholder = '[email protected]')
        user_password =  st.text_input('Password', placeholder = '12345678',type="password")
        login = st.button("Login", type="primary")
    add_vertical_space(1)

if selected == 'Login':

   st.header('Welcome to your Final group project Challenge')
   st.divider()
   st.subheader(f" Time remaining: {days} days, {hours} hours and {minutes} minutes")
   st.divider()
   st.write('Please proceed to login using the left side panel')
   st.write('⚠️ Please note that access to subsequent sections is restricted to logged-in users.')
   st.divider()
   st.subheader('Rules:')
   with st.expander('Rules',expanded=True):
      st.markdown("- This app lets you submit your project calculations.")
      st.markdown("- Everyone on your team can submit their solutions")
      st.markdown(f"- The site let each team submit {max_per_day} submissions per day max ")
      st.markdown("- For each one of the parts whichever team comes up with the best result for a given part gets the top grade for that part.")
      st.markdown("- In case there is a tie. The team that submitted first the solution will get the higher grade.")  

   st.subheader('How to use the app:')
   with st.expander('How to use the app',expanded=True):
      st.markdown("- Check out the **Ranking** 🥇 tab to see where your team stands for each one of the parts.")   
      st.markdown("- Click on **My Group Submissions** 🗃 ️ to see the history of all the solutions that your team has submitted.")
      st.markdown("- Hit **Submit Results** 📊  to drop in a new solution.")
      st.markdown("- Every time you submit, the app checks if it's all good and pops out some feedback in case you had a bug.") 
      st.markdown("- Keep trying new things and submitting - there's no limit")          
   add_vertical_space(1)

if login:

    hash_object_user_name = hashlib.sha256() #hashing
    hash_object_user_name.update(user_name.encode())
    hashed_username = hash_object_user_name.hexdigest()

    if hashed_username not in gs_user_db.keys():
        with st.sidebar:
            st.error('Username not registered')
    else:

        hash_object_password = hashlib.sha256() #hashing
        real_password = gs_user_db[hashed_username]['password']

        hash_object_password.update(user_password.encode())
        hashed_user_password = hash_object_password.hexdigest()

        if hashed_user_password != real_password:
            with st.sidebar:
                st.error('Sorry wrong password')
        else:
            user_first_name = gs_user_db[hashed_username]['name']
            group = gs_user_db[hashed_username]['group']
            st.session_state['user_name'] = user_name
            st.session_state['student_name'] = user_first_name
            st.session_state['password'] = real_password
            st.session_state['group'] =  group
            with st.sidebar:
                st.success(f'{user_first_name} from group ({group}) succesfully log-in', icon="✅")

with st.sidebar:
    if st.session_state['user_name'] != '':
        st.write(f"User: {st.session_state['user_name']} ")
        st.write(f"Group: {st.session_state['group']} ")
        logout = st.button('Logout')
        if logout:
            st.session_state['user_name'] = ''
            st.session_state['password'] = ''
            st.session_state['group'] = ''
            st.session_state['student_name'] = ''
            st.rerun()
    else:
        st.write(f"User: Not logged in ")

Submit results module

Now, with the login module in place, we can proceed to create the submission module. In this module, students will be able to upload a CSV file with the prediction outputs of their models. This module will contain the logic illustrated in Figure 10.

Figure 10. Submit results module process flow diagram (image created by the author)
Figure 10. Submit results module process flow diagram (image created by the author)

We will only allow users to access this module if the project deadline has not passed and if their team has not exceeded the maximum number of submissions per day (to avoid spamming the app, and the risk of overfitting to the test dataset). If either of these conditions fails, the student will receive a warning stating the issue. If everything is correct, the student can submit their results by uploading a CSV file. If the shape of the file matches the requirements (number of rows equal to the test dataset size, and the file contains a column called "predictions"), the submission will be accepted. If the shape does not match, the student will receive a warning message with details about the issue.

If everything matches, the module will calculate the model evaluation metric. In this specific case, we are using the accuracy score, but you could use the F1 score if preferred – the code is easy to modify. After this is done, the module will store the student’s submission in our Google Sheets database, in the "log" tab. The logic above will be implemented by following the code below.

if selected == 'Submit Results':

  st.markdown("""
      <style>
      div[data-testid="stMetric"] {
          background-color: #EEEEEE;
          border: 2px solid #CCCCCC;
          padding: 5% 5% 5% 10%;
          border-radius: 5px;
          overflow-wrap: break-word;
      }
      </style>
      """
      , unsafe_allow_html=True)

  st.header('Submit Predictions')
  st.subheader("Machine Learning Classification")
  st.divider()
  st.subheader(f" Time remaining: {days} days, {hours} hours and {minutes} minutes")
  st.divider()

  if st.session_state['user_name'] == '':
      st.warning('Please log in to be able to submit your project solution')
  else:

      group_log_df = conn.read(worksheet="log", usecols=list(range(log_df_n_cols)), ttl=1).dropna(how="all")
      group_log_df = group_log_df[group_log_df['group'] == st.session_state['group']]
      group_log_df['time'] = pd.to_datetime(group_log_df['time'])

      test_data = conn.read(worksheet="test_data", usecols=list(range(1)), ttl=30).dropna(how="all")
      test_data_y = test_data['y']

      n_test = len(test_data)

      current_date = pd.Timestamp.today()

      submissions_count = group_log_df[(group_log_df['time'].dt.date == current_date.date())].shape[0]

      time_diff = deadline - current_date
      time_diff = time_diff.dt.total_seconds().iloc[0]

      if time_diff <= 0:
          st.warning('Sorry you cannot longer submit anymore, the project deadline has passed')
      else:
          if submissions_count >= max_per_day:
              st.warning(f'Sorry your team has already submitted {submissions_count} times today, and this exceeds the maximun amount of submissions per day')
          else:

              user_file = st.file_uploader("Upload your predictions file",type=['csv'])
              st.caption(f"Your file must have {n_test} rows and at least one column named 'predictions' with your model predictions")

              if user_file is not None:

                  submit_pred = st.button('submit',type="primary",key="submit_pred")

                  if submit_pred:

                      pred_df = pd.read_csv(user_file) 

                      if 'predictions' not in pred_df.columns.to_list():
                          st.error('Sorry there is no "predictions" column in your file', icon="🚨  ")
                      elif len(pred_df) != n_test:
                          st.error(f'Sorry the number of rows of your file ({len(pred_df)}) does not match the expected lenght of ({n_test})', icon="🚨  ")
                      else:
                          with st.spinner('Uploading solution to database'):
                              user_predictions = pred_df['predictions']

                              timestamp = datetime.datetime.now()
                              timestamp = timestamp.strftime("%d/%m/%Y, %H:%M:%S")
                              st.write(f'Submitted on: {timestamp}')                

                              ACC = metrics.accuracy_score(test_data_y,user_predictions)

                              F1 = metrics.f1_score(test_data_y,user_predictions)

                              cm = pd.DataFrame(metrics.confusion_matrix(test_data_y,user_predictions),
                                              columns = ["T Pred","F Pred"],index=["T Real","F Real"])

                              columns_part_2 = st.columns(3)

                              with columns_part_2[0]:
                                  st.metric("ACCURACY",f"{100*ACC:.1f} %")
                              with columns_part_2[1]:
                                  st.metric("F1-Score",f'{F1:.3f}')

                              with columns_part_2[2]:
                                  st.dataframe(cm,use_container_width=True)

                              solution_dict = dict()
                              solution_dict['user'] = st.session_state['student_name']
                              solution_dict['group'] = st.session_state['group']
                              solution_dict['time'] = timestamp
                              solution_dict['score'] = ACC

                              logs_df_2 = conn.read(worksheet="log", usecols=list(range(log_df_n_cols)), ttl=1).dropna(how="all")
                              solution_2 = pd.DataFrame([solution_dict])
                              updated_log_2 = pd.concat([logs_df_2,solution_2],ignore_index=True)
                              conn.update(worksheet="log",data = updated_log_2)
                              st.success(f'Your solution was uploaded on: {timestamp}',icon="✅")
                              st.balloons()

Dynamic ranking module

Our app already allows logged-in users to submit their solutions. Now, we need to create the gamification 🎮 aspect of the app by adding a dynamic ranking that shows students how their solutions compare to those submitted by other teams. The logic of this module is straightforward and does not require a diagram. Essentially, we need to gather all the data from the "log" tab of our Google Sheets database, find the best score for each team, and present these scores in a table, with the team with the best score at the top position, and so on. In case of a tie, where two or more teams have the same score, the app will give a higher rank to the team that submitted their solution first. The logic above will be implemented by following the code below.

if selected == "Rankings":
    st.header('Rankings')

    if st.session_state['user_name'] == '':
        st.warning('Please log in to be able to see The rankings')
    else:
        st.write('The table below shows the rankings for the project')

        rank_df = conn.read(worksheet="log", usecols=list(range(log_df_n_cols)), ttl=1).dropna(how="all")
        GROUPS = list(rank_df['group'].unique())
        default_time = pd.to_datetime('01/01/1901, 00:00:00')

        st.header("Part 4: Machine Learning Classification")
        st.divider()

        ranking_list_2 = []
        for gr in GROUPS:

            mini_df_2 = rank_df[rank_df['group'] == gr]
            if len(mini_df_2) == 0:
                row = {'group':gr,'Accuracy':0,'time':default_time}
                ranking_list_2.append(row)
                continue
            else:
                best_idx_2 = np.argmax(mini_df_2['score'])
                best_value_2 = mini_df_2.iat[best_idx_2,-1]
                best_time_2 = pd.to_datetime(mini_df_2.iat[best_idx_2,2])
                row = {'group':gr,'Accuracy':best_value_2,'time':best_time_2}
                ranking_list_2.append(row)
        ranking_df_2 = pd.DataFrame(ranking_list_2).sort_values(by = ['Accuracy','time'],ascending=[False, True])
        ranking_df_2 = ranking_df_2.reset_index(drop=True)
        ranking_df_2.iat[0,0] = ranking_df_2.iat[0,0] + "   🥇"
        ranking_df_2.iat[1,0] = ranking_df_2.iat[1,0] + "   🥈"
        ranking_df_2.iat[2,0] = ranking_df_2.iat[2,0] + "   🥉"
        st.dataframe(ranking_df_2,use_container_width=True,hide_index=True)

Submissions log module

The last module we will implement is the Submissions Log module. This module will allow each student to access a historical log of all submissions made by themselves and their teammates throughout the project duration. The logic of this module is straightforward and does not require a diagram. We need to gather all the data from the "log" tab of our Google Sheets database, filter it for the current user’s group, and present the information in a table format. The logic above will be implemented by following the code below.

if selected == 'My group Submissions':
    st.header('My Group Submissions')

    if st.session_state['user_name'] == '':
        st.warning('Please log in to be able to see your submission history')
    else:
        st.write(f'The table below shows you the submission history of your group: **{st.session_state["group"]}**')
        group_log_df = conn.read(worksheet="log", usecols=list(range(log_df_n_cols)), ttl=1).dropna(how="all")
        group_log_df = group_log_df[group_log_df['group'] == st.session_state['group']]
        group_log_df = group_log_df[['user','time','score']]

        st.subheader('Submissions History:')
        st.dataframe(group_log_df,use_container_width=True,hide_index=True)    

With this last module coded, the app is complete. The entire code can be found at the link below.

project_medium_kaggle_app/app.py at main · ceche1212/project_medium_kaggle_app

App deployment

So far, our app runs smoothly on local execution. However, the main point of building it is to share it with your students and use it for their final project. This is where deployment becomes crucial. For this project, I chose to deploy the app using the Streamlit Community Cloud. It is free and easy to use. The service uses GitHub to deploy the app, with the only drawback being that the GitHub repository must be public. Therefore, it is essential not to upload any sensitive information, such as the technical user credentials we downloaded from Google. This information will be managed directly within the Streamlit Community Cloud service, so don’t worry about it. If you want to explore other deployment options, I encourage you to check out the excellent article below written by Damian Boh.

3 Easy Ways to Deploy your Streamlit Web App Online

To deploy our app, follow the steps below:

  1. Create a new public GitHub repository. The README.md file can be created directly when you create the repo.
  2. Upload or commit the following files, but please do not, and I repeat, DO NOT upload the secrets.toml file:
  • app.py
  • requirements.txt
  • logo.png (optional, in case you want to customize your app with your university logo or your company logo)

Your repo should look something like Figure 11.

Figure 11. Example snapshot of the repository for deployment. Please note that there is no secrets.toml file in the repo Please do not share your private credentials with the world (image created by the author)
Figure 11. Example snapshot of the repository for deployment. Please note that there is no secrets.toml file in the repo Please do not share your private credentials with the world (image created by the author)
  1. Go to the Streamlit Community Cloud site https://streamlit.io/cloud and sign in. If you don’t have an account, create one.
  2. Once signed in, click on the "Create App" button located in the top right corner of the site, and then select the option to get the app code from GitHub.
  3. Input the name of the repo that contains your app files, select the main branch, and input the name of the Python file (in our case, app.py). You can also customize the app URL name; I chose medium-kaggle-like-app. The form should look something like Figure 12.
Figure 12. Creating a new app in the Streamlit community cloud service (Image created by the author)
Figure 12. Creating a new app in the Streamlit community cloud service (Image created by the author)
  1. Click the "Deploy" button. It might take a few minutes.
  2. Once the app is deployed, you will immediately receive an error from Streamlit, as shown in Figure 13. This is completely normal since the deployed app does not have access to the technical user credentials because we never uploaded the secrets.toml file to GitHub. But don’t worry, we will fix this error in the next step.
Figure 13. Streamlit deployment error (Image created by the author)
Figure 13. Streamlit deployment error (Image created by the author)
  1. In the lower right corner of the site, there is a menu called "Manage App." Click on it, then open the sandwich menu, select "Settings," then select "Secrets," and copy the content of the secrets.toml file into there. The app will automatically reboot and should work just fine.
Figure 14. Fixing the Streamlit deployment error (Image created by the author)
Figure 14. Fixing the Streamlit deployment error (Image created by the author)

You can check the final deployed app by clicking the link below. Log in using the username: [email protected] and the password: pass1234. In the GitHub repository for this project, I’ve included a .csv file that you can submit as your results.

Real life implementation results

Using the framework of this tutorial, I gamified the final project of my Python for Finance course that I teach to my master’s students. Honestly, my expectations for this project were modest. I hoped each team would interact with the platform at least twice during the project, so having 50–60 interactions across 7 teams and 3 project sections by the project deadline would have been considered a success.

But my students gave me one of the best gifts an instructor can receive. After one month, the app received over 690 submissions, nearly 12 times my original expectations. For me, these levels of engagement are unprecedented. Each group submitted an average of more than 30 submissions per project section, equating to roughly one submission per day per section. Comparing the first submission to the best submission for each section and team, there was an average improvement of 21%, with some teams improving their submissions by more than 60%. If I had implemented a typical version of this project instead of the gamified version, this level of improvement likely would not have materialized. This demonstrates the power of gamification. Now, you have an app that you can easily implement in your classes, and everything is free. Awesome, don’t you think?

If you want to read more about gamification, please make sure to read my previous article, where I discuss the logic behind it and how it can help foster engagement and learning.

Transforming Group Projects: Enhancing Learning with Gamification Techniques

Conclusions

This article has taken you through the entire process of creating a CRUD app using Streamlit, integrated with Google Sheets, that can be used to gamify machine learning projects for students. I have also shown you how to deploy your app using the Streamlit Community Cloud service. Please note that this code is extremely flexible and is not limited to only machine learning projects. I also adapted it for one of my project scheduling classes and achieved excellent results.

In the article below, Bruno Scalia C. F. Leite uses Streamlit to deploy a logistics app. If you want to learn how you can use Streamlit to create operations research applications, I recommend checking out his article by following the link below.

Designing Operations Research Solutions: A User-friendly Routing Application with Streamlit

I sincerely hope you found this article useful and entertaining. If so, I’d love to hear your thoughts! Please feel free to leave a comment or show your appreciation with a clap 👏 . And if you’re interested in staying updated on my latest articles, consider following me on Medium. Your support and feedback are what drive me to keep exploring and sharing. Thank you for taking the time to read, and stay tuned for more insights in my next article!

References


Related Articles