Torsten Walbaum, Author at Towards Data Science

Mastering Back-of-the-Envelope Math Will Make You a Better Data Scientist

Torsten Walbaum — Wed, 23 Oct 2024 04:04:06 +0000

On July 16, 1945, during the first nuclear bomb test conducted at Los Alamos, physicist Enrico Fermi dropped small pieces of paper and observed how far they moved when the blast wave reached him.

Based on this, he estimated the approximate magnitude of the yield of the bomb. No fancy equipment or rigorous measurements; just some directional data and logical reasoning.

About 40 seconds after the explosion the air blast reached me. I tried to estimate its strength by dropping from about six feet small pieces of paper before, during and after the passage of the blast wave. […] I estimated to correspond to the blast that would be produced by then thousand tons of T.N.T. — Enrico Fermi

This estimate turned out to be remarkably accurate considering how it was produced.

We’re forced to do quick-and-dirty approximations all the time. Sometimes we don’t have the data we need for a rigorous analysis, other times we simply have very little time to provide an answer.

Unfortunately, estimates didn’t come naturally to me. As a recovering perfectionist, I wanted to make my analyses as robust as possible. If I’m wrong and I took a quick-and-dirty approach, wouldn’t that make me look careless or incapable?

But over time, I realized that making a model more and more complex rarely leads to better decisions.

Why?

Most decisions don’t require a hyper-accurate analysis; being in the right ballpark is sufficient
The more complex you make the model, the more assumptions you layer on top of each other. Errors compound, and it becomes harder to make sense of the whole thing

Napkin math, back-of-the-envelope calculations: Whatever you want to call it, it’s how management consultants and BizOps folks cut through complexity and get to robust recommendations quickly.

And all they need is structured thinking and a spreadsheet.

My goal with this article is to make this incredibly useful technique accessible to everyone.

In this article, I will cover:

How to figure out how accurate your analysis needs to be
How to create estimates that are "accurate enough"
How to get people comfortable with your estimates

Let’s get into it.

Part 1: How accurate do you need to be?

Most decisions businesses make don’t require a high-precision analysis.

We’re typically trying to figure out one of four things:

Scenario 1: Can we clear a minimum bar?

Often, we only need to know if something is going to be better / larger / more profitable than X.

For example, large corporations are only interested in working on things that can move the needle on their top or bottom line. Meta does over $100B in annual revenue, so any new initiative that doesn’t have the potential to grow to a multi-billion $ business eventually is not going to get much attention.

Once you start putting together a simple back-of-the-envelope calculation, you’ll quickly realize whether your projections land in the tens of millions, hundreds of millions, or billions.

If your initial estimate is way below the bar, there is no point in refining it; the exact answer doesn’t matter at that point.

Other examples:

VCs trying to understand if the market opportunity for a startup is big enough to grow into a unicorn
You’re considering joining an early-stage company and are trying to understand if it can ever grow into its high valuation (e.g. AI or autonomous driving companies)

Get an email whenever Torsten Walbaum publishes.

Scenario 2: Can we stay below a certain level?

This scenario is the inverse of the one above.

For example, let’s say the CMO is considering attending a big industry conference last minute. He is asking whether the team will be able to pull together all the necessary pieces (e.g. a booth, supporting Marketing campaigns etc.) in time and within a budget of $X million .

To give the CMO an answer, it’s not that important by when exactly you’ll have all of this ready, or how much exactly this will cost. At the moment, he just needs to know whether it’s possible so that he can secure a slot for your company at the conference.

The key here is to use very conservative assumptions. If you can meet the timeline and budget even if things don’t go smoothly, you can confidently give green light (and then work on a more detailed, realistic plan).

Other examples:

Your manager wants to know if you have bandwidth to take on another project
You are setting a Service Level Agreement (SLA) with a customer (e.g. for customer support response times)

Scenario 3: How do we stack-rank things?

Sometimes, you’re just trying to understand if thing A is better than thing B; you don’t necessarily need to know exactly how good thing A is.

For example, let’s say you’re trying to allocate Engineering resources across different initiatives. What matters more than the exact impact of each project is the relative ranking.

As a result, your focus should be on making sure that the assumptions you’re making are accurate on a relative level (e.g. is Eng effort for initiative A higher or lower than for initiative B) and the methodology is consistent to allow for a fair comparison.

Other examples:

You’re trying to decide which country you should expand into next
You want to understand which Marketing channel you should allocate additional funds to

Scenario 4: What’s our (best) estimate?

Of course, there are cases where the actual number of your estimate matters.

For example, if you are asked to forecast the expected support ticket volume so that the Customer Support team can staff accordingly, your estimate will be used as a direct input to the staffing calculation.

In these cases, you need to understand 1) how sensitive the decision is to your analysis, and 2) whether it’s better if your estimate is too high or too low.

Sensitivity: Sticking with the staffing example, you might find that a support agent can solve 50 tickets per day. So it doesn’t matter if your estimate is off by a bunch of tickets; only once you’re off by 50 tickets or more, the team would have to staff one agent more or less.
Too high or too low: It matters in which direction your estimate is wrong. In the above example, being understaffed or overstaffed has different costs to the business. Check out my previous post on the cost of being wrong for a deep dive on this.

Part 2: How to create estimates that are "accurate enough"

You know how accurate you need to be – great. But how do you actually create your estimate?

You can follow these steps to make your estimate as robust as possible while minimizing the amount of time you spend on it:

Step 1: Building a structure

Let’s say you work at Netflix and want to figure out how much money you could make from adding games to the platform (if you monetized them through ads).

How do you structure your estimate?

The first step is to decompose the metric into a driver tree, and the second step is to segment.

Developing a driver tree

At the top of your driver tree you have "Games revenue per day". But how do you break out the driver tree further?

There are two key considerations:

1. Pick metrics you can find data for.

For example, the games industry uses standardized metrics to report on monetization, and if you deviate from them, you might have trouble finding benchmarks (more on benchmarks below).

2. Pick metrics that minimize confounding factors.

For example, you could break revenue into "# of users" and "Average revenue per user". The problem is that this doesn’t consider how much time users spend in the game.

To address this issue, we could split revenue out into "Hours played" and "$ per hour played" instead; this ensures that any difference in engagement between your games and "traditional" games does not affect the results.

You can then break out each metric further, e.g.:

"$ per hour played" could be calculated as "# ad impressions per hour" times "$ per ad impression"
"Hours played" could be broken out into "Daily Active Users (DAU)" and "Hours per DAU"

However, adding more detail is not always beneficial (more on that below).

Segmentation

In order to get a useful estimate, you need to consider the key dimensions that affect how much revenue you’ll be able to generate.

For example, Netflix is active in dozens of countries with vastly different monetization potential and to account for this, you can split the analysis by region.

Which dimensions are helpful in getting a more accurate estimate depends on the exact use case, but here are a few common ones to consider:

Geography
User demographics (age, device, etc.)
Revenue stream (e.g. ads vs. subscriptions vs. transactions)

"Okay, great, but how do I know when segmentation makes sense?"

There are two conditions that need to be true for a segmentation to be useful:

The segments are very different (e.g. revenue per user in APAC is multiple times less than in the US)
You have enough information to make informed assumptions for each segment

You also need to make sure the segmentation is worth the effort. In practice, you’ll often find that only one or two metrics are materially different between segments.

Here’s what you can do in that case to get a quick-and-dirty answer:

Instead of creating multiple separate estimates, you can calculate a blended average for the metric that has the biggest variance across segments.

So if you expect "$ per hour played" to vary substantially across regions, you 1) make an assumption for this metric for each region (e.g. by getting benchmarks, see below) and 2) estimate what the country mix will be:

Image by author

You then use that number for your estimate, eliminating the need to segment.

How detailed should you get?

If you have solid data to base your assumptions on, adding more detail to your analysis can improve the accuracy of your estimate; but only up to a point.

Besides increasing the effort required for the analysis, adding more detail can result in false precision.

Image by author

So what falls into the "too much detail" bucket? For the sake of a quick and dirty estimation, this would include things like:

Segmenting by device type (Smart TV vs. Android vs. iOS)
Considering different engagement levels by day of week
Splitting out CPMs by industry
Modeling the impact of individual games
etc.

Adding this level of detail would increase the number of assumptions exponentially without necessarily making the estimate more accurate.

Step 2: Putting numbers against each metric

Now that you have the inputs to your estimate laid out, it’s time to start putting numbers against them.

Internal data

If you ran an experiment (e.g. you rolled out a prototype for "Netflix games" to some users) and you have results you can use for your estimate, great. But a lot of the time, that’s not the case.

In this case, you have to get creative. For example, let’s say that to estimate our DAU for games, we want to understand how many Netflix users might see and click on the games module in their feed.

To do this, you can compare it against other launches with similar entry points:

What other new additions to the home screen did you launch recently?
How did their performance differ depending on their location (e.g. the first "row" at the top of the screen vs. "below the fold" where you have to scroll to find it)?

Based on the last few launches, you can then triangulate the expected click-through-rate for games:

Image by author

These kind of relationships are often close enough to linear (within a reasonable range) so that this type of approximation yields useful results.

Once you get some actual data from an experiment or the launch, you can refine your assumptions.

External benchmarks

External benchmarks (e.g. industry reports, data vendors) can be helpful to get the right ballpark for a number if internal data is unavailable.

There are a few key considerations:

Pick the closest comparison. For example, casual games on Netflix are closer to mobile games than PC or console games, so pick benchmarks accordingly
Make sure your metric definitions are aligned. Just because a metric in an external report sounds similar doesn’t mean it’s identical to your metric. For example, many companies define "Daily Active Users" differently.
Choose reputable, transparent sources. If you search for benchmarks, you will come across a lot of different sources. Always try to find an original source that uses (and discloses!) a solid methodology (e.g. actual data from a platform rather than surveys). Bonus points if the report is updated regularly so that you can refresh your estimate in the future if necessary.

Deciding on a number

After looking at internal and external data from different sources, you will likely have a range of numbers to choose from for each metric.

Take a look at how wide the range is; this will show you which inputs move the needle on the answer the most.

For example, you might find that the CPM benchmarks from different reports are very similar, but there is a very wide range for how much time users might spend playing your games on a daily basis.

In this case, your focus should be on fine-tuning the "hours played" assumption:

If there is a minimum amount of revenue the business wants to see to invest in games, see if you can reach that level with the most conservative assumption
If there is no minimum threshold, try to use sanity checks to determine a realistic level.

For example, you could compare the play time you’re projecting for games against the total time users currently spend on Netflix.

Even if some of the time is incremental, it’s unrealistic that more than, say, 5% – 10% of the total time is spent on games (most of the users came to Netflix for video content, and there are better gaming offerings out there, after all).

Part 3: How to get people comfortable with your estimates

If you’re doing a quick-and-dirty estimate, people don’t expect it to be perfectly accurate.

However, they still want to understand what it would take for the numbers to be so different that they would lead to a different decision or recommendation.

A good way to visualize this is a sensitivity table.

Let’s say the business wants to reach at least $500k in ad revenue per day to even think about launching games. How likely are you to reach this?

On the X and Y axis of the table, you put the two input metrics that you feel least sure about (e.g. "Daily Active Users (DAU)" and "Time Spent per DAU"); the values in the table represent the number you’re estimating (in this case, "Games revenue per day").

Image by author

You can then compare your best estimate against the minimum requirement of the business; for example, if you’re estimating 30M DAU and 0.3 hours of play time per DAU, you have a comfortable buffer to be wrong on either assumption.

Closing thoughts

While it’s called napkin math, three lines scribbled on a cocktail napkin are rarely enough for a solid estimate.

However, you also don’t need a full-blown 20-tab model to get a directional answer; and often, that directional answer is all you need to move forward.

Once you get comfortable with rough estimates, they allow you move faster than others who are still stuck in analysis paralysis. And with the time you save, you can tackle another project – or go home and do something else.

For more hands-on Analytics advice, consider following me here on Medium, on LinkedIn or on Substack.

The post Mastering Back-of-the-Envelope Math Will Make You a Better Data Scientist appeared first on Towards Data Science.

There’s a right way to be wrong

Torsten Walbaum — Mon, 16 Sep 2024 22:56:23 +0000

There’s a Right Way to Be Wrong

How to make better predictions by incorporating business context and the cost of being wrong

Image by author (via Midjourney)

One day, my mother had vision problems in her left eye and went to the doctor. The doctor did a quick examination and concluded there was nothing wrong; the eyes were just getting worse with old age.

Shortly after, the symptoms got worse and my mom got a second opinion. Turns out she had a retinal detachment, and the delay in treatment caused permanent damage.

People make mistakes at work all the time; you can’t be right in 100% of the cases. But some mistakes are costlier than others and we need to account for that.

If the doctor had said "there might be something there" and sent my mother for further tests, there would have been a chance they all come back negative and it’s nothing after all. But the cost of being wrong in that case would have only been a waste of time and some medical resources, not permanent damage to an organ.

The medical field is an extreme example, but the same logic applies in jobs like Data Science, BizOps, Marketing or Product as well:

We should take the consequences, or cost, of being wrong into account when making predictions.

For example, if you work at Uber and are trying to predict demand (which you’re never going to do 100% accurately), would you rather end up with too many or too few drivers in the marketplace?

Unfortunately, in my experience, these conversations between business stakeholders and data scientists rarely happen. Let’s try to change that.

In this post, I will cover:

The different ways we make wrong predictions
How to make sure you are wrong the "right" way
4 Real-life examples to get you thinking about how you want to be wrong

Get an email whenever Torsten Walbaum publishes.

The different ways we are wrong

When we make predictions, we are typically either trying to:

Predict a category or outcome (e.g. users that will churn vs. those that won’t); this is called "classification"
Forecast a number (e.g. sales for the next year)

Let’s look at what it means to be right or wrong in each case.

Predicting a category or outcome

In a so-called classification problem, being wrong means we assign the wrong label to something. For simplicity, we are going to focus on problems with only two possible outcomes (binary classification).

For example, let’s say we’re trying to predict whether a prospect is going to buy from us. There are four outcomes:

We predict a prospect will buy, and they do (True Positive)
We predict a prospect will buy, but they don’t (False Positive)
We predict they won’t buy, but they do (False Negative)
We predict they won’t buy, and they don’t (True Negative)

2 (False Positive) and #3 (False Negative) are the two ways we can be wrong.

We can put our predictions into a so-called Error Matrix (or Confusion Matrix) to see how we did:

Image by author

There are three important metrics that help us understand how often, and in what way, we were wrong:

Our Accuracy tells us how many predictions overall were correct; it is calculated as the sum of our correct predictions (True Positives + True Negatives) divided by the total number of predictions
Our Precision tells us how many of our positive predictions were correct. I.e. out of all prospects we said would buy from us, how many actually did? It is the number of True Positives divided by all positive predictions (True Positives + False Positives)
Our Recall tells us how many of the relevant outcomes we correctly predicted, or in other words, how sensitive our model is. I.e. out of all prospects that ended up buying from us, how many did we identify in our prediction? It is calculated as True Positives divided by True Positives plus False Negatives

So now you have three different measures telling you how accurate your prediction is. Which one should you optimize for?

We will get to that in a second.

Forecasting a number

When we’re forecasting a number like sales, it’s a bit simpler. Our forecast is either above or below the actual number (or on target – just kidding, that one never happens ).

There are different ways you can measure the accuracy of your forecast here; the most popular ways are likely the Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE).

Image by author

The problem: By default, these measures of accuracy treat over-forecasting and under-forecasting as equally bad. In reality, that’s rarely the case, though:

For example, if you’re forecasting inventory needs it’s very different to over-predict (and have a few too many items in the warehouse) than to under-predict (and run out of stock, costing you valuable sales).

The key question here is: For the business use case you are trying to forecast, would you rather be too high or too low?

How to ensure you are wrong the "right" way

Just because a prediction model has good accuracy doesn’t mean it’s doing a good job at what you want it to do.

Here’s a real example I’ve encountered to illustrate this point:

Let’s say you work in Marketing or Sales and want to predict which leads will result in successful deals. You train a simple model on historical data and can’t believe your eyes. 90% accuracy! On the first try!

But then you look at the results in more detail and see the following:

Image by author

Our accuracy is 90%, but the model failed to identify any of the deals that ultimately turned into won customers. Because the majority of leads never convert, the model can achieve high accuracy by simply predicting that not a single lead will convert.

Similarly, you might have a forecasting model for predicting inventory needs that looks pretty good on paper as it has a fairly high accuracy. But if you look more closely, you notice that it almost always slightly under-predicts and your stores run out of inventory at the end of the day as a result.

Obviously, models like that are not very useful.

There are two main challenges you face when making these types of predictions:

Your forecasting model doesn’t have business context; i.e. if you don’t explicitly tell it, the model doesn’t "know" the cost of being wrong one way or the other
In classification problems, the events we want to predict are often rare

The good news is that you can address both issues.

If you are on the Business or Product side, this is a great opportunity to get closer to the work of your Data Science counterparts; and if you’re a DS, this is one of those situations where business context is absolutely crucial to delivering a useful analysis.

Step 1: Understanding the cost of being wrong

The first step in making a prediction model cost-aware is to understand that cost.

Here are some of the key types of costs to consider:

Direct costs: In many cases, a wrong prediction directly causes a financial cost to the business. For example, if a company fails to identify a fraudulent transaction, they might have to eat the resulting costs.
Opportunity cost: If your predictions are wrong, you are often not using your resources efficiently. For example, if you predict a very high volume of support tickets and staff accordingly, your support agents will be idle when fewer tickets come in than you forecasted.
Lost revenue: A misclassification or other wrong prediction can cause you to miss sales. For example, you decide not to send a promotion to a customer because your model predicted they wouldn’t be interested, but in reality they would have made a purchase if they had gotten the promotional email.
Unsubscribes & churn: On the flip side, there are plenty of scenarios where a wrong prediction can annoy users or customers and cause them to churn. Sticking with the above example, if you send too many emails or push notifications because you think users might be interested in these promotions, they might unsubscribe from these comms channels or even churn.

You can have multiple types of cost (e.g. direct costs and lost revenue) at the same time.

Add up all of the applicable individual cost factors to determine the total cost of being wrong in a certain scenario (i.e. a False Positive or False Negative in classification or over- and under-predicting when forecasting a number).

Step 2: Making your prediction model "cost-aware"

Classification (predicting a category or outcome)

Once you have calculated the cost of each type of error (False Positive & False Negative), you can calculate the expected total cost of being wrong for your forecasting model.

You get this overall cost by multiplying the likelihood of each type of error with the cost of that type of error and summing it all up:

Image by author

Mathematically, your goal should be to minimize this overall cost. But to do this, you need to make sure your prediction model actually takes this cost into account.

There are many ways to make your classification model cost aware, but I’m going to cover the three main ones I find most useful in practice:

1. Adjusting the classification threshold

Many classification models like Logistic Regression don’t actually output a classification, but rather a probability of an event occurring (so-called probabilistic models).

So sticking with the previous example, if you’re predicting which transactions are fraudulent, the model doesn’t actually directly say "this is a fraud transaction", but rather "this transaction has a [X%] probability of being fraud".

In a second step, the transactions are then put into two buckets based on their probability:

Bucket 1: Fraudulent transactions
Bucket 2: Regular (non-fraudulent) transactions

By default, the threshold is 50%; so all the transactions with a probability > 50% are going into Bucket 1, the rest into Bucket 2.

However, you can change this threshold. For example, you could decide a 20% probability is enough that you want to flag a transaction as fraud.

Why would you do this?

It goes back to the cost of being wrong. Missing a fraudulent transaction is much more costly to the company than flagging a transaction as fraud that turns out to be fine; in the latter case, you just have minor costs of an additional manual review by a human and a delay in the transaction which might annoy the customer.

How do you decide which threshold to choose?

You have two options:

Option 1: Use the Precision-Recall Curve

The so-called Precision-Recall Curve shows you the Precision (how many of our positive predictions were correct) and Recall (how many relevant outcomes we successfully identified) at different thresholds.

It’s a trade-off: The more fraudulent transactions you want to detect (higher Recall), the more false alarms you’ll have (lower Precision). You can align with your business partners on a point on the curve that you are comfortable with. For example, they might have a minimum share of fraudulent transactions they want to catch.

To make sure your choice is reasonable, you can then calculate the total cost of being wrong for your chosen point per the formula at the beginning of this chapter.

Image by author

Option 2: Calculate the cost-minimizing threshold

Based on the cost of a False Positive and False Negative, you can calculate the threshold that minimizes the overall cost.

For a well-calibrated model, this is:

The Foundations of Cost-Sensitive Learning, C. Elkan (2001)

An example:

If your cost of a False Negative (e.g. failing to identify a fraudulent transaction) __ is $9 and that of a False Positive (e.g. falsely flagging a transaction as fraud) is $1, then the _Ideal Threshol_d would be 1 / (9 + 1) = 1 / 10 = 0.1.

So even if the chance that something is fraud is only slightly above 10%, you would want to flag the transaction and have a human review it.

2. Rebalancing your training dataset

As mentioned above, the event we’re trying to predict is often underrepresented in the data that we’re training our model on (these events are called the "minority class").

At the same time, failing to identify these events usually has a very high cost to the business.

A few examples:

We try to predict which users will churn in the next X days, but few users actually churn in a given timeframe
You predict whether a certain transaction is fraudulent or not, but fraud is the absolute exception
We try to predict whether there are signs of cancer on scan, but most patients are healthy

Failing to detect these rare events is typically very costly, but most models struggle to perform well with such imbalanced datasets. They simply don’t have enough examples to learn from.

To deal with this issue, you can oversample the examples of the underrepresented class so that the positive events are more equally represented. In plain English: You are creating more instances of the rare event so that the model has an easier time training to detect them.

You either do this by duplicating existing examples of the minority class in your training data set, or by creating synthetic ones.

Alternatively, you could also reduce instances of the majority class to balance things out (undersampling) Both oversampling and undersampling come with challenges, which are beyond the scope of this article. I linked some resources for further reading at the end.

Image by author

How do you rebalance your dataset to minimize the cost of being wrong?

If your classifier is using a standard threshold of 0.5, you can calculate by what factor you need to multiply the number of majority examples with the following formula:

The Foundations of Cost-Sensitive Learning, C. Elkan (2001)

An example:

So using the same values as in the example above ($1 cost of falsely flagging a transaction as fraud, $9 cost of missing a fraudulent transaction), the factor would be 1 / 9 = 0.1111.

In other words, we would scale down the number of majority examples in the training set by a factor of 9 (undersampling).

3. Modifying the weights for each class

Many Machine Learning models allow you to adjust the weights assigned to each class.

What does that do? Let’s say again we’re predicting fraudulent transactions in our app. Our model tries to minimize misclassifications. By default, failing to identify fraud and falsely flagging a normal transaction as fraud are treated as equally bad.

If we assign a higher weight to the minority class (our fraudulent transactions) though, we essentially make mistakes in this class (i.e. failing to identify fraud) more costly and thus incentivize the model to make fewer of them.

Forecasting a number

We’ve talked a lot about how you can make your classification model cost-aware, but not all Predictions are classifications.

What if you’re forecasting how much inventory you need or how many organic leads you expect for your Sales team? Forecasting too high or too low in these scenarios has very different consequences for the business, but as discussed earlier, most widely-used forecasting methods ignore this business context.

Enter: Quantile Forecasts.

Traditional forecasts typically try to provide a "best estimate" that minimizes the overall forecast error. Quantile Forecasts, on the other hand, allow you to define how "conservative" you want to be.

The "quantile" you choose represents the probability that the actual value will land below the forecasted value. For example, if you want to be sure that you don’t under-predict, you can forecast at the 90th percentile, and the actual value is expected to be lower than your forecast 90% of the time.

In other words, you are assigning a higher cost __ to under-forecasting compared to over-forecasting.

4 real-life examples

That was a lot of theory; let’s put it into practice.

Here are some real-life scenarios in which you’ll have to make sure you are picking the "right" way to be wrong.

Example 1: Lead scoring in B2B Marketing & Sales

The problem:

Type: Predicting an outcome (classification problem)

You work at a B2B SaaS company and want to figure out which leads are going to "close", i.e. turn into customers. The high-potential leads your model identifies will receive special attention from Sales reps and be targeted with additional Marketing efforts.

The cost of being wrong:

False Positive: If you falsely flag a lead as "high potential", you are 1) wasting the time of your Sales reps and 2) wasting the money spend on high-touch Marketing campaigns (e.g. gifts).
False Negative: If you fail to identify a high potential lead, you have a lower chance of closing the deal since you’re not deploying your best tactics, leading to lost revenue.

What to optimize for:

By default, your model will flag very few leads as "high potential" because most leads in the training data never closed.

However, since losing a deal in B2B SaaS is typically much more costly than wasting some resources on working an unsuccessful deal (esp. if you target Mid Market or Enterprise companies with larger deal sizes), you’ll want to tune your model to massively penalize False Negatives.

Example 2: Inventory for a sales promotion

The problem:

Type: Forecasting a number

You work at an E-Commerce company and want to predict demand for your most important products for a big upcoming event (e.g. Amazon Prime Day).

The cost of being wrong:

Over-forecasting: If you overestimate the demand, you’ll have excess inventory. The cost consists of storage costs in the warehouse, plus you might have to write off the value of the items if you can’t sell them later.
Under-forecasting: If you under-predict the demand, you’ll run out of inventory and miss out on sales. Plus, your reputation will be tarnished as customers value constant availability and reliability in E-Commerce, pot. leading to churn.

What to optimize for:

The cost of running out of inventory is much higher than having some excess (non-perishable) stock. As a result, you’ll want to minimize the odds of this happening.

You can use a Quantile Forecast to decide exactly how much inventory risk you’re willing to take. Do you want to have a 50%, 70% or 90% chance of having sufficient stock to meet demand?

Example 3: Email promotions

The problem:

Type: Predicting an outcome (classification problem)

You’re planning to run a new type of Email marketing campaign and are trying to figure out which users you should target.

The cost of being wrong:

False positive: If you opt someone into the campaign and they don’t find it relevant, they might unsubscribe. The cost is that you can’t email them anymore with other campaigns and might lose out on future engagement or sales
False negative: If you don’t opt someone in that would have found the campaign relevant, you are leaving near-term user engagement or sales on the table

As you can see, the trade-off here is between short-term benefits and pot. negative long-term consequences.

What to optimize for:

This case is less clear-cut. You would have to quantify the "lifetime value" of being able to email a user and then tune your model so that the expected lost future revenue from unsubscribes equals the expected short-term gains from the email campaign itself.

Example 4: Sales hiring

The problem:

Type: Forecasting a number

You work in a B2B company. You are launching a new market and are forecasting the expected number of qualified opportunities to figure out how many Sales reps you should hire.

The cost of being wrong:

Over-forecasting: If your forecast is too high, you are hiring more sales reps than you can "feed". The reps will be unable to hit their quota, morale will tank, and you will eventually have to let people go.
Under-forecasting: If your forecast is too low, you end up with more opportunities per Sales rep than expected. Up to a certain level, this excess volume can be "absorbed" (reps will spend less time on Outbound, managers and reps from other markets can pitch in etc.). Only if you massively under-predicted, you will start leaving money on the table.

What to optimize for:

Over-forecasting, and as a result over-hiring, is extremely costly. Since the business has more flexible ways to deal with under-forecasting, you’ll want to choose a Quantile Forecast where the actual deal volume lands above the forecast the majority of the time.

In conclusion

A prediction is only as useful as the decisions it enables.

If you don’t incorporate real-life context into your analysis, it doesn’t matter how sophisticated your model is. The output will be suboptimal at best, harmful at worst.

When Data Science and Business / Product stakeholders partner, however, to calculate the cost of being wrong and tune the model to reflect the priorities of the business, predictions become an incredibly useful tool.

For more hands-on analytics advice, consider following me here on Medium, on LinkedIn or on Substack.

The post There’s a right way to be wrong appeared first on Towards Data Science.

Done is Better Than Perfect

Torsten Walbaum — Tue, 30 Jul 2024 14:30:12 +0000

You’re good at your job and you pride yourself in knowing the ideal way to do things. And since you want to raise the bar, you hold others to the same standard. This will surely get you noticed and promoted, right?

But then you get passed over for promotion and when you look around, you notice that the people that do get promoted are delivering work that’s much less rigorous than yours. Can people not tell the difference, or what’s going on?

If you’re a high performer, it’s easy to slide into perfectionism. It starts early: School and college train us in scientific methods, and anything that deviates from the ideal solution gets point deductions.

Image by author

This academic approach is often carried over into the workplace, especially in rigorous fields like Data Science & Analytics.

However, the reality is: In high-growth companies, getting stuff done is more important than perfection. If you can’t deliver results at the speed at which the business needs them, it will move forward without you.

This post will show you how to prevent that from happening.

I will cover:

Why Perfectionism is holding you back in your career
How to spot perfectionism and what to do about it
When to be pragmatic and when not to
How to become more pragmatic

Get an email whenever Torsten Walbaum publishes.

Why perfectionism is holding you back

At the surface level, perfectionism sounds good: You strive for excellence based on your intrinsic desire for perfection. Nothing you produce will ever make your manager or the company look bad.

But perfectionism can become a major blocker to your career progress:

1. Perfectionism lowers your output.

Studies show that humans assign higher value to short-term outcomes compared to long-term outcomes. That’s why we have trouble saving for retirement when we can use that money to go on a vacation right now.
At work, this means perfectionists try to minimize the chance of making a mistake (that would result in immediate negative consequences) and end up spending too much time polishing deliverables. This results in lower output which, in turn, makes it harder to get promoted.

2. Perfectionism limits your growth opportunities.

Perfectionists do whatever they can to minimize the possibility of mistakes. The natural consequence of this is that they tend to stay within their comfort zone.
You started your career as a Marketing Data Scientist B2B SaaS? Better double down on what you already know, even if you discover you’re actually more interested in doing Product Analytics in Consumer Fintech. If you switch, you’ll have to learn a new industry from scratch and will be much more likely to mess up; why take that risk?

In my experience, perfectionism is especially common with highly analytical people or those with advanced academic backgrounds. And it’s becoming more and more common. However:

The hard, but necessary realization to succeed in a high-growth environment is that what got you to this point is not what will get you to the next level.

You might have gotten good grades and admitted to your target grad school program because you were able to deliver flawless work, spending months refining a single paper or project. But you will rarely get the opportunity to showcase this on the job.

It’s painful to deliver work thinking "I could have done a much more sophisticated version of that"; sometimes, you feel downright ashamed of the hacky solution you threw together. But it’s important to remember that the time you invest in a deliverable quickly hits diminishing returns:

Image by author

How to spot perfectionism and what to do about it

The first step in tackling perfectionism is to understand what type you’re dealing with. There are three types:

Self-oriented (you hold yourself to impossibly high standards)
Socially-prescribed (you feel that others require you to be perfect), and
Other-oriented (you hold others to an unrealistically high bar)

For example, if you realize that your perfectionism comes at least partially from (what you feel are) unrealistically high expectations from your manager, you might need to work with them to address this instead of just trying to shift your own mindset.

Given that perfectionism can stem from many factors, including early childhood experiences, it’s not realistic to provide a one-size-fits-all recipe to overcome it in a blog post. Therefore, I’ll focus on the different ways that perfectionism shows up in the workplace, and what you can do in these specific situations.

Symptom #1: Perfectionists are unable to keep up with the pace of the business

What this looks like: Perfectionist Data Scientists propose elaborate approaches that take months to yield results even when the company needs something in weeks. They’re unwilling to compromise and you often hear "That’s not possible".
If this is you: Remember that it’s your job as a Data Scientist to help the business get things done. Instead of saying "that’s not possible", provide a set of options with their respective timelines and highlight the trade-offs. This will allow the business to move forward while knowing the risk, and you will be able to "cover your ass".

What helped me: Don’t focus on how much better you could have made the deliverable, but how much worse off the project would be if you didn’t provide any input at all (which will happen if you are not fast enough).

If you’re dealing with this: Rather than asking people how long they will need, communicate a hard deadline and ask for what’s possible by that date. Make it clear if a directional analysis will be sufficient; often, what’s needed to move forward is much less rigorous and detailed than what people think.

Symptom #2: Perfectionists are uncomfortable making decisions with incomplete data

What this looks like: Perfectionist Data Scientists are often paralyzed when it comes to decision-making. They drag out decisions in the hope of getting more information or doing more analysis to de-risk their choice.
If this is you: Give a clear recommendation and then state your confidence level, and what will happen if you’re wrong. You should also add the key assumptions that your decision was based on; if 1) others disagree with the assumptions or 2) you get new information later that changes one of them, you will be able to adjust.

What helped me: Realize that we never have perfect information. Every decision is an educated guess to some degree, and research shows that we tend to overly regret the decisions we made.

If you’re dealing with this: Put people on the spot; ask for recommendations or decisions from your team rather than options. And foster a culture where decisions are judged by what was known at the time since it’s easy to pick holes into something in hindsight.

Symptom #3: Perfectionists often become blockers for others

What this looks like: Perfectionists pick endless holes in other people’s proposals without offering alternatives.
If this is you: Don’t try to enforce perfection across the company. Playing devil’s advocate and challenging each other is important, but it should be constructive. Treat projects as an optimization problem where you need to find the least bad solution under the given constraints (time, budget etc.)

What helped me: Pretend that if you criticize someone else’s proposal, you are now on the hook for solving the problem instead. This forced me to go from "This doesn’t make sense" to "Here’s what I would do instead".

If you’re dealing with this: Set a deadline to propose alternatives and reward solution-oriented thinking rather than people who solely point out problems.

Symptom #4: Perfectionists polish every single deliverable

What this looks like: Every single document or slide (even just personal notes or internal documentation) is impeccably formatted and designed.
If this is you: Focus your efforts on customer-facing deliverables and those going to executives. Any time you spend making some internal working document pretty is time that you could spend shipping more stuff.

What helped me: Try to think about it the other way around. Everyone will notice that you spent a lot of time polishing this internal deck instead of working on something impactful. In a fast-moving company, that actually looks worse than delivering a document that’s rough around the edges.

If you’re dealing with this: Lead by example; set a culture where screenshotted graphs from a dashboard with brief commentary are an acceptable way to create a slide. Don’t nitpick minor things like color or font choices.

Side note: That doesn’t mean you should submit something completely unformatted. Spending five minutes to make the document easy to digest (not necessarily pretty) is time well spent.

Image by author

Symptom #5: Perfectionists give too many details

What this looks like: Perfectionists add too many details in written and verbal communication. They are uncomfortable with simplifications and use extensive technical jargon.
If this is you: Focus on the key insights and put the supporting information in the appendix. And use plain English; you want people from different teams and backgrounds to be able to understand your work. You only realize impact as a DS if others understand the takeaways of your analysis.

What helped me: Don’t try to anticipate all questions and answer them preemptively. Put the most likely questions in an FAQ section and prepare to answer any remaining ones live; this actually makes you look more competent than including everything in your document.

If you’re dealing with this: Ask presenters for a five minute executive summary to force them to focus on the essentials. Then ask targeted follow-up questions as needed.

When to be pragmatic, and when not to

There is a time and a place for getting things 100% accurate, and there are instances where speed trumps perfection. But when should you be pragmatic, and when is it a bad idea?

Here are the factors you should consider to guide that decision:

Is the decision reversible? There are decisions that are one-way doors, and others that aren’t. You should spend the majority of your time analyzing the ones that are costly to reverse, and move with educated guesses on the others.
What is the expected financial cost of being wrong? Even if a decision is reversible, it might be costly to do so (e.g. wasted Eng resources, money spent on the wrong tool etc.). Decisions with a high cost to reverse should receive more scrutiny.
Is there a potential for reputational damage or legal consequences if you mess up? Having to walk back on a statement you made internally is awkward; admitting to regulators that you made a mistake can have serious consequences. As a rule of thumb, anything that goes to regulators, Wall Street, your board of directors or customers should receive the maximum amount of rigor.
How sensitive is the decision to the analysis? One very common mistake is to keep investing time in an analysis even if additional accuracy won’t change the decision. For example, if you want to estimate the potential revenue from a new business opportunity, it might be enough to know whether the opportunity is in the range of $100M or $1B to make a go or no-go decision.
Is this throwaway work? Investing time in work that will be used over long periods of time is more beneficial than analyses that are used for one-off decisions. Make ad-hoc analyses "good enough" for the problem at hand, and focus most of your efforts refining things that will be used broadly by internal or external customers.

Image by author

How to become more pragmatic

I’ve had to unlearn perfectionism myself. These mindset shifts have helped me do that:

Realize that even if you do things perfectly, you’ll still fail all the time. For example, just because you do a flawless analysis of your Total Addressable Market (TAM) doesn’t mean your market entry will be successful. The key success factor is to get more "shots on goal", so your time is better spent on trying more things rather than perfecting a single one.
Don’t focus on the things you got wrong, but the ratio of what you got right. If you’re right most of the time, it’s fine to be wrong some of the time. For example, Amazon’s leadership principle is "Leaders are right, a lot" (not "all the time").
Practice your judgment in low-stakes situations. Practice making judgement calls even when you’re not the decision maker. E.g. if you’re in a meeting where an executive is asked to decide, think about what you would do. Decision-making is like a muscle and is best trained in low-stakes scenarios.

Conclusion

Becoming more pragmatic is a journey; it takes time, so don’t expect to shift your mindset overnight. But it’s worth it; it will not only increase your impact, but also reduce your stress level as you will spend less time chasing elusive perfection.

For more hands-on analytics advice, consider following me here on Medium, on LinkedIn or on Substack.

The post Done is Better Than Perfect appeared first on Towards Data Science.

How to challenge your own analysis so others won’t

Torsten Walbaum — Wed, 03 Jul 2024 15:46:20 +0000

How to Challenge Your Own Analysis So Others Won’t

Image by author

Have you ever created an analysis only for it to be torn apart by your manager? Or have you ever gotten a question during a presentation that made you think "Why didn’t I check this beforehand?"

Sometimes it can feel like managers and executives have an uncanny ability for finding the one weak spot in your work. How did they identify the issue so quickly, especially if they are seeing your work for the first time?

What seems like a superpower can be learned by anyone, and this post will show you how.

By routinely applying "sanity checks" to your work, you can proactively identify the weak spots and ensure the result makes sense before you share it with a broader audience.

I am going to go over:

What sanity checks are and why they matter
How sanity checks are different from how most people check their work
How To do a sanity check
How to use sanity checks to increase your credibility
How to use AI to sanity check your work for you

We’ve got a lot to cover, so let’s jump in.

What is a sanity check, and why is it important?

Imagine you’re building a detailed model from scratch, carefully choosing each assumption and combining them to get to your final output (e.g. a forecast, company valuation etc.).

Each assumption seemed reasonable and you checked the math twice, so the output should be solid. Right? Right??

My experience over the last decade has been that more often than not, we miss the forest for the trees when building models or doing analyses. We layer so many assumptions on top of each other that the final result can quickly change from reasonable to ridiculous.

This is where sanity checks come into play: Sanity checks help us determine whether the result of our analysis falls has a good chance of being correct.

We’re all wrong from time to time. That’s fine; reality doesn’t always play out the way we expect. But you should try to be right most of the time.

Let’s dive into how to do that.

Get an email whenever Torsten Walbaum publishes.

How is sanity checking different?

When checking our work, most of the time we go through it step-by-step to check for errors. Are the cells linked correctly? Did I pull the formulas all the way down? Do all the joins in my SQL work correctly?

Image by author

This mechanical "Quality Control" approach of checking all inputs can help us find issues, but it doesn’t ensure that the output makes intuitive business sense.

Sanity checking, on the other hand, is about taking a step back and validating the output from a different angle. If you come to the same conclusion both ways, you can be much more confident in your work.

How to do a sanity check

There are three broad categories of sanity checks: Bottom-up vs. top-down, benchmarking and intuition. I will go through each of them in detail and show how you can apply them at work.

Bottom-Up vs. Top-Down

Our analyses are typically either top-down or bottoms-up. But what does that mean?

Image by author

Let’s look at a (simplified) example. Let’s say you work in a B2B SaaS company and want to figure out how many customers you can acquire for the new product you’re launching.

In the top-down approach, we are trying to understand what share of the market we can win. So we’d start by looking at the total number of businesses in the US, exclude industries that we are not targeting and company sizes we can’t support, assume what % of companies is looking to switch software providers and finally assume what share of those companies we can win (vs. our competitors)
In the bottom-up approach, we are trying to understand how many companies we can acquire based on the channels we have available. So we’d look at prior launches to figure out what lead volume we can get from LinkedIn, analyze keywords to determine expected SEM volume, project Email leads based on the number of companies we can target and expected conversion rates etc.

Both approaches can give us a directional idea of what we can expect, but they each have a crucial weakness.

The top-down approach does not consider how we are going to get these customers, while the bottom-up approach ignores the size of our target market.

As a result, the best way to sanity check your work is to combine top-down with bottom-up analysis.

Image by author

Benchmarking

The best way to make sure a plan or projection is reasonable is to compare it against benchmarks. For example, if you’re forecasting the performance of a new market, it helps to compare it against prior launches of similar countries.

If your analysis massively deviates from the benchmark, you need to be able to explain why.

A few common things you should check for any model, forecast or projection:

Magnitude: How do the final outputs compare to benchmarks? E.g. are you projecting that France will be a larger market for the company than the UK?
Growth assumptions: What trend are you forecasting over time? E.g. is the new product projected to grow more quickly compared to past launches?
Seasonality: Does your projection show the same repeating patterns as the benchmarks? E.g. if all other markets show a slowdown during the December holiday period, why does your projection for the new country not show this?

This doesn’t mean you always have to model everything in line with benchmarks; but you always need to be able to explain why something is deviating.

Benchmarking Example 1: New Market launch

Scenario: You plan to enter the UK market and forecast user growth

Sanity checks: You compare the forecast against the two most recent launches, Germany and France. Your forecast is more aggressive than the last two launches, and doesn’t show any seasonality.

Questions you need to be able to answer:

What gives you confidence that that’s possible? Is the UK structurally different (e.g. the market is larger)? Is our product a better fit for the UK market? Are we using a different go-to-market strategy?
Why is there no seasonality in France? Are holiday periods different? Do B2B buyers have different seasonal buying patterns?

If you are not able to make a strong case for why the new market is different from past launches, you are better off keeping the forecast similar.

Benchmarking Example 2: Marketing Plan

Scenario: You are forecasting Marketing spend and performance metrics by channel. The plan is to double the Marketing budget year-over-year.

Sanity check: You compare the projected marketing efficiency against past trends. Your forecast projects that efficiency (Cost per Lead) will improve as we increase Marketing spend, but past data shows the opposite trend.

Questions you need to be able to answer:

Why do you expect better efficiency?
What specific improvements are we deploying in each Marketing channel that will drive this?
Are we doing anything that would improve overall Marketing performance, e.g. investing in our Brand?

If you don’t have a concrete plan to improve Marketing efficiency, you should assume that the historical relationship between spend and efficiency holds.

Comparing against intuition

Many times, you can use common sense to sanity check your analysis. Your intuition is not always going to be right, but it often highlights potential issues that need further validation.

A few examples:

You build a Discounted Cash Flow financial model and the terminal growth rate is much higher than that of the Gross Domestic Product (GDP); this means you are implicitly assuming that the company will outperform the broader economy forever. Is that realistic?
You are building an Account Scoring model and it scores companies as "good fit" that Sales believes are a waste of time. This doesn’t mean Sales is right (you build the model to surface new insights, after all), but you should take their experience into account since counterintuitive outputs often highlight model weaknesses

Using sanity checks to increase your credibility

Sanity checks are not just there to prevent others from poking holes in your work. They also give you a tool to increase your credibility.

Instead of doing them behind the scenes and then sharing the improved work, share the sanity checks as well. By showing how you validated the output of your analysis, you will build trust with your audience. If you don’t share the sanity checks you did, the audience has to assume they have to check your work on the spot.

You can do this visually on a slide by showing how both top-down and bottom-up approaches get to the same result, or comparing your data against benchmarks.

But you can also do it verbally:

"We are planning to grow to 50 Email leads in the UK by October; that is based on similar conversion assumptions as in Canada, and corresponds to 3% monthly penetration of our Total Addressable Market for Email

How to use AI to sanity check your work

Sanity checking can be pretty time consuming; after all, you have to approach the same problem from multiple angles. Luckily, AI tools can save you a lot of time.

This is not a replacement for your sanity checking skills; ChatGPT needs your guidance to do a good job, so you still need to know how to perform a robust sanity check. The AI’s job is simply to do the heavy lifting for you, and to bring up a few points you might have missed.

Here is a step-by-step guide on how to do this with ChatGPT; all screenshots are from actual conversations I had where I asked ChatGPT to sanity check my forecast for a new market launch.

Disclaimer: Always check your employer’s policies on using AI tools like ChatGPT before uploading any proprietary data.

Step 1: Upload your work to ChatGPT

The first step is to upload the work that you want ChatGPT to sanity check. ChatGPT can handle a variety of file types, including PDFs, Excel, CSV files and more.

You can also integrate directly with several tools; e.g. in this example, I linked my Google Sheet that contained my forecast:

Image by author

Even if your actual model lives outside a spreadsheet (e.g. in Python), I recommend dumping the outputs in Google Sheets for the sanity check; after all, you want ChatGPT to validate your outputs, not the mechanics of your model.

For this example, I gave ChatGPT this simple Go-To-Market Forecast for a new country launch (you can make a copy and try your own sanity check with it).

I went through four sanity checks; here are the logs:

First attempt (Grade: Intern)
Second attempt (Grade: Intern)
Third attempt (Grade: First-year analyst)
Fourth attempt (Grade: Over-confident first-year analyst)

Attempt #1 and attempt #2 were okay, but I felt like I had to provide quite a lot of guidance and didn’t always get exactly what I wanted.
The third attempt was pretty good, but both ChatGPT and I forgot to dig into the Marketing channel mix (and when I remembered a few days later, ChatGPT was unable to continue where we left off).
Attempt #4 was promising, but when re-forecasting to include seasonality, it adjusted the wrong months at first.

You might have to try a few times until you get a really good performance. And remember:

Don’t blindly use anything that AI produces; it can (and will) make mistakes, and you’ll be on the hook for the end result. AI can give helpful input and save you time, but it’s not a replacement for critical thinking.

Step 2: Write a prompt asking ChatGPT to sanity check

After ingesting your file, you need to write a prompt asking ChatGPT to sanity check your work.

Here’s what I used to sanity check the Go-To-Market Plan I linked above:

Image by author

I’ve found that giving a little bit of context on the dataset is helpful (although not absolutely necessary). Also, don’t forget to say "please" and "thank you" in case the AI ever gets sentient; this will keep you off the naughty list.

You could also give ChatGPT this article, or another summary of how to do sanity checks, so you don’t have to include too many instructions in your prompt.

Step 3: Make sure ChatGPT ingested the data correctly

After ingesting the file, ChatGPT will often typically give you a brief summary of what it sees and what it believes the data represents, as well as the steps it will take for the analysis: