Torsten Walbaum, Author at Towards Data Science https://towardsdatascience.com The world’s leading publication for data science, AI, and ML professionals. Fri, 24 Jan 2025 11:52:28 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://towardsdatascience.com/wp-content/uploads/2025/02/cropped-Favicon-32x32.png Torsten Walbaum, Author at Towards Data Science https://towardsdatascience.com 32 32 Mastering Back-of-the-Envelope Math Will Make You a Better Data Scientist https://towardsdatascience.com/mastering-back-of-the-envelope-math-will-make-you-a-better-data-scientist-74316b96472a/ Wed, 23 Oct 2024 04:04:06 +0000 https://towardsdatascience.com/mastering-back-of-the-envelope-math-will-make-you-a-better-data-scientist-74316b96472a/ A quick and dirty answer is often more helpful than a fancy model

The post Mastering Back-of-the-Envelope Math Will Make You a Better Data Scientist appeared first on Towards Data Science.

]]>
On July 16, 1945, during the first nuclear bomb test conducted at Los Alamos, physicist Enrico Fermi dropped small pieces of paper and observed how far they moved when the blast wave reached him.

Based on this, he estimated the approximate magnitude of the yield of the bomb. No fancy equipment or rigorous measurements; just some directional data and logical reasoning.

About 40 seconds after the explosion the air blast reached me. I tried to estimate its strength by dropping from about six feet small pieces of paper before, during and after the passage of the blast wave. […] I estimated to correspond to the blast that would be produced by then thousand tons of T.N.T. — Enrico Fermi

This estimate turned out to be remarkably accurate considering how it was produced.

We’re forced to do quick-and-dirty approximations all the time. Sometimes we don’t have the data we need for a rigorous analysis, other times we simply have very little time to provide an answer.

Unfortunately, estimates didn’t come naturally to me. As a recovering perfectionist, I wanted to make my analyses as robust as possible. If I’m wrong and I took a quick-and-dirty approach, wouldn’t that make me look careless or incapable?

But over time, I realized that making a model more and more complex rarely leads to better decisions.

Why?

  1. Most decisions don’t require a hyper-accurate analysis; being in the right ballpark is sufficient
  2. The more complex you make the model, the more assumptions you layer on top of each other. Errors compound, and it becomes harder to make sense of the whole thing

Napkin math, back-of-the-envelope calculations: Whatever you want to call it, it’s how management consultants and BizOps folks cut through complexity and get to robust recommendations quickly.

And all they need is structured thinking and a spreadsheet.

My goal with this article is to make this incredibly useful technique accessible to everyone.

In this article, I will cover:

  • How to figure out how accurate your analysis needs to be
  • How to create estimates that are "accurate enough"
  • How to get people comfortable with your estimates

Let’s get into it.


Part 1: How accurate do you need to be?

Most decisions businesses make don’t require a high-precision analysis.

We’re typically trying to figure out one of four things:

Scenario 1: Can we clear a minimum bar?

Often, we only need to know if something is going to be better / larger / more profitable than X.

For example, large corporations are only interested in working on things that can move the needle on their top or bottom line. Meta does over $100B in annual revenue, so any new initiative that doesn’t have the potential to grow to a multi-billion $ business eventually is not going to get much attention.

Once you start putting together a simple back-of-the-envelope calculation, you’ll quickly realize whether your projections land in the tens of millions, hundreds of millions, or billions.

If your initial estimate is way below the bar, there is no point in refining it; the exact answer doesn’t matter at that point.

Other examples:

  • VCs trying to understand if the market opportunity for a startup is big enough to grow into a unicorn
  • You’re considering joining an early-stage company and are trying to understand if it can ever grow into its high valuation (e.g. AI or autonomous driving companies)

Get an email whenever Torsten Walbaum publishes.

Scenario 2: Can we stay below a certain level?

This scenario is the inverse of the one above.

For example, let’s say the CMO is considering attending a big industry conference last minute. He is asking whether the team will be able to pull together all the necessary pieces (e.g. a booth, supporting Marketing campaigns etc.) in time and within a budget of $X million .

To give the CMO an answer, it’s not that important by when exactly you’ll have all of this ready, or how much exactly this will cost. At the moment, he just needs to know whether it’s possible so that he can secure a slot for your company at the conference.

The key here is to use very conservative assumptions. If you can meet the timeline and budget even if things don’t go smoothly, you can confidently give green light (and then work on a more detailed, realistic plan).

Other examples:

  • Your manager wants to know if you have bandwidth to take on another project
  • You are setting a Service Level Agreement (SLA) with a customer (e.g. for customer support response times)

Scenario 3: How do we stack-rank things?

Sometimes, you’re just trying to understand if thing A is better than thing B; you don’t necessarily need to know exactly how good thing A is.

For example, let’s say you’re trying to allocate Engineering resources across different initiatives. What matters more than the exact impact of each project is the relative ranking.

As a result, your focus should be on making sure that the assumptions you’re making are accurate on a relative level (e.g. is Eng effort for initiative A higher or lower than for initiative B) and the methodology is consistent to allow for a fair comparison.

Other examples:

  • You’re trying to decide which country you should expand into next
  • You want to understand which Marketing channel you should allocate additional funds to

Scenario 4: What’s our (best) estimate?

Of course, there are cases where the actual number of your estimate matters.

For example, if you are asked to forecast the expected support ticket volume so that the Customer Support team can staff accordingly, your estimate will be used as a direct input to the staffing calculation.

In these cases, you need to understand 1) how sensitive the decision is to your analysis, and 2) whether it’s better if your estimate is too high or too low.

  • Sensitivity: Sticking with the staffing example, you might find that a support agent can solve 50 tickets per day. So it doesn’t matter if your estimate is off by a bunch of tickets; only once you’re off by 50 tickets or more, the team would have to staff one agent more or less.
  • Too high or too low: It matters in which direction your estimate is wrong. In the above example, being understaffed or overstaffed has different costs to the business. Check out my previous post on the cost of being wrong for a deep dive on this.

Part 2: How to create estimates that are "accurate enough"

You know how accurate you need to be – great. But how do you actually create your estimate?

You can follow these steps to make your estimate as robust as possible while minimizing the amount of time you spend on it:

Step 1: Building a structure

Let’s say you work at Netflix and want to figure out how much money you could make from adding games to the platform (if you monetized them through ads).

How do you structure your estimate?

The first step is to decompose the metric into a driver tree, and the second step is to segment.

Developing a driver tree

At the top of your driver tree you have "Games revenue per day". But how do you break out the driver tree further?

There are two key considerations:

1. Pick metrics you can find data for.

For example, the games industry uses standardized metrics to report on monetization, and if you deviate from them, you might have trouble finding benchmarks (more on benchmarks below).

2. Pick metrics that minimize confounding factors.

For example, you could break revenue into "# of users" and "Average revenue per user". The problem is that this doesn’t consider how much time users spend in the game.

To address this issue, we could split revenue out into "Hours played" and "$ per hour played" instead; this ensures that any difference in engagement between your games and "traditional" games does not affect the results.

You can then break out each metric further, e.g.:

  • "$ per hour played" could be calculated as "# ad impressions per hour" times "$ per ad impression"
  • "Hours played" could be broken out into "Daily Active Users (DAU)" and "Hours per DAU"

However, adding more detail is not always beneficial (more on that below).

Segmentation

In order to get a useful estimate, you need to consider the key dimensions that affect how much revenue you’ll be able to generate.

For example, Netflix is active in dozens of countries with vastly different monetization potential and to account for this, you can split the analysis by region.

Which dimensions are helpful in getting a more accurate estimate depends on the exact use case, but here are a few common ones to consider:

  • Geography
  • User demographics (age, device, etc.)
  • Revenue stream (e.g. ads vs. subscriptions vs. transactions)

"Okay, great, but how do I know when segmentation makes sense?"

There are two conditions that need to be true for a segmentation to be useful:

  1. The segments are very different (e.g. revenue per user in APAC is multiple times less than in the US)
  2. You have enough information to make informed assumptions for each segment

You also need to make sure the segmentation is worth the effort. In practice, you’ll often find that only one or two metrics are materially different between segments.

Here’s what you can do in that case to get a quick-and-dirty answer:

Instead of creating multiple separate estimates, you can calculate a blended average for the metric that has the biggest variance across segments.

So if you expect "$ per hour played" to vary substantially across regions, you 1) make an assumption for this metric for each region (e.g. by getting benchmarks, see below) and 2) estimate what the country mix will be:

Image by author
Image by author

You then use that number for your estimate, eliminating the need to segment.

How detailed should you get?

If you have solid data to base your assumptions on, adding more detail to your analysis can improve the accuracy of your estimate; but only up to a point.

Besides increasing the effort required for the analysis, adding more detail can result in false precision.

Image by author
Image by author

So what falls into the "too much detail" bucket? For the sake of a quick and dirty estimation, this would include things like:

  • Segmenting by device type (Smart TV vs. Android vs. iOS)
  • Considering different engagement levels by day of week
  • Splitting out CPMs by industry
  • Modeling the impact of individual games
  • etc.

Adding this level of detail would increase the number of assumptions exponentially without necessarily making the estimate more accurate.

Step 2: Putting numbers against each metric

Now that you have the inputs to your estimate laid out, it’s time to start putting numbers against them.

Internal data

If you ran an experiment (e.g. you rolled out a prototype for "Netflix games" to some users) and you have results you can use for your estimate, great. But a lot of the time, that’s not the case.

In this case, you have to get creative. For example, let’s say that to estimate our DAU for games, we want to understand how many Netflix users might see and click on the games module in their feed.

To do this, you can compare it against other launches with similar entry points:

  • What other new additions to the home screen did you launch recently?
  • How did their performance differ depending on their location (e.g. the first "row" at the top of the screen vs. "below the fold" where you have to scroll to find it)?

Based on the last few launches, you can then triangulate the expected click-through-rate for games:

Image by author
Image by author

These kind of relationships are often close enough to linear (within a reasonable range) so that this type of approximation yields useful results.

Once you get some actual data from an experiment or the launch, you can refine your assumptions.

External benchmarks

External benchmarks (e.g. industry reports, data vendors) can be helpful to get the right ballpark for a number if internal data is unavailable.

There are a few key considerations:

  1. Pick the closest comparison. For example, casual games on Netflix are closer to mobile games than PC or console games, so pick benchmarks accordingly
  2. Make sure your metric definitions are aligned. Just because a metric in an external report sounds similar doesn’t mean it’s identical to your metric. For example, many companies define "Daily Active Users" differently.
  3. Choose reputable, transparent sources. If you search for benchmarks, you will come across a lot of different sources. Always try to find an original source that uses (and discloses!) a solid methodology (e.g. actual data from a platform rather than surveys). Bonus points if the report is updated regularly so that you can refresh your estimate in the future if necessary.

Deciding on a number

After looking at internal and external data from different sources, you will likely have a range of numbers to choose from for each metric.

Take a look at how wide the range is; this will show you which inputs move the needle on the answer the most.

For example, you might find that the CPM benchmarks from different reports are very similar, but there is a very wide range for how much time users might spend playing your games on a daily basis.

In this case, your focus should be on fine-tuning the "hours played" assumption:

  1. If there is a minimum amount of revenue the business wants to see to invest in games, see if you can reach that level with the most conservative assumption
  2. If there is no minimum threshold, try to use sanity checks to determine a realistic level.

For example, you could compare the play time you’re projecting for games against the total time users currently spend on Netflix.

Even if some of the time is incremental, it’s unrealistic that more than, say, 5% – 10% of the total time is spent on games (most of the users came to Netflix for video content, and there are better gaming offerings out there, after all).


Part 3: How to get people comfortable with your estimates

If you’re doing a quick-and-dirty estimate, people don’t expect it to be perfectly accurate.

However, they still want to understand what it would take for the numbers to be so different that they would lead to a different decision or recommendation.

A good way to visualize this is a sensitivity table.

Let’s say the business wants to reach at least $500k in ad revenue per day to even think about launching games. How likely are you to reach this?

On the X and Y axis of the table, you put the two input metrics that you feel least sure about (e.g. "Daily Active Users (DAU)" and "Time Spent per DAU"); the values in the table represent the number you’re estimating (in this case, "Games revenue per day").

Image by author
Image by author

You can then compare your best estimate against the minimum requirement of the business; for example, if you’re estimating 30M DAU and 0.3 hours of play time per DAU, you have a comfortable buffer to be wrong on either assumption.

Closing thoughts

While it’s called napkin math, three lines scribbled on a cocktail napkin are rarely enough for a solid estimate.

However, you also don’t need a full-blown 20-tab model to get a directional answer; and often, that directional answer is all you need to move forward.

Once you get comfortable with rough estimates, they allow you move faster than others who are still stuck in analysis paralysis. And with the time you save, you can tackle another project – or go home and do something else.


For more hands-on Analytics advice, consider following me here on Medium, on LinkedIn or on Substack.

The post Mastering Back-of-the-Envelope Math Will Make You a Better Data Scientist appeared first on Towards Data Science.

]]>
There’s a right way to be wrong https://towardsdatascience.com/theres-a-right-way-to-be-wrong-05b5c0ece56c/ Mon, 16 Sep 2024 22:56:23 +0000 https://towardsdatascience.com/theres-a-right-way-to-be-wrong-05b5c0ece56c/ How to incorporate business context into your predictions via cost of false positives & negatives

The post There’s a right way to be wrong appeared first on Towards Data Science.

]]>
There’s a Right Way to Be Wrong

How to make better predictions by incorporating business context and the cost of being wrong

Image by author (via Midjourney)
Image by author (via Midjourney)

One day, my mother had vision problems in her left eye and went to the doctor. The doctor did a quick examination and concluded there was nothing wrong; the eyes were just getting worse with old age.

Shortly after, the symptoms got worse and my mom got a second opinion. Turns out she had a retinal detachment, and the delay in treatment caused permanent damage.

People make mistakes at work all the time; you can’t be right in 100% of the cases. But some mistakes are costlier than others and we need to account for that.

If the doctor had said "there might be something there" and sent my mother for further tests, there would have been a chance they all come back negative and it’s nothing after all. But the cost of being wrong in that case would have only been a waste of time and some medical resources, not permanent damage to an organ.

The medical field is an extreme example, but the same logic applies in jobs like Data Science, BizOps, Marketing or Product as well:

We should take the consequences, or cost, of being wrong into account when making predictions.

For example, if you work at Uber and are trying to predict demand (which you’re never going to do 100% accurately), would you rather end up with too many or too few drivers in the marketplace?

Unfortunately, in my experience, these conversations between business stakeholders and data scientists rarely happen. Let’s try to change that.

In this post, I will cover:

  • The different ways we make wrong predictions
  • How to make sure you are wrong the "right" way
  • 4 Real-life examples to get you thinking about how you want to be wrong

Get an email whenever Torsten Walbaum publishes.


The different ways we are wrong

When we make predictions, we are typically either trying to:

  • Predict a category or outcome (e.g. users that will churn vs. those that won’t); this is called "classification"
  • Forecast a number (e.g. sales for the next year)

Let’s look at what it means to be right or wrong in each case.

Predicting a category or outcome

In a so-called classification problem, being wrong means we assign the wrong label to something. For simplicity, we are going to focus on problems with only two possible outcomes (binary classification).

For example, let’s say we’re trying to predict whether a prospect is going to buy from us. There are four outcomes:

  1. We predict a prospect will buy, and they do (True Positive)
  2. We predict a prospect will buy, but they don’t (False Positive)
  3. We predict they won’t buy, but they do (False Negative)
  4. We predict they won’t buy, and they don’t (True Negative)

2 (False Positive) and #3 (False Negative) are the two ways we can be wrong.

We can put our predictions into a so-called Error Matrix (or Confusion Matrix) to see how we did:

Image by author
Image by author

There are three important metrics that help us understand how often, and in what way, we were wrong:

  • Our Accuracy tells us how many predictions overall were correct; it is calculated as the sum of our correct predictions (True Positives + True Negatives) divided by the total number of predictions
  • Our Precision tells us how many of our positive predictions were correct. I.e. out of all prospects we said would buy from us, how many actually did? It is the number of True Positives divided by all positive predictions (True Positives + False Positives)
  • Our Recall tells us how many of the relevant outcomes we correctly predicted, or in other words, how sensitive our model is. I.e. out of all prospects that ended up buying from us, how many did we identify in our prediction? It is calculated as True Positives divided by True Positives plus False Negatives

So now you have three different measures telling you how accurate your prediction is. Which one should you optimize for?

We will get to that in a second.

Forecasting a number

When we’re forecasting a number like sales, it’s a bit simpler. Our forecast is either above or below the actual number (or on target – just kidding, that one never happens 😭 ).

There are different ways you can measure the accuracy of your forecast here; the most popular ways are likely the Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE).

Image by author
Image by author

The problem: By default, these measures of accuracy treat over-forecasting and under-forecasting as equally bad. In reality, that’s rarely the case, though:

For example, if you’re forecasting inventory needs it’s very different to over-predict (and have a few too many items in the warehouse) than to under-predict (and run out of stock, costing you valuable sales).

The key question here is: For the business use case you are trying to forecast, would you rather be too high or too low?


How to ensure you are wrong the "right" way

Just because a prediction model has good accuracy doesn’t mean it’s doing a good job at what you want it to do.

Here’s a real example I’ve encountered to illustrate this point:

Let’s say you work in Marketing or Sales and want to predict which leads will result in successful deals. You train a simple model on historical data and can’t believe your eyes. 90% accuracy! On the first try!

But then you look at the results in more detail and see the following:

Image by author
Image by author

Our accuracy is 90%, but the model failed to identify any of the deals that ultimately turned into won customers. Because the majority of leads never convert, the model can achieve high accuracy by simply predicting that not a single lead will convert.

Similarly, you might have a forecasting model for predicting inventory needs that looks pretty good on paper as it has a fairly high accuracy. But if you look more closely, you notice that it almost always slightly under-predicts and your stores run out of inventory at the end of the day as a result.

Obviously, models like that are not very useful.

There are two main challenges you face when making these types of predictions:

  1. Your forecasting model doesn’t have business context; i.e. if you don’t explicitly tell it, the model doesn’t "know" the cost of being wrong one way or the other
  2. In classification problems, the events we want to predict are often rare

The good news is that you can address both issues.

If you are on the Business or Product side, this is a great opportunity to get closer to the work of your Data Science counterparts; and if you’re a DS, this is one of those situations where business context is absolutely crucial to delivering a useful analysis.

Step 1: Understanding the cost of being wrong

The first step in making a prediction model cost-aware is to understand that cost.

Here are some of the key types of costs to consider:

  • Direct costs: In many cases, a wrong prediction directly causes a financial cost to the business. For example, if a company fails to identify a fraudulent transaction, they might have to eat the resulting costs.
  • Opportunity cost: If your predictions are wrong, you are often not using your resources efficiently. For example, if you predict a very high volume of support tickets and staff accordingly, your support agents will be idle when fewer tickets come in than you forecasted.
  • Lost revenue: A misclassification or other wrong prediction can cause you to miss sales. For example, you decide not to send a promotion to a customer because your model predicted they wouldn’t be interested, but in reality they would have made a purchase if they had gotten the promotional email.
  • Unsubscribes & churn: On the flip side, there are plenty of scenarios where a wrong prediction can annoy users or customers and cause them to churn. Sticking with the above example, if you send too many emails or push notifications because you think users might be interested in these promotions, they might unsubscribe from these comms channels or even churn.

You can have multiple types of cost (e.g. direct costs and lost revenue) at the same time.

Add up all of the applicable individual cost factors to determine the total cost of being wrong in a certain scenario (i.e. a False Positive or False Negative in classification or over- and under-predicting when forecasting a number).

Step 2: Making your prediction model "cost-aware"

Classification (predicting a category or outcome)

Once you have calculated the cost of each type of error (False Positive & False Negative), you can calculate the expected total cost of being wrong for your forecasting model.

You get this overall cost by multiplying the likelihood of each type of error with the cost of that type of error and summing it all up:

Image by author
Image by author

Mathematically, your goal should be to minimize this overall cost. But to do this, you need to make sure your prediction model actually takes this cost into account.

There are many ways to make your classification model cost aware, but I’m going to cover the three main ones I find most useful in practice:

1. Adjusting the classification threshold

Many classification models like Logistic Regression don’t actually output a classification, but rather a probability of an event occurring (so-called probabilistic models).

So sticking with the previous example, if you’re predicting which transactions are fraudulent, the model doesn’t actually directly say "this is a fraud transaction", but rather "this transaction has a [X%] probability of being fraud".

In a second step, the transactions are then put into two buckets based on their probability:

  • Bucket 1: Fraudulent transactions
  • Bucket 2: Regular (non-fraudulent) transactions

By default, the threshold is 50%; so all the transactions with a probability > 50% are going into Bucket 1, the rest into Bucket 2.

However, you can change this threshold. For example, you could decide a 20% probability is enough that you want to flag a transaction as fraud.

Why would you do this?

It goes back to the cost of being wrong. Missing a fraudulent transaction is much more costly to the company than flagging a transaction as fraud that turns out to be fine; in the latter case, you just have minor costs of an additional manual review by a human and a delay in the transaction which might annoy the customer.

How do you decide which threshold to choose?

You have two options:

Option 1: Use the Precision-Recall Curve

The so-called Precision-Recall Curve shows you the Precision (how many of our positive predictions were correct) and Recall (how many relevant outcomes we successfully identified) at different thresholds.

It’s a trade-off: The more fraudulent transactions you want to detect (higher Recall), the more false alarms you’ll have (lower Precision). You can align with your business partners on a point on the curve that you are comfortable with. For example, they might have a minimum share of fraudulent transactions they want to catch.

To make sure your choice is reasonable, you can then calculate the total cost of being wrong for your chosen point per the formula at the beginning of this chapter.

Image by author
Image by author

Option 2: Calculate the cost-minimizing threshold

Based on the cost of a False Positive and False Negative, you can calculate the threshold that minimizes the overall cost.

For a well-calibrated model, this is:

The Foundations of Cost-Sensitive Learning, C. Elkan (2001)
The Foundations of Cost-Sensitive Learning, C. Elkan (2001)

An example:

If your cost of a False Negative (e.g. failing to identify a fraudulent transaction) __ is $9 and that of a False Positive (e.g. falsely flagging a transaction as fraud) is $1, then the _Ideal Threshol_d would be 1 / (9 + 1) = 1 / 10 = 0.1.

So even if the chance that something is fraud is only slightly above 10%, you would want to flag the transaction and have a human review it.

2. Rebalancing your training dataset

As mentioned above, the event we’re trying to predict is often underrepresented in the data that we’re training our model on (these events are called the "minority class").

At the same time, failing to identify these events usually has a very high cost to the business.

A few examples:

  • We try to predict which users will churn in the next X days, but few users actually churn in a given timeframe
  • You predict whether a certain transaction is fraudulent or not, but fraud is the absolute exception
  • We try to predict whether there are signs of cancer on scan, but most patients are healthy

Failing to detect these rare events is typically very costly, but most models struggle to perform well with such imbalanced datasets. They simply don’t have enough examples to learn from.

To deal with this issue, you can oversample the examples of the underrepresented class so that the positive events are more equally represented. In plain English: You are creating more instances of the rare event so that the model has an easier time training to detect them.

You either do this by duplicating existing examples of the minority class in your training data set, or by creating synthetic ones.

Alternatively, you could also reduce instances of the majority class to balance things out (undersampling) Both oversampling and undersampling come with challenges, which are beyond the scope of this article. I linked some resources for further reading at the end.

Image by author
Image by author

How do you rebalance your dataset to minimize the cost of being wrong?

If your classifier is using a standard threshold of 0.5, you can calculate by what factor you need to multiply the number of majority examples with the following formula:

The Foundations of Cost-Sensitive Learning, C. Elkan (2001)
The Foundations of Cost-Sensitive Learning, C. Elkan (2001)

An example:

So using the same values as in the example above ($1 cost of falsely flagging a transaction as fraud, $9 cost of missing a fraudulent transaction), the factor would be 1 / 9 = 0.1111.

In other words, we would scale down the number of majority examples in the training set by a factor of 9 (undersampling).

3. Modifying the weights for each class

Many Machine Learning models allow you to adjust the weights assigned to each class.

What does that do? Let’s say again we’re predicting fraudulent transactions in our app. Our model tries to minimize misclassifications. By default, failing to identify fraud and falsely flagging a normal transaction as fraud are treated as equally bad.

If we assign a higher weight to the minority class (our fraudulent transactions) though, we essentially make mistakes in this class (i.e. failing to identify fraud) more costly and thus incentivize the model to make fewer of them.

Forecasting a number

We’ve talked a lot about how you can make your classification model cost-aware, but not all Predictions are classifications.

What if you’re forecasting how much inventory you need or how many organic leads you expect for your Sales team? Forecasting too high or too low in these scenarios has very different consequences for the business, but as discussed earlier, most widely-used forecasting methods ignore this business context.

Enter: Quantile Forecasts.

Traditional forecasts typically try to provide a "best estimate" that minimizes the overall forecast error. Quantile Forecasts, on the other hand, allow you to define how "conservative" you want to be.

The "quantile" you choose represents the probability that the actual value will land below the forecasted value. For example, if you want to be sure that you don’t under-predict, you can forecast at the 90th percentile, and the actual value is expected to be lower than your forecast 90% of the time.

In other words, you are assigning a higher cost __ to under-forecasting compared to over-forecasting.


4 real-life examples

That was a lot of theory; let’s put it into practice.

Here are some real-life scenarios in which you’ll have to make sure you are picking the "right" way to be wrong.

Example 1: Lead scoring in B2B Marketing & Sales

🤔 The problem:

Type: Predicting an outcome (classification problem)

You work at a B2B SaaS company and want to figure out which leads are going to "close", i.e. turn into customers. The high-potential leads your model identifies will receive special attention from Sales reps and be targeted with additional Marketing efforts.

💸 The cost of being wrong:

  • False Positive: If you falsely flag a lead as "high potential", you are 1) wasting the time of your Sales reps and 2) wasting the money spend on high-touch Marketing campaigns (e.g. gifts).
  • False Negative: If you fail to identify a high potential lead, you have a lower chance of closing the deal since you’re not deploying your best tactics, leading to lost revenue.

⚙ What to optimize for:

By default, your model will flag very few leads as "high potential" because most leads in the training data never closed.

However, since losing a deal in B2B SaaS is typically much more costly than wasting some resources on working an unsuccessful deal (esp. if you target Mid Market or Enterprise companies with larger deal sizes), you’ll want to tune your model to massively penalize False Negatives.

Example 2: Inventory for a sales promotion

🤔 The problem:

Type: Forecasting a number

You work at an E-Commerce company and want to predict demand for your most important products for a big upcoming event (e.g. Amazon Prime Day).

💸 The cost of being wrong:

  • Over-forecasting: If you overestimate the demand, you’ll have excess inventory. The cost consists of storage costs in the warehouse, plus you might have to write off the value of the items if you can’t sell them later.
  • Under-forecasting: If you under-predict the demand, you’ll run out of inventory and miss out on sales. Plus, your reputation will be tarnished as customers value constant availability and reliability in E-Commerce, pot. leading to churn.

⚙ What to optimize for:

The cost of running out of inventory is much higher than having some excess (non-perishable) stock. As a result, you’ll want to minimize the odds of this happening.

You can use a Quantile Forecast to decide exactly how much inventory risk you’re willing to take. Do you want to have a 50%, 70% or 90% chance of having sufficient stock to meet demand?

Example 3: Email promotions

🤔 The problem:

Type: Predicting an outcome (classification problem)

You’re planning to run a new type of Email marketing campaign and are trying to figure out which users you should target.

💸 The cost of being wrong:

  • False positive: If you opt someone into the campaign and they don’t find it relevant, they might unsubscribe. The cost is that you can’t email them anymore with other campaigns and might lose out on future engagement or sales
  • False negative: If you don’t opt someone in that would have found the campaign relevant, you are leaving near-term user engagement or sales on the table

As you can see, the trade-off here is between short-term benefits and pot. negative long-term consequences.

⚙ What to optimize for:

This case is less clear-cut. You would have to quantify the "lifetime value" of being able to email a user and then tune your model so that the expected lost future revenue from unsubscribes equals the expected short-term gains from the email campaign itself.

Example 4: Sales hiring

🤔 The problem:

Type: Forecasting a number

You work in a B2B company. You are launching a new market and are forecasting the expected number of qualified opportunities to figure out how many Sales reps you should hire.

💸 The cost of being wrong:

  • Over-forecasting: If your forecast is too high, you are hiring more sales reps than you can "feed". The reps will be unable to hit their quota, morale will tank, and you will eventually have to let people go.
  • Under-forecasting: If your forecast is too low, you end up with more opportunities per Sales rep than expected. Up to a certain level, this excess volume can be "absorbed" (reps will spend less time on Outbound, managers and reps from other markets can pitch in etc.). Only if you massively under-predicted, you will start leaving money on the table.

⚙ What to optimize for:

Over-forecasting, and as a result over-hiring, is extremely costly. Since the business has more flexible ways to deal with under-forecasting, you’ll want to choose a Quantile Forecast where the actual deal volume lands above the forecast the majority of the time.


In conclusion

A prediction is only as useful as the decisions it enables.

If you don’t incorporate real-life context into your analysis, it doesn’t matter how sophisticated your model is. The output will be suboptimal at best, harmful at worst.

When Data Science and Business / Product stakeholders partner, however, to calculate the cost of being wrong and tune the model to reflect the priorities of the business, predictions become an incredibly useful tool.


For more hands-on analytics advice, consider following me here on Medium, on LinkedIn or on Substack.

The post There’s a right way to be wrong appeared first on Towards Data Science.

]]>
Done is Better Than Perfect https://towardsdatascience.com/done-is-better-than-perfect-e055d5993fe7/ Tue, 30 Jul 2024 14:30:12 +0000 https://towardsdatascience.com/done-is-better-than-perfect-e055d5993fe7/ How to be more pragmatic as a Data Scientist, and why it matters for your career

The post Done is Better Than Perfect appeared first on Towards Data Science.

]]>
You’re good at your job and you pride yourself in knowing the ideal way to do things. And since you want to raise the bar, you hold others to the same standard. This will surely get you noticed and promoted, right?

But then you get passed over for promotion and when you look around, you notice that the people that do get promoted are delivering work that’s much less rigorous than yours. Can people not tell the difference, or what’s going on?

If you’re a high performer, it’s easy to slide into perfectionism. It starts early: School and college train us in scientific methods, and anything that deviates from the ideal solution gets point deductions.

Image by author
Image by author

This academic approach is often carried over into the workplace, especially in rigorous fields like Data Science & Analytics.

However, the reality is: In high-growth companies, getting stuff done is more important than perfection. If you can’t deliver results at the speed at which the business needs them, it will move forward without you.

This post will show you how to prevent that from happening.

I will cover:

  1. Why Perfectionism is holding you back in your career
  2. How to spot perfectionism and what to do about it
  3. When to be pragmatic and when not to
  4. How to become more pragmatic

Get an email whenever Torsten Walbaum publishes.


Why perfectionism is holding you back

At the surface level, perfectionism sounds good: You strive for excellence based on your intrinsic desire for perfection. Nothing you produce will ever make your manager or the company look bad.

But perfectionism can become a major blocker to your career progress:

1. Perfectionism lowers your output.

  • Studies show that humans assign higher value to short-term outcomes compared to long-term outcomes. That’s why we have trouble saving for retirement when we can use that money to go on a vacation right now.
  • At work, this means perfectionists try to minimize the chance of making a mistake (that would result in immediate negative consequences) and end up spending too much time polishing deliverables. This results in lower output which, in turn, makes it harder to get promoted.

2. Perfectionism limits your growth opportunities.

  • Perfectionists do whatever they can to minimize the possibility of mistakes. The natural consequence of this is that they tend to stay within their comfort zone.
  • You started your career as a Marketing Data Scientist B2B SaaS? Better double down on what you already know, even if you discover you’re actually more interested in doing Product Analytics in Consumer Fintech. If you switch, you’ll have to learn a new industry from scratch and will be much more likely to mess up; why take that risk?

In my experience, perfectionism is especially common with highly analytical people or those with advanced academic backgrounds. And it’s becoming more and more common. However:

The hard, but necessary realization to succeed in a high-growth environment is that what got you to this point is not what will get you to the next level.

You might have gotten good grades and admitted to your target grad school program because you were able to deliver flawless work, spending months refining a single paper or project. But you will rarely get the opportunity to showcase this on the job.

It’s painful to deliver work thinking "I could have done a much more sophisticated version of that"; sometimes, you feel downright ashamed of the hacky solution you threw together. But it’s important to remember that the time you invest in a deliverable quickly hits diminishing returns:

Image by author
Image by author

How to spot perfectionism and what to do about it

The first step in tackling perfectionism is to understand what type you’re dealing with. There are three types:

  1. Self-oriented (you hold yourself to impossibly high standards)
  2. Socially-prescribed (you feel that others require you to be perfect), and
  3. Other-oriented (you hold others to an unrealistically high bar)

For example, if you realize that your perfectionism comes at least partially from (what you feel are) unrealistically high expectations from your manager, you might need to work with them to address this instead of just trying to shift your own mindset.

Given that perfectionism can stem from many factors, including early childhood experiences, it’s not realistic to provide a one-size-fits-all recipe to overcome it in a blog post. Therefore, I’ll focus on the different ways that perfectionism shows up in the workplace, and what you can do in these specific situations.

Symptom #1: Perfectionists are unable to keep up with the pace of the business 🚀

  • What this looks like: Perfectionist Data Scientists propose elaborate approaches that take months to yield results even when the company needs something in weeks. They’re unwilling to compromise and you often hear "That’s not possible".
  • If this is you: Remember that it’s your job as a Data Scientist to help the business get things done. Instead of saying "that’s not possible", provide a set of options with their respective timelines and highlight the trade-offs. This will allow the business to move forward while knowing the risk, and you will be able to "cover your ass".

What helped me: Don’t focus on how much better you could have made the deliverable, but how much worse off the project would be if you didn’t provide any input at all (which will happen if you are not fast enough).

  • If you’re dealing with this: Rather than asking people how long they will need, communicate a hard deadline and ask for what’s possible by that date. Make it clear if a directional analysis will be sufficient; often, what’s needed to move forward is much less rigorous and detailed than what people think.

Symptom #2: Perfectionists are uncomfortable making decisions with incomplete data 📊

  • What this looks like: Perfectionist Data Scientists are often paralyzed when it comes to decision-making. They drag out decisions in the hope of getting more information or doing more analysis to de-risk their choice.
  • If this is you: Give a clear recommendation and then state your confidence level, and what will happen if you’re wrong. You should also add the key assumptions that your decision was based on; if 1) others disagree with the assumptions or 2) you get new information later that changes one of them, you will be able to adjust.

What helped me: Realize that we never have perfect information. Every decision is an educated guess to some degree, and research shows that we tend to overly regret the decisions we made.

  • If you’re dealing with this: Put people on the spot; ask for recommendations or decisions from your team rather than options. And foster a culture where decisions are judged by what was known at the time since it’s easy to pick holes into something in hindsight.

Symptom #3: Perfectionists often become blockers for others 🚫

  • What this looks like: Perfectionists pick endless holes in other people’s proposals without offering alternatives.
  • If this is you: Don’t try to enforce perfection across the company. Playing devil’s advocate and challenging each other is important, but it should be constructive. Treat projects as an optimization problem where you need to find the least bad solution under the given constraints (time, budget etc.)

What helped me: Pretend that if you criticize someone else’s proposal, you are now on the hook for solving the problem instead. This forced me to go from "This doesn’t make sense" to "Here’s what I would do instead".

  • If you’re dealing with this: Set a deadline to propose alternatives and reward solution-oriented thinking rather than people who solely point out problems.

Symptom #4: Perfectionists polish every single deliverable 🎁

  • What this looks like: Every single document or slide (even just personal notes or internal documentation) is impeccably formatted and designed.
  • If this is you: Focus your efforts on customer-facing deliverables and those going to executives. Any time you spend making some internal working document pretty is time that you could spend shipping more stuff.

What helped me: Try to think about it the other way around. Everyone will notice that you spent a lot of time polishing this internal deck instead of working on something impactful. In a fast-moving company, that actually looks worse than delivering a document that’s rough around the edges.

  • If you’re dealing with this: Lead by example; set a culture where screenshotted graphs from a dashboard with brief commentary are an acceptable way to create a slide. Don’t nitpick minor things like color or font choices.

Side note: That doesn’t mean you should submit something completely unformatted. Spending five minutes to make the document easy to digest (not necessarily pretty) is time well spent.

Image by author
Image by author

Symptom #5: Perfectionists give too many details 🔬

  • What this looks like: Perfectionists add too many details in written and verbal communication. They are uncomfortable with simplifications and use extensive technical jargon.
  • If this is you: Focus on the key insights and put the supporting information in the appendix. And use plain English; you want people from different teams and backgrounds to be able to understand your work. You only realize impact as a DS if others understand the takeaways of your analysis.

What helped me: Don’t try to anticipate all questions and answer them preemptively. Put the most likely questions in an FAQ section and prepare to answer any remaining ones live; this actually makes you look more competent than including everything in your document.

  • If you’re dealing with this: Ask presenters for a five minute executive summary to force them to focus on the essentials. Then ask targeted follow-up questions as needed.

When to be pragmatic, and when not to

There is a time and a place for getting things 100% accurate, and there are instances where speed trumps perfection. But when should you be pragmatic, and when is it a bad idea?

Here are the factors you should consider to guide that decision:

  • ♻ Is the decision reversible? There are decisions that are one-way doors, and others that aren’t. You should spend the majority of your time analyzing the ones that are costly to reverse, and move with educated guesses on the others.
  • 💰 What is the expected financial cost of being wrong? Even if a decision is reversible, it might be costly to do so (e.g. wasted Eng resources, money spent on the wrong tool etc.). Decisions with a high cost to reverse should receive more scrutiny.
  • ⚖ Is there a potential for reputational damage or legal consequences if you mess up? Having to walk back on a statement you made internally is awkward; admitting to regulators that you made a mistake can have serious consequences. As a rule of thumb, anything that goes to regulators, Wall Street, your board of directors or customers should receive the maximum amount of rigor.
  • 📈 How sensitive is the decision to the analysis? One very common mistake is to keep investing time in an analysis even if additional accuracy won’t change the decision. For example, if you want to estimate the potential revenue from a new business opportunity, it might be enough to know whether the opportunity is in the range of $100M or $1B to make a go or no-go decision.
  • 🗑 Is this throwaway work? Investing time in work that will be used over long periods of time is more beneficial than analyses that are used for one-off decisions. Make ad-hoc analyses "good enough" for the problem at hand, and focus most of your efforts refining things that will be used broadly by internal or external customers.
Image by author
Image by author

How to become more pragmatic

I’ve had to unlearn perfectionism myself. These mindset shifts have helped me do that:

  • Realize that even if you do things perfectly, you’ll still fail all the time. For example, just because you do a flawless analysis of your Total Addressable Market (TAM) doesn’t mean your market entry will be successful. The key success factor is to get more "shots on goal", so your time is better spent on trying more things rather than perfecting a single one.
  • Don’t focus on the things you got wrong, but the ratio of what you got right. If you’re right most of the time, it’s fine to be wrong some of the time. For example, Amazon’s leadership principle is "Leaders are right, a lot" (not "all the time").
  • Practice your judgment in low-stakes situations. Practice making judgement calls even when you’re not the decision maker. E.g. if you’re in a meeting where an executive is asked to decide, think about what you would do. Decision-making is like a muscle and is best trained in low-stakes scenarios.

Conclusion

Becoming more pragmatic is a journey; it takes time, so don’t expect to shift your mindset overnight. But it’s worth it; it will not only increase your impact, but also reduce your stress level as you will spend less time chasing elusive perfection.


For more hands-on analytics advice, consider following me here on Medium, on LinkedIn or on Substack.

The post Done is Better Than Perfect appeared first on Towards Data Science.

]]>
How to challenge your own analysis so others won’t https://towardsdatascience.com/how-to-challenge-your-own-analysis-so-others-wont-b3745919d098/ Wed, 03 Jul 2024 15:46:20 +0000 https://towardsdatascience.com/how-to-challenge-your-own-analysis-so-others-wont-b3745919d098/ Master the art of sanity checks to level up the quality of your work

The post How to challenge your own analysis so others won’t appeared first on Towards Data Science.

]]>
How to Challenge Your Own Analysis So Others Won’t
Image by author
Image by author

Have you ever created an analysis only for it to be torn apart by your manager? Or have you ever gotten a question during a presentation that made you think "Why didn’t I check this beforehand?"

Sometimes it can feel like managers and executives have an uncanny ability for finding the one weak spot in your work. How did they identify the issue so quickly, especially if they are seeing your work for the first time?

What seems like a superpower can be learned by anyone, and this post will show you how.

By routinely applying "sanity checks" to your work, you can proactively identify the weak spots and ensure the result makes sense before you share it with a broader audience.

I am going to go over:

  • What sanity checks are and why they matter
  • How sanity checks are different from how most people check their work
  • How To do a sanity check
  • How to use sanity checks to increase your credibility
  • How to use AI to sanity check your work for you

We’ve got a lot to cover, so let’s jump in.

What is a sanity check, and why is it important?

Imagine you’re building a detailed model from scratch, carefully choosing each assumption and combining them to get to your final output (e.g. a forecast, company valuation etc.).

Each assumption seemed reasonable and you checked the math twice, so the output should be solid. Right? Right??

My experience over the last decade has been that more often than not, we miss the forest for the trees when building models or doing analyses. We layer so many assumptions on top of each other that the final result can quickly change from reasonable to ridiculous.

This is where sanity checks come into play: Sanity checks help us determine whether the result of our analysis falls has a good chance of being correct.

We’re all wrong from time to time. That’s fine; reality doesn’t always play out the way we expect. But you should try to be right most of the time.

Let’s dive into how to do that.

Get an email whenever Torsten Walbaum publishes.

How is sanity checking different?

When checking our work, most of the time we go through it step-by-step to check for errors. Are the cells linked correctly? Did I pull the formulas all the way down? Do all the joins in my SQL work correctly?

Image by author
Image by author

This mechanical "Quality Control" approach of checking all inputs can help us find issues, but it doesn’t ensure that the output makes intuitive business sense.

Sanity checking, on the other hand, is about taking a step back and validating the output from a different angle. If you come to the same conclusion both ways, you can be much more confident in your work.

How to do a sanity check

There are three broad categories of sanity checks: Bottom-up vs. top-down, benchmarking and intuition. I will go through each of them in detail and show how you can apply them at work.

Bottom-Up vs. Top-Down

Our analyses are typically either top-down or bottoms-up. But what does that mean?

Image by author
Image by author

Let’s look at a (simplified) example. Let’s say you work in a B2B SaaS company and want to figure out how many customers you can acquire for the new product you’re launching.

  • In the top-down approach, we are trying to understand what share of the market we can win. So we’d start by looking at the total number of businesses in the US, exclude industries that we are not targeting and company sizes we can’t support, assume what % of companies is looking to switch software providers and finally assume what share of those companies we can win (vs. our competitors)
  • In the bottom-up approach, we are trying to understand how many companies we can acquire based on the channels we have available. So we’d look at prior launches to figure out what lead volume we can get from LinkedIn, analyze keywords to determine expected SEM volume, project Email leads based on the number of companies we can target and expected conversion rates etc.

Both approaches can give us a directional idea of what we can expect, but they each have a crucial weakness.

The top-down approach does not consider how we are going to get these customers, while the bottom-up approach ignores the size of our target market.

As a result, the best way to sanity check your work is to combine top-down with bottom-up analysis.

Image by author
Image by author

Benchmarking

The best way to make sure a plan or projection is reasonable is to compare it against benchmarks. For example, if you’re forecasting the performance of a new market, it helps to compare it against prior launches of similar countries.

If your analysis massively deviates from the benchmark, you need to be able to explain why.

A few common things you should check for any model, forecast or projection:

  • Magnitude: How do the final outputs compare to benchmarks? E.g. are you projecting that France will be a larger market for the company than the UK?
  • Growth assumptions: What trend are you forecasting over time? E.g. is the new product projected to grow more quickly compared to past launches?
  • Seasonality: Does your projection show the same repeating patterns as the benchmarks? E.g. if all other markets show a slowdown during the December holiday period, why does your projection for the new country not show this?

This doesn’t mean you always have to model everything in line with benchmarks; but you always need to be able to explain why something is deviating.

Benchmarking Example 1: New Market launch

Scenario: You plan to enter the UK market and forecast user growth

✅ Sanity checks: You compare the forecast against the two most recent launches, Germany and France. Your forecast is more aggressive than the last two launches, and doesn’t show any seasonality.

❓ Questions you need to be able to answer:

  • What gives you confidence that that’s possible? Is the UK structurally different (e.g. the market is larger)? Is our product a better fit for the UK market? Are we using a different go-to-market strategy?
  • Why is there no seasonality in France? Are holiday periods different? Do B2B buyers have different seasonal buying patterns?

👉 If you are not able to make a strong case for why the new market is different from past launches, you are better off keeping the forecast similar.

Benchmarking Example 2: Marketing Plan

Scenario: You are forecasting Marketing spend and performance metrics by channel. The plan is to double the Marketing budget year-over-year.

✅ Sanity check: You compare the projected marketing efficiency against past trends. Your forecast projects that efficiency (Cost per Lead) will improve as we increase Marketing spend, but past data shows the opposite trend.

❓ Questions you need to be able to answer:

  • Why do you expect better efficiency?
  • What specific improvements are we deploying in each Marketing channel that will drive this?
  • Are we doing anything that would improve overall Marketing performance, e.g. investing in our Brand?

👉 If you don’t have a concrete plan to improve Marketing efficiency, you should assume that the historical relationship between spend and efficiency holds.

Comparing against intuition

Many times, you can use common sense to sanity check your analysis. Your intuition is not always going to be right, but it often highlights potential issues that need further validation.

A few examples:

  • You build a Discounted Cash Flow financial model and the terminal growth rate is much higher than that of the Gross Domestic Product (GDP); this means you are implicitly assuming that the company will outperform the broader economy forever. Is that realistic?
  • You are building an Account Scoring model and it scores companies as "good fit" that Sales believes are a waste of time. This doesn’t mean Sales is right (you build the model to surface new insights, after all), but you should take their experience into account since counterintuitive outputs often highlight model weaknesses

Using sanity checks to increase your credibility

Sanity checks are not just there to prevent others from poking holes in your work. They also give you a tool to increase your credibility.

Instead of doing them behind the scenes and then sharing the improved work, share the sanity checks as well. By showing how you validated the output of your analysis, you will build trust with your audience. If you don’t share the sanity checks you did, the audience has to assume they have to check your work on the spot.

You can do this visually on a slide by showing how both top-down and bottom-up approaches get to the same result, or comparing your data against benchmarks.

But you can also do it verbally:

✅ "We are planning to grow to 50 Email leads in the UK by October; that is based on similar conversion assumptions as in Canada, and corresponds to 3% monthly penetration of our Total Addressable Market for Email

How to use AI to sanity check your work

Sanity checking can be pretty time consuming; after all, you have to approach the same problem from multiple angles. Luckily, AI tools can save you a lot of time.

This is not a replacement for your sanity checking skills; ChatGPT needs your guidance to do a good job, so you still need to know how to perform a robust sanity check. The AI’s job is simply to do the heavy lifting for you, and to bring up a few points you might have missed.

Here is a step-by-step guide on how to do this with ChatGPT; all screenshots are from actual conversations I had where I asked ChatGPT to sanity check my forecast for a new market launch.

Disclaimer: Always check your employer’s policies on using AI tools like ChatGPT before uploading any proprietary data.

Step 1: Upload your work to ChatGPT

The first step is to upload the work that you want ChatGPT to sanity check. ChatGPT can handle a variety of file types, including PDFs, Excel, CSV files and more.

You can also integrate directly with several tools; e.g. in this example, I linked my Google Sheet that contained my forecast:

Image by author
Image by author

Even if your actual model lives outside a spreadsheet (e.g. in Python), I recommend dumping the outputs in Google Sheets for the sanity check; after all, you want ChatGPT to validate your outputs, not the mechanics of your model.

For this example, I gave ChatGPT this simple Go-To-Market Forecast for a new country launch (you can make a copy and try your own sanity check with it).

I went through four sanity checks; here are the logs:

  1. First attempt (Grade: Intern)
  2. Second attempt (Grade: Intern)
  3. Third attempt (Grade: First-year analyst)
  4. Fourth attempt (Grade: Over-confident first-year analyst)
  • Attempt #1 and attempt #2 were okay, but I felt like I had to provide quite a lot of guidance and didn’t always get exactly what I wanted.
  • The third attempt was pretty good, but both ChatGPT and I forgot to dig into the Marketing channel mix (and when I remembered a few days later, ChatGPT was unable to continue where we left off).
  • Attempt #4 was promising, but when re-forecasting to include seasonality, it adjusted the wrong months at first.

You might have to try a few times until you get a really good performance. And remember:

Don’t blindly use anything that AI produces; it can (and will) make mistakes, and you’ll be on the hook for the end result. AI can give helpful input and save you time, but it’s not a replacement for critical thinking.

Step 2: Write a prompt asking ChatGPT to sanity check

After ingesting your file, you need to write a prompt asking ChatGPT to sanity check your work.

Here’s what I used to sanity check the Go-To-Market Plan I linked above:

Image by author
Image by author

I’ve found that giving a little bit of context on the dataset is helpful (although not absolutely necessary). Also, don’t forget to say "please" and "thank you" in case the AI ever gets sentient; this will keep you off the naughty list.

You could also give ChatGPT this article, or another summary of how to do sanity checks, so you don’t have to include too many instructions in your prompt.

Step 3: Make sure ChatGPT ingested the data correctly

After ingesting the file, ChatGPT will often typically give you a brief summary of what it sees and what it believes the data represents, as well as the steps it will take for the analysis:

Screenshot from my third attempt; Image by author
Screenshot from my third attempt; Image by author

ChatGPT will also often restate some of the data in the chat, which is helpful for making sure it pulled the numbers correctly:

Screenshot from my third attempt; Image by author
Screenshot from my third attempt; Image by author

Note: Initially, ChatGPT had some problems ingesting my spreadsheet. Here’s how you can troubleshoot:

  • If ChatGPT throws an error or obviously didn’t ingest the data correctly, you can ask it to reprocess by clicking on the ♻ icon
  • If it repeatedly has issues with your file, the you might have to clean it up. For example, I found that I could significantly reduce errors if I removed non-essential rows and columns (e.g. section headers, comments etc.). If the file only contains the relevant tables, it’s easier for ChatGPT to convert them to data frames in Python. Giving descriptive column and row headers also helps ChatGPT make sense of the data

Step 4: Work through the sanity checks with ChatGPT

Next, ChatGPT will start providing some initial observations, like this:

Screenshot from my third attempt; Image by author
Screenshot from my third attempt; Image by author

ChatGPT correctly identified that the forecast is conservative and catches up with the UK eventually after a slow start. It would have been better if it had pulled in some stats for validation such as the number of small businesses in each country, but in my experience you need to ask it explicitly to do that.

Sometimes it will also proactively visualize key trends; other times, you have to prompt it for that.

Here’s how the conversation continued:

Screenshot from my third attempt; Image by author
Screenshot from my third attempt; Image by author

It’s nice that ChatGPT was able to pull the launch dates from the separate tab, and plot the performance for each country indexed by launch date.

We then got into seasonality; as you can see, I needed to provide the initial nudge and it took some back-and-forth, but ChatGPT did the work of identifying the correct patterns:

Screenshot from my third attempt; Image by author
Screenshot from my third attempt; Image by author

In several of my attempts, ChatGPT also gave a brief, but on-point summary of the Marketing mix, like this:

Screenshot from my fourth attempt; Image by author
Screenshot from my fourth attempt; Image by author

I’d say this is overall okay.

It correctly highlights Organic and Email as key areas that deserve a closer look. Unfortunately, it also draws some odd conclusions; it doesn’t make a lot of sense that Referrals in France would be as high as in the US given it’s a new market without an established customer base.

Step 5: [Optional] Ask ChatGPT to re-do your work

If you used ChatGPT to sanity check a forecast, it sometimes offers to re-forecast based on your discussion.

In my case, I asked it to incorporate seasonality:

Image by author
Image by author

Final Thoughts

It sucks if you put a lot of effort into an analysis only for it to be torn apart by others. By sanity checking your work before you share it you can massively reduce the chance of that happening.

It’s also a great way to build trust with more senior stakeholders and show that you are thinking like an executive.

For more hands-on analytics advice, consider following me here on Medium, on LinkedIn or on Substack.

The post How to challenge your own analysis so others won’t appeared first on Towards Data Science.

]]>
How to Design Better Metrics https://towardsdatascience.com/how-to-design-better-metrics-9bad7bc8c875/ Wed, 26 Jun 2024 07:07:46 +0000 https://towardsdatascience.com/how-to-design-better-metrics-9bad7bc8c875/ 9 best practices from leading companies like Uber & Meta

The post How to Design Better Metrics appeared first on Towards Data Science.

]]>
Image by author (created via Midjourney)
Image by author (created via Midjourney)

Metrics are a powerful tool; they help you measure what you care about. Having lofty goals is great, but to know if you’re making progress, incentivize your team and create accountability, you need to be able to express them in numbers.

But that’s easier said than done. There are dozens of metrics that seemingly measure the same thing, and new trendy metrics are invented every day. Which ones should you use and what should you avoid at all costs? This article will help you decide that.

Over the last decade I have been living and breathing metrics and have found that there are a few general principles that distinguish good metrics from bad metrics:


Principle 1: A metric should be a good proxy of what you’re trying to measure

You typically cannot directly measure the exact thing you care about.

Let’s say my goal was to measure the quality of my newsletter posts; how do I do that? "Quality" is subjective and there is no generally-accepted formula for assessing it. As a result, I have to choose the best (or least bad) proxy for my goal that I am actually able to measure. In this example, I could use open rate, likes etc. as proxies for quality.

Image by author
Image by author

This is closely related to what people often called the "relevance" of the metric: Does it create value for the business if you improve the metric? If not, then why measure it?

For example, let’s say you work at Uber and want to understand if your supply side is healthy. You might think that the number of drivers on the platform, or the time they spend online on the app, is a good measure.

These metrics are not terrible, but they don’t really tell you if your supply side is actually healthy (i.e. sufficient to fulfill demand). It could be that demand is outpacing driver growth, or that most of the demand growth is during the mornings, but supply is growing mostly in the afternoons.

A better metric would be one that combines supply and demand; e.g. the number of times riders open the app and there is no driver available.

Get an email whenever Torsten Walbaum publishes.

Principle 2: The metric should be easy to calculate and understand

People love fancy metrics; after all, complex Analytics is what you pay the data team for, right? But complicated metrics are dangerous for a few reasons:

  1. 🤔 They are difficult to understand. If you don’t understand exactly how a metric is calculated, you don’t know how to interpret its movements or how to influence it.
  2. 🧑‍🔬 They force a centralization of analytics. Often, Data Science is the only team that can calculate complex metrics. This takes away the ability of other teams to do decentralized analytics.
  3. ⚠ They are prone to errors. Complex metrics often require inputs from multiple teams; I lost count of the number of times I found errors because one of the many upstream inputs was broken. To make things worse, since only a handful of people in the company can calculate these metrics, there is very little peer review and errors often go unnoticed for long periods of time.
  4. 🔮 They often involve projections. Many complex Metrics rely on projections (e.g. projecting out cohort performance based on past data). These projections are often inaccurate and change over time as new data comes in, causing confusion.

Take LTV:CAC for example:

Apart from the fact that it’s not the best metric for the job it’s supposed to do, it’s also dangerous because it’s complicated to calculate. The numerator, CAC, requires you to aggregate various costs across Marketing and Sales on a cohort basis, while the denominator, LTV, is a projection of various factors including retention, upsell etc..

These kinds of metrics are the ones where you realize after two years that there was an issue in the methodology and you looked at "wrong" data the whole time.

Principle 3: A good (operational) metric should be responsive

If you want to manage the business to a metric on an ongoing basis, it needs to be responsive. If a metric is lagging, i.e. it takes weeks or months for changes to impact the metric, then you will not have a feedback loop that allows you to make continuous improvements.

You might be tempted to address this problem by forecasting the impact of changes rather than waiting for them to show up in the metrics, but that’s often ill-advised (see principle #2 above).

Of course, lagging metrics like revenue are important to keep track of (esp. for Finance or leadership), but most teams should be spending most of their time looking at leading indicators.

Principle 4: A metric should be hard to manipulate

One you choose a metric and hold people accountable to improving that metric, they will find the most efficient ways to do so. Often, that leads to unintended outcomes. Here’s an example:

  1. Facebook wants to show relevant content to users to increase the time they spend on the site
  2. Since "relevance" is hard to measure, they use engagement metrics as a proxy (likes, comments etc.)
  3. Publishers and creators realize how the algorithm works and find psychologically manipulative ways to increase engagement ➡ Click Bait and Rage Bait are born

"When a measure becomes a target, it ceases to be a good measure."

— Goodhart’s Law

In the example above, Facebook might be fine with the deterioration in quality as long as users continue spending time on the platform. But in many cases, if metrics are gamed at scale, it can cause serious damage.

Let’s say you are offering a referral bonus where users get rewarded for referred signups. What will most likely happen? People will attempt to create dozens of fake accounts to claim the bonus. A better referral metric would require a minimum transaction amount on the platform (e.g. $25) to get the bonus.

So one way to prevent manipulation is by designing the metric to restrict the unwanted behavior that you anticipate. Another approach is to pair metrics. This approach was introduced by Andy Grove in his book "High Output Management":

"So because indicators direct one’s activities, you should guard against overreacting. This you can do by pairing indicators, so that together both effect and counter-effect are measured."

— Andy Grove, "High Output Management"

What does that look like in practice? If you only incentivize your customer support agents on "time to first response" because you want customers to get immediate help, they will simply respond with a generic message to every new ticket. But if you couple it with a target for ticket resolution time (or customer satisfaction), you are ensuring that agents actually focus on solving customers’ problems faster.

Principle 5: A good metric doesn’t have arbitrary thresholds

Many popular metrics you’ll find in Tech companies are tied to a threshold.

For example:

  • of users with at least 5 connections

  • of videos > 1,000 views

This makes sense; often, taking an action in itself is not a very valuable signal and you need to set a threshold to make the metric meaningful. Somebody watching the majority of a video is very different from somebody just clicking on it.

BUT: The threshold should not be arbitrary.

Don’t choose "1,000 views" because it’s a nice, round number; the threshold should be grounded in data. Do videos with 1,000 views get higher click-through rates afterwards? Or result in more follow-on content produced? Higher creator retention?

For example, Twitch measures how many users watch a stream for at least five minutes. While data apparently played into this choice, it’s not entirely clear why they ultimately chose five.

At Uber, we tried to let the data tell us where the threshold should be. For example, we found that restaurants that had a lot of other restaurants nearby were more reliable on UberEats, as it was easier to keep couriers around. We set the threshold for what we considered low-density restaurants based on the "elbow" we saw in the graph:

Image by author
Image by author

This approach worked in many areas of the business; e.g. we also found that once riders or drivers reach a certain number of initial trips on the platform, they were much more likely to retain.

You are not always going to find a "magic" threshold like this, but you should try to identify one before settling for an arbitrary value.

Principle 6: Good metrics create context

Absolute numbers without context are rarely helpful. You’ll often see press announcements like:

  • "1B rows of data processed for our customers", or
  • "$100M in earnings paid out to creators on our platform"

These numbers tell you nothing. For them to be meaningful, they’d have to be put into context. How much did each creator on the platform earn on average? In what timeframe? In other words, turning the absolute number into a ratio adds context.

Image by author
Image by author

Of course, in the examples above, some of this is intentional; companies don’t want the public to know the details. But this problem is not just limited to press releases and blog posts.

Looking at your Sales pipeline in absolute terms might tell you whether it’s growing over time; but to make it truly meaningful, you’ll have to connect it to the size of the Sales team or the quota they carry. This gives you Pipeline Coverage, the ratio of Pipeline to Quota, a much more meaningful metric.

Creating these types of ratios also makes comparisons more insightful and fair; e.g. comparing revenue per department will make large departments look better, but comparing revenue per employee gives an actual view of productivity.

Principle 7: A metric needs a clear owner that controls the metric

If you want to see movement on a metric, you need to have a person that is responsible for improving it.

Even if multiple teams’ work contributes to moving the metric, you still need a single "owner" that is on the hook for hitting the target (otherwise you’ll end up with a lot of finger-pointing).

There are three potential problem scenarios here:

  1. No owner. With nobody obsessing about improving it, the metric will just continue on its current trajectory.
  2. Multiple owners. Unclear ownership causes friction and lack of accountability. For example, there were times at UberEats where it was unclear whether certain metrics were owned by local City teams or Central Operations teams. For a short period of time, we spent more time meeting on this topic than actually executing.
  3. Lack of control. Assigning an owner that is (or feels) powerless to move the metric is another recipe for failure. This could be because the owner doesn’t have direct levers to control the metric, no budget to do so, or a lack of support from other teams

Principle 8: A good metric minimizes noise

A metric is only actionable if you can interpret its movements. To get a clean read, you need to eliminate as many sources of "noise" as possible.

For example: Let’s say you’re a small B2B SaaS startup and you look at web traffic as a leading indicator for the top of your funnel. If you simply look at the "raw" number of visits, you’ll have noise from your own employees, friends and family as well as existing customers visiting the website and you might see little correlation between web traffic and down-funnel metrics.

Excluding these traffic sources from your reporting, if possible, will give you a better idea of what’s actually going on with your prospect funnel.

Principle 9: Certain metrics should be industry standard

For certain metrics, it’s important that they can be compared across companies. For example, if you’re in B2B SaaS, your CFO will want to compare your Net Revenue Retention (NRR), CAC Paybacks or Magic Number to competitors (and your investors will want to do the same).

If you calculate these metrics in a way that’s not market standard, you won’t be able to get any insights through benchmarking and cause a whole lot of confusion. That’s not to say that you shouldn’t make up metrics; in fact, I have made up a few myself over the course of my career (and might write a separate post on how to do that).

But the definitions for most financial and efficiency metrics are better left untouched.

In conclusion

All of the above being said, I want to make one thing clear: There is no perfect metric for any use case. Every metric will have downsides and you need to pick the "least bad" one.

Hopefully, the principles above will help you do that.

For more hands-on analytics advice, consider following me here on Medium, on LinkedIn or on Substack.


Bonus: The Metrics Hall of Shame

A metric shouldn’t be made up to support a business narrative or hide inconvenient truths. This sounds obvious, but there are plenty of funky metrics out there that were created for this purpose:

1. WeWork’s Community-adjusted EBITDA:

The first place of made-up metrics goes to WeWork’s Community-Adjusted EBITDA.

Adjusted EBITDA has always been known as the land where anything goes; but WeWork’s metric, at first glance, seemed especially "creative". In addition to interest, taxes, depreciation and amortization, WeWork deducted line items like Marketing, General & Administrative expenses etc.

The intention was to show a measure of unit economics, which is not unheard of. But WeWork did not do a good job explaining the metric’s purpose, resulting in (understandable) backlash and ridicule.

2. Elon Musk’s "Unregretted User Minutes" for X:

What do you do when your core engagement metrics like DAUs are tanking? You tell people that those metrics don’t matter and make up a new metric to focus on instead. Enter: Unregretted User Minutes.

How is that measured, you ask? Nobody outside of X knows; and if I had to guess, neither does anyone at X.

Social Media is definitely an area that could benefit from a shift away from pure engagement metrics towards something that takes into account the quality of the user experience, but this much more likely to be a (thinly veiled) attempt to distract from X’s troubles.

3. Netflix’s 2019 "Views" definition change:

How do you make engagement on your platform go up without actually doing anything?

You change the threshold of what counts as an engagement!

Until the end of 2019, Netflix counted as a view any time someone watched > 70% of a movie or TV show episode. In late 2019, they set the threshold at 2 minutes instead; that’s not even enough for the cold open intro of most TV shows. So if someone drops off before the opening credit sequence plays, it still counts as a view. No surprise, the new numbers were roughly 35% higher.

Netflix has since changed their metric again, to be fair, and the new one seems more reasonable (total hours viewed divided by runtime; i.e. effectively "full views").

The post How to Design Better Metrics appeared first on Towards Data Science.

]]>
Should You Join FAANG or a Startup as a Data Scientist? https://towardsdatascience.com/should-you-join-faang-or-a-startup-as-a-data-scientist-030e3b8a7080/ Thu, 20 Jun 2024 15:00:21 +0000 https://towardsdatascience.com/should-you-join-faang-or-a-startup-as-a-data-scientist-030e3b8a7080/ Lessons from working at Uber + Meta, a growth stage company and a tiny startup

The post Should You Join FAANG or a Startup as a Data Scientist? appeared first on Towards Data Science.

]]>
What type of company you join is an incredibly important decision. Even if the company is prestigious and pays you well, if the work environment is not a fit, you’ll burn out eventually.

Many people join a startup or a big tech company without a good understanding of what it’s actually like to work there, and often end up disappointed. In this article, I will cover the key differences based on my experience working at companies ranging from a small 10-person startup to big tech giants like Uber and Meta. Hopefully this will help you decide where you want to go.

If you want to skim the article, I am adding a brief summary ("TL;DR" = "Too long, didn’t read") at the end of each section (something I learned at Uber).

Factor #1: How prestigious the company is

Think of a tech company you know. Chances are, you thought of Google, Meta, Amazon, Apple or a similar large company.

Based on these companies’ reputation, most people assume that anyone who works there meets a very high bar for excellence. While that’s not necessarily true (more on that below), this so-called "halo effect" can help you. Once you have the "stamp of approval" from a big tech company on your resume, it is much easier to find a job afterwards.

Many companies think: "If that person is good enough to be a Data Scientist at Google, they will be good enough for us. I’m sure Google did their due diligence".

Coming to the US from Germany, most hiring managers and recruiters didn’t know the companies I used to work for. Once I got a job at Uber, I was flooded with offers, including from companies that had rejected me before.

You might find that unfair, but it’s how the system currently works, and you should consider this when choosing a company to work for.

TL;DR: Working for a prestigious company early in your career can open a lot of doors.

Get an email whenever Torsten Walbaum publishes.

Factor #2: How smart your colleagues are

As mentioned above, people often assume that FAANG companies only hire the best and brightest.

In reality, that’s not the case. One thing I learned over the years is that any place in the world has a normal distribution of skill and talent once it reaches a certain size. The distribution might be slightly offset on the X axis, but it’s a normal distribution nonetheless.

Image by author
Image by author

Many of of the most well-known companies started out being highly selective, but as they grew and ramped up hiring, the level of excellence started reverting to the mean.

Counterintuitively, that means that some small startups have more elite teams than big tech companies because they can afford to hand-pick every single new hire. To be sure, you’ll need to judge the caliber of the people first-hand during the interview process.

TL;DR: You’ll find smart people in both large and small companies; it’s a fallacy that big tech employs higher-caliber people than startups.

Factor #3: How much money you’ll make

How much you’ll earn depends on many factors, including the specific company, the level you’re being offered, how well you negotiate etc.

The main thing to keep in mind: It’s not just about how much you make, but also how volatile and liquid your compensation is. This is affected by the composition of your pay package (salary vs. equity (illiquid private company-stock vs. liquid public company stock)) and the stage of the company.

Here is how you can think about it at a high level:

  • Early-stage: Small startups will offer you lower base salaries and try to make up for that by promising high equity upside. But betting on the equity upside of an early-stage startup is like playing roulette. You might hit it big and never have to work again, but you need to be very lucky; the vast majority of startups fail, and very few turn into unicorns.
  • Big Tech: Compensation in big tech companies, on the other hand, is more predictable. The base salary is higher (e.g. see the O’Reilly 2016 Data Science Salary Survey) and the equity is typically liquid (i.e. you can sell it as soon as it vests) and less volatile. This is a big advantage since in pre-IPO companies you might have to wait years for your equity to actually be worth something.
  • Growth stage: Growth stage companies can be an interesting compromise; they have a much higher chance of exiting successfully, but your equity still has a lot of upside. If you join 2–3 top-tier growth stage companies over the years, there is a good chance you’ll end up with at least one solid financial outcome. Pay in some of these companies can be very competitive; my compensation actually increased when I moved from Meta to Rippling.

TL;DR: Instead of just focusing on salary, choose the pay package that fits your appetite for risk and liquidity needs.

Factor #4: How much risk you’ll take on

We all want job security.

We might not stay in a job for our entire career, but at least we want to be able to choose ourselves when we leave.

Startups are inherently riskier than big companies. Is the founder up to the job? Will you be able to raise another round of financing? Most of these risks are existential; in other words, the earlier the stage of the company you join, the more likely it is it won’t exist anymore 6–12 months from now.

Image by author
Image by author

At companies in later stages, some of these risks have already been eliminated or at least reduced.

In exchange, you’re adding another risk, though: Increased layoff risk. Startups only hire for positions that are business critical since they are strapped for cash. If you get hired, you can be sure they really needed another Data Scientist and there is plenty of work for you to do that is considered central to the startup’s success.

In large companies, though, hiring is often less tightly controlled, so there is a higher risk you’ll be hired into a role that is later deemed "non-essential" and you will be part of sweeping layoffs.

TL;DR: The earlier the company stage, the more risk you take on. But even large companies aren’t "safe" anymore (see: layoffs)

Factor #5: What you get to work on

A job at a startup and a large company are very different.

The general rule of thumb is that in earlier-stage companies you’ll have a broader scope. For example, if you join as the first data hire in a startup, you’ll likely act as part Data Engineer, part Data Analyst and part Data Scientist. You’ll need to figure out how to build out the data infrastructure, make data available to business users, define metrics, run experiments, build dashboards, etc.

Your work will also likely range across the entire business, so you might work with Marketing & Sales data one day, and with Customer Support data the next.

In a large company, you’ll have a narrowly defined scope. For example, you might spend most of your time forecasting a certain set of metrics.

The trade-off here is breadth vs. depth & scale: At a startup, your scope is broad, but because you are stretched so thin, you don’t get to go deep on any individual problem. In a large company, you have a narrow scope, but you get to develop deep subject matter expertise in one particular area; if this expertise is in high demand, specializing like this can be a very lucrative path. In addition, anything you do touches millions or even billions of users.

TL;DR: If you want variety, join a startup. If you want to build deep expertise and have impact at scale, join Big Tech. A growth stage company is a good compromise.

Factor #6: What learning opportunities you’ll have

When I joined UberEats in 2018, I didn’t get any onboarding. Instead, I was given a set of problems to solve and asked to get going.

If you are used to learning in a structured way, e.g. through lectures in college, this can be off-putting at first. How are you supposed to know how to do this? Where do you even start?

But in my experience, working on a variety of challenging problems is the best way to learn about how a business works and build out your hard and soft skills. For example, coming out of school my SQL was basic at best, but being thrown into the deep end at UberEats forced me to become good at it within weeks.

The major downside of this is that you don’t learn many best practices. What does a best-in-class data infrastructure look like? How do the best companies design their metrics? How do you execute thousands of experiments in a frictionless way while maintaining rigor? Even if you ultimately want to join a startup, seeing what "good" looks like can be helpful so you know what you’re building towards.

In addition, large companies often have formalized training. Where in a startup you have to figure everything out yourself, big tech companies will typically provide sponsored learning and development offerings.

TL;DR: At early-stage companies you learn by figuring things out yourself, at large companies you learn through formal training and absorbing best practices.

Factor #7: What career growth opportunities you’ll have

We already talked about how working at prestigious companies can help when you’re looking for a new job. But what about your growth within the company?

At an early-stage company, your growth opportunities come as a direct result of the growth of the company. If you join as an early data hire and you and the company are both doing well, it’s likely you’ll get to build out and lead a data team.

Most of the young VPs and C-Level executives you see got there because their Careers were accelerated by joining a "rocket ship" company.

There is a big benefit of larger companies, though: You typically have a broader range of career options. You want to work on a different product? No need to leave the company, just switch teams. You want to move to a different city or country? Probably also possible.

TL;DR: Early-stage, high-growth companies offer the biggest growth opportunities (if the company is successful), but large companies provide flexibility.

Factor #8: How stressed you’ll be

There are many types of stress. It’s important to figure out which ones you can handle, and which ones are deal-breakers for you.

At fast-growing early-stage companies, the main source of stress comes from:

  • Changing priorities: In order to survive, startups need to adapt. The original plan didn’t work out? Let’s try something else. As a result, you can rarely plan longer than a few weeks ahead.
  • Fast pace: Early-stage companies need to move fast; after all, they need to show enough progress to raise another financing round before they run out of money.
  • Broad scope: As mentioned above, everyone in an early-stage company does a lot of things; it’s easy to feel stretched thin. Most of us in the analytics realm like to do things perfectly, but in a startup you rarely get the chance. If it’s good enough for now, move on to the next thing!

In large companies, stress comes from other factors:

  • Complexity: Larger companies come with a lot of complexity. An often convoluted tech stack, lots of established processes, internal tools etc. that you need to understand and learn to leverage. This can feel overwhelming.
  • Politics: At large companies, it can sometimes feel like you’re spending more time debating swim lanes with other teams than doing actual work.

TL;DR: Not all stress is created equal. You need to figure out what type of stress you can deal with and choose your company accordingly.

When should you join a big company vs. a startup?

There is no one-size-fits-all answer to this question. However, my personal opinion is that it helps to do at least one stint at a reputable big tech company early in your career, if possible.

This way, you will:

  • Get pedigree on your resume that will help you get future jobs
  • See what a high-performing data infrastructure and analytics org at scale looks like
  • Get structured onboarding, coaching and development

This will provide you with a solid foundation, whether you want to stay in big tech or jump into the crazy world of startups.

Final Thoughts

Working at a small startup, growth stage company or FAANG tech company is not inherently better or worse. Each company stage has its pros and cons; you need to decide for yourself what you value and what environment is the best fit for you.

For more hands-on advice on how to scale your career in data & analytics, consider following me here on Medium, on LinkedIn or on Substack.

The post Should You Join FAANG or a Startup as a Data Scientist? appeared first on Towards Data Science.

]]>
How to Maximize Your Impact as a Data Scientist https://towardsdatascience.com/how-to-maximize-your-impact-as-a-data-scientist-3881995a9cb1/ Tue, 11 Jun 2024 14:29:02 +0000 https://towardsdatascience.com/how-to-maximize-your-impact-as-a-data-scientist-3881995a9cb1/ Actionable advice to accelerate your career

The post How to Maximize Your Impact as a Data Scientist appeared first on Towards Data Science.

]]>
Image by Author (partially created via Midjourney)
Image by Author (partially created via Midjourney)

One of the hardest pills to swallow as an Individual Contributor (IC) at work is that nobody cares about the hard work you put in. They don’t even care about your output; they care about the impact you drive.

What’s the difference? Your output is the analysis you deliver, or the lines of code you write. Your impact is the decision your analysis helps the CEO make, or the revenue the new product feature is generating.

Image by author
Image by author

If you want to establish yourself as a high performer and accelerate your career as a Data Scientist, it’s key to focus on impact.

In this post I’ll go over the following:

  1. Why prioritizing impact matters not just for managers, but also ICs
  2. Why focusing on impact is hard
  3. How to maximize your impact
  4. How to overcome common challenges in driving real impact

Let’s dive in.

Get an email whenever Torsten Walbaum publishes.

Why should I focus on impact; isn’t that my manager’s job?

Of course you can leave it to your manager to worry about impact. But stepping up comes with some real benefits for your career:

  • Reduced frustration & burn-out: Putting a lot of work into a project and then feeling like it didn’t move the needle is one of the most frustrating feelings in any job.
  • Promotions: Promotions are heavily tied to impact. And if you want to become a manager, you’ll need to show that you understand what drives business outcomes and can allocate resources accordingly.
  • Internal opportunities: People around you notice if you are having an outsized impact, and you’ll increase your chances of receiving internal offers. My promotion to Director happened because the CMO noticed my work on the BizOps team and asked me to move into the Marketing org to build out a Strategy & Analytics team.
  • External opportunities: Prospective employers don’t focus on what responsibilities you had, but what your impact was. After all, they are trying to figure out how you can help their business.

Why isn’t everyone doing this?

Because it’s hard.

We are used to thinking about inputs and outputs rather than impact in our daily lives ("I went to the gym" or "I did three loads of laundry") and we carry that mindset over to our jobs.

More importantly, it gives us a sense of control. It’s fully under your control to work hard on the project, and maybe to create the final deliverable, but you can’t guarantee that it will actually move the business forward.

It can also feel like we’re doing someone else’s job. You built the dashboard; now it’s the other team’s problem how they’re going use it and get value from it. You can definitely take this stance; but don’t you want to see your work move the needle?

Lastly, sometimes it’s unclear what impact even looks like for our role because we feel too disconnected from the business outcomes; I’ll get into this below.

How can I become more impact-focused?

Step 1: Understand what impact looks like for your role and measure your success accordingly

Stop thinking about productivity metrics like "I launched 5 experiments" or "I built this model" and hold yourself accountable to driving impact.

But what does that look like for a Data Scientist? For other roles it’s easy; Account Executives have sales quotas and Growth Marketing Managers have lead generation targets.

But Data Science, at its core, is a function that supports other teams. As a result, there are two levels of impact:

Image by author
Image by author

Did your work change anything for the better for your business partners? E.g.:

  • Did your analysis change the roll-out strategy of the new product?
  • Did your model improve forecast accuracy?
  • Does your dashboard save the team hours every week that they used to spend on manual data pulls?

Did your work help move the needle on downstream business metrics? E.g.:

  • You’re a Marketing Data Scientist? Assume you’re on the hook for hitting lead and opportunity targets, and improving Marketing efficiency
  • You’re doing Analytics for the Customer Support org? Start obsessing about response times and satisfaction scores.

You don’t have to be solely responsible for something in order to take (partial) credit for it. If you provided the analysis that resulted in a pricing change that saved the company millions, then you deserve part of the credit for that impact.

You might not feel the consequences of missing these downstream targets as immediately as your stakeholders, but since your long-term career trajectory is still tied to driving impact, it helps to adopt this outcome-focused mindset.

Once you start doing this, you’ll notice more inefficiencies you can help address, or new opportunities for growth.

Step 2: Ensure your work solves a real business problem

You’ll likely know this situation: Instead of approaching you with a problem, people ask you for a specific deliverable. An analysis, a model, a dashboard.

If you blindly execute what they ask, you might realize too late that it won’t lead to tangible business impact. Maybe the problem they are trying to solve is not that important in the grand scheme of things, or there is a better way to approach it.

So what can you do?

Act like an owner. Understand the actual problem behind the request, and ask yourself what business priority this supports.

If you are early in your career then your manager should ideally help with this. But don’t rely on this: Managers don’t always do a perfect job, and you’ll be the one to feel the consequences of badly scoped work.

This requires you to understand company level priorities and the priorities of other orgs and teams. Take notes during All Hands meetings etc. to understand the big picture, and get your hands on other team’s planning materials to get an idea of what they’re trying to accomplish in the next 1–2 quarters.

Step 3: Ensure there is buy-in for your work

Even if your work directly supports company-level priorities, you’ll be in for a bad time if key stakeholders are not bought in.

You don’t want to be in a situation where you finish the work and then realize that another team is blocking the implementation because they have concerns you didn’t address. To avoid this, you’ll:

  1. Need to understand whose support you need, and
  2. Get them onboard from the get-go

This is a complex topic in itself; I’ll write a separate deep dive on how to drive alignment and get buy-in from other teams in the near future.

Step 4: Focus your time on the highest-impact thing

No matter what role you’re in, you’re likely juggling multiple priorities. To maximize your impact, you need to ensure you spend the majority of your time on the most important thing.

As with many things, this is easier said than done though, so let’s talk about what that looks like concretely.

Ad-hoc requests vs. strategic work

It’s easy to get caught up in the craziness of daily business only to realize you didn’t make any progress on the big, strategic project you actually care about.

This is all too common; none of us get to sit in our ivory tower and chip away at our projects undisturbed. Plus, ad-hoc work is impactful, too; while it’s less exciting than strategic projects, it’s what keeps the business running.

Still, if you find yourself spending the majority of your time fielding these ad-hoc issues, it’s time to talk to your manager. I’m sure your manager would rather help protect your bandwidth than have you 1) miss your deadlines on your key projects and 2) quit eventually from frustration.

Image by author
Image by author

Don’t cry over spilled milk

Another common challenge comes from the sunk cost fallacy. You invested a lot of time into a project, but it doesn’t seem to be going anywhere. Maybe you realized the premise didn’t make as much sense as you thought, or the priorities of the business have changed since you started the work.

Instead of talking to your manager and stakeholders about changing the scope of the project or abandoning it altogether, you’re doubling down to get it over the finish line. After all, you don’t want all of your effort to go to waste. Sound familiar?

Economists (and Poker players) figured out a long time ago that this is a dangerous trap. When prioritizing your time, ignore how much effort your already put in and focus on where the next hour of work will yield the most impact.

Things to watch out for ("impact killers")

How do you minimize the odds of wasting time on a project that won’t lead to impact? There are a few warning signs:

  • "Academic" projects: Any time a project is pitched to you along the lines of "This would be interesting to understand" you should be careful; projects that purely improve the understanding of an issue without tying it back to the business are a waste of time and source of frustration in my experience
  • Overly ambitious project scope: At Uber, everyone always wanted to understand what the "best" driver incentive type is. Many people worked on this over the years, but it never led anywhere. There was no simple "one-size-fits-all" answer to this question, and the projects that led to actual impact were much more concrete, tactical optimizations
  • The customer or deliverable are not defined: If it’s not clear who the end user of your work is (are you doing this for your manager, leadership, or another team?), or you’re unsure what exactly you’re supposed to deliver, it should raise a red flag. This is typically a sign that the project needs more scoping work before someone should start running with it

Common Challenges and How to Address Them

We talked about general frameworks to maximize impact. But how do you make actual, specific projects more impactful?

Many times, projects fail close to the finish line. Impact doesn’t materialize automatically, so you need to put in the final bit of work to ensure your work gets adopted. Doing this has an extremely high return on the time you invest since you already did the hard work to produce the deliverable and "only" need to close the loop with stakeholders.

Image by author
Image by author

To make things more tangible, I am going to go through a few types of common deliverables, touch on where they typically fail to create impact and propose what you can do about it:

1. You create a comprehensive analysis but nobody is acting on it

Problem: This is common with analyses that don’t have a clear recommendation. If you simply outline the data and potential paths forward, you are expecting your audience to do all of the heavy lifting.

Solution: Your work starts adding real value for them once you take that work off their plate. Always give a clear recommendation; you can caveat it and show alternatives in the appendix, but you need to take a stance.

2. You ran an experiment but nobody is using the results

Problem: Many experiments conclude with a metrics read-out by Data Science. More often than not, this is a "metrics dump" with a lot of information, but little interpretation or context.

Solution: Help your business partners interpret the results, and tell them how it affects what they care about.

  • How should they think about the statistical significance or lack thereof?
  • Is the observed lift good compared to other changes you tested and shipped?
  • What is your recommendation for next steps? What does the experiment result mean for this person or team specifically?

Remember, you are the subject matter expert and shouldn’t expect non-analytical audiences to interpret raw experiment data. Telling your stakeholders what the result means for them will increase chances they will act on it.

3. You built a predictive model, but the team you built it for is not using it

Problem: When predictive models don’t get used, it’s often because of a lack of trust in the model output.

ML models themselves tend to be black boxes, and if teams don’t understand how the outputs were generated and whether they are reliable, they are hesitant to rely on them. Even if your model is not using ML and lives in a spreadsheet: If people don’t know how it works, they’ll be suspicious.

Solution: It’s all about involving stakeholders in the process and building trust.

  • Involve stakeholders in the model development from the get-go to get them comfortable and address any concerns early on
  • Demystify the output; for example, you can extract the top model features and explain them
  • Sanity-check predictions and compare them to intuition. For example, if you forecast sales but your model predicts a different seasonality pattern from previous years, you’ll need to be able to explain why, or you’ll lose trust. In my experience, this is more impactful than just sharing performance metrics like the accuracy of the model

Having a structured playbook for how to do this will make your life easier, so I’ll cover this in a separate post in the near future.

4. You created a dashboard but nobody is looking at it

Problem: If a dashboard doesn’t get used, it’s likely one of these things is true:

  1. The dashboard doesn’t directly address an urgent business use case
  2. You didn’t involve your stakeholders along the way (e.g. by sharing mock-ups and drafts for feedback) and the final product is not what they were hoping for
  3. The dashboard is complex and your users don’t understand how to get what they need

Solution: To address #1 and #2, start with user research to understand pain points and potential use cases of the dashboard, and involve your stakeholders during development.

With regards to #3, a simpler dashboard that users are comfortable with beats a more advanced one that doesn’t get used. If you cannot (or don’t want to) simplify the dash further, you’ll need to train your users on the functionality and shadow them to understand any points of friction.

A dashboard is not done when you ship it for the first time, but needs to be improved over time based on users’ needs and feedback.

Closing Thoughts

Focusing on impact is scary since we leave the world of controllable inputs behind, but it’s what ultimately gets you promotions and new job opportunities.

And isn’t it nice when your work actually feels like it moves the needle?

For more hands-on analytics advice, consider following me here on Medium, on LinkedIn or on Substack.

The post How to Maximize Your Impact as a Data Scientist appeared first on Towards Data Science.

]]>
The Ultimate Guide to Making Sense of Data https://towardsdatascience.com/the-ultimate-guide-to-making-sense-of-data-aaa121db1119/ Tue, 04 Jun 2024 14:47:49 +0000 https://towardsdatascience.com/the-ultimate-guide-to-making-sense-of-data-aaa121db1119/ Lessons from 10 years at Uber, Meta and High-Growth Startups

The post The Ultimate Guide to Making Sense of Data appeared first on Towards Data Science.

]]>
Data can help you make better decisions.

Unfortunately, most companies are better at collecting data than making sense of it. They claim to have a data-driven culture, but in reality they heavily rely on experience to make judgement calls.

As a Data Scientist, it’s your job to help your business stakeholders understand and interpret the data so they can make more informed decisions.

Your impact comes not from the analyses you do or the models you build, but the ultimate business outcomes you help to drive. This is the main thing that sets apart senior DS from more junior ones.

To help with that, I’ve put together this step-by-step playbook based on my experience turning data into actionable insights at Rippling, Meta and Uber.

I’ll cover the following:

  1. What metrics to track: How To establish the revenue equation and driver tree for your business
  2. How to track: How to set up monitoring and avoid common pitfalls. We’ll cover how to choose the right time horizon, deal with seasonality, master cohorted data and more!
  3. Extracting insights: How to identify issues and opportunities in a structured and repeatable way. We’ll go over the most common types of trends you’ll come across, and how to make sense of them.

Sounds simple enough, but the devil is in the details, so let’s dive into them one-by-one.

Part 1: What metrics to track

First, you need to figure out what Metrics you should be tracking and analyzing. To maximize impact, you should focus on those that actually drive revenue.

Start with the high-level revenue equation (e.g. "Revenue = Impressions * CPM / 1000" for an ads-based business) and then break each part down further to get to the underlying drivers. The exact revenue equation depends on the type of business you’re working on; you can find some of the most common ones here.

The resulting driver tree, with the output at the top and inputs at the bottom, tells you what drives results in the business and what dashboards you need to build so that you can do end-to-end investigations.

Example: Here is a (partial) driver tree for an ads-based B2C product:

Image by author
Image by author

Understanding leading and lagging metrics

The revenue equation might make it seem like the inputs translate immediately into the outputs, but this is not the case in reality.

The most obvious example is a Marketing & Sales funnel: You generate leads, they turn into qualified opportunities, and finally the deal closes. Depending on your business and the type of customer, this can take many months.

In other words, if you are looking at an outcome metric such as revenue, you are often looking at the result of actions you took weeks or months earlier.

As a rule of thumb, the further down you go in your driver tree, the more of a leading indicator a metric is; the further up you go, the more of a lagging metric you’re dealing with.

Quantifying the lag

It’s worth looking at historical conversion windows to understand what degree of lag you are dealing with.

That way, you’ll be better able to work backwards (if you see revenue fluctuations, you’ll know how far back to go to look for the cause) as well as project forward (you’ll know how long it will take until you see the impact of new initiatives).

In my experience, developing rules of thumb (does it on average take a day or a month for a new user to become active) will get you 80% – 90% of the value, so there is no need to over-engineer this.

Part 2: Setting up monitoring and avoiding common pitfalls

So you have your driver tree; how do you use this to monitor the performance of the business and extract insights for your stakeholders?

The first step is setting up a dashboard to monitor the key metrics. I am not going to dive into a comparison of the various BI tools you could use (I might do that in a separate post in the future).

Everything I’m talking about in this post can easily be done in Google Sheets or any other tool, so your choice of BI software won’t be a limiting factor.

Instead, I want to focus on a few best practices that will help you make sense of the data and avoid common pitfalls.

1. Choosing the appropriate time frame for each metric

While you want to pick up on trends as early as possible, you need to be careful not to fall into the trap of looking at overly granular data and trying to draw insights from what is mostly noise.

Consider the time horizon of the activities you’re measuring and whether you’re able to act on the data:

  • Real-time data is useful for a B2C marketplace like Uber because 1) transactions have a short lifecycle (an Uber ride is typically requested, accepted and completed within less than an hour) and 2) because Uber has the tools to respond in real-time (e.g. surge pricing, incentives, driver comms).
  • In contrast, in a B2B SaaS business, daily Sales data is going to be noisy and less actionable due to long deal cycles.

You’ll also want to consider the time horizon of the goals you are setting against the metric. If your partner teams have monthly goals, then the default view for these metrics should be monthly.

BUT: The main problem with monthly metrics (or even longer time periods) is that you have few data points to work with and you have to wait a long time until you get an updated view of performance.

One compromise is to plot metrics on a rolling average basis: This way, you will pick up on the latest trends but are removing a lot of the noise by smoothing the data.

Image by author
Image by author

Example: Looking at the monthly numbers on the left hand side we might conclude that we’re in a solid spot to hit the April target; looking at the 30-day rolling average, however, we notice that revenue generation fell off a cliff (and we should dig into this ASAP).

2. Setting benchmarks

In order to derive insights from metrics, you need to be able to put a number into context.

  • The simplest way is to benchmark the metric over time: Is the metric improving or deteriorating? Of course, it’s even better if you have an idea of the exact level you want the metric to be at.
  • If you have an official goal set against the metric, great. But even if you don’t, you can still figure out whether you’re on track or not by deriving implied goals.

Example: Let’s say the Sales team has a monthly quota, but they don’t have an official goal for how much pipeline they need to generate to hit quota.

In this case, you can look at the historical ratio of open pipeline to quota ("Pipeline Coverage"), and use this as your benchmark. Be aware: By doing this, you are implicitly assuming that performance will remain steady (in this case, that the team is converting pipeline to revenue at a steady rate).

3. Accounting for seasonality

In almost any business, you need to account for seasonality to interpret data correctly. In other words, does the metric you’re looking at have repeating patterns by time of day / day of week / time of month / calendar month?

Example: Look at this monthly trend of new ARR in a B2B SaaS business:

Image by author
Image by author

If you look at the drop in new ARR in July and August in this simple bar chart, you might freak out and start an extensive investigation.

However, if you plot each year on top of each other, you’re able to figure out the seasonality pattern and realize that there is an annual summer lull and you can expect business to pick up again in September:

Image by author
Image by author

But seasonality doesn’t have to be monthly; it could be that certain weekdays have stronger or weaker performance, or you typically see business picking up towards the end of the month.

Example: Let’s assume you want to look at how the Sales team is doing in the current month (April). It’s the 15th business day of the month and you brought in $26k so far against a goal of $50k. Ignoring seasonality, it looks like the team is going to miss since you only have 6 business days left.

However, you know that the team tends to bring a lot of deals over the finish line at the end of the month.

Image by author
Image by author

In this case, we can plot cumulative sales and compare against prior months to make sense of the pattern. This allows us to see that we’re actually in a solid spot for this time of the month since the trajectory is not linear.

4. Dealing with "baking" metrics

One of the most common pitfalls in analyzing metrics is to look at numbers that have not had sufficient time to "bake", i.e. reach their final value.

Here are a few of the most common examples:

  1. User acquisition funnel: You are measuring the conversion from traffic to signups to activation; you don’t know how many of the more recent signups will still convert in the future
  2. Sales funnel: Your average deal cycle lasts multiple months and you do not know how many of your open deals from recent months will still close
  3. Retention: You want to understand how well a given cohort of users is retaining with your business

In all of these cases, the performance of recent cohorts looks worse than it actually is because the data is not complete yet.

If you don’t want to wait, you generally have three options for dealing with this problem:

Option 1: Cut the metric by time period

The most straightforward way is to cut aggregate metrics by time period (e.g. first week conversion, second week conversion etc.). This allows you to get an early read while making the comparison apples-to-apples and avoiding a bias towards older cohorts.

You can then display the result in a cohort heatmap. Here’s an example for an acquisition funnel tracking conversion from signup to first transaction:

Image by author
Image by author

This way, you can see that on an apples-to-apples basis, our conversion rate is getting worse (our week-1 CVR dropped from > 20% to c. 15% in recent cohorts). By just looking at the aggregate conversion rate (the last column) we wouldn’t have been able to distinguish an actual drop from incomplete data.

Option 2: Change the metric definition

In some cases, you can change the definition of the metric to avoid looking at incomplete data.

For example, instead of looking at how many deals that entered the pipeline in March closed until now, you could look at how many of the deals that closed in March were won vs. lost. This number will not change over time, while you might have to wait months for the final performance of the March deal cohort.

Option 3: Forecasting

Based on past data, you can project where the final performance of a cohort will likely end up. The more time passes and the more actual data you gather, the more the forecast will converge to the actual value.

But be careful: Forecasting cohort performance should be approached carefully as it’s easy to get this wrong. E.g. if you’re working in a B2B business with low win rates, a single deal might meaningfully change the performance of a cohort. Forecasting this accurately is very difficult.

Part 3: Extracting insights from the data

All this data is great, but how do we translate this into insights?

You won’t have time to dig into every metric on a regular basis, so prioritize your time by first looking at the biggest gaps and movers:

  • Where are the teams missing their goals? Where do you see unexpected outperformance?
  • Which metrics are tanking? What trends are inverting?

Once you pick a trend of interest, you’ll need to dig in and identify the root cause so your business partners can come up with targeted solutions.

In order to provide structure for your deep dives, I am going to go through the key archetypes of metric trends you will come across and provide tangible examples for each one based on real-life experiences.

1. Net neutral movements

When you see a drastic movement in a metric, first go up the driver tree before going down. This way, you can see if the number actually moves the needle on what you and the team ultimately care about; if it doesn’t, finding the root cause is less urgent.

Image by author
Image by author

Example scenario: In the image above, you see that the visit-to-signup conversion on your website dropped massively. Instead of panicking, you look at total signups and see that the number is steady.

It turns out that the drop in average conversion rate is caused by a spike in low-quality traffic to the site; the performance of your "core" traffic is unchanged.

2. Denominator vs. numerator

When dealing with changes to ratio metrics (impressions per active user, trips per rideshare driver etc.), first check if it’s the numerator or denominator that moved.

People tend to assume it’s the numerator that moved because that is typically the engagement or productivity metric we are trying to grow in the short-term. However, there are many cases where that’s not true.

Examples include:

  • You see leads per Sales rep go down because the team just onboarded a new class of hires, not because you have a demand generation problem
  • Trips per Uber driver per hour drop not because you have fewer requests from riders, but because the team increased incentives and more drivers are online

3. Isolated / Concentrated Trends

Many metric trends are driven by things that are happening only in a specific part of the product or the business and aggregate numbers don’t tell the whole story.

The general diagnosis flow for isolating the root cause looks like this:

Step 1: Keep decomposing the metrics until you isolate the trend r can’t break the metrics down further.

Similar to how in mathematics every number can be broken down into a set of prime numbers, every metric can be broken down further and further until you reach the fundamental inputs.

By doing this, you are able to isolate the issue to a specific part of your driver tree which makes it much easier to pinpoint what’s going on and what the appropriate response is.

Step 2: Segment the data to isolate the relevant trend

Through segmentation you can figure out if a specific area of the business is the culprit. By segmenting across the following dimensions, you should be able to catch > 90% of issues:

  • Geography (region / country / city)
  • Time (time of month, day of week, etc.)
  • Product (different SKUs or product surfaces (e.g. Instagram Feed vs. Reels))
  • User or customer demographics (age, gender, etc.)
  • Individual entity / actor (e.g. sales rep, merchant, user)

Let’s look at a concrete example:

Let’s say you work at DoorDash and see that the number of completed deliveries in Boston went down week-over-week. Instead of brainstorming ideas to drive demand or increase completion rates, let’s try to isolate the issue so we can develop more targeted solutions.

The first step is to decompose the metric "Completed Deliveries":

Image by author
Image by author

Based on this driver tree, we can rule out the demand side. Instead, we see that we are struggling recently to find drivers to pick up the orders (rather than issues in the restaurant <> courier handoff or the food drop-off).

Lastly, we’ll check if this is a widespread issue or not. In this case, some of the most promising cuts would be to look at geography, time and merchant. The merchant data shows that the issue is widespread and affects many restaurants, so it doesn’t help us narrow things down.

However, when we create a heatmap of time and geography for the metric "delivery requests with no couriers found", we find that we’re mostly affected in the outskirts of Boston at night:

Image by author
Image by author

What do we do with this information? Being able to pinpoint the issue like this allows us to deploy targeted courier acquisition efforts and incentives in these times and places rather than peanut-buttering them across Boston.

In other words, isolating the root cause allows us to deploy our resources more efficiently.

Other examples of concentrated trends you might come across:

  • Most of the in-game purchases in an online game are made by a few "whales" (so the team will want to focus their retention and engagement efforts on these)
  • The majority of support ticket escalations to Engineering are caused by a handful of support reps (giving the company a targeted lever to free up Eng time by training these reps)

4. Mix Shifts

One of the most common sources of confusion in diagnosing performance comes from mix shifts and Simpson’s Paradox.

Mix shifts are simply changes in the composition of a total population. Simpson’s Paradox describes the counterintuitive effect where a trend that you see in the total population disappears or reverses when looking at the subcomponents (or vice versa).

What does that look like in practice?

Let’s say you work at YouTube (or any other company running ads for that matter). You see revenue is declining and when digging into the data, you notice that CPMs have been decreasing for a while.

CPM as a metric cannot be decomposed any further, so you start segmenting the data, but you have trouble identifying the root cause. For example, CPMs across all geographies look stable:

Image by author
Image by author

Here is where the mix shift and Simpson’s Paradox come in: Each individual region’s CPM is unchanged, but if you look at the composition of impressions by region, you find that the mix is shifting from the US to APAC.

Since APAC has a lower CPM than the US, the aggregate CPM is decreasing.

Image by author
Image by author

Again, knowing the exact root cause allows a more tailored response. Based on this data, the team can either try to reignite growth in high-CPM regions, think about additional monetization options for APAC, or focus on making up the lower value of individual impressions through outsized growth in impressions volume in the large APAC market.

Final Thoughts

Remember, data in itself does not have value. It becomes valuable once you use it to generate insights or recommendations for users or internal stakeholders.

By following a structured framework, you’ll be able to reliably identify the relevant trends in the data, and by following the tips above, you can distinguish signal from noise and avoid drawing the wrong conclusions.

If you are interested in more content like this, consider following me here on Medium, on LinkedIn or on Substack.

The post The Ultimate Guide to Making Sense of Data appeared first on Towards Data Science.

]]>
What 10 Years at Uber, Meta and Startups Taught Me About Data Analytics https://towardsdatascience.com/what-10-years-at-uber-meta-and-startups-taught-me-about-data-analytics-fd948b912556/ Thu, 30 May 2024 19:00:44 +0000 https://towardsdatascience.com/what-10-years-at-uber-meta-and-startups-taught-me-about-data-analytics-fd948b912556/ Advice for Data Scientists and Managers

The post What 10 Years at Uber, Meta and Startups Taught Me About Data Analytics appeared first on Towards Data Science.

]]>
Advice for data scientists and managers
Image by Author (generated via Midjourney)
Image by Author (generated via Midjourney)

Over the last 10 years, I have worked in analytical roles in a number of companies, from a small Fintech startup in Germany to high-growth pre-IPO scale-ups (Rippling) and big tech companies (Uber, Meta).

Each company had a unique data culture and each role came with its own challenges and a set of hard-earned lessons. Below, you’ll find ten of my key learnings over the last decade, many of which I’ve found to hold true regardless of company stage, product or business model.

1. You need to tell a story with data.

Think about who your audience is.

If you work in a research-focused organization or you are mostly presenting to technical stakeholders (e.g. Engineering), an academic "white paper"-style analysis might be the way to go.

But if your audience are non-technical business teams or executives, you’ll want to make sure you are focusing on the key insights rather than getting into the technical details, and are connecting your work to the business decisions it is supposed to influence. If you focus too much on the technical details of the analysis, you’ll lose your audience; communication in the workplace is not about what you find interesting to share, but what the audience needs to hear.

The most well-known approach for this type of insights-led, top-down communication is the Pyramid Principle developed by McKinsey consultant Barbara Minto. Check out this recent TDS article on how to leverage it to communicate better as a DS.

2. Strong business acumen is the biggest differentiator between good and great data scientists.

If you are a Senior DS at a company with a high bar, you can expect all of your peers to have strong technical skills.

You won’t stand out by incrementally improving your technical skillset, but rather by ensuring your work is driving maximum impact for your stakeholders (e.g. Product, Engineering, Biz teams).

This is where Business Acumen comes into play: In order to maximize your impact, you need to 1) deeply understand the priorities of the business and the problems your stakeholders are facing, 2) scope analytics solutions that directly help those priorities or address those problems, and 3) communicate your insights and recommendations in a way that your audience understands them (see #1 above).

With strong Business Acumen, you’ll also be able to sanity check your work since you’ll have the business context and judgment to understand whether the result of your analysis, or your proposal, makes sense or not.

Business Acumen is not something that is taught in school or DS bootcamps; so how do you develop it? Here are a few concrete things you can do:

  1. Pay attention in the Company All Hands and other cross-team meetings when strategic priorities are discussed
  2. Practice connecting these priorities to your team’s work; during planning cycles or when new projects come up, ask yourself: "How does this relate to the high-level business priorities?" If you can’t make the connection, discuss this with your manager
  3. When you are doing an analysis, always ask yourself "So what?". A data point or insight only becomes relevant and impactful once you can answer this question and articulate why anyone should care about it. What should they be doing differently based on this data?

The ultimate goal here is to transition from taking requests and working on inbound JIRA tickets to being a thought partner of your stakeholders that shapes the analytics roadmap in partnership with them.

3. Be an objective truth seeker

Many people cherry pick data to fit their narrative. This makes sense: Most organizations reward people for hitting their goals, not for being the most objective.

As a Data Scientist, you have the luxury to push back against this. Data Science teams typically don’t directly own business metrics and are therefore under less pressure to hit short-term goals compared to teams like Sales.

Stakeholders will sometimes pressure you to find data that supports a narrative they have already created in advance. While playing along with this might score you some points in the near term, what will help you in the long term is being a truth seeker and promoting the narrative that the data truly supports.

Image by Author (created via Midjourney)
Image by Author (created via Midjourney)

Even if it is uncomfortable in the moment (as you might be pushing a narrative people don’t want to hear), it will help you stand out and position you as someone that executives will approach when they need an unfiltered and unbiased view on what’s really going on.

4. Data + Primary Research = ❤

Data people often frown at "anecdotal evidence", but it’s a necessary complement to rigorous quantitative analysis.

Running experiments and analyzing large datasets can give you statistically significant insights, but you often miss out on signals that either haven’t reached a large enough scale yet to show up in your data or that are not picked up well by structured data.

Diving into closed-lost deal notes, talking to customers, reading support tickets etc. is sometimes the only way to uncover certain issues (or truly understand root causes).

For example, let’s say you work in a B2B SaaS business. You might see in the data that win rates for your Enterprise deals are declining, and maybe you can even narrow it down to a certain type of customer.

But to truly understand what’s going on, you’ll have to talk to Sales representatives, dig into their deal notes, talk to prospects etc.. In the beginning, this will seem like random anecdotes and noise, but after a while a pattern will start to emerge; and odds are, that pattern did not show in any of the standardized metrics you are tracking.

5. If the data looks too good to be true, it usually is

When people see a steep uptick in a metric, they tend to get excited and attribute this movement to something they did, e.g. a recent feature launch.

Unfortunately, when a metric change seems suspiciously positive, it is often because of data issues or one-off effects. For example:

  • Data is incomplete for recent periods, and the metric will level out once all data points are in
  • There is a one-time tailwind that won’t sustain (e.g. you see a boost in Sales in early January; instead of a sustained improvement to Sales performance, it’s just the backlog from the holiday period that is clearing up)

Don’t get carried away by the excitement about an uptick in metrics. You need a healthy dose of skepticism, curiosity and experience to avoid pitfalls and generate robust insights.

6. Be open to changing your mind

If you work with data, it’s natural to change your opinion on a regular basis. For example:

  • You recommended a course of action to an executive, but have lost faith that it’s the right path forward since you got more data
  • You interpreted a metric movement a certain way, but you ran an additional analysis and now you think something else is going on

However, most analytical people are hesitant to walk back on statements they made in the past out of fear of looking incompetent or angering stakeholders.

That’s understandable; changing your recommendation typically means additional work for stakeholders to adjust to the new reality, and there is a risk they’ll be annoyed as a result.

Still, you shouldn’t stick to a prior recommendation simply out of fear of losing face. You won’t be able to do a good job defending an opinion once you’ve lost faith in it. Leaders like Jeff Bezos recognize the importance of changing your mind when confronted with new information, or simply when you’ve looked at an issue from a different angle. As long as you can clearly articulate why your recommendation changed, it is a sign of strength and intellectual rigor, not weakness.

Changing your mind a lot is so important. You should never let anyone trap you with anything you’ve said in the past. – Jeff Bezos

7. You need to be pragmatic

When working in the Analytics realm, it’s easy to develop perfectionism. You’ve been trained on scientific methods, and pride yourself in knowing the ideal way to approach an analysis or experiment.

Unfortunately, the reality of running a business often puts severe constraints in our way. We need an answer faster than the experiment can provide statistically significant results, we don’t have enough users for a proper unbiased split, or our dataset doesn’t go back far enough to establish the time series pattern we’d like to look at.

It’s your job to help the teams running the business (those shipping the products, closing the deals etc.) get things done. If you insist on the perfect approach, it’s likely the business just moves on without you and your insights.

As with many things, done is better than perfect.

8. Don’t burn out your Data Scientists with ad-hoc requests

Hiring full-stack data scientists to mostly build dashboards or do ad-hoc data pulls & investigations all day is a surefire way to burn them out and have churn on the team.

Many companies, esp. high-growth startups, are hesitant to hire Data Analysts or BI folks specifically dedicated to metric investigations and dashboard building. Headcount is limited, and managers want flexibility in what their teams can tackle, so they hire well-rounded Data Scientists and plan to give them the occasional dashboarding task or metrics investigation request.

In practice, however, this often balloons out of proportion and DS spend a disproportionate amount of time on these tasks. They get drowned in Slack pings that pull them out of their focused work, and "quick asks" (that are never as quick as they initially seem) add up to fill entire days, making it difficult to make progress on larger strategic projects in parallel.

Luckily, there are solutions to this:

  1. Implement an AI chatbot that can field straightforward data questions
  2. Train relevant teams on basic SQL (at least 1–2 analysts per team) to make them more independent. With the Snowflake SQL AI Assistant or Gemini assistance in BigQuery, extensive SQL syntax knowledge is not strictly required anymore to pull data and generate insights
  3. Use self-serve BI tools that give users autonomy and flexibility in getting the insights they need. There has been a ton of progress in recent years, and tools like Omni are getting us closer to a world where self-serve analytics are a reality

9. Not everything needs a fancy Tableau dashboard

Companies tend to see it as a sign of a mature, strong data culture when data is pulled out of spreadsheets into BI solutions.

While dashboards that are heavily used by many stakeholders across the organization and are used as the basis for critical, hard-to-reverse decisions should live in a governed BI tool like Tableau, there are many cases where Google Sheets gets you what you need and gets you there much faster, without the need to scope and build a robust dashboard over the course of days or weeks.

The truth is, teams will always leverage analytics capabilities of the software they use day-to-day (e.g. Salesforce) as well as spreadsheets because they need to move fast. Encouraging this type of nimble, decentralized analytics rather than forcing everything through the bottleneck of a BI tool allows you to preserve the resources of Data Science teams (see #8 above) and equip the teams with what they need to succeed (basic SQL training, data modeling and visualization best practices etc.).

10. Having perfectly standardized metrics across the entire company is a pipe dream

As discussed under #9 above, teams across the company will always unblock themselves by doing hacky analytics outside of BI tools, making it hard to enforce a shared data model. Esp. in fast-growing startups, it’s impossible to enforce perfect governance if you want to ensure teams can still move fast and get things done.

While it gives many Data Scientists nightmares when metric definitions don’t match, in practice it’s not the end of the world. More often than not, differences between numbers are small enough that they don’t change the overall narrative or the resulting recommendation.

As long as critical reports (anything that goes into production, to Wall Street etc.) are handled in a rigorous fashion and adhere to standardized definitions, it’s okay that data is slightly messy across the company (even if it feels uncomfortable).

Final Thoughts

Some of the points above will feel uncomfortable at first (e.g. pushing back on cherry-picked narratives, taking a pragmatic approach rather than pursuing perfection etc.). But in the long run, you’ll find that it will help you stand out and establish yourself as a true thought partner.

For more hands-on analytics advice, consider following me here on Medium, on LinkedIn or on Substack.

The post What 10 Years at Uber, Meta and Startups Taught Me About Data Analytics appeared first on Towards Data Science.

]]>