Problem Definition | Towards Data Science https://towardsdatascience.com/tag/problem-definition/ The world’s leading publication for data science, AI, and ML professionals. Fri, 11 Apr 2025 19:19:38 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://towardsdatascience.com/wp-content/uploads/2025/02/cropped-Favicon-32x32.png Problem Definition | Towards Data Science https://towardsdatascience.com/tag/problem-definition/ 32 32 Ivory Tower Notes: The Problem https://towardsdatascience.com/ivory-tower-notes-the-problem/ Thu, 10 Apr 2025 18:48:08 +0000 https://towardsdatascience.com/?p=605707 When a data science problem is "the" problem

The post Ivory Tower Notes: The Problem appeared first on Towards Data Science.

]]>
Did you ever spend months on a Machine Learning project, only to discover you never defined the “correct” problem at the start? If so, or even if not, and you are only starting with the data science or AI field, welcome to my first Ivory Tower Note, where I will address this topic. 


The term “Ivory Tower” is a metaphor for a situation in which someone is isolated from the practical realities of everyday life. In academia, the term often refers to researchers who engage deeply in theoretical pursuits and remain distant from the realities that practitioners face outside academia.

As a former researcher, I wrote a short series of posts from my old Ivory Tower notes — the notes before the LLM era.

Scary, I know. I am writing this to manage expectations and the question, “Why ever did you do things this way?” — “Because no LLM told me how to do otherwise 10+ years ago.”

That’s why my notes contain “legacy” topics such as data mining, machine learning, multi-criteria decision-making, and (sometimes) human interactions, airplanes ✈ and art.

Nonetheless, whenever there is an opportunity, I will map my “old” knowledge to generative AI advances and explain how I applied it to datasets beyond the Ivory Tower.

Welcome to post #1…


How every Machine Learning and AI journey starts

 — It starts with a problem. 

For you, this is usually “the” problem because you need to live with it for months or, in the case of research, years

With “the” problem, I am addressing the business problem you don’t fully understand or know how to solve at first. 

An even worse scenario is when you think you fully understand and know how to solve it quickly. This then creates only more problems that are again only yours to solve. But more about this in the upcoming sections. 

So, what’s “the” problem about?

Causa: It’s mostly about not managing or leveraging resources properly —  workforce, equipment, money, or time. 

Ratio: It’s usually about generating business value, which can span from improved accuracy, increased productivity, cost savings, revenue gains, faster reaction, decision, planning, delivery or turnaround times. 

Veritas: It’s always about finding a solution that relies and is hidden somewhere in the existing dataset. 

Or, more than one dataset that someone labelled as “the one”, and that’s waiting for you to solve the problem. Because datasets follow and are created from technical or business process logs, “there has to be a solution lying somewhere within them.

Ah, if only it were so easy.

Avoiding a different chain of thought again, the point is you will need to:

1 — Understand the problem fully,
2 — If not given, find the dataset “behind” it, and 
3 — Create a methodology to get to the solution that will generate business value from it. 

On this path, you will be tracked and measured, and time will not be on your side to deliver the solution that will solve “the universe equation.” 

That’s why you will need to approach the problem methodologically, drill down to smaller problems first, and focus entirely on them because they are the root cause of the overall problem. 

That’s why it’s good to learn how to…

Think like a Data Scientist.

Returning to the problem itself, let’s imagine that you are a tourist lost somewhere in the big museum, and you want to figure out where you are. What you do next is walk to the closest info map on the floor, which will show your current location. 

At this moment, in front of you, you see something like this: 

Data Science Process. Image by Author, inspired by Microsoft Learn

The next thing you might tell yourself is, “I want to get to Frida Kahlo’s painting.” (Note: These are the insights you want to get.)

Because your goal is to see this one painting that brought you miles away from your home and now sits two floors below, you head straight to the second floor. Beforehand, you memorized the shortest path to reach your goal. (Note: This is the initial data collection and discovery phase.)

However, along the way, you stumble upon some obstacles — the elevator is shut down for renovation, so you have to use the stairs. The museum paintings were reordered just two days ago, and the info plans didn’t reflect the changes, so the path you had in mind to get to the painting is not accurate. 

Then you find yourself wandering around the third floor already, asking quietly again, “How do I get out of this labyrinth and get to my painting faster?

While you don’t know the answer, you ask the museum staff on the third floor to help you out, and you start collecting the new data to get the correct route to your painting. (Note: This is a new data collection and discovery phase.)

Nonetheless, once you get to the second floor, you get lost again, but what you do next is start noticing a pattern in how the paintings have been ordered chronologically and thematically to group the artists whose styles overlap, thus giving you an indication of where to go to find your painting. (Note: This is a modelling phase overlapped with the enrichment phase from the dataset you collected during school days — your art knowledge.)

Finally, after adapting the pattern analysis and recalling the collected inputs on the museum route, you arrive in front of the painting you had been planning to see since booking your flight a few months ago. 

What I described now is how you approach data science and, nowadays, generative AI problems. You always start with the end goal in mind and ask yourself:

“What is the expected outcome I want or need to get from this?”

Then you start planning from this question backwards. The example above started with requesting holidays, booking flights, arranging accommodation, traveling to a destination, buying museum tickets, wandering around in a museum, and then seeing the painting you’ve been reading about for ages. 

Of course, there is more to it, and this process should be approached differently if you need to solve someone else’s problem, which is a bit more complex than locating the painting in the museum. 

In this case, you have to…

Ask the “good” questions.

To do this, let’s define what a good question means [1]: 

A good data science question must be concrete, tractable, and answerable. Your question works well if it naturally points to a feasible approach for your project. If your question is too vague to suggest what data you need, it won’t effectively guide your work.

Formulating good questions keeps you on track so you don’t get lost in the data that should be used to get to the specific problem solution, or you don’t end up solving the wrong problem.

Going into more detail, good questions will help identify gaps in reasoning, avoid faulty premises, and create alternative scenarios in case things do go south (which almost always happens)👇🏼.

Image created by Author after analyzing “Chapter 2. Setting goals by asking good questions” from “Think Like a Data Scientist” book [2]

From the above-presented diagram, you understand how good questions, first and foremost, need to support concrete assumptions. This means they need to be formulated in a way that your premises are clear and ensure they can be tested without mixing up facts with opinions.

Good questions produce answers that move you closer to your goal, whether through confirming hypotheses, providing new insights, or eliminating wrong paths. They are measurable, and with this, they connect to project goals because they are formulated with consideration of what’s possible, valuable, and efficient [2].

Good questions are answerable with available data, considering current data relevance and limitations. 

Last but not least, good questions anticipate obstacles. If something is certain in data science, this is the uncertainty, so having backup plans when things don’t work as expected is important to produce results for your project.

Let’s exemplify this with one use case of an airline company that has a challenge with increasing its fleet availability due to unplanned technical groundings (UTG).

These unexpected maintenance events disrupt flights and cost the company significant money. Because of this, executives decided to react to the problem and call in a data scientist (you) to help them improve aircraft availability.

Now, if this would be the first data science task you ever got, you would maybe start an investigation by asking:

“How can we eliminate all unplanned maintenance events?”

You understand how this question is an example of the wrong or “poor” one because:

  • It is not realistic: It includes every possible defect, both small and big, into one impossible goal of “zero operational interruptions”.
  • It doesn’t hold a measure of success: There’s no concrete metric to show progress, and if you’re not at zero, you’re at “failure.”
  • It is not data-driven: The question didn’t cover which data is recorded before delays occur, and how the aircraft unavailability is measured and reported from it.

So, instead of this vague question, you would probably ask a set of targeted questions:

  1. Which aircraft (sub)system is most critical to flight disruptions?
    (Concrete, specific, answerable) This question narrows down your scope, focusing on only one or two specific (sub) systems affecting most delays.
  2. What constitutes “critical downtime” from an operational perspective?
    (Valuable, ties to business goals) If the airline (or regulatory body) doesn’t define how many minutes of unscheduled downtime matter for schedule disruptions, you might waste effort solving less urgent issues.
  3. Which data sources capture the root causes, and how can we fuse them?
    (Manageable, narrows the scope of the project further) This clarifies which data sources one would need to find the problem solution.

With these sharper questions, you will drill down to the real problem:

  • Not all delays weigh the same in cost or impact. The “correct” data science problem is to predict critical subsystem failures that lead to operationally costly interruptions so maintenance crews can prioritize them.

That’s why…

Defining the problem determines every step after. 

It’s the foundation upon which your data, modelling, and evaluation phases are built 👇🏼.

Image created by Author after analyzing and overlapping different images from “Chapter 2. Setting goals by asking good questions, Think Like a Data Scientist” book [2]

It means you are clarifying the project’s objectives, constraints, and scope; you need to articulate the ultimate goal first and, except for asking “What’s the expected outcome I want or need to get from this?”, ask as well: 

What would success look like and how can we measure it?

From there, drill down to (possible) next-level questions that you (I) have learned from the Ivory Tower days:
 — History questions: “Has anyone tried to solve this before? What happened? What is still missing?”
 —  Context questions: “Who is affected by this problem and how? How are they partially resolving it now? Which sources, methods, and tools are they using now, and can they still be reused in the new models?”
 — Impact Questions: “What happens if we don’t solve this? What changes if we do? Is there a value we can create by default? How much will this approach cost?”
Assumption Questions: “What are we taking for granted that might not be true (especially when it comes to data and stakeholders’ ideas)?”
 — ….

Then, do this in the loop and always “ask, ask again, and don’t stop asking” questions so you can drill down and understand which data and analysis are needed and what the ground problem is. 

This is the evergreen knowledge you can apply nowadays, too, when deciding if your problem is of a predictive or generative nature

(More about this in some other note where I will explain how problematic it is trying to solve the problem with the models that have never seen — or have never been trained on — similar problems before.)

Now, going back to memory lane…

I want to add one important note: I have learned from late nights in the Ivory Tower that no amount of data or data science knowledge can save you if you’re solving the wrong problem and trying to get the solution (answer) from a question that was simply wrong and vague. 

When you have a problem on hand, do not rush into assumptions or building the models without understanding what you need to do (Festina lente)

In addition, prepare yourself for unexpected situations and do a proper investigation with your stakeholders and domain experts because their patience will be limited, too. 

With this, I want to say that the “real art” of being successful in data projects is knowing precisely what the problem is, figuring out if it can be solved in the first place, and then coming up with the “how” part. 

You get there by learning to ask good questions.

To end this narrative, recall how Einstein famously said:  

If I were given one hour to save the planet, I would spend 59 minutes defining the problem and one minute solving it.


Thank you for reading, and stay tuned for the next Ivory Tower note.

If you found this post valuable, feel free to share it with your network. 👏

Connect for more stories on Medium ✍ and LinkedIn 🖇.


References: 

[1] DS4Humans, Backwards Design, accessed: April 5th 2025, https://ds4humans.com/40_in_practice/05_backwards_design.html#defining-a-good-question

[2] Godsey, B. (2017), Think Like a Data Scientist: Tackle the data science process step-by-step, Manning Publications.

The post Ivory Tower Notes: The Problem appeared first on Towards Data Science.

]]>
Cracking Business Case Interviews for Data Scientists – Step 1) Define a Problem https://towardsdatascience.com/cracking-business-case-interviews-for-data-scientists-step-1-define-a-problem-6a63f86b9a38/ Mon, 06 Sep 2021 16:04:00 +0000 https://towardsdatascience.com/cracking-business-case-interviews-for-data-scientists-step-1-define-a-problem-6a63f86b9a38/ Business case study interview questions are often the most difficult part of the interview process, especially for early data scientists...

The post Cracking Business Case Interviews for Data Scientists – Step 1) Define a Problem appeared first on Towards Data Science.

]]>
Cracking Business Case Interviews for Data Scientists: Part 1

Problem Definition, Structuring, and Prioritization

Image by Minha Hwang
Image by Minha Hwang

Business case study interview questions are often the most difficult part of the interview process, especially for early data scientists without job experience. Even after landing on a job, it is beneficial to have a structured way of formulating and solving business problems from other teams and stakeholders.

Based on my learning as an Ex-McKinsey consultant and data scientist over 10 years, I am sharing a "7 steps of problem-solving" approach. This structured business case/problem-solving process and related framework and templates will help you to become more comfortable in approaching and solving new business problems.

In this article, I will discuss:

(1) Overview of a "7 steps of problem-solving" approach

(2) Step 1: Define a problem – How to avoid Type III error by solving a proper problem

(3) Step 2: Structure a problem – How to break down a problem to manageable sub-problems

(4) Step 3: Prioritize issues – How to prioritize sub-problems to solve

A link to "Part 2" is shown below for the rest of steps.

Cracking Business Case Interviews for Data Scientists – Part 2 | by Minha Hwang | Towards Data Science

Overview of a "7 steps of problem-solving" approach

Solving a business problem is an inherently difficult task. However, applying a structured problem-solving approach can be very helpful for data scientists to make this more enjoyable. This is not taught well in universities or graduate schools yet. Top management consulting companies such as McKinsey have developed this approach, and this is one of the secret sauces for their successes. In the spirit of sharing, I will describe the approach in detail in this series of 2 articles. This article will be useful (1) for data scientists to prepare business case interviews, (2) for data scientists and their business partners to solve business problems and have business impacts, (3) for MBAs to prepare for business case study interviews in management consulting companies, and (4) for Ph.D. students and researchers to adopt a hypothesis-driven approach and become more efficient in their research efforts. The 7 key steps of business problem-solving are as follows:

(1) Define a problem

(2) Structure a problem

(3) Prioritize issues

(4) Develop issue analysis and work plan

(5) Conduct analyses

(6) Synthesize findings

(7) Develop recommendations

This is not a linear process. Thus, you will need to iterate this a few times to come to a better solution, until you are convinced that the benefit from further iteration is limited. It is very important to periodically revise and re-evaluate. These steps sound quite intuitive and simple. At this point, you may wonder whether this would even help. Devils are in detail. Let me now deep dive into each step to show how this can work. I have summarized "7 steps of problem-solving" in Figure 1.

Figure 1: Image by Minha Hwang
Figure 1: Image by Minha Hwang

Step 1: Define a Problem

The most important step of solving the business case, which is often overlooked, is "problem definition." What is the point of perfectly solving the problem with a sophisticated natural language processing model if the machine learning engineer solved the wrong problem? Believe it or not, this "Type III error" (solving a wrong problem) happens a lot in practice. This could have been entirely avoided by having a proper problem definition stage.

Before describing key elements for this stage, I would like to point out that a "well-defined problem" that can be solved is very different from what we usually mean by a "problem." In day-to-day interactions, what we usually mean by a "problem" is recognition that "something is not right." There is a gap between what happened and what is supposed to happen (i.e. goal.) Or there is a gap between what happened and what could happen (i.e. opportunity.) Recognition of a "problem" is difficult and anxiety-producing. It tends to be very "general" and it can be just a "statement of fact". It is merely the recognition of the situation, which calls for action.

A "well-defined problem" that can be solved has the following properties:

  • A thought-provoking question, not a fact
  • Specific, not general
  • Measurable
  • Action-oriented
  • Relevant (to the key problem)
  • Time-bound

Moreover, it is

  • Debatable (not a statement of fact or non-disputable assertion)
  • Focused on what the decision-maker needs to move forward

These properties of well-defined problems are often summarized by a "SMART" principle. This principle is shown in Figure 2.

Figure 2: Image by Minha Hwang
Figure 2: Image by Minha Hwang

To make a "well-defined problem" more concrete, let me please use a simple toy example to contrast a "well-defined problem" from a "problem" that we usually mean. Summarized below are contexts for this toy example: John Octopus.

  • What happened: John went to see "Ready Player One" (a computer-generated animation movie (i.e. CGI) on Virtual Reality) with his friends last Saturday, and he loved it.
  • Goal: John is so impressed after the movie. Thus, he wants to become a Hollywood CGI movie director.
  • Situation calling for action (a problem in the typical sense):
  • John has no clue on how to create a CGI movie.
  • He does not even own a computer.
  • First, he needs to figure out how to buy a computer, to get started.

Do we have enough information to properly define the problem? No, not yet! I have summarized what information we have so far in Figure 3.

Figure 3: Image by Minha Hwang
Figure 3: Image by Minha Hwang

Let me introduce the first tool for problem definition, to clarify what additional information we need to "properly define" the problem. The "problem definition worksheet" is a good checklist to ensure that we have all the required information to properly define the problem. The key elements are:

(1) Problem definition: Please force yourself to put this as a "question."

(2) Contexts: It is important to know historical and organizational contexts around the problem.

(3) Key stakeholders: Who are the key decision-makers? Who will be affected by the decision?

(4) Criteria for success: Without this, how can you even know whether you successfully solve the business problem? Please make sure that you properly define "success metrics" before solving the problem. You can’t solve the problem, which can’t be measured.

(5) Constraints: It is also helpful to write down what should not be even considered.

(6) Scope of solution space: Oftentimes, there are clear requirements for geographic or business line focus for the problem. It is also good to check out timing requirements (i.e. answers in 4 weeks, 3 months, 1 year?). Sometimes, 80/20 directional answers are desire. Other times, very precise answers (1% increase in website traffic or 1.1% increase in website traffic) are required. Having clarity on required accuracy helps to properly measure whether the proposed approach or solution will be appropriate.

To facilitate future use, a first tool, "problem definition worksheet" is shown below in Figure 4.

Figure 4: Image by Minha Hwang
Figure 4: Image by Minha Hwang

Now, let’s gather additional information for the John Octopus toy example. Those are:

  • Timing: John would like to get some design practices from August during the summer vacation, which is only six months away.
  • Type of computers and price ranges for computers:
  • John found that MacBook Pro is good for CGI use.
  • Used MacBook Pro can be purchased at ~ $900
  • Other contexts:
  • John does not want to borrow or rent a computer.
  • John has $140 in savings. His parents give him an allowance of $100 per month, and he earns $15 per hour walking the neighbor’s dog once a week, which amounts to $60 per month.
  • John spends $80 per month, on average.
  • If John wants a computer within 10 months, he may be able to buy it simply by saving his money.

With additional required information, we can properly define the problem for John Octopus, which is summarized in Figure 5.

Figure 5: Image by Minha Hwang
Figure 5: Image by Minha Hwang

It is often helpful to see bad examples of problem definition. Figure 6 shows three examples of flawed problem definitions for the same John Octopus example.

Figure 6 Image by Minha Hwang
Figure 6 Image by Minha Hwang

As a final exercise, we can consider one more example, which is less based on a personal experience example, but more from a business case example.

Figure 7 shows more examples of flawed problem definitions for Onion Bank.

Figure 7: Image by Minha Hwang
Figure 7: Image by Minha Hwang

In contrast, Figure 8 shows an example of a well-defined problem for Onion Bank.

Figure 8: Image by Minha Hwang
Figure 8: Image by Minha Hwang

Hope this article helps you to understand the key elements for well-defined business problems. A good definition of the problem is more than 60% of problem-solving efforts. Please do make sure that you are properly defining the problem in the data scientist interviews or day-to-day tasks as data scientists.

Step 2: Structure a Problem

Once you have properly defined a problem, the next step is "structuring a problem." Oftentimes, the defined problem is too big to solve efficiently even after problem definition. Breaking down a problem into smaller and manageable components of sub-problems (i.e. issues) is very helpful to make problem-solving feasible and efficient. This allows the work to be divided and distributed to different team members with proper responsibility allocations. Moreover, this step is the foundation for the subsequent step of issue prioritization where priorities are set in terms of where to focus the problem-solving efforts. A "logic tree" (Tool #2) is very useful in the process of problem structuring. Figure 9 introduces a logic tree.

Figure 9: Image by Minha Hwang
Figure 9: Image by Minha Hwang

A "logic tree" (Tool #2) helps us to ensure the integrity of the problem to be maintained. By checking that discrete chunks (i.e. sub-problems) are mutually exclusive and collectively exhaustive (i.e. "MECE"), we can ensure that solving the parts of the problem will really solve the problem in the end. Since the parts do not overlap, we can avoid duplicate efforts. The fact that there are no gaps (collectively exhaustive) ensures that we are not leaving out anything important. This also helps as a communication device. Creating a logic tree as a team helps to build a common understanding, which can be also shared in a structured way outside of the team. Finally, this helps to focus the use of frameworks and theories. Validating proposed frameworks and theories often reveals the gaps in logic or aspects which have not been considered yet.

There are two different types of logic trees. In the initial phase of problem-solving, an "issue tree" (what/how tree) is more useful to think about the entire solution space. Once you know enough about the problem (e.g. after a few iterations of exploratory analysis, initial secondary data research, or interviews and meetings), a "hypothesis-driven tree" (why tree) becomes more relevant. This tree is more suitable to focus problem-solving efforts based on hypotheses and serve as the foundation for prioritization. Building good hypotheses early and prioritizing efforts properly are secrets behind the success of Data Science leaders. Figure 10 summarizes two types of logic trees.

Figure 10: Image by Minha Hwang
Figure 10: Image by Minha Hwang

To make the use of a "logic tree" more concrete, let’s start a practice by using a simple case of John Octopus. Figure 11 is a sample "issue tree" that you can create to solve the problem for John Octopus. Please note that we are asking "how" questions to develop this tree. Figure 11 shows the top 2 levels. Since the goal is to come up with $900 for a used MacBook Pro computer in 6 months, we can start dividing the problem into increasing income vs. reducing the saving. To develop the branches below, we have to think about how John Octopus earned his income and where he spent most of his money.

Figure 11: Image by Minha Hwang
Figure 11: Image by Minha Hwang

Once we have developed the top levels, we can further develop the issue tree both for upper and lower branches. I would recommend you to do this without looking at the solutions below if you want to practice. After the practice, you may realize that developing a "MECE" tree takes a good deal of thinking even for a simple problem like this. Figure 12 and Figure 13 shows potential issue tress for upper and lower branches, respectively.

Figure 12: Image by Minha Hwang
Figure 12: Image by Minha Hwang
Figure 13: Image by Minha Hwang
Figure 13: Image by Minha Hwang

What would be considered as a bad "issue tree"? Seeing bad examples helps to understand what would be required for a good solution. Figure 14 shows flawed issue tree examples for the John Octopus problem. You can see that it is not consistent by mixing up different levels – increasing income is at the same level with many spending category breakdowns. Moreover, this is not "MECE", since important spending categories such as coffee and candy are not shown.

Figure 14: Image by Minha Hwang
Figure 14: Image by Minha Hwang

So far, we have seen an example of an "issue tree" (how/what tree) for the John Octopus problem. If you create a "hypothesis-driven tree" (why tree), how would it be different from the issue tree that you created? Figure 15 shows a "hypothesis-driven tree" example for the John Octopus problem. Please note that this can be developed by asking a series of "why" questions.

Figure 15: Image by Minha Hwang
Figure 15: Image by Minha Hwang

Now, let’s do another practice with creating an "issue tree" by using a more business-oriented example. Assuming that you are hired as a management consultant for Coca-Cola Company, which tries to solve a declining global profitability problem. There can be many ways to develop "issue tree(s)". These will help you to consider different dimensions and aspects of the given problem. Figure 16, Figure 17, and Figure 18 show three different ways of developing issue tree(s) for a given problem: By profit drivers, geographies, and lines of business.

Figure 16: Image by Minha Hwang
Figure 16: Image by Minha Hwang
Figure 17: Image by Minha Hwang
Figure 17: Image by Minha Hwang
Figure 18: Image by Minha Hwang
Figure 18: Image by Minha Hwang

Finally, let’s consider one more example, which is a data science problem for A/B testing. The problem that we are trying to solve is "how to increase the sensitivity of A/B testing." This is an important problem for many large Tech/Internet companies which use data-driven decision-making based on A/B testing to make their product feature release decisions. By increasing metric sensitivity, companies can do precise inference with fewer data (i.e. shorter durations for experiments or smaller sample sizes for experiments.) Typical A/B testing depends on an independent 2 sample t-test as test statistics. Having an intuitive understanding of what drives the magnitude of the test statistics in A/B tests will help you to develop an issue tree and brainstorm ways to increase metric sensitivity. As a reminder, a test statistic for A/B testing is shown below. From a more intuitive look at the formula, you can realize that 3 things would matter: an effect size (i.e. difference in means between your treatment group and control group), variance (i.e. a noise level), and sample size. Please note that I have simplified a more general formula on the top with the assumption of the same variance across treatment and control groups (i.e. pooled variance) and the same sample size between treatment and control groups, to facilitate intuition.

Figure 20: Image by Minha Hwang
Figure 20: Image by Minha Hwang

Figure 20 shows a potential "issue tree" for the problem of increasing A/B test sensitivity.

Figure 20: Image by Minha Hwang
Figure 20: Image by Minha Hwang

Hope you have developed a good feel about how to apply a logic tree to break down the problem into more manageable chunks (i.e. subproblem). In summary, good logic trees are (1) consistent, (2) relevant, and (3) "MECE". Figure 21 makes this point more clearer in a picture.

Figure 21: Image by Minha Hwang
Figure 21: Image by Minha Hwang

Finally, I will close this section by providing a few tips on "how to make a logic tree." Figure 22 shows the tips, together with rationales (i.e. why’s).

Figure 22: Image by Minha Hwang
Figure 22: Image by Minha Hwang

Step 3: Prioritize Sub-Problems

Once you have broken down a problem into sub-problems, the next step is "prioritization". This is essentially cutting off branches on the issue tree or hypothesis-driven tree to focus on what is most important. Figure 23 shows this process as a graphic.

Figure 23: Image by Minha Hwang
Figure 23: Image by Minha Hwang

What are the potential criteria that we can use for this prioritization? Potential (business) impacts, technical or execution feasibility, risks are often useful as criteria. Personal or corporate values also help as a guide. On the practical side, it is not a bad idea to reference the top management agenda to ensure that your projects would receive the required support from leadership and top management. Checking OKRs would help with this regard. Figure 24 summarizes potential criteria that you can consider.

Figure 24: Image by Minha Hwang
Figure 24: Image by Minha Hwang

Figure 25 shows "Tool #3", which is useful for this prioritization step of the problem-solving. Prioritization matrix (2 x 2 matrix) where one axis is potential impact, and the other axis is feasibility, is a useful visualization tool to prioritize the subproblems to focus on. In Figure 25, I have used the John Octopus problem to make it more concrete.

Figure 25: Image by Minha Hwang
Figure 25: Image by Minha Hwang

To become more familiar with step 3 of problem-solving: prioritization, let’s use the John Octopus problem again. In Figure 26, I show the upper branches of the issue tree and how you can apply prioritization. Similarly, in Figure 27, the lower branches of the issue tree are shown with a similar prioritization exercise.

Figure 26: Image by Minha Hwang
Figure 26: Image by Minha Hwang
Figure 27: Image by Minha Hwang
Figure 27: Image by Minha Hwang

Until now, the first 3 steps of a structured problem-solving approach: (1) define a problem, (2) structure a problem, and (3) prioritize issues are introduced with examples to make them more concrete.

In the subsequent article (Part 2), I will describe the remaining parts of the problem-solving process in detail. A link to the next article is shown below.

Cracking Business Case Interviews for Data Scientists-Part 2

Disclaimer: I am not representing McKinsey for the suggested problem-solving approach, which is described here. I am just sharing my opinion.


Originally published at http://bigdataco.blogspot.com on September 6, 2021.

The post Cracking Business Case Interviews for Data Scientists – Step 1) Define a Problem appeared first on Towards Data Science.

]]>