Jarom Hulet, Author at Towards Data Science https://towardsdatascience.com The world’s leading publication for data science, AI, and ML professionals. Thu, 03 Apr 2025 19:12:32 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://towardsdatascience.com/wp-content/uploads/2025/02/cropped-Favicon-32x32.png Jarom Hulet, Author at Towards Data Science https://towardsdatascience.com 32 32 Linear Programming: Managing Multiple Targets with Goal Programming https://towardsdatascience.com/linear-programming-managing-multiple-targets-with-goal-programming/ Thu, 03 Apr 2025 19:12:16 +0000 https://towardsdatascience.com/?p=605406 Part 6: Balancing multiple objectives using the weights and preemptive goal programming approaches

The post Linear Programming: Managing Multiple Targets with Goal Programming appeared first on Towards Data Science.

]]>

This is the sixth (and likely last) part of a Linear Programming series I’ve been writing. With the core concepts covered by the prior articles, this article focuses on goal programming which is a less frequent linear programming (LP) use case. Goal programming is a specific linear programming setup that can handle the optimization of multiple objectives in a single LP problem.

By the end of this article, you’ll understand:
1. Definition of goal programming and when it should be used
2. The weighted goal programming approach — illustrated with an example
3. The preemptive goal programming approach — illustrated with an example

Definition and Use Case of Goal Programming

Goal programming is a special case of linear programming that allows for multiple — often conflicting — objectives to be balanced. With regular LP problems, you pick a single metric to optimize (minimize or maximize) and set constraints to ensure that the optimal solution is feasible. Goal programming is a technique that allows for multiple objective metrics to be targeted at the same time.

The ‘mindset’ of goal programming is fundamentally different from plain vanilla LP problems. Basic LP searches for ways to get as little or as much of a single metric as possible — e.g., maximize profit or minimize waste  — subject to constraints. Often conflicting priorities will be found in the objective function or the constraints. For example, maximize profit (objective) subject to a maximum amount of waste (constraint). With goal programming, we can move important constraint metrics into the objective function so we can optimize on them rather than just constraining them. We can maximize profit and minimize waste at the same time!

Now is a good time to establish the example we will explore for the rest of the article:

Let’s imagine that we manage a factory that makes clothing. Our factory can make pants, shirts, and dresses. Each article of clothing has costs, profit, and waste associated with their production. We want to create a production plan that will make a certain level of profit and also waste below a certain quantity due to an environmental commitment. Let’s say that we want to make $150k a month in profit and we also want to waste less than 20k yards of fabric. In addition to our goals, we can’t spend more than $50k in materials and labor. 

The example above sounds a lot like a regular linear programming problem right? Well, the twist is that we can’t make $150k in profit and waste less than 20k of yards at the same time. In other words, there would be no feasible solution if we were to plug this into a regular linear programming problem. Typically, the goals set in the problems cannot all be achieved with a single solution, otherwise there wouldn’t be a point in using goal programming. We would just use regular old linear programming with our goals as constraints. The real value of goal programming is that it can create a compromise between conflicting goals when regular linear programming would yield an infeasible solution.

The real value of goal programming is that it can create a compromise between conflicting goals when regular linear programming would yield an infeasible solution.

How does goal programming balance and compromise with conflicting goals? There are two popular approaches: (1) weighted and (2) preemptive. We’ll cover these in detail in the following sections.

The weights approach

Here, we’ll dive into the details of the weights approach. The weights method has a single objective function and runs a single Optimization based on (you guessed it) weights! The fact that only one optimization is run under the weights method may seem like a given — but the preemptive method actually runs multiple linear programming optimizations. We’ll get to that in the next section…

The weights method has specific targets or goals for multiple metrics — e.g., make at least $150k in profit selling clothes or waste no more than 20k yards of fabric. For regular LP, we want to fully optimize. For the weights method of goal programming, we want to get as close to hitting goals as possible — after we reach a goal, the optimization doesn’t see more benefit in further maximization or minimization, so it prioritizes hitting the next most important goal. If this seems confusing now, don’t worry it will make more sense as we get into the example.

The objective function for the weights approach is specially formulated to minimize the weighted differences between a metric’s goal and the actual value of the metric. Let’s jump to our example from above — i.e., we want to make $150k in profit and waste less than 20k yards of raw material. Our objective is to minimize how far off we are from both of these goals. 

Here is the mathematical formulation for this objective:

Where w1 and w2 are assigned weights and P and W are how far we miss our profit and waste goals respectively

With our objective function set up, we need to define our constraints. We will have two types of constraints (1) goal related constraints and (2) regular linear programming constraints (the same kind of constraints you would find in plain vanilla LP). Let’s talk about the goal related constraints first.

We will need to create two things to set up our goal related constraints, (1) profit and waste functions and (2) multiple slack variables. Let’s go through these one at a time.

The profit and waste functions are very straight forward. They combine our decision variables together to calculate total profit and total waste give a specific solution. Below are the formulas that tie profit and waste to the number of pants, shirts and dresses we produce:

profit and waste functions

With our profit and waste functions established, let’s start talking about our slack variables. In goal programming, slack variables are used to measure how far a solution is from hitting a goal. In our example, the variables P and W are both slack variables — they represent how much lower our profit is compared to our profit goal and how much higher our waste is compared to our waste goal. Slack variables are embedded in constraints. Below are the constraint functions for our profit and waste goals — again, the P’s and the W’s are our slack variables:

P+, P-, W+ and W- are slack variables, profit and waste are functions established with the formulas above

Note that we have plus and minus slack variables — this allows us to miss the goal on either end. We only want to penalize the slack variable that goes in the opposite direction of our goal (e.g., we don’t want to penalize more profit than our goal, we only want to penalize less profit) — that is why only one slack variable per goal is in the objective function. With this new notation, let’s rewrite our objective function:

Objective function with updated slack variable notation

We have now done all of the special work for goal programming. The last thing we need to do is quickly add our plain vanilla budget constraint. We are using a regular constraint for our budget because, in our example, it is a hard constraint. Unlike with profit and waste, we cannot violate the budget.

Regular (not goal programming related) budget constraint

Now, we have a fully specified goal programming problem. Let’s set it up in Python and solve!

# $150,000 in profit
problem += profit + profit_deviation_neg - profit_deviation_pos == 150000, "Profit_Goal"

# Waste goal: No more than 20,000 yards of waste
problem += waste + waste_deviation_neg - waste_deviation_pos == 20000, "Cost_Goal"

# Budget constraint
problem += cost <= 50000

# Solve the problem
problem.solve()

# Display the results
print("Status:", pulp.LpStatus[problem.status])
print("Pants produced:", pulp.value(pants))
print("Shirts produced:", pulp.value(shirts))
print("Dresses produced:", pulp.value(dresses))
print("Cost :", pulp.value(cost))
print("Profit :", pulp.value(profit))
print("Profit deviation (positive):", pulp.value(profit_deviation_pos))
print("Profit deviation (negative):", pulp.value(profit_deviation_neg))
print("Waste :", pulp.value(waste))
print("Waste deviation (positive):", pulp.value(waste_deviation_pos))
print("Waste deviation (negative):", pulp.value(waste_deviation_neg))

This optimization recommends we make 0 pants, 5,000 shirts and 0 dresses. We make $150k in profit which matches our goal exactly and we waste 50k yards of fabric which exceeds our max waste by 30k yards. The full results are printed by the code and shown below:

results of optimization run with equal weights

Now that we’ve got the basic structure of the weights approach down, let’s actually talk about the weights! In our first example, we gave equal weights to a dollar of profit and to a yard of waste. This probably doesn’t make a lot of sense because they are different units. Setting the weights is a subjective decision to be made by the practitioner. For our example, we’ll decide that wasting 1.5 yards of fabric is as bad as making $1 of profit is good. In other words, we’ll increase the weight of fabric waste to 1.5 in our objective function.

problem += profit_deviation_neg + 1.5*waste_deviation_pos

The optimization with the updated rates recommends we make about 8,572 pants, 7,143 shirts and 0 dresses. With this solution, we generate $107k in profit (which is a goal miss of $43k) and we waste 20,000 yards of fabric which matches our goal exactly. The full results are printed by the code and shown below:

Results of optimization run with 1.5 weight on fabric waste

Clearly, shifting the weights of the goals can have large impact on the optimization results. We need to carefully set our weights to adequately balance the relative importance of our goals!

Now that we have a solid understanding of how the weighted approach works, let’s shift to talking about the goal programming with the preemptive approach.

Preemptive Approach

While the weights method balances goals using weights in the objective function, the preemptive method gives hierarchical priority to goals through iterative optimizations. That’s a lot of words, don’t worry, we’ll break it down!

Here are the steps to the preemptive approach:

1. Run a regular linear programming optimization on your first goal — e.g., maximize profit
2. Save the objective value from that run
3. Run another regular linear programming on the next most important goal — e.g., minimize waste — but, add the objective value from the last run as a constraint
4. Repeat the process until you have gone through all goal metrics

Two important features of the preemptive method are (1) it prioritizes goals by rank and (2) the objective value of a higher importance goal cannot be decreased (because of the hard constraint) when optimizing lower priority goals. Let’s go over an example to build intuition.

For our example, let’s say that profit is the most important goal and minimizing waste is the second. We’ll start out by running a plain vanilla optimization that maximizes profit:

# Define the problem
problem = pulp.LpProblem("Clothing_Company_Goal_Programming", pulp.LpMaximize)

# Decision variables: number of pants, shirts, and dresses produced
pants = pulp.LpVariable('pants', lowBound=0, cat='Continuous')
shirts = pulp.LpVariable('shirts', lowBound=0, cat='Continuous')
dresses = pulp.LpVariable('dresses', lowBound=0, cat='Continuous')

# Profit and cost coefficients
profit = 10 * pants + 3 * shirts + 15 * dresses
cost = 5 * pants + 1 * shirts + 10 * dresses
waste = 1.5 * pants + 1 * shirts + 3 * dresses

# Objective function: Maximize profit
problem += profit

# Constraints
# Budget constraint
problem += cost <= 50000

# Solve the problem
problem.solve()

# Display the results
print("Status:", pulp.LpStatus[problem.status])
print("Pants produced:", pulp.value(pants))
print("Shirts produced:", pulp.value(shirts))
print("Dresses produced:", pulp.value(dresses))
print("Cost :", pulp.value(cost))
print("Profit :", pulp.value(profit))

The results of the profit maximizing LP problem are below:

Profit maximization

So, our objective function says to make 50k shirts and collect a profit of $150k. This was only the first optimization we are going to run though! Following the algorithm outlined above, we will now run another LP that minimizes waste but, we will add a constraint of profit ≥ $150k to ensure we don’t get a worse profit.

# Define the problem
problem = pulp.LpProblem("Clothing_Company_Goal_Programming", pulp.LpMinimize)

# Decision variables: number of pants, shirts, and dresses produced
pants = pulp.LpVariable('pants', lowBound=0, cat='Continuous')
shirts = pulp.LpVariable('shirts', lowBound=0, cat='Continuous')
dresses = pulp.LpVariable('dresses', lowBound=0, cat='Continuous')

# Profit and cost coefficients
profit = 10 * pants + 3 * shirts + 15 * dresses
cost = 5 * pants + 1 * shirts + 10 * dresses
waste = 1.5 * pants + 1 * shirts + 3 * dresses

# Objective function: Minimize the fabric waste
problem += waste

# Budget constraint
problem += cost <= 50000

problem += profit >= 150000

# Solve the problem
problem.solve()

# Display the results
print("Status:", pulp.LpStatus[problem.status])
print("Pants produced:", pulp.value(pants))
print("Shirts produced:", pulp.value(shirts))
print("Dresses produced:", pulp.value(dresses))
print("Cost :", pulp.value(cost))
print("Profit :", pulp.value(profit))

And here are the results of this final optimization:

results of waste minimizing optimization

The astute observer will notice that the optimization is the exact same 😅. This can often be the case with the preemptive method. The constraint of previously optimized goals can be very restrictive. The only time when iterative optimizations are different is if there are multiple ways to get the optimal values for previous goals. For example, if there were two ways to get $150k in profit; one had more waste and the other had less, our second iteration would return the results of the solution with the lower waste. In the preemptive method, there is no trade off between the goals. Even if there was a solution that made $149k in profit with 0 yards of waste, the preemptive method would always choose $150k in profit with 50k yards of waste. The extra $1000 of profit is infinitely more important than any amount of wasted fabric.

The preemptive method should be used when goals are clearly prioritized, and there is no substitution between them — meaning no amount of success in lower priority goals can compensate for reduced optimization in higher priority ones. When used correctly, the preemptive method can help optimize a primary goal while trying to find good solutions for lower priority goals very effectively.

Conclusion

Goal programming provides a framework that extends traditional linear programming to optimize multiple metrics at the same time. The weighted approach balances priorities via weights in the objective function and is appropriate when objective metrics have relative importance that can be quantified. The preemptive method is an iterative approach that gives priority to metrics based on hierarchy. It is appropriate when some objectives are wholly more important than others. Both approaches can be powerful optimization techniques when applied in the correct context!

Happy optimizing!

Earlier articles in this series:

The post Linear Programming: Managing Multiple Targets with Goal Programming appeared first on Towards Data Science.

]]>
Bite-Size Data Science: Falling for the Gambler’s Fallacy https://towardsdatascience.com/bite-size-data-science-falling-for-the-gamblers-fallacy-9a3ebf7a86cd/ Thu, 30 Jan 2025 11:31:57 +0000 https://towardsdatascience.com/bite-size-data-science-falling-for-the-gamblers-fallacy-9a3ebf7a86cd/ Where the gambler's fallacy shows up in data science and what to do about it

The post Bite-Size Data Science: Falling for the Gambler’s Fallacy appeared first on Towards Data Science.

]]>
Image generated by DALL-E using prompt by author
Image generated by DALL-E using prompt by author

The "bite size" format of articles is meant to deliver concise, focused insights on a single, small-scope topic. My goal is to write an article that gives you a few key takeaways that you could read during a quick break at work. You’ll understand these key points after reading this article:

  1. The definition of the gambler’s fallacy
  2. Why we fall for it
  3. The problems it can cause in you work as a data scientist and how to avoid those problems

Bite Size Data Science

1 – What is the gambler’s fallacy?

The gambler’s fallacy is the incorrect assumption that prior random events will impact other random events. It is a Cognitive Bias that causes us to believe that what randomly happened before will influence future random outcomes. The opposite of this fallacy is understanding that randomness is random and no number of peculiar independent events before a random event influence the event itself.

Let’s get some intuition on this fallacy with a few examples:

Coin Toss (the "classic" example)

The classic and thoroughly used example for the gambler’s fallacy is a series of coin tosses. Suppose that I flip a coin nine times – all of the nine flips have been ‘heads.’ I’m about to flip it for the tenth time. What is the probability of getting ‘heads’ again? The correct answer (assuming the flips are independent and the coin is ‘fair’) is 50%. This can often feel wrong because it seems so unlikely to flip 10 heads in a row. That ‘wrong’ feeling is the gambler’s fallacy working inside of you!

Roulette Table

Imagine you are gambling at a casino – you are at the roulette table and you’ve been on black for the last 10 spins (black means you get a payout if the spin lands on any black number; which has about a 48% probability), you have lost every spin. Should you stay on black? Or go to red? Or leave the casino with your head down? If you are like me and many people, something inside of you wants to stay on black because it is ‘due’ for a win. It just feels so unlikely that the next spin would not be black again. That feeling is the gambler’s fallacy!

Traffic Lights

Imagine you are driving to work and you hit three red lights in a row – let’s assume that these traffic lights operate independently of each other (probably not a fair assumption for real life). Frustrated, you know that the next light has to be green, how could you get stuck in so many red lights in the same commute!? This is also the gambler’s fallacy.

Sport Performance

Imagine now that you are watching your favorite sports team (mine is the Dallas Mavericks). Your star player has made 10 shots in a row, he is on fire! Your friend, a fan of the opposing team, looks over to you and says – he’s about to start missing a bunch of shots because his shooting average is 47%. Assuming that your star player’s shots are independent of each other, your friend’s conclusion is unwarrented. Your friend has been a victim of the gambler’s fallacy!

2 – Why do we fall for the gambler’s fallacy?

Now that we understand what the gambler’s fallacy is, let’s talk about why it appeals to us. There are many factors and theories about what makes us subject to the gambler’s fallacy. I wanted to focus on two of the reasons I find most interesting and compelling.

We misunderstand the law of large numbers

The law of large numbers asserts that as a sample size gets large, the sample mean gets very close to the population mean (this is an informal defintion). The concept that larger samples increases the probability that the sample mean matches the population mean is intuitive and correctly understood by most people. The misunderstanding is how the sample mean converges to the population mean.

Correct understanding: As sample size increases, the volume of observations ‘dilute’ the influence of extreme random values on either side. This dilution balances out both sides of the sample distribution and will eventually result in the convergence of the sample mean with the population mean.

Incorrect understanding: We often accidentally think (usually implicitly) that the law of large numbers is ‘enforced’ by some kind of a equilibrium mechanism. E.g., if we had an extremely low observation in our sample, we will have an extremely high observation in our sample to compensate. Of course, there is no ‘equilibrium’ mechanism – this is a misunderstanding of the nature of random events. Sample observations don’t offset each other, they dilute each other. If we have a very large sample size, it is probable that we would have extremely low observations to balance out the extremely high observations – but there is no rule that enforces it. That is why the sample size matters so much in the law of large numbers. The more random samples we take, the more likely observations are to balance each other out. If there was an equilibrium mechanism, sample size wouldn’t be as important because balanced samples would be enforced. This implicit belief that there is a ‘balancing’ of random events lends to the gambler’s fallacy. Under this impression, we think "if something happened before, there must be a balance in what will happen after" – of course this is not the case!

There is no equilibrium force in randomness. The law of large numbers isn’t a law of equilibrium, it is a law of dilution.

Most of us would probably catch this issue if presented with a well-framed question about how random samples balance out as their sample sizes increase. But, when we remove the nice framing, we have a tendency to fall for the fallacy.

Amos Tversky and Daniel Khaneman performed multiple studies on cognitive biases. They often presented carefully crafted questions to scientists and statistians that exposed biases and fundamental misunderstandings – even among these professionals! Below is one question they posed that exposes vulnerability to the gambler’s fallacy. Check it out and see how you would answer!

"The mean IQ of the population of eighth graders in a city is known to be 100. You have selected a random sample of 50 children for a study of educational acheivements. The first child tested has an IQ of 150. What do you expect the mean IQ to be of the whole sample?¹" – Tversky & Kahneman

The correct answer is 101, we’ll prove it below. Tversky and Kahneman report that a "surprisingly large number of people" get this wrong. People think that the answer is 100 because the other 49 samples will offset the abnormally large IQ in the first sample – this simply isn’t the case. The other samples are independent of the first. This is an example of the gambler’s fallacy outside of the casino!

Just in case you don’t believe me, below is the expected value calculation of a sample of the 50 students, given that one of the observations has an IQ of 150.

Calculation of expected value of sample given one observation of 150 IQ - image by author
Calculation of expected value of sample given one observation of 150 IQ – image by author

They don’t offset each other, but they can dilute each other – let’s see what happens if our sample was increased to 100 students?

Illustration of how larger sample size dilutes 150 IQ observation - image by author
Illustration of how larger sample size dilutes 150 IQ observation – image by author

The more random samples, the closer the expected value of the sample gets to 100 – even with the extremely high IQ student in our sample. This convergence is driven by dilution, not equilibrium!

We conflate the probability of a series of random events with a single random event

Here are two similar questions with very different answers:

What is the probability of flipping 10 heads in a row with a fair coin?

A fair coin has been flipped 9 times, all 9 flips were heads. What is the probability that the 10th flip is heads as well?

Can you spot the difference? One is asking about the probability of a series of events; the other has tricky framing, but is only asking about the probability of a single event.

The first question depicts a pretty unlikely event. We can go beyond intuition and calculate the exact probability using the binomial distribution, which simplfies to: 0.5¹⁰ = 0.1%- I would call that ‘unlikely.’

The second question asks; given that a pretty unlikely event has occured, what is the probability that another, independent event takes place. Since we are only talking about the 10th flip, the answer is simply 50%. But we feel like it is less because a strange, but independent event preceeded it.

Each flip has 50% chance of heads independently; probability of a series of heads becomes less likely as more flips are added - coin image generated by DALL-E, other components by author
Each flip has 50% chance of heads independently; probability of a series of heads becomes less likely as more flips are added – coin image generated by DALL-E, other components by author

If we are not disciplined in separating these two questions (probability of a series of events vs. probability of a single event) in our minds, we can be suciptable to the gambler’s fallacy.

Let’s go back to the roulette table to finish this point. Imagine you lost on black 10 times in a row – you feel that there is no way you could lose 11 times in a row. You have to win the next round! It is true that the probability of losing 11 times in a row is very low. But, that probability is only applicable when you sit down at the table before any spins. Once you’ve lost 10 in a row, the probability that you lose the 11th time is simply the probability that you lose in general. Nothing that happend before matters!

3 – What problems can the gambler’s fallacy cause in your work as a data scientist? How can you avoid them?

I’ve curated a list of three areas of Data Science that I think are most likely to be impacted by the gambler’s fallacy. Similar to the last section, my goal here is not to make a comprehensive list, but to highlight what I think are the most common areas that could be impacted.

  1. The problem: Our tendency towards the gambler’s fallacy can make us inclined to accept trends or patterns too quickly or with too small of a sample size. The fallacy causes us to feel like fewer random variables (i.e., smaller random samples) are needed to observe signal from the noise. How to Avoid it: This is avoided by using statistical techniques to calculate the Probability that the patterns we observe in samples are by chance.
  2. The problem: The gambler’s fallacy can cause us to design tests with sample sizes that are too small. This causes our tests to be under powered, meaning the probability that we are able to pick up on a trend/pattern from our experiment is too low. How to avoid it: We can avoid this problem by calculating the power of a test and deciding if we are okay with it. If we simply use our intuition to estimate a good sample size – we are quite likely to select too small of a sample. The statistical calculation of power removes the need for us to use intuition in the decision.
  3. The problem: The gambler’s fallacy can make us too conservative in accepting replicated results. Since the gambler’s fallacy compels us to believe that small samples are representative of the population and other small samples (Tversky & Kahneman), we have a bias to expect too much similarity between replications of tests and experiments. For example, let’s say we are testing out a new fertilizer on 15 bean plants. We get a result that show that the fertilizer helps bean plants grow with a p-value of less than 5%. We decide to replicate the experiment a little later and get a statistically insigficant p-value of 15%, but a positive average estimate of growth from the fertilizer. Given that the second experiment is ‘insignificant’, we have a bias to say that the results were not replicated – even if we have a small sample size in both of the experiments. How to avoid it: Once again, the best way to avoid this problem is through Statistics and calculations. The ‘replication probability’ can be calculated using key statistics from the original experiment/test, such as the effect size, the standard error and the statistical power. With these metrics and the sample size of the replication, we can get a quantitative (rather than intuitive) understanding of how ‘hard’ it will be to replicate the results. In the bean plant example, if we found that there was a 50% chance that the results would be replicated with a sample of 15 bean plants, we wouldn’t be too quick to discard our findings from the first experiment because the second one didn’t replicate the statistical significance.

Conclusion

We have covered the definition of the gambler’s fallacy, why it is appealing to us and how it can cause problems in our work as data scientists. The gambler’s fallacy can show up in multiple places outside of the casino and causes problems wherever we find it. The best way to combat against the fallacy is by relying on statistics and calculations rather than our intuition to make decisions and conclusions.

  1. Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76(2), 105–110. https://doi.org/10.1037/h0031322

The post Bite-Size Data Science: Falling for the Gambler’s Fallacy appeared first on Towards Data Science.

]]>
Linear Programming: Auxiliary Variables https://towardsdatascience.com/linear-programming-auxiliary-variables-c66bb66c6aee/ Wed, 08 Jan 2025 16:02:38 +0000 https://towardsdatascience.com/linear-programming-auxiliary-variables-c66bb66c6aee/ Part 5: Increasing LP flexibility to handle tricky logic

The post Linear Programming: Auxiliary Variables appeared first on Towards Data Science.

]]>
image from pexels.com by Pixababy
image from pexels.com by Pixababy

Auxiliary variables seem to be a topic that is often quickly rushed over in a lot of Linear Programming material. I think they are fascinating and quite powerful. Because of this, I decided to dedicate a short article to explaining and demonstrating how auxiliary variables are often used in linear programming.

Before we dive into auxiliary variables – I wanted to mention that this is the fifth part in a series I’m writing on linear programming (LP). To check out the other LP topics I’ve covered, use the link below:

Linear Programming

First of all, let’s address the question – what is an auxiliary variable? My definition of an auxiliary variable(in the context of LP) is ‘additional variables that are added to a linear programming problem, that allow the use of logic that otherwise would not be possible.’

Auxiliary Variable: Additional variables that are added to a linear programming problem that allow the use of logic that otherwise would not be possible.

Generally, auxiliary variables are used in clever ways to formulate non-linear relationships (typically a violation of the rules of linear programming) in the objective function and constraints in a way that still works for linear programming. In other words, they allow for linearization of functions that would be non-linear without them. Since there are multiple ways to use auxiliary variables, in this article, we’ll go over the most common ways I’ve seen. My goal is for you to have a good intuition on what auxiliary variables do and how you can use them in linear programming. The rest of the article will go through specific applications of auxiliary variables with detailed examples for each application. Note that this is not meant to be an exhaustive representation of how auxiliary variables are used in linear programming. It is meant to demonstrate some of the most common ways they are used.

Creating Piecewise Linear Functions

The jumps that happen in piecewise linear equations are a ‘no-no’ in linear programming. Thankfully, we can accomplish the equivalent of piecewise functions using auxiliary variables! Let’s get into an example.

Let’s imagine that you manage a manufacturing facility. You need to produce a minimum number of products while minimizing your company’s wage expenses. Each employee has an hourly wage and an hourly production rate. Your goal is to create a work schedule that minimizes wage expenses while meeting a specific number of units produced. Easy enough? There’s a catch! Any employee that works over 20 hours a week becomes eligible for benefits – these benefits will cost an extra $500 per employee. This represents a jump in the function with the same slope (hourly salary).

We will first explore how to perform an intercept jump. Let’s set up the simple problem without the intercept jump, then we can add the complexity of the extra benefits cost at 20 hours of work. In our simple example, we have three employees, they each have a different pay and a different level of productivity (shown in the table below). Our goal is to minimize labor costs while producing at least 600 units a week.

image by author
image by author

Before the additional complexity of the benefits expense, our objective function and constraint look like this:

LP set up before intercept jump - 1 objective function, 1 constraint
LP set up before intercept jump – 1 objective function, 1 constraint

Alright, with the LP 101 problem set up, let’s get into adding that intercept jump at 20 hours. We are going to use something called the ‘Big-M Method’ to do this. In this method, we create one or more binary auxiliary variables and we create an additional constraint to bind the auxiliary variable to the original decision variables. The ‘M’ part of ‘Big-M’ comes into play in the constraint configuration.

Below is an example of the new constraint we will add – y is the binary auxiliary variable that we added to the problem and M is an arbitrarily large number.

big-m constraint
big-m constraint

Let’s break the constraint down since this is the main concept of performing the jump. On the left we have the decision variable for the hours we will schedule employee 1. On the right, we have the number of hours after which benefits are required plus ‘M’ which is an arbitrarily large number (hence _big-_M) and y1 is our binary auxiliary variable. In our example, I’ll set M to be 1,000. Understanding this constraint is key! Let’s build our intuition by running different values through the inequality for employee 1 hours. See the table and explanations below:

explanation of the big-m method - image by author
explanation of the big-m method – image by author

In addition to the new constraint, we also have to make modifications to our objective function. This is pretty simple to do, we add the product of our new auxiliary variables (remember, they are binary) and the intercept jump, in this case $500, to the objective function. When the auxiliary variable is 0, no additional cost is added. When the employee works more than 20 hours, the auxiliary variable is forced to be 1 and $500 is added to the objective value for that employee.

Updated objective function for intercept jumps
Updated objective function for intercept jumps

With the problem fully formulated, let’s set it up and solve it in Python using the pulp package!

import pulp

# Define the problem
problem = pulp.LpProblem("Staff_Management", pulp.LpMinimize)

# Define the decision variables as integers
empl1 = pulp.LpVariable("empl1", lowBound=0, upBound=60)
empl2 = pulp.LpVariable("empl2", lowBound=0, upBound=60)
empl3 = pulp.LpVariable("empl3", lowBound=0, upBound=60)

# establish employee pay and productivity
empl1_pay = 15
empl2_pay = 18
empl3_pay = 22

empl1_prod = 7
empl2_prod = 8
empl3_prod = 10

# create auxiliary variables to capture piecewise OT function
empl1 = pulp.LpVariable("empl1_reg", lowBound=0, upBound=40)
empl2 = pulp.LpVariable("empl2_reg", lowBound=0, upBound=40)
empl3 = pulp.LpVariable("empl3_reg", lowBound=0, upBound=40)

# add auxiliary variables
y1 = pulp.LpVariable("y1", lowBound=0, cat="Integer")
y2 = pulp.LpVariable("y2", lowBound=0, cat="Integer")
y3 = pulp.LpVariable("y3", lowBound=0, cat="Integer")

# Objective function
problem += empl1_pay*empl1 + 500*y1 +  empl2_pay*empl2 + 500*y2 + empl3_pay*empl3 + 500*y3  , "Salary Cost"

# Constraints
# force ret and ot hours to add up to total hours
M = 1000

# big M method
problem += empl1 <= 20 + M*y1
problem += empl2 <= 20 + M*y2
problem += empl3 <= 20 + M*y3

problem += (empl1_prod * empl1 + 
            empl2_prod * empl2 + 
            empl3_prod * empl3 >= 600)

# Solve the problem
status = problem.solve()

# Output the results
print(f"Status: {pulp.LpStatus[status]}")
print(f"Optimal value of empl1: {empl1.varValue}")
print(f"Optimal value of empl2: {empl2.varValue}")
print(f"Optimal value of empl3: {empl3.varValue}")
print(f"Minimized salary cost: {pulp.value(problem.objective)}")

And here are the results of the run, looks like we can make 600 widgets while incurring $1,810 dollars in labor expenses. Employee 1 will be the only employee to receive benefits (employee 2 is just below the threshold of more than 20 hours).

optimization output - image by author
optimization output – image by author

Alright, now that we’ve tackled the issue of the additional benefits expense, let’s make things even more complicated by adding in over-time pay (1.5x the regular rate)! This is actually going to be pretty simple 😁 . To do this, we need to split our employee hour variables into two separate variables (regular hours and overtime hours). So now, instead of three employee hour variables, we have six. One regular hour variable and one overtime variable for each of the three employees. We create a constraint to bind the regular hours to no more than 40 (in pulp you can do this with the built in upBound parameter). We will then modify the objective function to reflect the change— I’ll just copy/past an excerpt from the code below:

new objective function with regular time and overtime split into two auxiliary variables
new objective function with regular time and overtime split into two auxiliary variables

Note that since overtime costs more than regular time, the solver will always max out the regular time before it starts adding overtime, so we don’t need a constraint to force this behavior. I also increased the production requirement to 1,000 so that our example would require some overtime.

Here is the code to solve the full problem with the benefits and overtime pieces:

import pulp

# Define the problem
problem = pulp.LpProblem("Staff_Management", pulp.LpMinimize)

# establish employee pay and productivity
empl1_pay = 15
empl2_pay = 18
empl3_pay = 22

empl1_prod = 7
empl2_prod = 8
empl3_prod = 10

# create auxiliary variables to capture piecewise OT function
empl1_reg = pulp.LpVariable("empl1_reg", lowBound=0, upBound=40)
empl2_reg = pulp.LpVariable("empl2_reg", lowBound=0, upBound=40)
empl3_reg = pulp.LpVariable("empl3_reg", lowBound=0, upBound=40)

empl1_ot = pulp.LpVariable("empl1_ot", lowBound=0, upBound=20)
empl2_ot = pulp.LpVariable("empl2_ot", lowBound=0, upBound=20)
empl3_ot = pulp.LpVariable("empl3_ot", lowBound=0, upBound=20)

# add auxiliary variables
y1 = pulp.LpVariable("y1", lowBound=0, cat="Integer")
y2 = pulp.LpVariable("y2", lowBound=0, cat="Integer")
y3 = pulp.LpVariable("y3", lowBound=0, cat="Integer")

# Objective function
problem += (empl1_pay*empl1_reg + 500*y1 + empl1_ot*1.5*empl1_pay +
            empl2_pay*empl2_reg + 500*y2 + empl2_ot*1.5*empl2_pay + 
            empl3_pay*empl3_reg + 500*y3 + empl3_ot*1.5*empl3_pay 
            , "Salary Cost")

# Constraints
# force ret and ot hours to add up to total hours
M = 1000

# big M method
problem += (empl1_reg + empl1_ot) <= 20 + M*y1
problem += (empl2_reg + empl2_ot) <= 20 + M*y2
problem += (empl3_reg + empl3_ot) <= 20 + M*y3

# constraint on minimum items produced
problem += empl1_prod * (empl1_reg + empl1_ot) + empl2_prod * (empl2_reg + empl2_ot) + empl3_prod * (empl3_reg + empl3_ot) >= 1000

# Solve the problem
status = problem.solve()

# Output the results
print(f"Status: {pulp.LpStatus[status]}")
print(f"Optimal value of empl1 reg: {empl1_reg.varValue}")
print(f"Optimal value of empl1 ot: {empl1_ot.varValue}")
print(f"Optimal value of empl2 reg: {empl2_reg.varValue}")
print(f"Optimal value of empl2 ot: {empl2_ot.varValue}")
print(f"Optimal value of empl3 reg: {empl3_reg.varValue}")
print(f"Optimal value of empl3 ot: {empl3_ot.varValue}")
print(f"Minimized salary cost: {pulp.value(problem.objective)}")

And here is the optimal output:

Optimization output - image by author
Optimization output – image by author

The optimal strategy is to max out employee 1’s work to 60 hours, use employee 2 up to the benefits limit and have employee 3 work a full week plus 2 hours. I hope this example has illustrated how we can linearize intercept jumps and changes in slopes so that they can be incorporated into LP problems.

Conditional relationships between variables

Sometimes we need to model conditional relationships between variables to solve Optimization problems. Here, I’ll dive into an example that I created for an article I wrote on simulation a little while ago (link below). That article was on simulating data and covered LP tangentially. I didn’t talk about the auxiliary variables it used at all – I’d like to take the opportunity to cover here what I didn’t there.

Simulated Data, Real Learnings: Scenario Analysis

Here’s how the problem is set up – imagine you are a developer planning a housing subdivision. You have multiple floorplans available to you, and you want to optimize your profit subject to specific constraints. Below is a table of the different floorplans with their key metrics:

Details for each floorplan - image by author
Details for each floorplan – image by author

Our main decision variables are integer values for each row in the table shown above. The integer values represent the number of homes we will build of each floorplan. This is represented by the list ‘x’ in the code at the end of this section.

Here are the three constraints for the problem:

  • We must have at least 6 different floorplans in our neighborhood
  • We need to have at least 10 homes < 2,000 sqft and 15 between 2,000 sqft and 3,000 sqft and 10 homes that are > 3,000 sqft
  • Lastly, we have 150,000 square feet of plots – each house needs 25% more square feet than the floor plan – we can’t build more houses than we have land!

The first constraint is going to need auxiliary variables, the other two can be handled directly with the input data and the primary decision variables (the "x’s").

Before we get into setting up this constraint, let’s build an understanding of how the main decision variable works. Each row in the table above will have its own ‘x’ decision variable. So we will have 15 ‘x’ variables – each will represent the number of houses that will be built for the corresponding floorplan. For example, X1 is the number of floorplan 1 houses that will be built. So, if X1 = 10, 10 houses of floorplan 1 will be built in the subdivision.

Okay, now onto the meat of the conversation – the auxiliary variables! We need a way to ensure that at least 6 of the ‘x’ variables are greater than 0. We do this by creating a binary auxiliary variable for each floorplan. So, we now have 15 binary ‘y’ variables. The next thing we need to do is tie the ‘y’s to the ‘x’s in a way that when an x variable is greater than 0, the corresponding y variable is set to 1. We do this by creating a constraint that x ≥ y – let’s build intuition on why this constraint meets our needs with the table below:

explaining the constraints that tie the auxiliary variables (y's) to the decision variables (x's) - image by author
explaining the constraints that tie the auxiliary variables (y’s) to the decision variables (x’s) – image by author

With that auxiliary variables set up, and new constraints introduced to tie the x’s to the y’s, we need to add one final constraint – the constraint that the y variables need to sum to at least 6. With that constraint in place, we are now ready to set up and run our optimization!

The code below shows how to set up and solve this problem in Python:

import pandas as pd
import numpy as np
from pulp import *

df = pd.read_csv(csv_path)
n = len(df)

# create dummy indicators to categorize home size
df['small_house'] = np.where(df['square_feet'] < 2000, 1, 0)
df['medium_house'] = np.where((df['square_feet'] >= 2000) &amp; 
                              (df['square_feet'] < 3000), 
                               1, 0)
df['large_house'] = np.where(df['square_feet'] >= 3000, 1, 0)

# Create a MILP problem
problem = LpProblem("Simple MILP Problem", LpMaximize)

# Define decision variables
x = LpVariable.dicts("x", range(n), lowBound=0, cat='Integer')
y = LpVariable.dicts('y', range(n), cat='Binary')

# Objective 
problem += lpSum(x[i]*df['gain_loss'][i] for i in range(n))

# constraints

# limit to amount of land available
# assume each floorplan takes 25% more land than its square footage
problem += lpSum(x[i]*df['square_feet'][i]*1.25 for i in range(n)) <= 150000

# requirements for diversity in home sizes
problem += lpSum(x[i]*df['small_house'][i] for i in range(n)) >= 15
problem += lpSum(x[i]*df['medium_house'][i] for i in range(n)) >= 15
problem += lpSum(x[i]*df['large_house'][i] for i in range(n)) >= 10

# Create at least 6 unique floorplans
for i in range(n):
    # if x is 0, y has to be 0
    problem += x[i] >= y[i]

# if x = 1, y coud be 0 or 1
# but because we want sum(y) to be >= 6, the optimization
# will assign y to be 1
problem += lpSum(y[i] for i in range(n)) >= 6

# solve problem
problem.solve()

# print solution
for i in range(n):
    print(f'{i + 1} : {value(x[i])}')

# print optimal profit
print("Optimal profit :", value(problem.objective))

And here is the optimal solution:

Optimal floorplan strategy - image by author
Optimal floorplan strategy – image by author

We can see here that our constraint worked! There are 6 floorplans selected to build in the optimal output.

‘Or’ logic

Often we need to access ‘or’ logic in linear programming. Raw ‘or’ logic is not linear, so we have to use auxiliary variables to linearize it. To do this, we will need to add one auxiliary variable for each condition in the ‘or’ logic and then add another auxiliary variable to tie them together.

Imagine you manage a microchip manufacturing plant. For national security reasons, the government offers a $2,000 grant for specific levels of chip production. To be grant eligible, you need to produce at least 42 units of chip A or 15 units of chip B a week. You can only get one grant in a week, so producing >42 of A and >15 of B won’t get you double grants. This is an example of ‘or’ logic because you get the grant if you meet the requirement for A or B.

To formulate this problem, we need to set up one binary auxiliary variable for each of the chip types. We will create a ‘grant_a’ variable and a ‘grant_b’ variable. We will then add the constraint that chip_a ≥ 45 * grant_a— Where chip_a is the total number of chip a’s produced. We will add the same constraint for chip b with the corresponding number chips required for a grant.

constraints with 'grant a' and 'grant b' auxiliary variables - image by author
constraints with ‘grant a’ and ‘grant b’ auxiliary variables – image by author

Lastly, we need a way to tie grant_a and grant_b together with ‘or’ logic. To do this, we will create one more binary auxiliary variable – ‘grant’ – and one more constraint.

constraint to tie the grant a and b auxiliary variables to the grant auxiliary variable - image by author
constraint to tie the grant a and b auxiliary variables to the grant auxiliary variable – image by author

This will force grant to be 0 if both grant_a and grant_b are 0. But, if grant_a and/or grant_b are 1, grant can take the value of 1 as well. The optimization will always set grant to 1 when it has the option (again, that is when grant_a and/or grant_b are 1), because setting grant to 1 will increase the objective function by $2,000!

Below is the example in Python code. Note that I added marginal profit to the objective function ($20 for chip A and $30 for chip B) and a material usage constraint to bind the problem.

from pulp import LpProblem, LpMaximize, LpVariable, LpBinary, lpSum

# Define the problem
problem = LpProblem("chip_manufacturing", LpMaximize)

# Decision variables
chip_a = LpVariable("chip_a", lowBound=0, cat="integer")
chip_b = LpVariable("chip_b", lowBound=0, cat="integer")

# set up three auxiliary variables
# one for if the factory qualfies for a grant through chip a production
# one for if the factory qualifies for a grant through chip b production
# and one more to indicate of at least one of the chip production levels qualifies for the grant
grant = LpVariable("grant", cat="Binary")
grant_a = LpVariable("grant_a", cat="Binary")
grant_b = LpVariable("grant_b", cat="Binary")

# Objective function
profit = 20 * chip_a + 30 * chip_b + 2000 * grant
problem += profit, "total profit"

# Constraints
# Material usage and availability
problem += 5 * chip_a + 12 * chip_b <= 200, "raw_material_constraint"

# Grant eligibility conditions
# If 100 units of chip_a are made, grant can be awarded
problem += chip_a >= 45 * grant_a, "grant_chip_A_constraint"

# If 75 units of chip_b are made, grant can be awarded
problem += chip_b >= 15 * grant_b, "grant_chip_b_constraint"

# if grant_a and grant_b are 0, force grant to be 0
problem += grant_a + grant_b >= grant, "grant_or_condition"

# Solve the problem
problem.solve()

# Output the results
print(f"Status: {problem.status}")
print(f"Profit: ${problem.objective.value():,.2f}")
print(f"Chip A: {chip_a.value():.0f} units")
print(f"Chip B: {chip_b.value():.0f} units")
print(f"Grant Awarded: {'Yes' if grant.value() else 'No'}")

The optimized solution is below:

Results of the optimization - image by author
Results of the optimization – image by author

Here, we can see that even though we make more per unit of raw material with chip A, we want to create 15 units of chip B to get the grant. Once we get the grant, we use the rest of the material to produce chip A. Looks like the ‘or’ logic checks out! We now understand how we can use auxiliary variables to solve LP problems with ‘or’ logic!

Conclusion

I hope that you have a fuller understanding of how using auxiliary variables can greatly increase linear programming’s flexibility. This article was meant as an introduction to auxiliary variables, with some examples of their use. There are other ways that they can be used that you can explore further. They are a little tricky to understand at first, but once you get the hang of them, a new world of LP optimization possibilities opens up!

The post Linear Programming: Auxiliary Variables appeared first on Towards Data Science.

]]>
Linear programming: Integer Linear Programming with Branch and Bound https://towardsdatascience.com/linear-programming-integer-linear-programming-with-branch-and-bound-fe25a0f8ae55/ Tue, 19 Nov 2024 15:44:23 +0000 https://towardsdatascience.com/linear-programming-integer-linear-programming-with-branch-and-bound-fe25a0f8ae55/ Part 4: Extending linear programming optimization to discrete decision variables

The post Linear programming: Integer Linear Programming with Branch and Bound appeared first on Towards Data Science.

]]>
Up until now in this series, we’ve talked about strict Linear Programming – where the objective function, constraints and decision variables have all been linear and continuous. This linear set up comes with some really nice properties, but it isn’t very flexible. In this article, I’ll discuss how we can allow for discrete decision variables using a tool called integer linear programming (ILP).

This is the fourth article in a series I’m writing on linear programming. The other articles (including an introduction – in case you aren’t familiar with linear programming) can be found here:

Linear Programming

In this article we’ll be covering the following topics:

  1. When discrete decision variables are needed
  2. How the branch and bound algorithm solves integer linear programming problems
  3. The pros and cons of integer linear programming compared to regular linear programming
  4. Solving an ILP problem in Python

Why discrete decision variables are needed

Discrete decision variables can be required in an Optimization for two reasons:

  1. The nature of a variable is discrete
  2. To handle conditional logic and ‘or’ logic

We’ll get into the details of these two reasons below!

The nature of the variable is discrete

Often, decision variables that we are modelling are discrete in nature and won’t be well modelled with continuous variables. Here are some examples of discrete decision variables:

  • Turning a machine on or off in a production process – represented with discrete, binary variables
  • Scheduling staff to be in or out of office – represented with discrete binary variables
  • Planning fire station locations— specific locations can be represented with discrete variables
  • Purchasing large items – the number of items to purchase can be represented by a discrete variable
  • Product production plans – the number of each type of product created can be represented by discrete variables

When the nature of a decision variable is discrete, you have two options on how to handle it — (1) treat it as continuous or (2) treat it as discrete. Treating a discrete variable as continuous has the distinct advantage of allowing you to use traditional LP optimization (which has multiple benefits we will discuss in the next section). But, it comes at the cost of potentially modelling the variable poorly. Treating the variable as discrete will require you to use the less-powerful ILP optimization, but will model the ‘real world’ better.

As a rule of thumb, if a variable is well-approximated by a decimal, you can model it as continuous. For example, the number of nails a factory produces might be well approximated by a decimal — 1,000,000 nails is a pretty good approximation of 1,000,000.35 nails. If the variable is not well approximated by a decimal, then you will probably have to go the integer route. Binary variables fall into this category, 0.35 is not a good approximation of 0 and 0.75 is not a good approximation of 1. Additionally, variables that tend to take on lower volume won’t do well. Imagine a business that makes a handful of mobile homes every month — 11 mobile homes is probably not a good approximation of 10.63 mobile homes.

Two things can go wrong if you incorrectly treat a discrete variable as a continuous variable:

  1. You can get suboptimal solutions – rounding to an integer forfeits the guarantee of an optimal solution
  2. You can get infeasible solutions – rounding can put the solution outside of the feasible solution space. e.g., mobile homes cost $10k to make, you have a budget of $105,000 – the LP solution says to create 10.5 units. You round that up to 11 units and now you are over budget!

Handle conditional logic and ‘or’ logic

The regular linear programming can’t handle complex relationships in the constraints and objective functions. It can only work with ‘and’ logic. For example: X1 < 10 AND X2 < 20. There are a lot of scenarios where ‘or’ logic is needed. For example, imagine a micro chip manufacturing plant that receives government grants. To be eligible for the grants, they need to make 1000 ‘A’ chips OR 800 ‘A’ chips. Traditional linear programming could not optimize this logic. ILP can handle the logic by introducing binary auxiliary variables. These binary variables can be used to turn on and off constraints based on the values of the other decision variables. The fact that it has to be binary requires the use of ILP instead of LP.

Binary auxiliary variables can also be used to capture non-linear jumps in constraints. Imagine you are scheduling staff for a call center – your goal is to have coverage for the estimated call volume while minimizing salary cost. After 40 hours, employees will receive over-time pay. Traditional LP can’t handle this type of jump, the use of a binary auxiliary variables can. And of course, if we introduce a binary variable, we are now in the space of ILP. Going over the specifics of how to set up auxiliary variables will be covered in another article. For now, it is sufficient to say that binary auxiliary variables can capture these complexities.

How the branch and bound algorithm solves ILP problems

I hope you have a good idea of why we often need to use integers in linear programming problems. The presence of integers in a problem necessitates that we use integer linear programming. The most popular algorithm for solving ILPs is called ‘branch and bound.’ Let’s jump into how branch and bound works.

The branch and bound algorithm splits an ILP into multiple LP sub-problems (that’s the branching part). It uses information learned from some sub-problems to skip over other sub-problems (that’s the bound part) — this saves computation and avoids an exhaustive search. I think it’s hard to conceptualize a verbal description of the algorithm, how about an example?

We are going to solve the ILP below using the branch and bound algorithm:

example of integer linear programming problem - image by author
example of integer linear programming problem – image by author

Step 1: Relax the integer constraint and solve the LP problem

This is easy enough, we just allow x and y to take continuous values and solve – as I covered in previous articles, LP problems can generally be solved quickly and easily via the simplex method.

With the relaxation of the integer requirement, we easily get a solution of x = 2.25, y = 3.75 with a maximized objective value of 52.5. At the end of every step in branch and bound, we check to see if the solution is a feasible integer solution – if it is we don’t branch, if it isn’t, we branch. Clearly we do not meet the integer solution criteria since neither x or y are integers. So now we move on to branching!

Step 2: Pick an integer variable that doesn’t have an integer solution and branch it into two sub-LP problems

Given our solution from the prior step, we split our single LP problem into two sub-problems. We do this by picking a variable (we can pick either one) — here, we’ll pick x, creating two LPs with extra constraints on that variable. The constraints we set are determined by the result of the solution in the prior step. For our example, we’ll make one LP problem with the new constraint x ≤ 2 and another with the constraint x ≥ 3 added. Note that since we are interested in integer solutions, setting the constraints to 2 and 3 doesn’t cut out any part of the solution space from the original problem (the numbers between 2 and 3 are non-integers). We then solve the new LP problems.

Step 3: Continue to iterate conditional on the input from the prior step

After the first two steps, we are now set to continue making iterative branches based on the results of the prior steps. At each iteration, one of three things can happen. The table below shows what can happen and what the algorithm does for each event:

Actions to take in branch and bound conditional on LP problem results - image by author
Actions to take in branch and bound conditional on LP problem results – image by author

We continue following this algorithm until all branches are finished. At this point, we take the best integer solution found and return this as the optimal solution.

It is a little difficult to conceptualize the algorithm without a visualization. I hope my description has put you in a good place to understand the visual walk-through below.

example of branch and bound algorithm - image by author
example of branch and bound algorithm – image by author

Note that because each level of the tree adds additional constraints, we know that the objective value will get lower as we go down (more constraints generate lower objective values). That is why we know that we don’t have to continue down the x ≥ 3 branch even though we could create two sub-problems splitting on y (y≤2 and y≥3). Since we know that nothing below the leaf can be higher than 50, we can ‘prune’ the branch and not continue down.

The problem I picked for our first example was very simple. I’m going to put another ILP problem setup with the algorithm’s visual execution below to give you some more exposure. I’ll spare you the description of each step this time!

Second example of ILP problem - image by author
Second example of ILP problem – image by author
second example of ILP problem - image by author
second example of ILP problem – image by author

The moving image is good for understanding the sequence of the algorithm – here is the still image so you can take a closer look:

image by author
image by author

The pros and con of integer linear programming

The pros:

  1. Some problems are not well modeled with continuous linear equations, others flat out can’t be modeled as LP problems at all. Integer programming allows for these cases.
  2. Because of the convex solution space of linear programming, ILP is guaranteed to find the global optimum.

The Con:

The main problem with ILP is that the branch and bound algorithm isn’t very efficient — even though it is generally considered the best algorithm for ILP. For large problems, it can require a lot of computational resources and memory. Not all problem formulations will find optimal solutions in a reasonable amount of time — i.e., some ILP problems are not tractable.

Given that the primary challenge with ILP is execution, here are a few recommendations to help ILP run faster – note, not all of these potential solutions are possible for all ILP problems:

  1. Approximate integers with continuous variables when possible
  2. Try to use binary variables when possible — binary variables require fewer splits, which in turn make smaller trees
  3. Limit the number of variables (if possible) – the more variables, the more complex the solution space

Integer Linear Programming in Python

Okay, now that we know why we need integer linear programming and we understand how the branch and bound algorithm works, let’s show how we can solve ILPs in Python. Using the ‘pulp’ package in Python, ILP looks really similar to regular LP problems. Other articles in this series go into more details on setting up an LP problem with pulp. The only difference (for the end user of pulp at least) between ILP and LP is how the decision variables are set up. As you can see below, the ‘cat’ attribute is set to ‘Integer’ for both x and y. From this, pulp automatically solves the problem with a variant of the branch and bound algorithm because of the presence of integers in the decision variables.

import pulp

# Define the problem
problem = pulp.LpProblem("Maximize_Profit", pulp.LpMaximize)

# Define the decision variables as integers
x = pulp.LpVariable("x", lowBound=0, cat="Integer")
y = pulp.LpVariable("y", lowBound=0, cat="Integer")

# Objective function
problem += 10 * x + 8 * y, "Objective"

# Constraints
problem += x + y <= 6, "Constraint 1"
problem += 20 * x + 12 * y <= 90, "Constraint 2"

# Solve the problem
status = problem.solve()

# Output the results
print(f"Status: {pulp.LpStatus[status]}")
print(f"Optimal value of x: {x.varValue}")
print(f"Optimal value of y: {y.varValue}")
print(f"Maximized Profit: {pulp.value(problem.objective)}")

The output from the code is pasted below – as you can see, it ties our results from our manual solving of the problem above!

Solution to the first example problem - image by author
Solution to the first example problem – image by author

Conclusion

Integer linear programming is a very important tool in the optimization tool box. It allows for the handling of discrete decision variables and complex logic amongst constraints. This additional flexibility comes with an extra computational cost compared to the classic linear programming optimization framework. Multiple things can be done to speed up ILP execution — these speed increasing helpers are problem dependent however. Despite some drawbacks, ILP is powerful in its flexibility and is a valuable and frequently used technique.

The post Linear programming: Integer Linear Programming with Branch and Bound appeared first on Towards Data Science.

]]>
I Wasn’t Always a Data Scientist – How I Broke into the Field https://towardsdatascience.com/i-wasnt-always-a-data-scientist-how-i-broke-into-the-field-5b8f05d470bf/ Tue, 05 Nov 2024 13:02:08 +0000 https://towardsdatascience.com/i-wasnt-always-a-data-scientist-how-i-broke-into-the-field-5b8f05d470bf/ 8 strategies I used (and you can too) on my journey to data science

The post I Wasn’t Always a Data Scientist – How I Broke into the Field appeared first on Towards Data Science.

]]>
From my experience, people who work in Data Science have a wide diversity of backgrounds. I (like many of my fellow data scientists) didn’t start my career in data science right out of college. I started out working as a securities broker in the investment finance industry. I quickly discovered that the career path I originally chose was not a good fit and started a multi-year journey towards becoming a data scientist. In this article, I’m going to share 8 strategies I used in my successful transition to becoming a data scientist. Let’s get into it!

Strategy 1— Know what you want

Data science is a competitive industry and it can be difficult to get into, especially if you weren’t originally planning on it. It is crucial that you know that you really want to work in the field – if your journey will be anything like mine, it will take some significant time and effort to break into your first data science job. You need to be sure this is what you want so you stay focused and don’t lose motivation in your journey!

Strategy 2 — Set up a job feed (even if you are no where near qualified to apply)

When people ask me for advice on becoming a data scientist, I always say this one first! Set up a daily job feed for data science positions and record key details from the job postings. This job feed isn’t for applying, it is for learning what employers want out of a data scientist. I could give you a list of what I think you should learn, but I’m one person with one opinion. If you have data from 50+ job postings, you don’t have a bunch of opinions, you have real information about what real employers want from real data scientists!

When I first decided that data science was the goal, I set up a daily job feed for any data science positions in the Dallas area – making it location-specific was important because it allowed me to not only learn what employers want, but it helped me create a list of target companies to focus on. Every day I got an email, usually with 2–3 new job postings. I wasn’t anywhere near qualified for any of them, but again applying wasn’t the goal, data was the goal! I saved the details for every job posting in an excel spreadsheet – this gave me extremely valuable data to direct my journey.

The job feed gave me these key pieces of knowledge:

  • Skills that employers want from data scientists – programming languages, specific ML algorithms, statistical knowledge etc.
  • Specific tools that data scientists use – Python, SQL etc.
  • Educational background — what level of education in which subjects
  • Companies that hire data scientists in my area (and an idea of how large their data science groups are based on the number of postings I saw)
  • Occasionally I would see salary ranges – this wasn’t very actionable information, but it was good for keeping up my motivation since the posted salaries were a lot higher than mine 😅

Without the job feed, I would’ve had to rely on blogs, articles and pieces of advice from other people. The job feed gave me a list of what employers in my area wanted and the source was the employers themselves!

From this process, I made my ‘to-learn’ list based on the skillsets that I saw most frequently on the job postings. That list became a roadmap for the rest of my journey to becoming a data scientist.

Strategy 3— Try to gain some data science skills in your current job

Different jobs have different levels of flexibility in the work you do and how you do it. If you have some flexibility, try to do a few things that will help you gain skills from your skillset list (from strategy 2).

Example #1 from my journey:

I realized I wanted to work in data science while I was working at Fidelity doing mutual fund operation work. My work was pretty well-structured (meaning not a lot of flexibility) but, I was able to carve out some time for ‘pet projects.’ For one of my projects, I built a simple linear regression model that predicted the number of data discrepancies we would see based on the daily market volatility. It really wasn’t much, but it provided me a tiny amount of "data science" experience that gave me (1) more confidence that I wanted to work in the industry (because I loved making that little model) and (2) one line that I could put on a resume that demonstrated (in a very small way) a data science skill.

Example #2 from my journey:

I later transitioned from Fidelity to GM Financial (I’ll talk about the strategy of that move in the next section). After working at GMF for a while, I decided it was time to start gaining Python experience (which I had as a pretty high priority on my skill list from the Strategy 2 section). I asked my manager if I could download a license free version on Python and to my surprise, he said ‘no’. I protested and explained that it was free and that I could use it to help with my job responsibilities, but the answer was still a firm ‘no’. I decided to start looking for other jobs because of that. I know it seems like a small thing, but remember, my goal was to become a data scientist, not to keep my current position. I needed some Python skills to accomplish my goal! I got a job offer from a company where I would be able to use Python. When I told my manager about it, he asked me what he could do to get me to stay — I just said I wanted to work on a project or two in Python since it is in alignment with my career goals — this time he said ‘yes’ 🤷‍♂️! I then worked on some small modelling projects in that position, which gave me valuables skills and resume talking points.

Strategy 4— Be willing to do intermediate jobs to gather the skills needed

Ideally, you can just jump from your current, non-data science job, directly into a data science role. For my journey however, the skill gap between what I had and what I needed was too large for just one jump. Because of the size of the gap, I had to take a couple of different roles to stair step my skills. It can be hard to switch jobs for skills, but if you really want to be a data scientist, you may have to pursue this strategy.

I had two ‘intermediary’ roles in my journey towards data science. Each role I used to gather a subset of skills I needed to become a data scientist or I at least needed for the next role.

  1. Pricing Analyst (Fidelity) – This job got me off of the phones as a broker and into a more analytical role. The main skill I had was working with data in Excel – that skill isn’t super data science friendly, but it gave me enough to get into my next role, where I picked up more skills!
  2. Data Analyst (GMF) – I was able to qualify for this job because of my financial background and my excel skills. In this role, I picked up heavy data analyst experience, SQL experience, and experience using an old-school statistical program called SAS. And of course, as I mentioned in Strategy 3, I picked up just a little bit of Python skills even though it wasn’t a part of the job description.

Strategy 5— You might have to go back to school, do it part-time!

During my data science journey, I started a master’s in data science at the University of Oklahoma. I think that having this on my resume really helped me get the interview that ultimately gave me my break into data science. I found that most data science jobs required at least a master’s degree (something I learned from Strategy 2) – so I decided that I would work on getting that requirement, but I would do it part-time.

Pursuing education part-time was one of the best decisions I made during my journey. It balances work experience and education (and you get to have a salary while studying, which was really nice compared to my undergraduate experience 💰 ). It took me an extra year to get my degree, but I continued to gain valuable experience and I could start working as a data scientist before I finished. I think now, with the taboo of online learning almost completely gone (or at least much lower than it has been), there is no reason you need to quit your job for a master’s degree. Get your cake and eat it too by getting work experience and education at the same time (oh and avoid those student loans as well)!

Strategy 6— Teach yourself, don’t rely on schooling

I learned a lot from my master’s program, but in reality, it taught me a pretty small subset of what I needed to know to become a successful data scientist. I also couldn’t wait around for my program to teach me the skills I needed, I didn’t have the time!

I had my first data science interview when I was just a few classes into my master’s program. Thankfully I had taught myself a pretty wide array of data science knowledge in the years leading up to the interview. The interview was fairly intense with a long case study that required a general understanding of multiple machine learning and data science topics. I relied 100% on my self-taught knowledge in the interview – which went well enough for me to secure the offer.

With the huge amount of resources available online today, there is no reason to not teach yourself. I did learn from my master’s program, but I estimate that about 85% of my data science knowledge comes from self-teaching. Going back to strategy 1 from the beginning of the article – if teaching yourself data science is fun and interesting, you can know that it is a good career path for you.

Strategy 7— You were doing something before data science, use that domain knowledge to your advantage!

If you don’t have data science work experience (like I didn’t), you can use the domain knowledge from your other work experience to edge out some of the competition. This was a really important factor for me getting my first data science job.

While I was working at Fidelity, I took and passed the Chartered Financial Analyst tests (CFA). This is a pretty difficult designation in investment finance. My first data science position was at Toyota working on the financing side. Although it wasn’t the exact same type of ‘finance’ (consumer vs. investment), the certification showed that I had a level of professional financial knowledge.

The CFA helped, but what gave me the biggest edge was the fact that I had industry experience in a pretty niche area, i.e., "captive auto finance." This is a very specific industry made up of consumer finance companies that are wholly owned by a manufactorer and whose purpose is to originate loans for the purchase of the manufactorers’ products. When I landed my first data science job, I was working at GM Financial (captive finance company of General Motors) — My first data science job was at Toyota Motor Credit Company, which is the captive finance of Toyota. In my interview I was able to ‘talk shop’ with my future boss, using terminology that only industry insiders would use. I don’t know for sure, but I bet that given the industry work experience, I could have beat someone that had a little bit of data science experience, but no industry experience. I do know that it really helped!

The takeaway here is, focus on applying to data science jobs in an industry you’ve already worked it. This could help compensate for your lack of data science experience and can give you a competitive edge over applicants with some data science experience that are coming from other industries.

Strategy 8- Be willing to take risks; a successful journey may require it

To become a data scientist, you may have to take a few uncomfortable risks. If you really want it (again – and you are probably annoyed at me by now for this – going back to Strategy 1), taking reasonable risks are worth it!

The big risk I took

I took multiple risks in my data science journey, but the biggest one was when I finally got my first ‘data science’ offer.

The first opportunity I had to become a data scientist (at Toyota) was a 1-year contract position. This was a significant risk to me, because I was leaving a reasonable paying full-time job for a 1-year role that could get extended or convert into a full-time position or it could not! Contracting was probably the only data science offer I could get at the time. Being willing to take a contracting position gave me two advantages (1) fewer people are willing to take contracting positions – I was competing with a smaller applicant pool and (2) less risk on the side of the hiring manager, making them more willing to extend an offer – if things didn’t work out, it was easy to terminate the contract early, or just let it expire at the end of the year. Since I was taking most of the risk, they were more willing to hire me without data science experience. I had to bet on myself, and it was a risky bet! Thankfully (and a lot of thanks to my manager at the time) I was later able to be converted to a full-time position, the bet paid off!

If you aren’t a data scientist, but want to become one, you may need to take a few risks to keep your journey’s momentum going. I suggest you take reasonable risks as needed, it may be required for you to reach your goal!

Conclusion

Breaking into data science was difficult and intense endeavor that required some clever strategy, time and a lot of work. But for me, the journey was well worth it! I’ve now been working at Toyota for nearly six years as a data scientist and it has been fantastic! I started out as a contractor with no data science experience and was able to progress to managing a team that tackles big and impactful problems! Of course, it would be arrogant to take credit for my successful journey, I had many mentors and great managers help me along the way. I also had some good luck as well – the contribution of luck to anyone’s success can’t be ignored 😊 .

I hope that this was helpful for you and, if you are looking to break into data science, I wish you the best of luck!

The post I Wasn’t Always a Data Scientist – How I Broke into the Field appeared first on Towards Data Science.

]]>
Will Your Vote Decide the Next President? https://towardsdatascience.com/will-your-vote-decide-the-next-president-0875a4927c75/ Tue, 15 Oct 2024 20:55:34 +0000 https://towardsdatascience.com/will-your-vote-decide-the-next-president-0875a4927c75/ Simulating the probability that your singular vote swings the election in November

The post Will Your Vote Decide the Next President? appeared first on Towards Data Science.

]]>
Image by Element5 Digital from Pexels.com
Image by Element5 Digital from Pexels.com

Well, election season is in full swing – I tell you, just in case you haven’t noticed from the obnoxiously aggressive TV ads and all of the yard signs. Having the right to vote is a wonderful opportunity that many fellow world citizens don’t have. I have often wonder, however, how influential my singular vote can be. In this article, I’m going to attempt to address this question using one of my favorite tools – simulation!

I enjoy using simulation to solve problems – I wrote a series on simulation earlier this year – check it out if that seems interesting.

Data Simulation

In this article, we’ll cover these topics:

  1. Setting up the simulation
  2. Simulating your vote swinging your state election
  3. Simulating your state swinging the overall election
  4. Estimating the overall probability that your vote decides the president
  5. Why (I think) you should still vote

Setting up the simulation approach

Before we get into the simulation, we need to get everything set up. There are multiple elements for us to consider, so we’ll go through them here.

What needs to happen for your vote to swing an election?

Two things need to happen for your vote to decide the presidential election – (1) your vote needs to be the deciding vote for your candidate in your state and (2) your state needs to provide your candidate with enough electoral votes to win the presidency. We will simulate these two separately and then combine them at the end.

What inputs do we need for the simulation?

We need multiple inputs and estimations for those inputs for our simulation:

inputs for simulation - image by author
inputs for simulation – image by author

Code setup

The code setup will have two classes that interact with each other. The first class will need an instance for each state. It will store methods and attributes related to individual state Elections – this class will be called ‘state.’ The second class will take multiple state instances as attributes and will have other attributes and methods that are needed to simulate state and national election – this class will be called ‘simulate_elections.’

Simulation outputs

The Simulation will run state and/or national elections multiple times. The primary output will be the number of times that your vote swung the election (state or national, whichever one we are simulating at the time) out of the total number of simulations. We can interpret this as an estimate of the probability that your vote swings the national election.

Okay, with the simulation plan mapped out, let’s get going on the code and then run some simulations!

Simulating your vote swinging a state

For the presidential election, most states have a "winner take all" popular vote for the president. This just means whoever gets the majority of the votes in the state wins the whole state. Maine and Nebraska are the exceptions, but for this article, we are going to just assume they act like the other states. They have a relatively small number of electoral votes – as long as we are not simulating the probability of a Maine or Nebraska resident swinging the election our estimates shouldn’t be too far off (sorry Mainer and Nebraskan readers!).

Condition for your vote to swing the state

So, with the "winner take all" assumption, what has to happen for your vote to swing a state? It’s actually pretty simple: your state needs to be in a perfect tie without your vote. Then, your vote comes in and breaks the tie for your candidate. Ok, now that we understand what needs to happen, let’s figure out how we can estimate the probability of a perfect tie.

Estimating the probability of a popular vote tie with simulation

We are going to use the binomial probability distribution a lot in our simulation. This is a really cool distribution and I encourage you to check it out if you are not familiar with it. It is out of the scope of this article, so I’m not going to go into any details here.

When simulating, you often have to make at least a few simplifying assumptions. In this section, I’m going to make the assumption that there are only two candidates (Trump and Harris) and votes that don’t go to one of them go to the other. In reality there are other candidates that will secure a couple of votes. For our purposes we are not going to factor them in though.

Right, let’s get started with the simulation! There are multiple components that we need to consider when simulating the probability of your vote deciding your state’s candidate.

We covered a comprehensive list of inputs in the last section. The inputs required for state simulation are a subset of that list and are below:

  • The number of registered voters
  • Voter turnout
  • The probability of a voter voting for a specific candidate (Harris or Trump)
  • Which candidate you are voting for

Simulating the number of voters

We first need to simulate the number of registered voters that will actually vote. To do this, we can use the binomial distribution. When simulating the number of voters, we put the number of registered voters as n and the historical voter turnout as the probability of success. This will give us a random number of voters based on the probabilities provided by the binomial distribution.

simulating voter turn out - image by author
simulating voter turn out – image by author

Let’s do a quick example – I live in the great state of Texas, we have 17.2mm registered voters with a historical voter turnout of 58%. Here is what 10 simulations for number of voters would look like:

simulating total votes cast in Texas - image by author
simulating total votes cast in Texas – image by author

Using the binomial distribution sounds kind of complicated; why can’t we just multiply registered voters by the historical voter turnout (for Texas, that would be 17.2mm*0.58)? The main problem with this approach is that for each of our simulated elections, we will have the exact same number of voters. This is a problem because of how we define winning a state. There must be a tie and our vote must break the tie. If the number of voters is always the same, the number of voters will always be odd or even. If the number is odd, we will never swing the election (because there can be no ties)! This will mess up our simulation – randomly simulating the number of registered voters avoids this trap, and is also more realistic.

Let’s go through a quick example to explain why you can’t swing an election if an odd number of voters turnout. Let’s pick the odd number of 99 for our example. In a near perfect split, 49 people voted for your candidate and 50 did not. If you then vote for your candidate you have caused a tie, not a swing (and we aren’t going to get into how ties are broken here)! If it is the other way, 50 for your candidate and 49 not. Your candidate has already won and your vote still doesn’t swing the state. Either way, an odd number of voters before you show up to the poll will rob you of your power to swing the election!

Simulating the state election

Okay, we now have a way of simulating the number of votes that are cast. Our next task is to determine the probability of the popular vote in your state resulting in a tie. The first thing we need to do is rescale our poll numbers to accommodate our ‘two-party’ assumption.

Let’s go back to Texas and give an example of simulating the votes for Trump (republican). At the time of this writing, Trump is polling at 50% and Harris is polling at 46%. Because of our two-party assumption, we have to scale these two numbers so that they add up to 100%, otherwise our simulation will be off. The calculation to do this is simple – an illustration is below.

Scaling calculation example - image by author
Scaling calculation example – image by author

Alright, with our poll numbers rescaled, we have everything we need to estimate the probability of a tie. There are multiple ways of doing this, the easiest and most computationally efficient way is using the binomial probability mass function. It is essentially the opposite of the binomial probability distribution. For the binomial probability mass function, we input a number of successes (in this case, votes needed for a tie), total trials (number of votes cast) and the probability of success (scaled poll numbers) and it returns the probability of the specific outcome. The probability we get is the probability of an exact tie in our state!

Using binomial probability mass function to simulate probability of tie - image by author
Using binomial probability mass function to simulate probability of tie – image by author

Alright, we’ve gone through all of the logic to simulate a state’s election. It’s been a lot of design talk, let’s write some Python code and finally simulate the probability of swinging an election in our state!

This article has excerpts from the code base in this repo : Simulation GitHub Link – feel free to modify and run your own simulations 😁 !

Here is the code for the ‘state’ class:

import numpy as np
from scipy.stats import binom

class state():

    def __init__(self,
                 state_abbrv,
                 elect_votes,
                 reg_voters,
                 voter_turn_out,
                 vote_prob_dem,
                 vote_prob_repub,
                 party,
                 exclude_odd_vote_results = False):

        '''
            state_abbrv (str)      : state abbreviation
            elect_votes (int)      : number of electoral votes that state has
            reg_voters (int)       : number of registered voters in state
            voter_turn_out (float) : percent of registered voters that vote
            vote_prob_dem (float)  : probability that a single voter
                                     votes for the democrate candidate
            vote_prob_rep (float)  : probability that a single voter
                                     votes for the republican candidate
            party (str)            : the party of the potential swing voter
                                     'rep' or 'dem'                                
        '''

        self.state_abbrv = state_abbrv
        self.elect_votes = elect_votes
        self.reg_voters = reg_voters
        self.voter_turn_out = voter_turn_out
        self.vote_prob_dem = vote_prob_dem
        self.vote_prob_repub = vote_prob_repub
        self.party = party
        self.exclude_odd_vote_results = exclude_odd_vote_results

        # scale poll numbers for two-party assumption
        sum_dem_rep = self.vote_prob_dem + self.vote_prob_repub

        if self.party == 'dem':
            self.vote_prob = self.vote_prob_dem / sum_dem_rep
        else:
            self.vote_prob = self.vote_prob_repub / sum_dem_rep

        # simulate the number of voters that turn out
        self.actual_voters = np.random.binomial(n = self.reg_voters,
                                                p = self.vote_prob)            

        return

    def simulate_election(self):

        '''
            Simulates an election by simulating the vote of each 
            voting registered voter in the state.  Assumes popular 
            vote winner takes all electoral points

            output
                elect_points_won (int) : if the candidate wins the state, this is 
                                         the self.elect_votes else, it is 0        
        '''

        votes_for_candidate = np.random.binomial(1, self.vote_prob, self.actual_voters)

        # see if votes for candidate is greater than 50% of actual voters
        if np.sum(votes_for_candidate) > 0.5*self.actual_voters:
            return self.elect_votes
        else:
            return 0

    def simulate_votes(self):

        '''
            Simulates the votes from a state election. If state would
            be lost without resident's vote and won with it, then
            the resident will swing the state's vote.

            input:
                no explicit inputs - uses multiple class attributes

            output:
                swing_vote (bool) : True if resident's vote would swing the
                                    election, False otherwise

        '''
        # if actual_voters is odd, there is no chance of swinging the election
        if (self.actual_voters % 2 != 0) and (self.exclude_odd_vote_results):
            swing_vote = False
            return swing_vote

        # calculate probability of a tie using binomial distribution
        prob_of_tie = binom.pmf(int(self.actual_voters*0.5), self.actual_voters, self.vote_prob)
        rand_num = np.random.rand()

        if rand_num <= prob_of_tie:
            swing_vote = True
        else:
            swing_vote = False

        return swing_vote

As I mentioned before, the primary attribute of the ‘election_simulation’ class is a dictionary of ‘state’ class instances. Here is an example of setting up that dictionary (the full dictionary, with all of the states can be found in the GitHub repo).

state_dict = {
    'AL': state(state_abbrv='AL', elect_votes=9, reg_voters=2499000, voter_turn_out=0.454, vote_prob_dem=0.35, vote_prob_repub=0.63, party = 'dem'),
    'AK': state(state_abbrv='AK', elect_votes=3, reg_voters=597000, voter_turn_out=0.429, vote_prob_dem=0.40, vote_prob_repub=0.56, party = 'dem'),
    'AZ': state(state_abbrv='AZ', elect_votes=11, reg_voters=4543000, voter_turn_out=0.619, vote_prob_dem=0.49, vote_prob_repub=0.50, party = 'dem'),
    'AR': state(state_abbrv='AR', elect_votes=6, reg_voters=1631000, voter_turn_out=0.434, vote_prob_dem=0.32, vote_prob_repub=0.64, party = 'dem')
    }

Below is the code for the ‘simulate_election’ class. The work horse of the class is the ‘simulate_multiple_times’ method, which simulates national or state (or both) elections multiple times. This is the method I will use to create all of the election simulations in this article.

class simulate_election():

    def __init__(self,
                 state_dict,
                 resid_state,
                 elect_votes_to_win = 270):

        '''
            state_dict (dictionary) : list of instances of state class
            resid_state (str)       : state abbreviation for the single 
                                       potential swing vote
            elect_votes_to_win (int) : number of electoral votes to win
                                       default to 270

        '''

        self.state_dict = state_dict
        self.resid_state = resid_state
        self.elect_votes_to_win = elect_votes_to_win

        return

    def simulate_national_election(self):

        '''
            Simulates a popular election for each state and adds up
            electoral votes for candidate. Determines if the candidate
            loses without resident's state and wins with it. This is 
            a pre-requisite to the resident swinging the overall
            election

            input:
                no explicit input, but uses state_dict, resid_state
                elect_votes_to_win attributes

            output:
                swing_election (bool) : True if resident's state would
                                        swing election

        '''

        won_elect_votes = 0

        # simulate election for each state
        for state in self.state_dict.values():

            # skip simulation for resident's state
            if state != self.resid_state:
                state_elect_votes = state.simulate_election()
                won_elect_votes += state_elect_votes

        # determine of candidate loses with resident's state's electoral votes and winds with it
        loss_without_resid_state = won_elect_votes <= self.elect_votes_to_win
        win_with_resid_state = won_elect_votes + self.state_dict[self.resid_state].elect_votes >= self.elect_votes_to_win

        if  loss_without_resid_state and win_with_resid_state:
            swing_election = True
        else:
            swing_election = False

        return swing_election

    def simulate_state_election(self):

        '''
            Simulates the resident's state election and 
            determines if the resident's vote could swing
            the election.

            inputs
                no direct inputs, uses multiple class attributes

            output:
                swing_state (bool) : True if resident's vote
                                     could swing election False
                                     if not

        '''
        resident_state = self.state_dict[self.resid_state]

        swing_state = resident_state.simulate_votes()

        return swing_state

    def simulate_overall_election(self):

        '''
            Runs the state election and then the national election
            identifies if a single vote swung the entire election

            inputs
                no direct inputs, uses multiple attributes

            output
                swing_overall_election (bool) : True means a single vote 
                                                swung the election, false
                                                means that it did not
        '''
        state_swing_election = self.simulate_national_election()

        vote_swing_state = self.simulate_state_election()

        if state_swing_election and vote_swing_state:
            swing_overall_election = True
        else:
            swing_overall_election = False

        return swing_overall_election

    def simulate_multiple_times(self,
                                iters,
                                sim_type):

        '''
            Runs individual simulations multiple times.

            inputs
                iters (int) : number of simulations to run
                sim_type (str) : indicates what kind of simulation
                                 should be run - acceptable values are
                                 'state', 'national' and 'both'

        '''

        if sim_type == 'state':
            sim_func = self.simulate_state_election

        elif sim_type == 'national':
            sim_func = self.simulate_national_election

        elif sim_type == 'both':
            sim_func = self.simulate_overall_election

        election_results = []
        for _ in range(iters + 1):
            iter_result = sim_func()
            election_results.append(iter_result)

        swing_pct = (sum(election_results) / len(election_results))*100

        return election_results

And finally, below is the code used to actually run the state simulation. This runs the simulation for residents of each state. The results are a predicted probability of your vote swinging an election by state.

from state_info_dem_reps import states_dict
import election_simulations as sim

state_abbreviations = ['AL', 'AK', 
    'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA',
    'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD',
    'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ',
    'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC',
    'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'
]

swinging_states_df = pd.DataFrame()

for abbrv in state_abbreviations:

    print(f'running {abbrv}')
    test_sim = sim.simulate_elections(states_dict,
                                        abbrv)

    swing = test_sim.simulate_multiple_times(50000000, 'state')

    temp_swing_prob = (sum(swing) / len(swing))

    print(f'prob of swing = {temp_swing_prob}')

Intuitively, it seems very unlikely that (1) the voter turnout is even and (2) the votes tally to a perfect tie. And…… our intuition was correct! I ran 50mm simulations per state for democrat voters and republican voters (it took quite some time on my small PC 😐 ) and not a single one of the over 5 billion simulations had a swing in the state election.

So, from this we know that it is extremely unlikely to swing a state (less than 1 in 50mm), but we still don’t know what the probability is, and I don’t have enough computational power to run the number of simulations needed to see some swings. I decided to do a straight probability calculation instead of a simulation based one. To do this, I estimated the total voters by multiplying the voter turnout and total number of registered voters – I loosened the even voter turnout assumption in this step. I then used the binomial probability mass function to calculate the probability of a perfect tie in the votes. Here’s the code to do that:

from scipy.stats import binom

# code to calculate probability of swinging state without simulation
def calc_swing_prob(state, _50_prob = False):

    voters = int(state.reg_voters*state.voter_turn_out)
    votes_needed_to_win = int(0.5*voters)

    # scale poll numbers for two-party assumption
    sum_dem_rep = state.vote_prob_dem + state.vote_prob_repub

    if state.party == 'dem':
        vote_prob = state.vote_prob_dem / sum_dem_rep
    else:
        vote_prob = state.vote_prob_repub / sum_dem_rep     

    if _50_prob:
        vote_prob = 0.5

    prob_of_tie = binom.pmf(votes_needed_to_win, voters, vote_prob)

    return prob_of_tie

Python floats round to zero after 1e-324 (that is super small), so anything less than that returns a 0. As you can see from the table below, all states but a handful were rounded to zero. Given that there are still a lot of zeros, I decided to calculate the probability of swinging a state if there was exactly a 50% chance that a voter votes for your candidate. This is the best case scenario/highest probability scenario of swinging a state. This gave much higher probabilities of swinging a state. But an exact 50/50 poll is really unlikely and, to make matters worse, even small changes to this split has huge impacts. For example, in Iowa (IA) the scaled probability of a voter casting a democrat vote is 50.5%, which takes the probability of swinging an election from 0.0007 (for exactly 50%) to 1.77E-322 😲 !

So, in summary, the probability of swinging any state’s vote is so small the simulation requires an intractable (at least for my technology setup) amount of iterations to see even a single swing. We switched to using probability theory instead which shows that the probabilities are very close to zero for all states.

Simulating your state swinging the nation

We currently have a way to estimate the probability that your vote swings the state’s election. But that is only one thing that needs to happen for your vote to decide the presidency. The other thing is that your state needs to swing the national electoral college vote as well. For a president to win, they need 270 electoral votes. We need to estimate the probability that without your state, your candidate has less than 270 and with it, your candidate has at least 270. How will we do this? You guessed it, the binomial distribution again!

The process will go as follows, for all states except yours, we will simulate an election by (1) simulating voter turnout, (2) simulating who those voters vote for, (3) calculating the winning candidate for the state and (4) tallying the electoral votes for your candidate from all states but yours. Once we have the simulated electoral votes without your state, understanding if your state could decide the election is simple math. If your candidate already has 270 votes, they already win without needing your state – no swing. If they have less than 270 and adding your state’s votes still doesn’t get them to 270, they lose no matter what – no swing. But, if they have less than 270 and your state’s vote would get them at or above 270, your state swings the election!

Possible national election scenarios - image by author
Possible national election scenarios – image by author

Okay, with the logic in our heads – I’ll copy some excerpts from the full code I shared in the previous section to highlight where this logic is used.

def simulate_national_election(self):

        '''
            Simulates a popular election for each state and adds up
            electoral votes for candidate. Determines if the candidate
            loses without resident's state and wins with it. This is 
            a pre-requisite to the resident swinging the overall
            election

            input:
                no explicit input, but uses state_dict, resid_state
                elect_votes_to_win attributes

            output:
                swing_election (bool) : True if resident's state would
                                        swing election

        '''

        won_elect_votes = 0

        # simulate election for each state
        for state in self.state_dict.values():

            # skip simulation for resident's state
            if state != self.resid_state:
                state_elect_votes = state.simulate_election()
                won_elect_votes += state_elect_votes

        # determine of candidate loses without resident's state's electoral votes and wins with it
        loss_without_resid_state = won_elect_votes <= self.elect_votes_to_win
        win_with_resid_state = won_elect_votes + self.state_dict[self.resid_state].elect_votes >= self.elect_votes_to_win

        if  loss_without_resid_state and win_with_resid_state:
            swing_election = True
        else:
            swing_election = False

        return swing_election
def simulate_election(self):

        '''
            Simulates an election by simulating the vote of each 
            voting registered voter in the state.  Assumes popular 
            vote winner takes all electoral points

            output
                elect_points_won (int) : if the candidate wins the state, this is 
                                         the self.elect_votes else, it is 0        
        '''

        votes_for_candidate = np.random.binomial(1, self.vote_prob, self.actual_voters)

        # see if votes for candidate is greater than 50% of actual voters
        if np.sum(votes_for_candidate) > 0.5*self.actual_voters:
            return self.elect_votes
        else:
            return 0

The code to call the national election simulation is very similar to the code to call the state simulations:

import pandas as pd
from state_info_dem_reps import states_dict
import election_simulations as sim
from closed_form_state_swing import calc_swing_prob

state_abbreviations = ['AL', 'AK', 
    'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA',
    'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD',
    'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ',
    'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC',
    'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'
]

swinging_states_df = pd.DataFrame()

for abbrv in state_abbreviations:

    print(f'running {abbrv}')
    test_sim = sim.simulate_elections(states_dict,
                                        abbrv)
    # pass 'national' to simulate national elections
    swing = test_sim.simulate_multiple_times(1000, 'national')

    temp_swing_prob = (sum(swing) / len(swing))

    print(f'prob of swing = {temp_swing_prob}')

With the theory and the code organized, let’s run some simulations to see the probability of your state swinging the national election. I ran simulations for republican and democrat voters and then I ran a simulation where each states polling numbers were a perfect 50%-50% split (this gives us the most interesting simulations). For the most part, in the simulation with actual polling numbers, the individual states are not likely to swing the state election. There are some states that happen to have a lot of power (like CA and NY). I think this is because, given how we set up the simulation, there isn’t a lot of variance in how the states vote across simulations. So, if the math of the electoral votes happens to work out that without the state in question’s votes, the candidate has less than 270 and with it it is ≥270, then the state will swing every time (because the other states don’t swap in the simulation). Perhaps a future iteration of the simulation could give some random variation to the poll numbers?

We can see that if we switch the poll numbers to a perfect 50/50 split, most states have a higher chance of swinging the election. This is because there is a lot more variance in the outcomes – e.g., Texas isn’t always voting republican for every single simulation.

Estimating the probability that your vote decides the president

We now have ways of estimating the probabilities of the two independent events that need to happen for your vote to decide the presidential election.

  1. Your vote swings your state’s election
  2. You state’s electoral votes swing the national election

Let’s use these to finally estimate what we’ve all been waiting for! As I just mentioned, these probabilities are independent of each other. In other words, the probability of your vote swinging your state and your state swinging the election are not changed by each other. Because of the ‘multiplication rule for independent events’ we can multiply these two probabilities. This is a lot more time effective then simulating them together!

So, multiplying the two probabilities together. For the simulations with the actual poll numbers, there wasn’t a single state that had a probability of swinging higher than 0! Since a table of all 0’s isn’t very fun or informative, I put the table of the perfect 50/50 split results below. The conclusion is, in the highest probability of swing (both for your state and the national election), there is a slightly higher than 0% probability of your vote deciding the election. When we use actual polling data, all probabilities round to 0%.

Why you should still vote

This article is just an exploration of simulating a specific system that is also a culturally pertinent topic. I don’t want the takeaway to be that people shouldn’t vote because there is essentially a 0% probability that they swing an election. I vote and I think other people should vote as well.

There are political reasons to vote other than swinging elections. There are also moral/ethical reasons to vote. I’m not really qualified to get into those. My main reasons for voting are based on classic logic!

My arguments for why you should still vote come from logical issues with the statement: "You shouldn’t vote because your vote will never swing an election."

Logical issue #1 – reductio ad absurdum:

‘Reductio ad absurdum’ is a Latin phrase that translates to ‘reduction to absurdity.’ This is an argument technique that follows a statement to its logical conclusion and exposes inconsistencies or other problems. A compelling argument against the statement above can be created using reductio ad absurdum. If you shouldn’t vote, then no one should vote (since everyone is in your same situation with regards to voting). If no one voted, our system of government wouldn’t work – how would we get leaders if literally no one cast a vote? Therefore, you should vote to keep our government from collapsing.

Logical issue #2 – self refutation

I’ll first illustrate a ‘self-refuting’ argument with a classic example: ‘there is no truth.’ For this statement to be true, at the very least, the statement must be true (duh). This refutes the statement because it says there is no truth! Thus, the statement is self-refuting.

"You shouldn’t vote because your vote will never swing an election" is a self-refuting argument because, if followed to its logical conclusion, no one would vote. But, if no one voted, a single vote would decide the election and the vote would matter! As people follow the advice of the statement, the impact of an individual vote becomes more important and thus the statement becomes less valid. It is self-refuting!

Conclusion

Well, as our intuition already indicated, the probability of a single vote deciding the next election is extremely small – like impossible to conceptualize small! But, you should not let this stop you from voting! I hope this was a fun and useful exercise in simulating a real-world system to learn more about it. Happy voting!

The post Will Your Vote Decide the Next President? appeared first on Towards Data Science.

]]>
I’ve hired 3 cohorts of data science interns – here’s my advice on getting an offer https://towardsdatascience.com/ive-hired-3-cohorts-of-data-science-interns-here-s-my-advice-on-getting-an-offer-036bdba7d4b2/ Tue, 24 Sep 2024 18:29:01 +0000 https://towardsdatascience.com/ive-hired-3-cohorts-of-data-science-interns-here-s-my-advice-on-getting-an-offer-036bdba7d4b2/ Resume and interview tips for landing a data science internship

The post I’ve hired 3 cohorts of data science interns – here’s my advice on getting an offer appeared first on Towards Data Science.

]]>
I’ve Hired 3 Cohorts of Data Science Interns – Here’s My Advice on Getting an Offer
Image by Djordje Samkov from Pexels.com
Image by Djordje Samkov from Pexels.com

In my current role, I’ve had the responsibility of reviewing resumes, performing interviews, and making Data Science intern hiring decisions for the last three years. As my group is prepping for our fourth cohort of summer interns, I thought it might be helpful to publish a few pieces of advice based on my experience and observations.

This isn’t a comprehensive guide to getting a data science internship – a lot of other people have done great work on that subject. This is a hodge-podge of advice that comes from my experience looking at hundreds of resumes and performing dozens of data science internship interviews. My hope is that you can find a few unique tips that can help round out your resume and interviewing skills. I’ve put them in order of my opinion of their importance. Let’s jump in!

Tip 1: Differentiators are always important, but they are king for internships!

You always want to differentiate yourself and your resume when looking for any kind of job. This often naturally happens through work experience. But, for Internships, work experience is not very common. Because of this, there is not a ton of variety in the resumes I see. Even small differentiators can have big impacts because of how similar most resumes are.

I’ll express the huge need for resume differentiation further by asking you to put yourself in my shoes at a recruiting event. Imagine you are at a booth in a recruiting event for the statistics department at The Generic University. For two hours, students wait in line to introduce themselves to you and hand you their resume. By the end of the day you are hoarse, your feet hurt, you have a blurred memory of over one-hundred faces and a stack of resumes that rivals the IRS tax code in size.

Now I have the unenviable task of reviewing said stack of resumes and deciding whom I would like to invite to an Interview and whom I will pass on.

I start looking at the resumes….

Resume 1 : Jane Doe

  • Master’s in statistics from Generic University
  • No work experience
  • Some random hobbies

Resume 2 : Joe Doe

  • Master’s in statistics from Generic University
  • No work experience
  • Some random hobbies

Resume 3: John Doe

  • Master’s in statistics from Generic University
  • No work experience
  • Some random hobbies

And it continues for dozens more!

You can see my problem here, right? I can’t interview every single statistics student, but so many of them look almost exactly the same! So, I scour the resumes for some kind of differentiator – anything that sets them apart from their classmates in a meaningful way. Because of how similar internship candidates can appear, even small differentiators can have huge impacts.

Okay, I think I’ve driven the point of why differentiation is especially important for internships sufficiently. Let’s get into the kind of ‘differentiators’ I look for when reviewing resumes:

  • Any kind of data-related work experience (data analyst, statistics tutor or TA, etc.)
  • Freelance job or two doing some kind of data work
  • Some kind of a certification (e.g., AWS ML certification)
  • An interesting data-centric project outside of your school work
  • Writing data science articles (like in Towards Data Science!) or blogs
  • Interesting data science topics for thesis or dissertations
  • Development of an R or Python package
  • Participation in a data centric hackathon
  • Participation in a Kaggle competition
  • Leadership position in a data focused club at school

Of course, this list is not comprehensive. The point is, do something impressive that other people are not doing and then advertise it!

Sometimes, candidates will think that something is a material differentiator when (in my opinion) it is not. Here are a few things that I don’t really see as differentiators:

  • Projects related to coursework – I assume everyone does school projects
  • GPA – Because of varying levels of grade inflation, I never really know what is an impressive GPA. If I’m in a pinch I might use it as a last-resort differentiator (e.g., two really similar candidates with a larger delta in GPAs or similar candidates from the same program), but I prefer to use the differentiators I listed above first.
  • Interesting hobbies or background not related to data – I’m not saying to not put these in your resume. It can give you some personality, I just won’t give you any extra points for them.
  • Being a member of a data club in school (This doesn’t tell me anything about your level of activity in a club – did you show up once for the kick off where they had free pizza or are you an enthusiastic, active member?)
  • Cover letters and follow-up thank you emails – I don’t see electing to do or not do these as indicative of your potential as a data scientist

I’ve focused here on the resume because without differentiators on your resume, you are much less likely to get the interview. Of course, highlight your differentiators in the interview as well. Don’t rely on your resume to speak for itself, speak for it!

Tip 2: Express interest in data science specifically (especially in your resume)

This seems like a no-brainer, but a lot of resumes (your competition) miss the mark here! I’ll get into some ways I recommend doing this below, but I first want to emphasize who needs to pay the most attention to this section. Anyone who has a major that is not obviously data science-related (basically everyone studying anything other than statistics, data science, and machine learning) should pay special attention here! In my experience, computer science majors fall into this problem most often, so I’ll pick on them to serve as examples. They often have great resumes for software development, but don’t mention anywhere that they are interested in data science. This leaves me wondering why they applied. I don’t have time to reach out to each candidate to understand their motivation, so I have to dock points since I don’t know why they are interested in the opportunity. If you fall into this category, this section is for you!

First of all, the easiest way to show interest is to simply explicitly say that you want to work in data science on your resume! Probably the best way to do this is to put a concise objective statement at the top of your resume. There have been many times when I’ve passed on resumes because there isn’t an ounce of indication that they actually want to work in data science. In most of these cases, I would’ve given the resumes a closer look if there was just a short objective statement that talked about wanting to get experience in data science.

Just simply saying that you want to work in data science is a good thing; if you want to do better (which of course you do!) demonstrate how your background is consistent with your data science objective. For example "I’m pursuing a master’s in computer science to acquire the skills to be a top class data science coder." Or "My background in physics has provided me with the quantitative skills to understand predictive modelling at a deep level." Not perfect examples, but I hope you get the idea – explaining how your background and academic goals tie to your goal of being a data scientist can pre-emptively answer a lot of questions that I may have while reading your resume – and that is a very good thing!

Of course, words are cheap, and I know that many applicants have multiple resume versions depending on what they are applying to. I generally take the sincerity of an objective statement at face value, but I’ll take it much more seriously if it is backed up by some other things on your resume. Any of the differentiators listed above in tip 1 will work wonders to show that you are very interested in pursuing data science.

Note: Data scientists come from a wide variety of backgrounds. Do not feel that you are at a big disadvantage if you are coming from a ‘non-traditional’ path. We typically will consider any applicant studying a ‘quantitative’ field. But, if your field isn’t directly related to data science you have a little extra work to make sure I understand that you really are passionate about the field.

Tip 3: Realize that no one gets all of the answers right in the interview, don’t get flustered!

I’ve had multiple opportunities to talk to accepted candidates about their interview experiences after they started their internship. Almost all of them say they felt like they didn’t do well in the interview. At first that came as a surprise to me, because (obviously), I thought the were among the best interviewees. I realized they felt this way because they were holding themselves up against a perfect standard – i.e., getting all of the questions right – while I was measuring them up against the other candidates that I had interviewed.

I think it is important to realize that no one does a perfect interview, so if you get a question slightly wrong, or fumble with your words at some point, you aren’t automatically out. I can almost guarantee that your competition has also made some mistakes. I don’t tell you this to make you feel complacent or that the competition isn’t stiff. After all, even if no one does perfectly, you still have to beat most of them to get an offer! My purpose in sharing this insight is to give you confidence during the interview so you don’t ‘crash and burn.’ If you make a mistake or say something awkward, just move on, chances are your competition has also made some mistakes. Try to keep mistakes to a minimum of course, but don’t give up or get flustered because some things didn’t go perfectly! You are probably still in the race!

Tip 4: Do not guess, never guess! Have a strategy for when you don’t know the answer

I think the single worst thing you can do in an interview (barring something totally crazy) is to guess the answer to a question. You might be able to pull it off, and I honestly can’t tell you how many have in my interviews (because I don’t know when some one got one by me 😁 !). But I can say if I realize someone is guessing, they lose major points. Not only because they don’t know the answer, but because they are showing really bad judgement. Imagine if someone I hired was later in a meeting with an executive. That executive asks them a question they don’t know the answer and they just guess. What a nightmare!

You need to have a few plans to handle questions you don’t know in a graceful way. Luckily, as an intern you have a solid go-to – "I don’t have a lot of confidence in X, but I will be taking a class next semester that should cover it." With interns, unlike applicants for full-time positions, we expect you to grow and develop between the application period and your start date. Knowing which classes you are going to take and being able to tie a knowledge gap to one of those classes is about a best-case scenario when you don’t know an answer.

So, saying you will take a class that will cover a specific topic is a great one. But, what if the topic isn’t in your future coursework? You also need a plan for this scenario. If you are familiar with a similar topic, you can say something like "I’m not sure exactly about that, but I do know a general principle about the larger topic is ……" Don’t dismiss your lack of knowledge, or pretend like you didn’t understand the question, but add a little color to let the interviewer know that you are familiar with somewhat similar topics.

You also need to have a plan on what to do if you literally know nothing about the topic. This is a really tough one! My recommendation is to go for honesty and show enthusiasm for learning. Something like this might be the best solution in a worst-case scenario – "I honestly am not familiar with this topic. As a motivated student inside and outside of the classroom, I hope to continue to learn about this and other related topics as I continue to grow my knowledge base."

In summary, you need to have a plan on how you will answer questions you don’t know the answer to. It should be a multi-faceted plan with a protocol for different scenarios (e.g., you don’t know, but it is a topic you will study later, you’ve heard of it and know just a little bit, you’ve never heard of it in your life etc.). And of course, whatever you do, don’t guess!

Tip 5: Pay attention to and pick up on interviewer hints and cues

Picking up on subtle or not-so-subtle clues and cues from the interviewer can give the interviewer valuable insights on your communication aptitudes. Missing them can be detrimental to your candidacy.

In this section, there are two types of cues I want to address. The first are general social cues and the second are hints and cues specific to solving technical problems.

This is more of general internship/communication advice – just pay attention to the social cues of the interviewers. Listen for polite, but somewhat dismissive ‘uh-huhs’ that indicate that you are talking too long about a subject or you are discussing something that they are not wanting to discuss. I’ve sat through many 15 minute dissertation summaries (in a 45 minute interview) giving dozen’s of ‘uh-huhs’ – which roughly could each be translated to ‘please stop talking.’ If the interviewer is constantly asking you to elaborate, take that as a cue that your next answer should be in more detail. Everyone has different communication and interviewing styles, make sure you are paying enough attention to adapt to the styles and preferences of the person on the other side of the table. Your odds of getting the offer will be much higher!

One question I always ask myself after an interview is ‘Would I like to work on a project with this person?’ Being able to pick up and adapt based on social cues will make the interview a lot more enjoyable for me and will make me a lot more likely to answer ‘Yes!’

I don’t have a whole lot to say about hints and cues about technical questions, except, "use them!" – they are there to help you. If the interviewer tells you to go a different direction in your thinking, then do it! If they tell you to assume something isn’t an issue, assume it! I once asked a question about profit in an interview. The candidate rightly mentioned the importance of discounting cashflows. But discounting cashflows would make the problem too complicated. So I politely told him to ignore discounting for this question. He continued to talk about discounting in all of the follow up questions. He even tried to answer the final big question using discounted cashflows (and he did it incorrectly)! It was a really frustrating experience and he lost major points for his inflexibility.

Tip 6: Be prepared to introduce yourself, be prepared in general

This sounds so basic and honestly, before hiring three cohorts of interns I would’ve scoffed at this overly obvious advice. But my experience suggests this is something that people need to be reminded of. Make sure you have a buttoned up, excellent introduction that is concise and sells all of your strong points. I cannot tell you how many times candidates don’t seem to be prepared for that very predictable question. They often meander, going into too much detail some places and and skipping over detail other places. What a missed opportunity! The ‘tell me about yourself’ request is basically giving you the green light to say why I should hire you. Make sure you have a good pitch!

Less importantly, just make sure you are generally prepared. Know a little bit about the company, have a pen and paper handy, have a copy of your resume to reference etc. basic stuff that won’t win you any points, but will keep you from losing points. And shockingly, there are many people who lose a few points because they seem flustered and unprepared.

Tip 7: Put your expected graduation date clearly on your resume

This is a really simple one. A lot of internship programs target specific graduation dates. If you don’t put your graduation date on there, one of two things could happen (1) you get an interview and they ask for your graduation date there, or (2) your resume gets passed because the hiring manager doesn’t have time to make sure you have an acceptable graduation date. Either way, if your graduation date doesn’t meet the requirements, you won’t get the offer. But if you don’t put it there and your graduation date is acceptable, there is a chance you won’t get an interview because of it. So basically, just put your graduation date. If it isn’t a fit, you won’t waste your time preparing for an interview that will be over as soon as they find out your date. If it does work, you are more likely to get an interview. No reason to not include it!

Tip 8: Understand that the interviewer most likely wants you to do well – they won’t cut you any breaks, but they are probably secretly rooting for you

My last tip is to help you calm your nerves before an interview. I can’t speak for everyone, but when I’m doing an interview, I’m hoping the candidate does well. I don’t want to trip you up with arbitrary tricks or ‘gotcha questions.’ I’m not going to make it easier by any means, but I’m secretly cheering for you. The reason I feel this way is because I want to find great candidates quickly (I’m not a recruiter, I have plenty of data sciencing that I would like to get back to 🤣). I want you to be a great candidate and I’m hoping you are. I think that a lot of interviewers feel the same as way. They won’t cut you any slack, but they are also not looking for tricky ways to eliminate you.

Conclusion

I hope that some of these tips will prove useful to you. Remember that these are just my opinions. Other interviewers and companies may have different opinions and priorities. You definitely want to use your discretion. I do honestly think that following these 8 tips will greatly improve your probability of success in landing that data science internship this year! Good luck!

The post I’ve hired 3 cohorts of data science interns – here’s my advice on getting an offer appeared first on Towards Data Science.

]]>
Linear Programming Optimization: The Simplex Method https://towardsdatascience.com/linear-programming-optimization-the-simplex-method-b2f912e4c6fd/ Tue, 10 Sep 2024 07:28:23 +0000 https://towardsdatascience.com/linear-programming-optimization-the-simplex-method-b2f912e4c6fd/ Part 3: The algorithm under the hood

The post Linear Programming Optimization: The Simplex Method appeared first on Towards Data Science.

]]>
Up until now, this series has covered the basics of linear programming. In this article, we are going to move from basic concepts into the details under the hood! This article will cover the simplex method, which is the algorithm that is often used to solve linear programming problems. While we will solve a simple linear programming example by hand with the simplex method, our focus will be on the intuition of the algorithm rather than memorizing the algorithmic steps (we have computers for that kind of stuff!).

Here is what we will cover:

  1. Why the simplex method is needed
  2. Moving from graphical solutions to algebraic
  3. Demonstrating how the simplex method works with a simple example

Here is the link to the list that contains all of the articles I’ve written so far for this series:

Linear Programming

Why the simplex method is needed

In the first article of this series, we went over how the attributes of linear programming allow it to only consider the corner points of constraints as potential optimal solutions. This is a very powerful feature that narrows an infinite solution space to a finite solution space. In the examples we reviewed, we only had a few constraints and a few variables – we even solved some of them by hand! After looking at some simple examples, one might assume that an exhaustive, brute force (looking at every single corner point) approach to searching for the optimal solution is sufficient.

While, for smaller LP problems, the brute force method is fine, as the problems get more complicated, the number of corner points can explode. The number of total corner points (including feasible and infeasible) is a function of (1) the number of variables and (2) the number of constraints in the problem.

Formula for the total number of corners - image by author
Formula for the total number of corners – image by author

Below are some examples of the growth of the number of corner points for multiple variables and constraint combinations:

Quick explosion in number of corner points - image by author
Quick explosion in number of corner points – image by author

The rapid growth of the corner points necessitates a smarter way of finding the optimal solution – enter the simplex method! The simplex method is a clever way to look at only a small subset of all of the possible solutions while still guaranteeing a global optimal result – we’ll get into the details a little further down.


Moving from graphical solutions to algebraic

When learning the basics of linear programming, it is really nice to look at a graphical representation of an Optimization problem with two decision variables. We covered this in part 1. While this works wonders for developing an understanding of the basics, it doesn’t extrapolate well to more complicated problems. Its usefulness is essentially limited to instructional purposes.

To further our understanding of LPs and get to a point where we are ready to start discussing the details of the simplex method, we need to understand how to transfer an LP problem into an algebraic format.

One of the main components of getting an algebraic set up is transforming the constraint inequalities to equalities. Outside of the context of LPs, there isn’t a good way to do this mathematically. But remember how for LPs, we are really only interested in corner points? And all corner points are on the lines? Since we actually only care about the borders of the solution space, we basically ignore the inequality! Again, this is because we wouldn’t worry about the area not on the border lines of the space anyways.

Nice, let’s just switch all of the inequality signs to equality signs and we are good to go, right? Not so fast! When we consider corner points, we aren’t usually on all of the lines at the same time. Because of this, we need to add a way to ‘hop’ on and off of specific constraint lines. This is a little hard to explain in words – ironically, I’m going to explain this concept… graphically.

Switching from inequalities to equalities - image by author
Switching from inequalities to equalities – image by author

Okay, so now see why we can remove the inequality for equality – but we have a big problem. Let’s explore this problem by looking into the math of a specific corner point:

image by author
image by author

Let’s plug these X₁ and X₂ values for all of the linear constraints:

image by author
image by author

We see here that the more strict equality causes these values of X₁ and X₂ to not be viable options for the third constraint. We need a way to make it possible for us to be at the corner while having the variable values work in all of the constraint formulas.

The solution to the problem above is to create slack variables. Slack variables are added to each equality and give us the flexibility to only consider the constraints involved in one corner at a time.

We can ‘hop’ to a corner point by setting the slack variables to 0 for the constraints involved. For the additional constraints that are not involved, we can set to slack variable to whatever the difference (or the slack) is between the decision variable side of the equation and the constant side of the equation.

Adding slack variables gives us the flexibility to 'hop' to corner points - image by author
Adding slack variables gives us the flexibility to ‘hop’ to corner points – image by author

If the slack variable is set to 0, then the constraint is one of the constraints in the corner point we are considering. If it is anything other than zero, we know that the specific constraint is not involved in the corner point under consideration.

Okay, with our inequalities set to equalities and our slack variables created, we are ready to solve linear programing problems algebraically. Let’s now get into the details of how the Simplex method uses this set up to efficiently optimize!

Note: When we were discussing the number of possible corner points in the previous section, n includes slack variables, not just the original decision variables.


Demonstrating how the simplex method works with a simple example

First and foremost, the simplex method is an algorithm that efficiently moves from corner point to corner point, calculating the objective values of the corner points until it finds the globally optimal solution. The simplex method is a greedy algorithm that moves to the corner point that increases (for maximization) or decreases (for minimization) the objective function the most. Greedy algorithms are typically infamous for finding sub-optimal solutions – but, because of the characteristics of linear programming, the simplex method is guaranteed to find the optimal solution.

I’d like to take some time to build intuition on why a greedy algorithm, like the simplex method, can find a global optimal for linear programming. The defining attribute of a greedy algorithm is that it makes optimal decisions given just the immediate information. Often, a series of local optimal decisions do not lead to a globally optimal solution – but, as I mentioned, this is not the case for linear programming. The reason it holds is because of the linearity requirement for the constraints. Let’s look at an example of making optimal decisions with linear functions.

Imagine we are at the intersection of two functions – we want to select the function that will give us the highest y-value for the next x-value.

Demonstration that for linear functions, local optimal decisions are also global - image by author
Demonstration that for linear functions, local optimal decisions are also global – image by author

The simplex method uses a very similar approach to move to different corner points in a greedy fashion. One of the keys to the simplex method is that it terminates when no more better adjacent corners are found. This corner is both the local and global optimum because of the convexity of the solution space.

Intuitive Example

With a general understanding of the set up of the simplex method and a basic understanding of what it does (greedily iterate through corner points to find the globally optimal solution) – I think we are ready to start working through an example.

So far, we’ve discussed three constraints, but we have not established an objective function. To do that, we’ll come up with some coefficients to give X₁ and X₂ and we will add all of the slack variables with coefficients of 0 (this makes the optimization tableau complete).

So here is the problem we will be solving:

image by author
image by author

Let’s get this set of linear equations simplex-ready. We’ll start with the constraints, which we’ve already discussed above. This is what they will look like:

image by author
image by author

We have not discussed how to modify the objective function to fit into the simplex approach. It is very simple, we set the objective value equal to Z and the transform the function algebraically to where 0 is on the right hand side. This looks like this:

image by author
image by author

Note that we added 0’s for the slack variables, they add no value to the objective, but this formulaic set up will be helpful as we start to build out the simplex tableau.

The simplex tableau is a data table that is used in the simplex method to traverse corners and calculate their objective function values. It also has implements to stop exploring once the optimal value has been found.

The tableau has one row for the objective function and one row for each constraint. At any point in the algorithm’s execution, there are basic and non-basic variables. Variables switch between these categories as the algorithm progresses. Basic variables are variables that are not set to 0, you can think of them as variables that are active. Non-basic variables are variables that have a value of 0, you can think of these as deactivated. The simplex method always starts at the origin. At the original, all decision variables are set to 0, which means that only the slack variables are basic, i.e., non-zero or "active." The starting tableau will show the slack variables as non-basic at the beginning.

Below is our example’s starting tableau:

image by author
image by author

Okay, now that we have our initial set up, we are ready to start our first iteration of the simplex method. We need to start moving away from the origin – we do this by moving one of the slack variables to non-basic and pulling one of the decision variables into the basic solution set. But which slack variable do we take out, and which of the X’s do we bring in?

Let’s first figure out which variable we want to make basic. Remember how we said that the simplex algorithm is greedy? This is where this comes into play! We look at the coefficients for the z-row and select the variable that has the most negative coefficient. This is a maximization problem, but remember that we switched the signs of the coefficients to get the 0 on the right hand side, so negative = good, positive = bad. The most negative coefficient is tied to the variable that will increase the objective function the most right now. Our options are -5 for X₁ and -3 for X₂, so we select X₁. This is called the entering variable – thankfully that is a pretty intuitive name!

image by author
image by author

Now, let’s talk about finding the slack variable to make non-basic or de-active/remove. In simplex terms, the variable that is being removed is (also intuitively) called the leaving variable. We want to find the maximum value that we can set the entering variable to, without violating any constraints. This means we need to find the slack variable that is associated with the tightest/most restrictive constraint on X₁. We can do this by taking the ratio of the ‘solution’ column and the constraint’s’ X₁ coefficients. X₁’s constraint coefficients are -2, 1 and 1 – the corresponding ‘solution’ column values are 5, 7 and 10.

S_2 has the lowest, non-negative ratio - therefore it will be selected as the leaving variable - image by author
S_2 has the lowest, non-negative ratio – therefore it will be selected as the leaving variable – image by author

Here’s the intuition behind this selection. We are trying to find the maximum value that X₁ can take while being consistent with the system of linear equations it is a part of. For the first constraint, really anything goes because the X₁ coefficient is negative – the other variables can compensate at any level to make the overall equation equal to 5. Obviously ‘anything goes’ is not a very restrictive constraint 😄 !

image by author
image by author

For the second constraint, X₁’s maximum value will be realized when S₁ and X₂ are set to 0, this maximum value is 7. We see that there are values for other variables that can make the system of linear equations work.

values exist such that the system of linear equations doesn't break - image by author
values exist such that the system of linear equations doesn’t break – image by author

Now let’s take a look at our last option, which is setting X₁ to 10. This causes big problems; remember that all variables have to be ≥ 0. As you can see below, there is no way to make the second constraint work given a value of 10 for X₁. Therefore, the highest we can set X₁ to, without violating a function in a system of linear equations is 7.

There is no way to make the system work with x1 = 10 - image by author
There is no way to make the system work with x1 = 10 – image by author

Wow, I feel like we’ve done a lot of work already! We’ve identified which variable we will be introducing into and which variable will be leaving the simplex tableau. Now we have to update the tableau to reflect these changes.

Before we get into the details of updating the tableau for the entering and leaving variables, let’s get some key terms down quickly. The row that belongs to the leaving variable is called the pivot row. The column that belongs to the entering variable is called the pivot column. The intersection of the pivot row and the pivot column is called the pivot element.

Simplex entering/leaving variable terms - image by author
Simplex entering/leaving variable terms – image by author

Okay, let’s update this tableau for the changes we need to make! The first step is to update the row for S₂ (it gets special treatment because it is the row we are swapping out for X₁), then we need to update all of the other rows.

We update the pivot row by dividing the entire row by the pivot element. This sets the row up so that the ‘solution’ column’s value is equal to the X₁ value. Unfortunately, for this example, since the pivot element is 1, it will appear that nothing has changed (I guess I should’ve picked a better example…). But we will update the ‘Basic’ column to show that X₁ is now in the basic solution and S₂ is not.

image by author
image by author

All of the other rows get modified with this function:

New Row = (Current Row) – (Its pivot column coefficient) X (New Pivot Column)

This function updates the solution columns so that the values of the basic variables make a coherent set of linear equations. Let’s go through that now. We’ll start with the row for the slack variable S₁.

modifying S1 because X1 was introduced to the basic solution - image by author
modifying S1 because X1 was introduced to the basic solution – image by author

The math for S₁’s row now ‘adds up.’ Remember that X₁ is now equal to 7. The constraint that corresponds to the first row is -2X₁ + X₂ + S₁ = 5. After our first move, this equals -14 + 0 + S₁ = 5. Given this formula, S₁ must equal 19, which is what we now have for the ‘solution’ column. We will follow this same step for the other two rows. The resulting tableau is the output of the first iteration of the simplex algorithm.

updated simplex tableau after first iteration - image by author
updated simplex tableau after first iteration – image by author

Now that we’ve completed the first iteration, we start a new iteration and follow the exact same steps as we did for the first. Remember that the first step is to find the most negative number on the z-row. Interesting… All numbers are positive. When all numbers are 0 or positive in the z-row, you are currently at the optimal point – this is the stopping criteria for the simplex method. For this example, our optimal solution is X₁ = 7 and X₂ = 0 with an objective function value of 35. The slack variables only serve an algebraic purpose and are discarded after the optimal solution is found.

For illustrative purposes, let’s image that there were a negative value in the z-row. Something like this:

hypothetical scenario where we have a negative value in z-row - image by author
hypothetical scenario where we have a negative value in z-row – image by author

If this were the case, we would select X₂ to be the entering variable and then proceed to (1) select the leaving variable, (2) recalculate the leaving variable’s row and switch it for X₂, (3) recalculate all other rows and (4) lastly look for negative values in the z-row to see if we have found the optimal solution, or if we need to move to the next iteration.

Conclusion

We’ve gone deep into the details of the simplex method in this article. I think it is important to understand the details, but it is also important to not miss the forest for the trees. Especially since you will never have to manually solve a linear programming problem in practice.

Here are a few key takeaways from this article:

  • The simplex method is an algorithm to solve linear programming problems using algebra
  • The objective and constraint formulas need to be re-written as equalities with slack variables before the simplex method can be used
  • The simplex method optimizes by doing a greedy search for the variable that adds the most to the objective function – then it sets that variable as high as possible
  • The simplex method stops after it can’t find any variables that will improve the objective function – this allows the simplex method to exit the problem early with the optimal solution – without having to explore all of the corner points

The post Linear Programming Optimization: The Simplex Method appeared first on Towards Data Science.

]]>
Linear Programming: The Stock Cutting Problem https://towardsdatascience.com/linear-programming-the-stock-cutting-problem-dc6ba3bf3de1/ Thu, 22 Aug 2024 04:49:22 +0000 https://towardsdatascience.com/linear-programming-the-stock-cutting-problem-dc6ba3bf3de1/ Part 2 - Linear Programming Example Deep Dive

The post Linear Programming: The Stock Cutting Problem appeared first on Towards Data Science.

]]>
This article is a deep dive into how linear programming can solve a specific problem called the ‘stock cutting’ problem. I want to provide a concrete example before getting too deep into the details linear programming in this series. This article will use Optimization terms that are not defined in it – I wrote an optimization basics article that covers those terms and other basic concepts.

In this article, we’ll go over:

  1. Stock cutting problem definition

2. Problem difficulty

3. Solving the problem in Python

Here are the links to the first article in the series and the optimization basics article I mentioned:

Linear Programming Optimization: Foundations

The Intuitive Basics of Optimization


Stock Cutting Problem Definition

The stock cutting problem is a very practical and applicable problem. The first article I ever wrote on TDS covered a greedy algorithm that I wrote to help me reduce waste when doing my own woodworking projects. The greedy algorithm was sufficient for little DIY projects, but linear programming can solve the stock cutting problem at an industrial scale!

How bad is being greedy?

Here is the setup for the problem. Imagine that you are making a piece of furniture. For the piece, you need multiple cuts of wood of various lengths. The lumber yard only sells wood of one length. The stock cutting problem can optimize the cutting strategy so that you buy as few pieces of wood as possible – which saves you money and minimizes waste.

Here’s a simple example of a stock cutting problem. Let’s say you need to make the following cuts (borrowing the example I made in my first article on the subject): [6′, 6′, 5′, 4′, 3′, 3′] and the store sells stock pieces that are 9′ long. The stock cutting problem asks, "What cutting strategy should we use to minimize the number of stock pieces we need to purchase?"

Before we get into how linear programming can help us solve this problem, let’s look at how a bad strategy can cost us money.

This specific instance of the problem is pretty simple and can easily be optimized by hand. But let’s just imagine we didn’t think the problem through or made a mistake when developing the strategy below, which requires purchasing 4 pieces of stock:

Poor strategy stock cutting strategy - image by author
Poor strategy stock cutting strategy – image by author

In the solution above, we get all of the required cuts, but we need to purchase 4 pieces of stock. Below is the optimal strategy which meets all of the required cuts, but only requires 3 pieces of stock to be purchased. The takeaway here is that optimizing the strategy can save real world money!

Optimal strategy that meets all cut needs with only 3 stock pieces - image by author
Optimal strategy that meets all cut needs with only 3 stock pieces – image by author

For this toy example, finding the optimal solution is very easy. But the challenge with the stock cutting problem is that as you add more cuts (think at an industrial scale) the problem gets extremely difficult to solve. And that is where linear programming can help us out. Let’s first get some intuition on why the stock problem cutting problem gets difficult with scale.


Stock Cutting Problem Difficulty

What makes the stock cutting problem difficult is the fact that adding more cuts vastly increases the size of the solution space. So, as the problem scales up, it quickly becomes intractable.

You can think of the stock cutting problem as an ordering problem. You are placing the cuts in a specific order. To execute a plan, you go through pieces of stock making cuts in the specified order – once the cut off piece is too short for the next cut, you start cutting a new piece of stock.

The number of unique ways to order n items (or in this specific case, cuts) is n!. Factorials grow extremely quickly. Below is a table I borrowed from my first article on the stock cutting problem to illustrate this point:

Factorial growth of the stock cutting problem's solution space - image by author
Factorial growth of the stock cutting problem’s solution space – image by author

The above table shows the size of the solution space if we are going to try to solve the problem directly via a brute force or greedy algorithm. When we use linear programming, we need to adapt our approach to the problem in a way that the solution space is smaller and there is not closed form solution (like the factorial function) to easily quantify the size. We’ll cover all of those specifics in the next section!


Solving an example problem in Python

As we’ve now established, the stock cutting problem is really hard to solve because of the very large number of potential solutions to explore. Luckily, we can use the power of linear programming to help overcome these challenges. Unluckily, for the stock cutting problem to fit into the linear programming paradigm, we have to make a few concessions. Let’s get into it!

If you recall from the first article in the series (or your basic knowledge of linear programming) – linear programming problems have a linear objective function and usually have linear constraints (if it is a constrained optimization problem). So, to encode the stock cutting problem into a linear programming problem, we need a way to create an objective function and constraints that make sense. We do this by providing the problem with pre-defined cut patterns – if that doesn’t make sense now, don’t worry about it, we’ll break it down!

What is a cut pattern you may ask? It is simply a plan for how to cut a single piece of stock. We showed multiple cut patterns in our quick example in the intro. The suboptimal solution (showing again below) gave us four unique cut patterns.

four unique cut patterns - image by author
four unique cut patterns – image by author

The catch with using linear programming is that we have to provide the cut patterns. Typically, if we are solving a problem that is hard enough to justify the use of linear programming, there are too many potential cut patterns to include them all in the problem. This will have some implications on the interpretation of the final optimized results – but we’ll get to that later!

For now, let’s use the four cut patterns established above to finally set up our linear programming problem. The problem will be deciding how many times to use each cut pattern. So, for example, if the problem decides to use cut pattern 1 three times, we will get three 5′ cuts and three 3′ cuts at an expense of three stock pieces. Following this structure, we want to minimize the number of patterns we use (objective) while making sure we have all of the cut lengths we need (constraints).

In the optimization function, we create a decision variable for each cut pattern. The variable values are the number of each cut pattern to use. Here is our objective function formally written out:

minimize total number of stock by minimizing the sum of the four cut pattern uses - image by author
minimize total number of stock by minimizing the sum of the four cut pattern uses – image by author

Now, let’s add the constraints. We need a constraint for each length of cut. As a reminder, we need these cuts for our project: [6′, 6′, 5′, 4′, 3′, 3′]. We’ll use 6′ as an example; our constraints need to ensure that the solution provides at least two 6′ cuts. We need to look at each cut pattern to see how many 6′ cuts each has. The table below shows the break down:

Number of 6' cuts for each cut pattern
Number of 6′ cuts for each cut pattern

From here, it is really easy to convert this into a linear constraint.

6' cut constraint - by author
6′ cut constraint – by author

Following the same process, we can make a constraint for each cut length. Okay, so the full set up for the problem looks like this:

full linear programming problem set up - image by author
full linear programming problem set up – image by author

Let’s turn to Python to actually solve this problem! We’ll set it up and solve it using the pulp library:

import pulp

# Define the problem
lp_problem = pulp.LpProblem("stock_cutting", pulp.LpMinimize)
# Define decision variables
x1 = pulp.LpVariable('x1', lowBound=0)
x2 = pulp.LpVariable('x2', lowBound=0)
x3 = pulp.LpVariable('x3', lowBound=0)
x4 = pulp.LpVariable('x4', lowBound=0)
# Objective function - minimize stock purchases
lp_problem += (x1 + x2 + x3 + x4)
# Constraints
lp_problem += x2 + x3 >= 2, "6 foot cuts"
lp_problem += x1 >= 1, "5 foot cuts"
lp_problem += x1 + x2 >= 1, "4 foot cuts"
lp_problem += x4 >= 2, "3 foot cuts"
# Solve the problem
lp_problem.solve()
# Output the results
print("Status:", pulp.LpStatus[lp_problem.status])
print("Optimal number of stock purchases:", pulp.value(lp_problem.objective))
print("Optimal use of cut patterns:")
print("x1 =", pulp.value(x1))
print("x2 =", pulp.value(x2))
print("x3 =", pulp.value(x3))
print("x4 =", pulp.value(x4))
results of the code above - image by author
results of the code above – image by author

Congratulations! You’ve now solved the stock cutting problem with linear programming – you may have some reservations about the our approach though. I certainly do! Let’s talk about a key concern:

Key Concern – results depend on cut patterns passed into optimization

The linear programming will always find the best solution given the cut patterns provided. It will not find the global best solution, unless all of the cut patterns required for the global best solution are available. This can be a challenge because, depending on the specifics of the problem, there can be a lot of unique cut patterns. Thankfully, including a good spread of cut patterns will usually be easier than solving the stock cutting problem itself.

There isn’t a closed form solution to calculating the total possible number of cut patterns because number of cuts on a specific piece of stock is variable. We can, however, follow a simple algorithm (which works well for small stock lengths and a short cut list) to calculate the number of unique cuts.

Here are the steps in the algorithm:

  1. Calculate the maximum possible number of cuts for a cut pattern
  2. Iterate through all cut patterns with the maximum number of cuts – eliminate patterns that are not feasible or duplicates.
  3. Go down one cut from the previous number of cuts and repeat steps 2 and 3 recursively.
  4. After all feasible, unique cut patterns have been created, remove sub-optimal patterns – defined as cut patterns that could have an additional cut.

I’m about to go into a lot of detail about how we can find all of the possible cut patterns for an instance of the stock cutting problem. This focuses on the specifics of the problem and not so much on the linear programming aspect. If this isn’t what you signed up for, feel free to skip down to the section titled ‘running the linear programming problem with all cut patterns.’

Let’s use our current example to illustrate the algorithm:

For the first step we’ll calculate the maximum number of cuts that can go into a cut pattern. To do this, we take our cut list and sort from shortest to longest. Which is [3, 3, 4, 5, 6, 6] – then we sum up the lower cuts until the sum is greater than the stock length. In our case, we add 3+3+4 = 10 – so the maximum number of cuts for our patterns is 2, because we can add at most two cuts to a single pattern (3 + 3).

Now we are ready for the second step. For this step, we find all of the unique combinations of two cuts from our list (the total count of unique combinations is 6 ‘choose’ 2):

unique patterns with 2 cuts - image by author
unique patterns with 2 cuts – image by author

Then, we eliminate duplicates and patterns that are not feasible (cuts that add up to > 9):

Eliminate cut patterns that are duplicated or not feasible - image by author
Eliminate cut patterns that are duplicated or not feasible – image by author

After eliminating duplicate and infeasible cut patterns, we are left with 5 unique patterns:

remaining 2-cut patterns - by author
remaining 2-cut patterns – by author

With the first and second step complete, the third step is to repeat the process but with one less cut on the stock. Since we had two cuts last time, we will do 1 cut now. This will go as follows:

Creating cut patterns with 1 cut per stock - image by author
Creating cut patterns with 1 cut per stock – image by author

Now that we have iterated through all possible patterns, and we’ve removed duplicate and infeasible patterns, we have one final step. Step four is to remove sub-optimal cut patterns. Sub-optimal is defined as cut patterns that could have an additional cut. For example, if we have the cut pattern that just has a 3′ cut, this is sub-optimal because it could also have an additional 3′ cut, or a 4′ cut or even a 6′ cut. It is sub-optimal because it is leaving cuts on the table for no reason. Including these ‘sub-optimal’ cut patterns won’t hurt our optimization solution. But, including them will slow down our optimization run time and create unnecessary complexity.

We check for sub-optimal solutions by simply looking at the cuts that remain after a pattern and see if the shortest one would fit in the pattern. If any cut does fit in the pattern, the pattern is sub-optimal and we eliminate it. The logic here is that a cut pattern that fits that description has waste that could be used for another cut, so there is no reason to not just use a cut pattern that has the original cut, plus the additional cut.

Identifying sub-optimal cut patterns - image by author
Identifying sub-optimal cut patterns – image by author

After all of that work, here is our final list of cut patterns:

final list of cut patterns - by author
final list of cut patterns – by author

This algorithm works well for very simple problems like our example. In fact, we were able to calculate all of the cut patterns by hand! But, as the number of cuts grow, this can quickly become intractable. When that happens, we will have to employ some kind of a heuristic based algorithm to come up with a subset of good cut options rather than creating an exhaustive list. The code in the linked repo (https://github.com/jaromhulet/stock_cutting_lp) has a class I wrote the uses the neighborhood-based hill climb algorithm to identify potential cut pattern candidates. Going into the details of that algorithm is beyond the scope of this article.

Running the linear programming problem with all cut patterns

Now that we have the 5 possible cut patterns, let’s re-run the optimization and see what the results are.

Here is the new formulation with all of the optimal cuts included:

linear programming set up for optimization with all cut patterns - image by author
linear programming set up for optimization with all cut patterns – image by author

And here are the results. As we can see, improving the cuts that are available in the LP problem gives us a better solution – we only need to buy 3 stock pieces instead of 4!

Adding all possible cuts improves optimal solution - image by author
Adding all possible cuts improves optimal solution – image by author

The key takeaway here is that the LP will always find the optimal solution, given the cut patterns provided. For simple problems, we can provide all cut patterns that make sense (unique and not sub-optimal), but for larger problems we have to take a subset of good cut patterns. If we provide a subset, linear programming will solve the problem well given the subset, but we may not get the global optimal solution. Spending time improving the subset could improve the solution, like in our example!

Conclusion

The purpose of this article was to give a concrete, real-world and in-depth application of linear programming. This is a powerful tool that is used across many domains and industries. Often, as with the stock cutting problem, the real challenge lies in how you formulate the problem to fit into the rules of linear programming. Once you have a solid understanding of how linear programming works, you can understand and find clever ways to encode problems (like the stock cutting problem) into the LP paradigm and thus take advantage of the optimizing power it has!

The post Linear Programming: The Stock Cutting Problem appeared first on Towards Data Science.

]]>
Linear Programming Optimization: Foundations https://towardsdatascience.com/linear-programming-optimization-foundations-2f12770f66ca/ Mon, 22 Jul 2024 07:31:51 +0000 https://towardsdatascience.com/linear-programming-optimization-foundations-2f12770f66ca/ Part 1 - Basic Concepts and Examples

The post Linear Programming Optimization: Foundations appeared first on Towards Data Science.

]]>
Linear Programming Optimization: Foundations

Linear programming is a powerful Optimization technique that is used to improve decision making in many domains. This is the first part in a multi-part series that will cover important topics having to do with linear programming. This first article will be the most simple and is meant to cover the basics before getting into more advanced linear programming topics.

In this article, we will:

  1. Discuss what constitutes a linear programming problem
  2. Go over how linear programming works and why it is powerful
  3. Run a linear programming example in Python

I personally find examples to be a very effective vehicle for learning technical topics. For that reason, we will be following a simple example throughout this article. I’ll introduce the example in the next section.

Before we dive in, if you aren’t very familiar with the basic concepts and terminology of optimization, I wrote an intro to optimization article that I encourage you to check out before continuing (there are many generic optimization terms in this article that I do not define).

The Intuitive Basics of Optimization

Setting Up Our Example

We are going to follow a silly example to give context to the concepts we’ll cover in this article. We are going to imagine that we are very simple people who only eat two things — apples🍏 and candy bars🍫 . We want to balance our health and our flavor enjoyment. That’s the set-up for our example, in the following sections, I’ll go over linear programming concepts and then apply the to this example. By the end, we’ll have run full optimization to help us decide how many apples and how many candy bars we should eat!

What is Linear Programming?

Linear programming is an optimization technique that minimizes or maximizes a linear objective function subject to constraints in the form of linear equations or inequalities (that’s a lot of jargon, don’t worry, we’ll put it into our intuitive example soon). The name "linear programming" predates what we typically think of as programming now, i.e., computer programming. Linear programming was developed and used in the WWII era. It was widely used to manage operational programs — such as moving supplies from factories to front lines. So, the "programming" refers to planning or scheduling rather than computers. Just a tid-bit of historical context so you aren’t always wondering when the ‘programming’ part comes in!

While it was originally developed to solve operational/logistical problems, there are many clever ways to make linear programming work for a wide array of problem types!

A linear programming problem has 3 parts described below:

by author
by author

Let’s set up these three parts for our apples and candy bars example.

Type: We want to maximize the enjoyment we have from eating food – so it is a maximization problem.

Objective Function: Luckily for us, we know ourselves so well, that we can quantify how much we enjoy apples and candy bars – we have even more luck in that our enjoyment makes a linear function that is perfect for LP problems 😅 !

image by author
image by author

This function can be used to calculate how much enjoyment we get given how many apples and candy bars we eat. We get 1 unit of enjoyment when we eat an apple, we get 3 units of enjoyment when we eat a candy bar.

Constraints: Although we have very little variety in our diet, we have a little bit of concern regarding our health. We’ve decided that we want to limit our (1) caloric and (2) sugar consumption. Since we want to control two variables, we need two constraints. We need to set up linear functions that will calculate calories and grams of sugar given a number of apples and candy bars. Once we have set up the formulas, we’ll make them inequalities based on how we want to limit our consumption. We’ve decided that we want to eat no more than 850 calories and consume no more than 100 grams of sugar.

Here are our constraints:

Apples have 95 calories and 15 grams of sugar, candy bars have 215 calories and 20 grams of sugar - by author
Apples have 95 calories and 15 grams of sugar, candy bars have 215 calories and 20 grams of sugar – by author

Alright, we now have a full linear programming problem set up! Here’s the summary:

image by author
image by author

We are very lucky that all of our preferences and constraints for food consumption are linear. There are a lot of rules about what types of constraint and objective functions that are allowed for basic linear programming. Violations to the rules are listed below:

image by author
image by author

LP can be expanded to allow for many of these violations – but allowing for the elements above loses the guarantee that the solution will be a global optimum. For our example today, we will stay within the lines! Future articles will explore how we can violate some of these rules using a modified linear programming framework called Mixed Integer Linear Programming.

How it Works

I think it is easiest to understand how linear programming works by creating a graphical representation. Since we only have two decision variables, we can make nice 2-D graphs (how convenient that we only eat two foods). We start by graphing our constraints:

Constraints visually represented - image by author
Constraints visually represented – image by author

The orange and blue lines are our linear constraints. We are only accepting solutions that are less than or equal to both of the lines. The shaded region is our feasible solution space. The task of the LP problem is to find the spot in the solution space that maximizes our objective function (our food enjoyment). This can seem like a daunting task — our solution space is infinite (since we can eat fractions of apples and candy bars)! Not to worry though, because all of our functions are linear, we can use a trick to reduce our infinite solution space to a finite set!

Let’s think about our solution space for a little bit. It is the gray area bounded by the blue and orange lines. When looking for optimal solutions, we are looking for extremes — the lowest or highest. So, it makes sense that the optimal solution will be on the borders or the gray area, because that is the most extreme. Now, we can conclude that our optimal solution is somewhere on the border of our feasible space. But that is still an infinite space. We need one more trick to get to a finite solution space. Let’s pick a random point on our blue line:

image by author
image by author

How can we make this point more extreme while staying on this edge of the feasible region? We could slide it all the way to the edge of the feasible region on the left, or we could slide it to where the orange line intersects with the blue line to the right. After that point, the orange line defines the feasible region instead of the blue line. These points are the most extreme points for this portion of the solution space. If we follow this logic for each edge or the solution space, we find that all of the intersections are the most extreme points of the edges. Through this algorithm, we reduce our infinite solution space to finite by just looking at the corners of the space. If our objective function and constraints are linear, the optimal solution will always be one of the corners! We simply need to look at all of the corner’s objective values and select the point with the highest or lowest value!

image by author
image by author

Below is a graphical representation of the linear programming algorithm searching for the optimal solution at the corner points:

image by author
image by author

As we discussed earlier, the objective function and constraints need to be linear for an LP problem. Now that we have a better understanding of how the LP problem works — i.e., checking the corners of the constraints — we are in a good position to understand how linearity violations can cause problems. When non-linear functions are introduced, extreme points can be found between corners because the lines are not monotonically increasing or decreasing. This nullifies the ‘trick’ that LP has of checking the corners to make the potential solution list finite.

In the example below, our calorie constraint is quadratic (of course this makes no practical sense — but let’s just go with it for the example’s sake!). The optimal solution is now the max of the parabola, which is not at a corner. So, if we were to try the corner method on this set of constraints, we would get a suboptimal solution! In summary, the constraints and objectives need to be linear because LP relies on the corners being the extreme points which means that the optimal solution is always at a corner. As we see below, when that doesn’t hold, the ‘trick’ doesn’t guarantee an optimal solution.

optimal solutions aren't only in the corners for non-linear constraints - image by author
optimal solutions aren’t only in the corners for non-linear constraints – image by author

Now I’m going to take a few minutes to talk about two scenarios where the LP algorithm will run, but not return an optimal solution. The first is if the solution space that meets all of the constraints is null. For example, if I decided that I wanted to eat negative sugar, there would be no space below the calorie constraint and above 0 for the X and the Y axis. See the graph below as an example. This will result in an ‘infeasible’ result.

no feasible region - image by author
no feasible region – image by author

The second scenario is when the solution space isn’t fully boxed in by the constraints. Image that instead of eating less than 850 calories and 100 grams of sugar, we wanted to eat at least 850 calories and 100 grams of sugar (meaning we switched our constraints to ≥). Then, our solution space would be unbounded. We could always eat more candy bars and apples and get more enjoyment. Of course this makes no sense! We would get an ‘Unbounded’ result – meaning that a meaningful optimal solution cannot be reached. With those two notes out of the way, let’s solve the LP problem in Python.

Example of unbounded LP Problem - image by author
Example of unbounded LP Problem – image by author

Running the Example in Python

Now we are going to get into how we can solve this linear programming problem in Python, using a package called pulp. Before we get into those details, I’ve already written two articles that use linear programming. One article, on simulation, demonstrates tangentially how linear programming can be used to help a fictitious housing developer optimize profit. The other article goes over a distribution comparison metric called the ‘earth mover’s distance’ which uses linear programing in its calculation. Check the links out below to see two more examples of how linear programming can be used!

Simulated Data, Real Learnings: Scenario Analysis

Comparison of Distributions with Earth Mover’s Distance

Okay, let’s get into coding the example we’ve been going through! The coding isn’t too bad.

After importing the pulp package, we instantiate a pulp.LpProblem object — here is where we define if we are solving a maximization or minimization problem.

import pulp

# Create the LP problem
lp_problem = pulp.LpProblem("Apples and Candy Bars", pulp.LpMaximize)

With our problem instantiated, let’s use the pulp.LpVariable function to create our decisions variables for apples and candy bars. Note : the lowBound parameter can be set to 0 to add a non-negative constraint to the decision variables.

# Define the decision variables
apples = pulp.LpVariable('apples', lowBound=0)
candy_bars = pulp.LpVariable('candy_bars', lowBound=0)

Now, we use the ‘+=’ operator to add our objective function and our constraints to the problem instance. Pulp doesn’t have a parameter to indicate if you are adding an objective function or constraints directly. Instead, it interprets a function that doesn’t have an equality/inequality as the objective functions and functions with equalities/inequalities as constraints. This is a little confusing, but it will hopefully make sense when you look over the code snippet below.

# Add the objective function - level of enjoyment from specific diet
# notice, no equality or inequality at the end of the function
lp_problem += 1 * apples + 3 * candy_bars

# Add calorie constraint
lp_problem += 95 * apples + 215 * candy_bars <= 850
# Add sugar constraint
lp_problem += 15 * apples + 20 * candy_bars <= 100

Okay, everything is set up now! All we have to do is call the solve() method on our lp_problem object and print the results!

# Solve the problem
lp_problem.solve()

# Print the results
print(f"Status: {pulp.LpStatus[lp_problem.status]}")
print(f"apples = {pulp.value(apples)}")
print(f"candy bars = {pulp.value(candy_bars)}")
print(f"Objective value = {pulp.value(lp_problem.objective)}")

Here are the printed results:

image by author
image by author

The status of ‘Optimal’ means that an optimal solution has been found — if no feasible solution was found or if it was unbounded, the ‘status’ attribute would indicate that to us. We can see that the optimal solution is to eat 3.95 candy bars and 0 apples for a total enjoyment of 11.86. This is exactly what we found when we solved the problem manually!

Here is the code in a single snippet:

import pulp

# Create the LP problem
lp_problem = pulp.LpProblem("Apples and Candy Bars", pulp.LpMaximize)

# Define the decision variables
apples = pulp.LpVariable('apples', lowBound=0)
candy_bars = pulp.LpVariable('candy_bars', lowBound=0)

# Add the objective function - level of enjoyment from specific diet
lp_problem += 1 * apples + 3 * candy_bars

# Add calorie constraint
lp_problem += 95 * apples + 215 * candy_bars <= 850
# Add sugar constraint
lp_problem += 15 * apples + 20 * candy_bars <= 100

# Solve the problem
lp_problem.solve()

# Print the results
print(f"Status: {pulp.LpStatus[lp_problem.status]}")
print(f"apples = {pulp.value(apples)}")
print(f"candy bars = {pulp.value(candy_bars)}")
print(f"Maximum enjoyment = {pulp.value(lp_problem.objective)}")

Conclusion

By now you should have a good understanding of what a LP problem is, how LP problems find optimal values and how to set them up and solve them in Python! In future articles we’ll go over details on how the LP algorithm works to check edges (called the Simplex method) and expansions of the LP problem to include integer variables (instead of just continuous).

The post Linear Programming Optimization: Foundations appeared first on Towards Data Science.

]]>