Linear Programming (LP) is arguably the most successful algorithmic framework in history. Developed independently by Kantorovich and Dantzig in the 1940s, it remains the standard tool for allocating resources, from routing internet traffic to optimizing global supply chains. At the core of LP lies the assumption that the world behaves in a linear way. For instance, if it costs 10€ to run a server, it will costs 100€ to run 10 servers (proportionality). This also means that we discard scale effects or diminishing returns. Additionally, we assume that total costs are always the sum of individual costs. There are no second- or higher-order interactions between the variables. While these assumptions may seem restrictive, they approximate a vast number of real-world systems well enough to be useful.
A linear program in standard form is normally expressed as:
\[
\begin{aligned}
\min_x\text{ } & c^T x \\
\text{subject to } & Ax\le b \\
& x\ge 0
\end{aligned}
\tag{6.1}\]
In this model, \(x\in\mathbb{R}^n\) are the decision variables, which we wish to determine. The vector \(c\in\mathbb{R}^n\) is the cost vector. The matrix \(A\in\mathbb{R}^{m\times n}\) contains all the linear constraints (one per row), describing how the variables interact with resources. The vector \(b\in\mathbb{R}^m\) thus represents a resource limit vector, establishing an upper bound on resource usage. The non-negativity constraint \(x\ge 0\) implies that we can’t produce negative units of a product.
At first sight, this formulation seems somewhat restrictive. The reason is that it assumes a very specific interpretation in terms of minimimization of costs and limits on resources, and some solvers actually expect the problem to be described in this form. However, with some changes, we can express more general problems:
Maximization instead of minimization: replace \(\min_x c^Tx\) with \(\max_x -c^Tx\).
Use of equality constraints: a constraint \(a^Tx=b\) can be expressed by two inequality constraints \(a^Tx\le b\) and \(-a^Tx\le -b\).
If \(x\) can be negative, we can replace x with two non-negative variables \(u,v\) such that \(x=u-v\).
As an example, consider the following Server Allocation Problem: A cloud provider has two types of jobs: CPU-bound and memory-bound. Each job type requires a certain amount of CPU and memory resources, and generates a specific revenue. The provider has a limited amount of CPU and memory resources available. The goal is to determine how many jobs of each type to run in order to maximize the total revenue, without exceeding the available resources. Let’s define the decision variables: let \(x_1\) be the number of CPU-bound jobs to run, and \(x_2\) be the number of memory-bound jobs to run. The revenue generated by each CPU-bound job is 5€, and each memory-bound job generates 8€. The cloud provider has a total of 100 CPU units and 80 memory units available. Each CPU-bound job requires 2 CPU units and 1 memory unit, while each memory-bound job requires 1 CPU unit and 3 memory units. We can formulate this problem as a linear program as follows:
In this formulation, the objective function \(5x_1 + 8x_2\) represents the total revenue from running \(x_1\) CPU-bound jobs and \(x_2\) memory-bound jobs. The constraints ensure that the total CPU and memory usage does not exceed the available resources.
6.1.1 Using Geometry to Solve LPs
Linear programs can be solved using various algorithms, with the most common being the Simplex method and Interior Point methods. Both methods leverage the geometric properties of linear programs.
Think about a linear inequality in the form \(a^Tx\le b\). This inequality describes a half-space in \(n\)-dimensional space, where \(a\) is the normal vector to the hyperplane defined by \(a^Tx=b\). The feasible region of a linear program is the intersection of all such half-spaces defined by the constraints. This feasible region is a convex polyhedron (or polytope), meaning that any line segment connecting two points within the region lies entirely within the region.
In 2D, this feasible region appears as a polygon.
In 3D, it appears as a polyhedron.
In higher dimensions, it is referred to as a polytope (a “hyper-diamond” shape).
Because the intersection of convex sets is convex, the feasible region of an LP is always convex. This guarantees that if we find a local minimum, it is also the global minimum (see Chapter 5).
Now consider the objective function \(c^Tx\). This function defines a family of parallel hyperplanes in \(n\)-dimensional space, each corresponding to a different value of the objective function. The goal of the LP is to find the point in the feasible region that lies on the hyperplane with the lowest (or highest, for maximization problems) value of \(c^Tx\). Now if the polytope is bounded (a polytope rather than a polyhedron), the optimal solution will always be located at one of the vertices (or corners) of the polytope. This is because the objective function is linear, and thus its maximum or minimum value over a convex set occurs at an extreme point of that set. This result is known as the Fundamental Theorem of Linear Programming and is the foundation for algorithms like the Simplex method, which systematically explores the vertices of the feasible region to find the optimal solution.
6.2 The Simplex Method
The Simplex method is an iterative algorithm that starts at a vertex of the feasible region and moves along the edges of the polytope to adjacent vertices, seeking to improve the objective function value at each step. The algorithm continues this process until it reaches a vertex where no adjacent vertices offer a better objective function value, indicating that the optimal solution has been found. Invented by George Dantzig in 1947, the Simplex method is widely considered one of the top 10 algorithms of the 20th century. It transforms the continuous optimization problem into a graph traversal problem.
The Simplex method can be summarized in the following steps:
Initialization: The algorithm starts at an arbitrary vertex of the polytope. If the origin (\(x=0\)) is feasible, it can be used as the starting point; otherwise, an auxiliary sub-problem is solved to find an initial feasible vertex.
Optimality Check: From the current vertex, the algorithm looks along the edges connecting to neighboring vertices. Then it calculates the “reduced cost” (gradient) along each edge. If an edge leads “downhill” (improves the objective), the algorithm moves along that edge to the next vertex. If all edges leading out from the current vertex go “uphill” (worsen the objective), then—due to convexity—we know we are at the global minimum. We stop.
Pivoting: If an adjacent vertex offers a better objective function value, move to that vertex by performing a pivot operation.
Iteration: Repeat the optimality check and pivoting steps until the optimal solution is found.
Let’s dive a bit deeper into the pivot operation. In a system with \(n\) variables and \(m\) constraints, a vertex is obtained by setting \(n-m\) variables to zero (these are called the non-basic variables) and solving for the remaining \(m\) variables (the basic variables). The pivot operation involves selecting one non-basic variable to enter the basis (i.e., become a basic variable) and one basic variable to leave the basis (i.e., become a non-basic variable). This is done in such a way that the new vertex remains feasible and improves the objective function value.
6.2.1 A Step-by-Step Example
Consider the following example problem: Imagine a factory that produces two products, A and B. We denote by \(x_1\) the number of units of product A and by \(x_2\) the number of units of product B. The profit from each unit of product A is 3€, and from each unit of product B is 2€. There are in total 4 machine hours to produce, and it takes 1 hour to produce one unit of product A and 1 hour to produce one unit of product B. The factory has a limited amount of resources: product A requires 2 units of raw material per unit produced, and product B requires 1 unit of raw material per unit produced. The factory has a total of 6 units of raw material available.
We can formulate this problem as a linear program as follows:
As a first step, we move the objective function to the left side to express it in standard form: \[
\begin{aligned}
Z -3x_1 - 2x_2 + 0s_1 + 0s_2 & = 0 \quad \text{(Objective function)}
\end{aligned}
\]
The Simplex method requires all constraints to be in the form of equalities. We can convert the inequalities into equalities by introducing slack variables. Let \(s_1\) be the slack variable for the machine hours constraint and \(s_2\) be the slack variable for the raw material constraint. The constraints become: \[
\begin{aligned}
1x_1 + 1x_2 + s_1 & = 4 \quad \text{(Machine hours constraint)}\\
2x_1 + 1x_2 + s_2 & = 6 \quad \text{(Raw material constraint)}\\
s_1, s_2 & \ge 0
\end{aligned}
\]
So we have \(n=4\) variables (\(x_1, x_2, s_1, s_2\)) and \(m=2\) constraints. This means that we need to set \(n-m=2\) variables to zero to find a vertex.
Step 2: Initial Basic Feasible Solution
We can start by setting the decision variables \(x_1\) and \(x_2\) to zero, which gives us the initial basic feasible solution (solving the system of equations for \(s_1\) and \(s_2\)): \[
\begin{aligned}
x_1 & = 0 \\
x_2 & = 0 \\
s_1 & = 4 \\
s_2 & = 6
\end{aligned}
\]
We now represent the current state as a , which is a compact way to keep track of the coefficients of the variables in the constraints and the objective function:
Basis
\(x_1\)
\(x_2\)
\(s_1\)
\(s_2\)
Solution
\(s_1\)
1
1
1
0
4
\(s_2\)
2
1
0
1
6
\(Z\)
-3
-2
0
0
0
Step 3: Optimality Check and Pivoting
Looking at the objective function row (the last row), we see that both \(x_1\) and \(x_2\) have negative coefficients (-3 and -2, respectively). This indicates that increasing either variable will improve the objective function value. We choose to increase \(x_1\) since it has the most negative coefficient. This is the variable that will enter the basis.
Next, we need to determine which variable will leave the basis. This will be the variable that, when decreased, causes the solution to become infeasible. For this, we perform the minimum ratio test by dividing the solution values by the corresponding coefficients of \(x_1\) in each constraint row: - For \(s_1\): \(4 / 1 = 4\) - For \(s_2\): \(6 / 2 = 3\) The minimum ratio is 3, so \(s_2\) will leave the basis. We now perform a pivot operation to update the tableau. The main idea of the pivot operation is to make the coefficient of the entering variable (\(x_1\)) equal to 1 in the leaving variable’s row (\(s_2\)) and 0 in all other rows by performing row additions and subtractions.
We divide the entire row of \(s_2\) by 2 to make the coefficient of \(x_1\) equal to 1: \([1, 0.5, 0, 0.5, 3]\).
Next, we update the row of \(s_1\) by subtracting 1 times the new row of \(s_2\) from it: \([0, 0.5, 1, -0.5, 1]\).
Finally, we update the objective function row by adding 3 times the new row of \(s_2\) to it: \([0, -0.5, 0, 1.5, 9]\).
The updated tableau looks like this:
Basis
\(x_1\)
\(x_2\)
\(s_1\)
\(s_2\)
Solution
\(s_1\)
0
0.5
1
-0.5
1
\(x_1\)
1
0.5
0
0.5
3
\(Z\)
0
-0.5
0
1.5
9
Now we have moved along the \(x_1\) axis to a new vertex, which is (3,0) with an objective function value of 9. To find this new vertex, we need to solve the system of equations again using the new basis variables \(x_1\) and \(s_1\).
Continuing with the optimality check, we see that \(x_2\) still has a negative coefficient (-0.5) in the objective function row. This means we can still improve the objective function by increasing \(x_2\). We repeat the pivoting process. The minimum ratio test gives us: - For \(s_1\): \(1 / 0.5 = 2\) - For \(x_1\): \(3 / 0.5 = 6\) The minimum ratio is 2, so \(s_1\) will leave the basis. Performing the pivot operation, we get the new tableau:
Basis
\(x_1\)
\(x_2\)
\(s_1\)
\(s_2\)
Solution
\(x_2\)
0
1
2
-1
2
\(x_1\)
1
0
-1
1
2
\(Z\)
0
0
1
1
10
At this point, all coefficients in the objective function row are non-negative, indicating that we have reached the optimal solution. The optimal solution is \(x_1 = 2\) and \(x_2 = 2\), with a maximum profit of 10€.
6.2.2 Solving LPs using the Simplex Method in Python
The Simplex method is efficient in practice and can solve large-scale linear programs quickly, although its worst-case time complexity is indeed exponential. However, it remains one of the most widely used algorithms for solving LPs due to its practical performance and simplicity. Let’s see how to solve the example using the Simplex method in Python implemented in the scipy.optimize.linprog function.
import numpy as npfrom scipy.optimize import linprog# Coefficients for the objective function (to be maximized)c = [-3, -2] # Negated for maximization# Coefficients for the inequality constraintsA = [[1, 1], [2, 1]]# Right-hand side of the inequality constraintsb = [4, 6]# Solve the linear program using the Simplex methodres = linprog(c, A_ub=A, b_ub=b, method='highs')# Print the resultsprint("Optimal value (max revenue):", -res.fun) # Negate to get the original maximized valueprint("Optimal solution (Product A, Product B):", res.x)
Optimal value (max revenue): 10.0
Optimal solution (Product A, Product B): [2. 2.]
6.3 Integer Linear Programming
In the previous example, the optimal solution happened to be integer-valued. However, this is not always the case. Linear programming allows decision variables to take on any real values, which can lead to solutions that are not practical in certain contexts. For example, you cannot produce a fraction of a product or hire half an employee. This leads us to the field of Integer Linear Programming (ILP), where some or all decision variables are constrained to be integers.
Mathematicall, we “just” need to add integrality constraints to the LP formulation Equation 6.1:
\[
\begin{aligned}
\min_x\text{ } & c^T x \\
\text{subject to } & Ax\le b \\
& x\ge 0 \\
& x_i \in \mathbb{Z} \quad \text{for some or all } i
\end{aligned}
\]
This single constraint changes . It transforms the problem from a convex, polynomial-time task (P) into a non-convex, NP-Hard challenge. The nice, smooth edges of the polytope are replaced by a discrete grid of valid points, destroying the gradient information that the Simplex and other methods for LP rely on.
6.3.1 Definition and Variants
Depending on which variables are constrained to be integers, we have different variants of ILP:
Pure Integer Linear Programming: All decision variables are required to be integers. Example: the Knapsack Problem, where items can either be included or excluded from the knapsack (see Chapter 1).
Mixed-Integer Linear Programming (MILP): Some decision variables are constrained to be integers, while others can take on continuous values. Example: a power plant optimization problem, where the number of power units to activate is an integer, but the amount of power generated can be continuous.
Binary Integer Programming: A special case of ILP where decision variables are restricted to binary values (0 or 1). Example: facility location problems, where a facility is either built (1) or not built (0). Additionally, this is the language of logic and choice. We can model logical propositions using linear constraints, for instance:
If we select option A, we cannot select option B: \(x_A + x_B \leq 1\) where \(x_A, x_B \in \{0, 1\}\).
If we build a warehouse, we can ship up to \(C\) units from it: \(y \leq C x\) where \(y\) is the number of units shipped and \(x \in \{0, 1\}\) indicates if the warehouse is built.
In principle, one could round the solution of a linear program to the nearest integers to solve an ILP. However, this naive approach often leads to suboptimal or even infeasible solutions. For instance, consider the constraint \(3x\le 10\). The LP solution might yield \(x=3.33\), which rounds to \(x=3\) (which is feasible), however if the constraint would be \(3x\ge 10\) instead, the rounded solution would be infeasible.
We could try to enumerate all possible integer solutions and select the best one, but this approach quickly becomes computationally infeasible as the number of variables increases, due to the combinatorial explosion of possible solutions. Because we cannot rely on gradients or rounding, we need a systematic way to search the discrete space without checking every single possibility. This leads us to the Branch and Bound algorithm.
6.3.2 The Branch and Bound Algorithm
Since we cannot round LP solutions, and we cannot brute-force check every integer combination (which would take billions of years for even moderate problems), we need a middle ground. We need a smart way to enumerate solutions that allows us to discard vast sections of the search space that are guaranteed to be suboptimal. The Branch and Bound (B&B) algorithm is a systematic method for solving ILPs by exploring the solution space in a tree-like structure. The main idea is to divide the problem into smaller subproblems (branching) and use bounds on the objective function to eliminate subproblems that cannot yield better solutions than the best one found so far (bounding).
The main idea for doing so is to work with the LP relaxation of the ILP. The LP relaxation is obtained by removing the integrality constraints from the ILP, allowing the decision variables to take on continuous values. This relaxed problem can be solved efficiently using LP methods like the Simplex method and provides a lower bound (for minimization problems) or an upper bound (for maximization problems) on the optimal ILP solution.
Imagine we are solving the LP relaxation of an ILP as a maximization problem, and we find \(x_1=3.7\). We know that the optimal integer solution must be an integer \(\le 3\) or \(\ge 4\). Thus, we can create two new subproblems (branches):
Branch Left: Add the constraint \(x_1 \le 3\) to the original ILP.
Branch Right: Add the constraint \(x_1 \ge 4\) to the original ILP.
We have now effectively removed the fractional region \((3,4)\) from the search space. We then solve the LP relaxation for each of these subproblems. If the solution to a subproblem is integer-valued, we compare it to the best-known integer solution and update our best solution if necessary. If the solution is fractional, we repeat the branching process on that subproblem. However, doing this blindly would lead to an explosion in the number of subproblems. This is where bounding comes into play. We use the objective value of the LP relaxation as a bound, as follows:
Step 1: Bounding: Let \(Z^*\) be the best-known integer solution found so far (called the incumbent solution). If the objective value of the LP relaxation of a subproblem is less than or equal to \(Z^*\) (for maximization problems), we can discard that subproblem from further consideration, as it cannot yield a better integer solution.
Step 2: Pruning: We stop exploring a branch when:
The LP relaxation has no solution (infeasible).
The LP relaxation is integer-valued (we have found a candidate solution). We compare it to \(Z^*\) and update if it’s better. We stop because adding more constraints to this node will only make the objective worse.
The objective value of the LP relaxation is worse than \(Z^*\) (we can prune this branch).
Step 3: Branching: If the LP relaxation yields a fractional solution, we select a variable with a fractional value and create two new subproblems by adding constraints to force that variable to take on integer values (as described above).
This process continues until all branches have been either explored or pruned. The best integer solution found during this process is the optimal solution to the original ILP.
6.3.3 Coding Example
We will now illustrate the Branch and Bound algorithm using Python’s PuLP library, which provides a convenient interface for defining and solving ILPs. Let’s consider a simple ILP example (Knapsack Problem): We have a knapsack which can hold 10 kg. There are four items available: A (value 15€, weight 4 kg), B (value 10€, weight 3 kg), C (value 9€, weight 2 kg) and D (value 5€, weight 1 kg). We want to maximize the total value of items in the knapsack without exceeding its weight capacity.
import pulpfrom pulp import PULP_CBC_CMD# 1. Initialize the Model (Maximization)model = pulp.LpProblem("The_Knapsack_Problem", pulp.LpMaximize)# 2. Define Decision Variables# cat='Binary' tells the solver these are 0 or 1 integers.# This triggers the Branch and Bound algorithm under the hood.items = ['A', 'B', 'C', 'D']weights = {'A': 4, 'B': 3, 'C': 2, 'D': 1}values = {'A': 15, 'B': 10, 'C': 9, 'D': 5}x = pulp.LpVariable.dicts("Select", items, cat='Binary')# 3. Define Objective Functionmodel += pulp.lpSum([values[i] * x[i] for i in items]), "Total Value"# 4. Define Constraint (Capacity <= 10)model += pulp.lpSum([weights[i] * x[i] for i in items]) <=10, "Capacity"# 5. Solvemodel.solve(PULP_CBC_CMD(msg=False))# 6. Output Resultsprint(f"Status: {pulp.LpStatus[model.status]}")print(f"Max Value: ${pulp.value(model.objective)}")print("Selected Items:")for i in items:if pulp.value(x[i]) ==1:print(f" - Item {i} (Value: {values[i]}, Weight: {weights[i]})")
Status: Optimal
Max Value: $39.0
Selected Items:
- Item A (Value: 15, Weight: 4)
- Item B (Value: 10, Weight: 3)
- Item C (Value: 9, Weight: 2)
- Item D (Value: 5, Weight: 1)
6.4 The Limits of Exact Methods
We have now seen two powerful algorithms: Simplex for Linear Programming and Branch & Bound for Integer Programming. In the examples provided (allocating server resources or packing a knapsack) these methods returned the provably optimal solution in milliseconds. It is tempting thus to conclude that optimization is a solved problem: If we can model it, we can solve it. Alas, this is an illusion: as we scale up the problem size, exact methods quickly become impractical. The Simplex method, while efficient for many practical problems, can exhibit exponential time complexity in the worst case. More critically, Integer Linear Programming is NP-Hard, meaning that no known polynomial-time algorithm exists to solve all instances of ILP optimally. As the number of variables and constraints increases, the time required to solve ILPs can grow exponentially, making them infeasible for large-scale problems.
In practice, the performance of B&B depends on the quality of the bounds: the tighter the bounds, the more branches can be pruned, and the faster the algorithm runs. However, for many real-world problems, especially those with a large number of variables or complex constraints, even state-of-the-art ILP solvers can struggle to find optimal solutions within a reasonable timeframe. In the worst case, Branch and Bound degenerates into a brute force search.
In modern AI applications, such as optimizing the weights of a neural network (millions of variables) or routing a fleet of 10,000 delivery drones, exact methods simply cease to function. When exact methods fail, we must change our goal. We stop asking for the best solution and start asking for a good solution found fast.
This is the domain of heuristic and metaheuristic algorithms, which we will explore in the next chapter. Heuristics are meant to solve the following challenges of exact methods:
Latency: A ride-sharing app like Uber needs to match a driver to a rider in milliseconds. It cannot run a 20-minute Branch and Bound job every time a user opens the app. It accepts a slightly suboptimal match (maybe the driver is 1 minute further away than the perfect choice) to ensure the system is responsive.
Data Uncertainty: Why spend 100 hours computing the “perfect” supply chain schedule if the demand forecast is only 90% accurate? The noise in the data renders the precision of the exact solver meaningless.
Good enough Solutions: In many cases, a solution that is 95% as good as the optimal one is perfectly acceptable, especially if it can be found in a fraction of the time. For example, in logistics, a delivery route that is slightly longer than the optimal route may still meet customer expectations while significantly reducing computation time.
In the next chapter, we will abandon the guarantee of optimality. We will explore metaheuristics: algorithms like Genetic Algorithms, Simulated Annealing, and Tabu Search. These methods do not use gradients, and they do not prove optimality. Instead, they use metaphors from nature (evolution, thermodynamics) to navigate the rugged, non-convex landscapes where Simplex and Branch & Bound do not perform well.
6.5 Summary
In this chapter, we explored exact methods for solving optimization problems, focusing on Linear Programming (LP) and Integer Linear Programming (ILP). We discussed the Simplex method, which efficiently solves LPs by navigating the vertices of the feasible region, and the Branch and Bound algorithm, which systematically explores the solution space of ILPs while pruning suboptimal branches. While these methods are powerful and can provide optimal solutions for many problems, they face significant challenges as problem size increases, particularly for ILPs, which are NP-Hard. This limitation motivates the need for heuristic and metaheuristic approaches in scenarios where quick, good-enough solutions are more practical than guaranteed optimality.
6.6 Exercises
Formulating an LP: A factory produces two products, X and Y. Each unit of product X requires 2 hours of labor and 3 units of raw material, while each unit of product Y requires 4 hours of labor and 2 units of raw material. The factory has a total of 100 hours of labor and 120 units of raw material available. The profit from each unit of product X is 30€, and from each unit of product Y is 20€. Formulate this as a linear programming problem to maximize profit.
Solving an LP using Simplex: Using the formulation from Exercise 1, implement the Simplex method in Python using scipy.optimize.linprog to find the optimal production quantities of products X and Y.
Branch and Bound for ILP: Consider the following integer linear programming problem: Maximize \(Z = 10x_1 + 15x_2\) subject to the constraints \(2x_1 + 3x_2 \leq 12\), \(x_1 + x_2 \leq 5\), and \(x_1, x_2 \geq 0\), with the additional constraint that \(x_1\) and \(x_2\) must be integers. Use the Branch and Bound method to solve this ILP manually, showing each step of the process.
Implementing Branch and Bound: Using the ILP from Exercise 3, implement the Branch and Bound algorithm in Python using the PuLP library to find the optimal integer solution.