# Introduction to oblivious randomized rounding

Greedy & Lagrangian-relaxation algorithms can be understood formally as randomized-rounding algorithms.

Before continuing, note that this discussion assumes the following background:

You can review those topics at the links above.

# Summary

This blog explores a technical connection between (i) randomized rounding and (ii) greedy and Lagrangian-relaxation algorithms for approximately solving packing and covering problems. Broadly, the motivation for exploring the connection is that it might help better understand both domains. Technically, the basis of the connection is the following observation:

In some cases, the method of conditional probabilities yields a randomized-rounding algorithm that can skip the first step (solving the linear program).

I’ve been exploring this connection since 1995 [4]. My goal in this blog is to present a sequence of examples, using the examples to articulate the underlying ideas, to the extent I’ve been able to identify them, one by one.

It turns out that many existing greedy/Lagrangian-relaxation algorithms can indeed be understood this way. The rounding schemes, which I call sample-and-increment schemes, are atypical. Some of the rounding schemes give better approximations than do standard schemes. Some of the resulting approximation algorithms are better than known algorithms. I also identify some general probabilistic machinery for the analyses. the “probabilistic lens” helps to organize the technical ideas underlying the design and analyses of these algorithms relatively cleanly.

There are negative examples on both sides: natural greedy algorithms that apparently have no randomized-rounding analogues, as well as natural randomized-rounding schemes that apparently have no greedy analogues.

To distinguish this particular type of randomized rounding from conventional randomized rounding, I sometimes call it oblivious randomized rounding.

Organization of this blog.

The blog is organized as an ordered collection of entries. (See the “Pages”, “previous”, and “next” links at the top.) The initial entries each explore some basic concept. Later entries focus on applications. Final entries, in the appendices, contain probabilistic bounds. One of the appendices reviews bounds that are well known; the other contains new (or less well-known) bounds.

For elaboration of the general approach, continue reading “Further discussion,” below. To begin exploring the topic in detail, go directly to the lead example.

This is a work in progress. The topic is open-ended and many interesting questions remain. Comments, questions, and discussion are always welcome.

# Further discussion

## The probabilistic method and randomized rounding

Alon, Spencer, and Erdős (1992) describe the probabilistic method as follows:

In order to prove the existence of a combinatorial structure with certain properties, we construct an appropriate probability space and show that a randomly chosen element in the space has the desired properties with positive probability. [1]

Since about 1950, the method has come into wide use. Its helpfulness in understanding combinatorial structures, even in domains that, a-priori, have nothing to do with probability, has earned it the nickname “the probabilistic lens.” As Alon et al. note, in principle, any use of the probabilistic method can be replaced by direct arguments, but,

in practice, the probability is essential. It would be hopeless to replace the applications of many of the tools [by direct arguments]. [1, page 2]

In Computer Science, randomized rounding uses the probabilistic method to design approximation algorithms [3, 2]. A randomized-rounding algorithm has two steps:

1. Solve the relaxation of an integer linear program.

2. Randomly round the solution to obtain an approximate integer solution.

To prove a performance guarantee for such an algorithm, one uses the probabilistic method to prove that the integer solution desired in the second step exists. This proof may be non-constructive, but, using Raghavan’s method of conditional probabilities, it

can be converted, in a very precise sense, into a deterministic approximation algorithm. [2] (1988)

Since the introduction of randomized rounding in the 1980’s, it has become a fundamental tool in the analysis of integrality gaps and in the design and analysis of provably good approximation algorithms for combinatorial-optimization problems.

# Greedy and Lagrangian-relaxation algorithms

Greedy approximation algorithms (such as the greedy set-cover algorithm) are as old and well-studied as the field of approximation algorithms for combinatorial optimization itself. Greedy algorithms are often intuitive, relatively fast, and simple to implement. They can give provably good approximate solutions, and their simplicity makes them useful in many contexts.

Lagrangian relaxation is a sophisticated generalization of the greedy approach: to approximately solve a given “hard” optimization problem, one solves a sequence of carefully chosen “easier” problems, then aggregates the solutions of the easier problems, typically by taking a weighted combination of them.

The easy problems are obtained from the original problem by relaxing some of the more difficult constraints, replacing them by a single surrogate constraint.1 Typically, the surrogate constraint is a weighted combination of the constraints it replaces, and the weights come from the gradient of a smooth potential function that somehow proxies the replaced constraints. To prove the algorithm’s quality of approximation, one proves some invariant involving this potential function.

Historically, Lagrangian-relaxation algorithms have been used (i) for problem formulations involving huge numbers of variables (as a practical alternative to ellipsoid methods — similarly to column generation; the Held-Karp lower bound for the traveling salesman problem is one famous example), and (ii) for problems that are composed of many subproblems that are loosely coupled by side constraints (multicommodity flow is a canonical example). The last few decades have seen improved bounds on convergence rates — that is, bounds on the number of subproblem solutions needed to approximate the original problem to the desired extent. Similarly to greedy algorithms, the form that these algorithms take makes them useful in many contexts (for example, distributed, parallel, and online settings).

The basic observation — that a “hard” problem can be approximately solved by a weighted combination of “easy” problems — can also be used to prove some nice combinatorial theorems. This is roughly because, if the easy-problem solutions have some nice combinatorial property, the property may carry over to their weighted combination, and thus to some approximate solution to the hard problem.

The basic observation is also directly related to the learning-theory concept of boosting (the problem of strongly learning a concept reduces to the problem of weakly learning it). Relatedly, some Lagrangian-relaxation algorithms can be viewed as learning-theory algorithms for following expert advice (such as multiplicative-weights-update algorithms).

Despite their many interesting connections and large literature, Lagrangian-relaxation algorithms are not currently taught in typical Computer-Science curricula, even at the graduate level. Perhaps the complexity of their analyses is a barrier — indeed, it has been remarked with hindsight that apparently “the only way to understand these algorithms is to invent them yourself!”

# Greedy & Lagrangian relaxation versus randomized rounding

The two classes of algorithms — randomized rounding vs. greedy/Lagrangian — are widely considered to be distinct.

Most obviously, the first step of randomized rounding algorithms is to solve a linear program by other means (such as simplex, interior point, or Ellipsoid methods). Greedy and Lagrangian-relaxation algorithms do not have this step. They compute approximate solutions to the linear program directly.

This difference makes randomized rounding slower. As Raghavan observes,

the time taken to solve the linear program relaxations of the integer programs… dominates the net running time theoretically (and, most likely in practice as well). [2]

Greedy and Lagrangian-relaxation algorithms are not only faster, they can also be applied in some contexts where randomized-rounding algorithms cannot. In short, greedy/Lagrangian-relaxation algorithms have some practical advantages. On the other hand, given access to an LP solver, randomized-rounding algorithms can be relatively easy to implement.

The conceptual foundations of the two classes of algorithms are also a bit different:

• The conceptual foundation of greedy and Lagrangian-relaxation algorithms, is, loosely speaking, discrete and continuous combinatorial optimization. Building blocks include LP duality, Lagrangian/surrogate duality, potential functions (gradients, step sizes, trust regions), invariants, etc.

• For randomized rounding, the conceptual foundation is the probabilistic method. Building blocks include the basic concepts of probability (random variables, conditional expectations), direct calculations of expectations and probabilities, Markov bounds, Chernoff bounds, Wald’s equation, linearity of expectation, naive union bounds, the method of conditional probabilities, and so on. These tools are relatively precise: the concepts have precise technical definitions and the various bounds have precise formulations as theorems.

It seems to me that the conceptual foundation of randomized rounding makes it somewhat easier to learn, to teach, and to apply to new problems, but this is probably a matter of taste.

# Greedy & Lagrangian relaxation as randomized rounding

Historically many problems in the literature have both types of approximation algorithms: that is, a good approximate solution can be obtained either by randomized rounding, or using a greedy or Lagrangian-relaxation algorithm. As mentioned above, our goal here is to explore a strong formal connection that explains this phenomenon:

Some greedy and Lagrangian-relaxation algorithms can be understood formally as randomized-rounding algorithms

and the basis of this connection is the following technical observation:

In some cases, the method of conditional probabilities yields a randomized-rounding algorithm that can skip the first step (solving the linear program).

The broad motivation is that understanding this connection could help better understand each domain. For example:

• To design a greedy or Lagrangian-relaxation algorithm for a problem, one can start with a simple randomized-rounding scheme, give an existence proof using standard probabilistic tools, and then apply the method of conditional probabilities to systematically derive the algorithm (along with the proof of its performance guarantee, including the potential-function or duality invariant) from a standard analysis of a simple randomized-rounding scheme.

In short, one can replace direct, low-level arguments by applications of standard probabilistic tools. Broadly, one can use the probabilistic lens to better understand the space of provably good greedy and Lagrangian-relaxation algorithms.

• Conversely, one can examine existing greedy and Lagrangian-relaxation algorithms, and ask “what randomized-rounding scheme and proof would yield this algorithm?” Reverse-engineering existing algorithms allows their underlying ideas to be translated back into the probabilistic setting, yielding analogous ideas that may be new there, and helping to formalize, generalize, and communicate the underlying ideas.

Blogging in LaTeX

I write these notes in LaTeX, then translate them to HTML using plasTeX (modified to use MathJax to display math). If this interest you, see this blog entry.

## Footnotes

1. The specific kind of Lagrangian relaxation that we focus on here is based on surrogate duality.

1. I’d love to hear more on MathJax-enabled plasTeX. Thanks for your offer to share

- Juergen Stockburger Post author
1. Juergen — See this post for more information and a link to a zip archive of the files.

- Neal Post author
2. Hi Neal,

Your SODA’95 paper is a superb paper. I’ve always felt that paper deserves a journal version because it is jam-packed with ideas, but it is not very easy to read. This is not a criticism of your writing skill — I usually I think your writing and talks are extremely clear.

Anyways, I’m excited to have discovered this blog and I hope that it can serve as a substitute for the journal version of your paper.

- Anonymous Post author
1. Well, thanks! The ideas have developed a fair amount since the paper. This blog doesn’t contain much yet, but this winter quarter I should have time to add more.

- Neal Post author
3. Hi Neal,
Very interesting work! Happy I got to read your paper. I have a quick question though about your generalized algorithm when applied to approximating the optimal solution to a game with sparse strategies. It is said that at each iteration the best response pure strategy is added to S. How ever the algorithm shows x which is supposed to be a mixed strategy. Does the $\epsilon$ approximation still hold when pure strategies are selected at each iteration instead of mixed strategies? What happens when the oracle returns a pure strategy that is within $\alpha$ from the best response pure strategy?
Thanks.

- NL
1. If I understand your question, the best response in each iteration should be a pure strategy, so in each iteration a pure strategy should be selected. If an $\alpha$-approximate strategy is selected as the response in each iteration (instead of the true best response), then the algorithm should generate finally a $(1+\epsilon)\alpha$-approximate solution. Not sure if that answers your questions.

- Neal Post author
1. Thanks for the quick answer. Actually it was mostly about selecting the best pure strategy vs the best mixed strategy at each iteration. I was wondering if that makes any difference… But then when I think about it, if x* is a best response mixed strategy for the opponent’s strategy y, then each pure strategy i in the support of x* is a pure best response with v(x*,y)=v(i,y). So the $\epsilon$-approximation holds for both.

- NL