Genetic Algorithmics remains a niche area within Artificial Intelligence largely because the field has not identified a computation of some kind that genetic algorithms perform efficiently. The skepticism with which GAs are regarded is understandable. If, after decades of research and thousands of published articles, we cannot say what genetic algorithms do efficiently, then it is hardly surprising that the field would be held at arm’s length by the wider AI community.
His Highness is Naked. Time for some Real Clothes
Interestingly, the thing genetic algorithms do efficiently—concurrent multivariate effect evaluation (implicit concurrency for short)—is not difficult to describe or demonstrate. It’s been under our noses ever since schemata and schema partitions were defined as concepts, but for various reasons, which I won’t go into here, it’s escaped identification and dissemination.
Implicit concurrency (not to be confused with implicit parallelism) is a marvelous phenomenon that should be more widely appreciated than it currently is. This blog post provides a quick introduction. For a more in depth treatment check out my FOGA 2013 paper Explaining Optimization in Genetic Algorithms with Uniform Crossover (slides here). And if you really want to have at it, my dissertation is here.
Schema Partitions and Effects
Let’s begin with a quick primer on schemata and schema partitions. Let be a search space consisting of binary strings of length . Let be some set of indices between and , i.e. . Then represents a partition of into subsets called schemata (singular schema) as in the following example: suppose , and , then partitions into four schemata:
where the symbol stands for ‘wildcard’. Partitions of this type are called schema partitions. As we’ve already seen, schemata can be expressed using templates, for example, . The same goes for schema partitions. For example denotes the schema partition represented by the index set . Here the symbol stands for ‘defined bit’. The order of a schema partition is simply the cardinality of the index set that defines the partition (in our running example, it is ). Clearly, schema partitions of lower order are coarser than schema partitions of higher order.
Let us define the effect of a schema partition to be the variance of the average fitness values of the constituent schemata under sampling from the uniform distribution over each schema. So for example, the effect of the schema partition is
where the operator gives the average fitness of a schema under sampling from the uniform distribution.
You’re now well poised to understand implicit concurrency. Before we get to a description, a brief detour to provide some motivation: We’re going to do a thought experiment in which we examine how effects change with the coarseness of schema partitions. Let denote the schema partition represented by some index set . Consider a search space with , and let . Then is the finest possible partition of ; one where each schema in the partition has just one point. Consider what happens to the effect of as we start removing elements from . It should be relatively easy to see that the effect of decreases monotonically. Why? Because we’re averaging over points that used to be in separate partitions. Don’t proceed further until you convince yourself that coarsening a partition tends to decrease its effect.
Finally, observe that the number of schema partitions of order is . So for , the number of schema partitions of order 2,3,4 and 5 are on the order of , and respectively. The take away from our thought experiment is this: while a search space may have vast numbers of coarse schema partitions, most of them will have negligible effects (due to averaging). In other words, while coarse schema partitions are numerous, ones with non-negligible effects are rare.
So what exactly does a genetic algorithm do efficiently? Using experiments and symmetry arguments I’ve demonstrated that a genetic algorithm with uniform crossover can concurrently sift through vast numbers of coarse schema partitions and identify partitions with non-negligible effects. In other words, a genetic algorithm with uniform crossover can implicitly perform multitudes of effect/no-effect multifactor analyses and can efficiently identify interacting loci with non-negligible effects.
Let’s Play a Game
It’s actually quite easy (not to mention, cool and fun) to visualize a genetic algorithm as it identifies such loci. Let’s play a game. Consider a stochastic function that takes bitstrings of length 200 as input and returns an output that depends on the values of the bits of at just four indices. These four indices are fixed; they can be any one of the combinations of four indices between between 1 and 200. Given some bitstring, if the parity of the bits at these indices is 1 (i.e. if the sum of the four bits is odd) then the stochastic function returns a value drawn from the magenta distribution (see below). Otherwise, it returns a value drawn from the black distribution. The four indices are said to be pivotal. All other indices are said to be non-pivotal.
As per the discussion in the first part of this post, the set of pivotal indices is the dual of a schema partition of order 4. Of all the schema partitions of order 4 or less, only this partition has a non-zero effect. All other schema partitions of order 4 or less have no effect. (Verify for yourself that this is true) In other words, in the world of effect evaluation, parity is a Needle in a Haystack (NIAH) problem. The kind of problem that seems closed to all approaches save brute force.
Now for the rules of the game: Say I give you query access to the stochastic function just described, but I do not tell you what four indices are pivotal. You are free to query the function with any bitstring 200 bits long as many times as you want. Your job is to recover the pivotal indices I picked, i.e. to identify the only schema partition of order 4 or less with a non-negligible effect.
Take a moment to think about how you would do it? What is the time and query complexity of your method?
What Not Breaking a Sweat Looks Like
The animation below shows what happens when a genetic algorithm with uniform crossover is applied to the stochastic function just described. Each dot displays the proportion of 1’s in the population at a locus. Note that it’s trivial to just “read off” the proportion of 0s at each locus. The four pivotal loci are marked by red dots. Of course, the genetic algorithm does not “know” that these loci are special. It only has “query access” to the stochastic function.
As the animation shows, after 500 generations you can simply “read off” the four loci I picked by examining the proportion of 1s to 0s in the population at each locus. Congratulations! You’ve just seen implicit concurrency in action. The chromosome size in this case is 200, so there are possible combinations of four loci. From all of these possibilities, the genetic algorithm managed to identify the correct one within five hundred generations.
Let’s put implicit concurrency through it’s paces. I’m going to tack on an additional 800 non-pivotal loci while leaving the indices of the four pivotal loci unchanged. Check out what happens:
[Note: more dots in the animation below does not mean a bigger population or more queries. More dots just means more loci under consideration. The population size and total number of queries remain the same]
So despite a quintupling in the number of bits, entailing an increase in the number of coarse schema partitions of order 4 to , the genetic algorithm solves the problem with no increase in the number of queries. Not bad. (Of course we’re talking about a single run of a stochastic process. And yes, it’s a representative run. See chapter 3 of my dissertation to get a sense for the shape of the underlying process)
Let’s take it up another notch, and increase the length of the bitstrings to 10,000. So now we’re looking at combinations of four loci. That’s on the order of a million trillion combinations. This time round, let’s also change the locations of the 4 pivotal loci. Will the genetic algorithm find them in 500 generations or less?
How’s that for not breaking a sweat? Don’t be deceived by the ease with which the genetic algorithm finds the answer. This is not an easy problem.
Intrigued? [If you've read this far, you should be] To run the experiments yourself download speedyGApy, and run it with
python speedyGA.py --fitnessFunction seap --bitstringLength <go-wild!>
noting that the increase in “wall clock” time between generations as you increase bitstringLength is due to an increase in the running time of everything (including query evaluation). The number of queries (i.e. number of fitness evaluations), however, stays the same. To learn how a genetic algorithm parlays implicit concurrency into a general-purpose global search heuristic called hyperclimbing, read my FOGA 2013 paper Explaining Optimization in Genetic Algorithms with Uniform Crossover (slides here).
[A request: Please help me get the word out about Implicit Concurrency. Tweet/retweet, blog/reblog, like, email, and mention on mailing lists. As always, if you have questions/comments, holler]