Trying Out Different Angles of Approach
I’m trying out different angles to take on my doctoral work when submitting it to AI and ML journals. Here’s one that shows promise:
| The Hyperclimbing Hypothesis: An explanation for the adaptive capacity of genetic algorithms with uniform crossover
It is generally accepted that the adaptive capacity of genetic algorithms with uniform crossover (UGAs) cannot be explained within the rubric of the building block hypothesis. Nevertheless, to date, an alternate explanation has not been proposed. We describe a promising non-local stochastic search heuristic, called hyperclimbing, for optimizing over attribute product spaces (e.g., the set of all binary strings of some fixed length) with rugged objective functions. Hyperclimbing works by decimating the search space, i.e. by progressively limiting sampling to a series of nested subsets. At each step, this heuristic sifts through potentially vast numbers of coarse partitions of the subset it “inhabits”, and identifies ones that partition this set into subsets whose expected fitness values are significantly variegated. The hyperclimbing hypothesis holds that UGAs work by implementing hyperclimbing extraordinarily efficiently. The evidence for this hypothesis comes from the use of a novel analytic technique involving the exploitation of “algorithmic symmetry”. In addition to this evidence, we present the results of a relatively demanding test of validity—one involving the application of a UGA with a simple tweak to large, random instances of the MAX-3SAT problem. |
The supporting evidence, as well as the test of validity mentioned above can be found in my dissertation.
The Generative Fixation Hypothesis and Wikipedia
References to the Generative Fixation Hypothesis were recently removed from the Wikipedia article on Genetic Algorithms.
You can join the discussion about their removal at the bottom of the talk page of the article (also see the discussion here). Do Wikipedia readers benefit from learning about the existence of the generative fixation hypothesis? Does mentioning the generative fixation hypothesis uphold or violate Wikipedia’s neutral point of view (NPOV) policy? If you feel strongly one way or the other about these issues, please add your voice to the discussion.
Working Abstract of my Next Paper
This one’s to catch the attention of folks in machine learning, and theoretical computer science.
We recently proposed a new explanation for the adaptive capacity of simple recombinative genetic algorithms. This explanation proceeds from evidence that the simple genetic algorithm with uniform crossover (UGA) can implement a stochastic non-local search heuristic called hyperclimbing extraordinarily efficiently. To showcase the core computational efficiency involved we take up the problem of learning a classifier for the attributes of an unknown parity function over n attributes, k of which are effective. We consider the case where the learning algorithm can make adaptive queries against a membership query oracle. Given a bitstring of length n, the oracle returns a boolean value indicating the parity of the bitstring under the unknown parity function. For certain small, but otherwise arbitrarily chosen, values of k, we “show” that a UGA that uses the oracle as its fitness function can learn a classifier that classifies any attribute of the parity function—as effective or non-effective—with arbitrary accuracy; the learning occurs in time that is linear in n, and with query complexity that is constant in n, even when the oracle is “moderately” noisy.
Related blog post: Red Dots, Blue Dots
Update (June 11, 2010): Had a back and forth with Vitaly Feldman about the “angle” I take in this paper. He suggested that it may not be the best. For small values of k, and particular regimes of the noise parameter, a GA based learning algorithm performs at par (in an asymptotic sense) with the best known algorithms for solving the learning parities problem. Vitaly cautioned, however, that the problem of learning parities with an adaptive memebership oracle is currently not of practical interest. And since the GA based learning algorithm does not improve upon an existing computational bound, he thinks that from a pure computational learning perspective, this result it is unlikely to be of interest.
So, back to the drawing board. Taking it from the top, the goal is to draw the attention of the machine learning community to the hyperclimbing heuristic, and the GA’s ability to implement this heuristic extraordinarily efficiently. One way to do this is to showcase the core computational efficiency at play in this implementation . And one way to do this is by showing how this core efficiency can be used to efficiently solve a problem that members of the computational learning community care about—presuming that such a problem currently exists.
More Debate, Please!
“… there are many issues in computing that inspire differing
opinions. We would be better off highlighting the differences
rather than pretending they do not exist”
–Moshe Y. Vardi
In an article entitled “More Debate, Please!”, in the January, 2010 issue of Communications of the ACM, Moshe Y. Vardi, editor-in-chief of Communications, writes:
`Vigorous debate, I believe, exposes all sides of an issue—their strengths and weaknesses. It helps us reach more knowledgeable conclusions. To quote Benjamin Franklin: “When Truth and Error have fair play, the former is always an overmatch for the latter.”’[1]
Vardi goes on to say that as he solicited ideas for the 2008 relaunch of Communications, he was frequently told to keep controversial topics front and center. “Let blood spill over the pages of Communications,” a member of a focus group jokingly urged [1].
In my attempts, to date, to publish my doctoral research—work that ultimately became chapters two and three of my dissertation—in evolutionary computation journals, I found the sentiments expressed by Vardi to be in short supply. The reviewers seemed much more invested in not rocking the boat than in fostering a climate in which prevailing assumptions can be challenged, and alternate ideas expressed transparently. They seemed, in short, to be inured to the poverty of the field’s foundations, and, for the most part, had little tolerance for someone with a bone to pick with the status quo. “Fall in line, or get rejected,” was the overarching message.
One way this unfortunate state of affairs may be addressed is through the institution of a forum like the Point/Counterpoint section introduced to Communications by Vardi in 2008—a forum where the various controversies that mark our field are periodically featured, and the different sides of each controversy given, as Benjamin Franklin put it, fair play. There are several contentious topics in EC. Tapped correctly, many of these topics can be powerful vehicles for learning—not just about the workings of evolutionary algorithms, but, also, about the workings of a vibrant intellectual community. Right now, instead of vigorous, open, ongoing debates in the EC literature, uneasy truces prevail. The community, by and large, steps around the the really big points of contention. Researchers talk past each other to niche audiences. And, if my experience is anything to go by, new lines of criticism, and new modes of analysis are hastily dismissed.
In the absence of a written record of ongoing controversies, new entrants to the field will not have access to the various positions involved. Pressed for time, and confronting the reality of “publish or perish”, most will fall back on the opinions and practices of their advisors. It doesn’t take much to see that in environments like this, opportunities for learning and advancement will frequently be missed.
A forum for open, ongoing, collegial debate would bring awareness, and transparency to the controversies in our field. It would also (one hopes) inculcate a more welcoming attitude toward alternate approaches, conclusions, and critiques.
Two topics for debate: (No points for guessing where my sympathies lie on these issues)
EC Theory and First Hitting Time: Is it problematic that so much contemporary theoretical work in EC focuses on “first hitting time”, i.e., the number of fitness evaluations required to find a global optimum? Do we look at first hitting time only because there currently isn’t a well developed, and generally accepted theoretical framework for examining adaptation (the generation of fitter points over time)? If so, isn’t the study of first hitting time a lot like the proverbial search for one’s house keys under the light of a street lamp just because it happens to be dark in one’s house?
The Building Block Hypothesis: Can the building block hypothesis be reconciled with the widely reported utility of uniform crossover? If yes, how? If no, can we—more to the point, should we—be comfortable with this knowledge given the considerable influence of the building block hypothesis on contemporary evolutionary computation research?
What other topics have been under-addressed in the evolutionary computation literature? Leave a comment with your opinion, or a link to your own blog post.
[1] Moshe Y. Vardi. More debate please!, In Communications of the ACM 53(1):5, 2010
Hyperclimbing and Decimation
In recent years, probabilistic inference algorithms such as survey propagation and belief propagation have been shown to be remarkably effective at tackling large, random instances of SAT, and other combinatorial optimization problems that lie beyond the reach of previous approaches. These inference algorithms belong to a class of techniques called decimation strategies. Decimation strategies monotonically reduce the size of a problem instance by iteratively fixing partial solutions (partial variable assignments in the case of SAT).
The generative fixation hypothesis essentially states that genetic algorithms work by efficiently implementing a decimation strategy called hyperclimbing.
Hyperclimbing, Genetic Algorithms, and Machine Learning
I’ve identified a promising stochastic search heuristic, called hyperclimbing, for large-scale optimization over huge attribute product spaces (e.g., the set of all binary strings of some length N, where N is very large) with rugged fitness functions. Hyperclimbing works by progressively limiting sampling to a series of nested subsets with increasing expected fitness. At any given step, this heuristic sifts through vast numbers of coarse partitions of the subset it “inhabits”, and identifies ones that partition this set into subsets whose expected fitness values are significantly variegated. Because hyperclimbing is sensitive, not to the local features of a search space, but to certain more global statistics, it is not susceptible to the kinds of issues that waylay local search heuristics.
The chief barrier to the wide and enthusiastic use of hyperclimbing is that it seems to scale very poorly with the number of attributes. If one heeds the seemingly high cost of applying hyperclimbing to large search spaces, this heuristic quickly looses its shine. A key conclusion of my doctoral work is that this seemingly high cost is illusory. I have uncovered evidence that strongly suggests that genetic algorithms can implement hyperclimbing extraordinarily efficiently.
As readers of this blog probably know, genetic algorithms are search algorithms that mimic natural evolution. These algorithms have been used in a wide range of engineering and scientific fields to quickly procure useful solutions to poorly understood (i.e. black-box) optimization problems. Unfortunately, despite the routine use of genetic algorithms for over three decades, their adaptive capacity has not been adequately accounted for. Given the evidence that genetic algorithms can implement efficient hyperclimbing, I’ve proposed a new explanation for the adaptive capacity of these algorithms. This new account—the generative fixation hypothesis—promises to spark significant advances in the fields of genetic algorithmics and discrete optimization.
The discovery that hyperclimbing is efficiently implementable also promises to have a non-negligible impact on the ecology of machine learning research. Optimization and machine learning are, after all, intimately related. Overlooking a few exceptions, the practice of machine learning research, can be characterized as the effective reduction of difficult learning problems to optimization problems for which efficient algorithms exist. In other words, the machine learning problems that can effectively be tackled are in large part those that can in practice be reduced to optimization problems that can be tackled efficiently. Currently, this largely limits the class of tractable machine learning problems to the class of learning problems that can in practice be reduced to convex optimization problems [1] . The identification of general-purpose non-convex optimization heuristics with efficient implementations (e.g. hyperclimbing), thus, has the potential to significantly extend the reach of machine learning.
For a description of hyperclimbing, and evidence that genetic algorithms can implement this heuristic efficiently, please see my dissertation
[1] Kristin P. Bennett and Emilio Parrado-Hernandez. The interplay of optimization and machine learning research. Journal of Machine Learning Research, 7:1265–1281, 2006.
Google Group for Generative Fixation
The generative fixation hypothesis now has a Google group—a place to ask questions and share your insights. If you’re intrigued by the idea of generative fixation, please sign up.
Dissertation Deposition
I deposited my dissertation today.
Click here to see the final version (single spaced for easy reading).
Back to the Future: A Science of Genetic Algorithms
From the preface to my dissertation:
The foundations of most computer engineering disciplines are almost entirely mathematical. There is, for instance, almost no question about the soundness of the foundations of such engineering disciplines as graphics, machine learning, programming languages, and databases. An exception to this general rule is the field of genetic algorithmics, whose foundation includes a significant scientific component.
The existence of a science at the heart of this computer engineering discipline is regarded with nervousness. Science traffics in provisional truth; it requires one to adopt a form of skepticism that is more nuanced, and hence more difficult to master than the radical sort of skepticism that suffices in mathematics and theoretical computer science. Many, therefore, would be happy to see science excised from the foundations of genetic algorithmics. Indeed, over the past decade and a half, much effort seems to have been devoted to turning genetic algorithmics into just another field of computer engineering, one with an entirely mathematical foundation.
Broadening one’s perspective beyond computer engineering, however, one cannot help wondering if much of this effort is not a little misplaced. Read the rest of this entry »