Ames
Laboratory, U.S. Department of Energy, Ames, Iowa
It
all began at a group meeting, when Kai-Ming Ho, a senior physicist at the Ames
Laboratory, brought up the genetic algorithm, a method of solving some types of
computationally greedy problems. Jamie Morris, a post-doctoral fellow in the
Condensed Matter Physics Program, then described a project elsewhere on campus
where the genetic algorithm had been used to optimize the behavior of a frog.
At first, David Deaven, another postdoctoral fellow, listened with amused
detachment.
"These were mathematical models of frogs," he says, "which sat at one edge of a mathematical field. Flies landed at random on the field. The frogs had a number of internal rules that specified where they looked for bugs, and there was a cost associated with sticking out their tongues." Frogs good at catching flies mated by swapping rules, and, in a digital version of the survival of the fittest, the most adept of their offspring replaced the least adept frogs in the population. Lo and behold, over time the population of frogs became very good at catching flies.
"After a while," Deaven continues, "I stopped focusing on the frogs and started thinking about the root causes of the algorithm's effectiveness. And after the group meeting, I discussed the algorithm with Kai-Ming and its possible application to the formation of carbon clusters as well as to the behavior of frogs."
All of this might sound rather frivolous, but Deaven was actually wrestling with the central problem in theoretical materials science: Why can't a physicist sit down at a computer and design a new material when an engineer can sit down at a computer and design a new car? The answer is that if the material is modeled at the level of its constituent atoms, the number of possible atomic arrangements a program must evaluate increases exponentially with the number of atoms.
The practical consequence is that there are problems scientists have the theoretical understanding but not the computational resources to solve. A classic example is protein folding, which is central to the design of new drugs and many other problems scientists would like to crack.
The standard way to study materials at the atomic level is with a computer algorithm called simulated annealing. "It's just sort of accepted that when you have a physical model of a material, you're going to have to do simulated annealing," says Deaven.
The simulated annealing algorithm was originally suggested by the annealing of a solid, a procedure used to ensure that a solid reaches a physical state with the lowest possible energy. The solid is first heated to a relatively high temperature, and then its temperature is lowered in steps. Provided the solid spends enough time at each temperature in the annealing schedule to reach thermal equilibrium, it will probably end up in the lowest energy state.
The simulated annealing algorithm is a shortcut, but it's not a very short shortcut. "Solving a 60-carbon-atom problem with simulated annealing could take as many as 109 hours, which is more than a hundred thousand years," says Deaven. "As a post-doc, I just can't wait that long."
The genetic algorithm, another computational shortcut, has been spectacularly successful since its invention by John Holland of the University of Michigan at Ann Arbor in the 1960s. But when the genetic algorithm is used, the presumption is that the process being simulated is some-how analogous to biological evolution.
In a typical simulation, every member of a population of potential solutions is repre-sented by a chromosome, and solutions mate by swapping chromosomal segments in a process called crossover. It is not obvious that mating or crossover have any relevance to carbon clusters. "No one is proposing," Deaven says wryly, "that carbon clusters are an intelligent form of life.
"We decided to try the genetic algorithm on the clusters anyway, because we need something that works better than simulated annealing," Deaven continues. "So I wrote a computer program that implemented our notion of the genetic algorithm. In this program, clusters did not mate by swapping chromosomal segments. Instead, the mother cluster divided along one plane, the father cluster divided along another plane, and two half-clusters joined to form a daughter cluster.
"We
tried simulating the evolution of a population of 60-carbon-atom clusters with
this mating strategy, and it worked." The simulation produced buckyballs,
the hollow balls with the geometry of soccer balls that are the lowest energy
arrangement for 60 carbon atoms. Although there have been many previous
attempts to simulate the formation of buckyballs, none have been successful.
Ho comments, "The irony is we were successful because we invented everything from scratch, beginning only with the spirit of the approach, instead of starting with Holland's book and other standard sources. Had we used the standard mating procedure, the simulation wouldn't have worked." Deaven and Ho knew in advance that 60 carbon atoms preferred to make bucky-balls rather than other structures. Because the answer to the 60-carbon question was known, it was a good test of the genetic algorithm. And if the algorithm consistently came up with the right answer to problems with known answers, it could be tried with confidence on the many problems whose answers are not known.
But how seriously can one take this simulation? Carbon clusters can be made in the lab by blasting a chunk of carbon with a laser and spraying the resulting gas of carbon atoms into the laboratory. No one supposes that in the laboratory, as in the simulation, rudimentary clusters crash into one another, break in half and rejoin to form new clusters. The simulation, in other words, is not literally faithful to nature.
On the other hand, the buckyball mat-ing strategy does have physical meaning. The energy of a cluster is a local rather than a global property. In other words, the energy is really the sum of the energies of different regions in the cluster rather than a quantity that can be calculated only by considering the cluster as a whole. For this reason a mating operation that preserves regions, as the cluster-splitting mating strategy does, tends to encourage incremental improvement rather than repeatedly unraveling any progress that is made.
Moreover, in later experiments with different numbers of carbon atoms, Deaven and his colleagues have been able to show that the genetic algorithm has a spooky ability to mimic different regimes in cluster formation. "For example, we tried the algorithm on clusters with 20 carbon atoms," Deaven says. "But when we ran the simulation, we always ended up with an open ring of 20 atoms. And the trouble with this is we know that for C20 the lowest-energy stucture looks like a bottle cap, not like a necklace. At first we were a little perplexed, because now the algorithm wasn't finding the lowest energy structure.
"It took about a month to solve this problem, which is an embarrassingly long time considering the solution we came up with. If you cut a ring in half, you get a C-shaped group of atoms, and if you cut another ring in half, you get another C. If you put a C and a C together, you get a ring. So no matter what we started with, if one of the clusters found a ring, it bred rings so efficiently that rings took over the population."
The solution was to add mutation to the algorithm. Mutation is usually simulated in genetic algorithms by a "cosmic ray" that zaps a chromosome, flipping a bit. But since the carbon clusters don't have chromosomes, a different mutagenic procedure had to be devised. Deaven calls the one they ended up using for C20 the drunken climber algorithm. "Suppose a solution is at the bottom of a little valley in the solution space, where the energy is low but not as low as it gets elsewhere in the space. What this algorithm does is try to walk uphill -unless it finds that it is climbing steeply, in which case it walks along a contour. After walking awhile, it goes straight downhill, and the simulation resumes from the new location in the solution space.
"This algorithm isn't very sophisticated, and it rarely finds a lower energy solution," says Deaven. "It balls up the ring and knots it around and comes up with something that's worse than what you started with. But that's OK because that diversity in the population allows the population as a whole to get out of being trapped in rings.
"Now, what's really interesting about this," adds Deaven, "is that when I went back and talked to the experimentalists who were blasting carbon in the laboratory and asked them what structures 20 carbon atoms form, they said, `Oh yeah, for 20 it turns out you get a lot of rings. You get some caps, all right, but you also get a lot of rings.' "
From a physicist's point of view, there are two possible explanations for the pop-ularity of a particular structure. One is just that it happens to be a structure that has a very low energy. The other is that there are more paths leading from a random collection of carbon atoms to this structure than to other structures. Sixty carbon atoms like to form a buckyball because the buckyball is the lowest energy configuration, but 20 carbon atoms like to form a ring because more paths lead to the ring.
"So," says Deaven, "staring us in the face was this analogy. When there were 60 carbon atoms, the genetic algorithm had no trouble making the buckyball, the low-energy structure. Remarkably, neither does nature. But when there were 20 carbon atoms, the genetic algorithm had trouble escaping from the large basin of attraction for rings in the solution space. Remarkably, so does nature." Additional experiments showed that the analogy continues to hold for clusters containing 30 carbon atoms, which like to form cages.
So if carbon clusters aren't evolving like Galapagos tortoises, what exactly is going on? Why is a bad metaphor for cluster evolution (the genetic algorithm) so much better than a good metaphor (simulated annealing) at predicting the right cluster structures? One reason is that the genetic algorithm is better than the simulated annealing algorithm at negotiating a complex solution space. The energy barriers between cluster structures can be quite high, and the simulated annealing algorithm tends to get stuck in the wells between barriers. The mating procedure allows the genetic algorithm to leap the barriers and to negotiate the terrain much more competently.
John Holland finds Deaven's use of the genetic algorithm "both insightful and innovative." He adds that the 20-atom case is actually more interesting than the 60-atom case. "Innovative uses of the genetic algorithm make use of its ability to locate the `building blocks' of a solution, which are the analogs of co-adapted sets of genes," he explains. "The algorithm then proceeds to build solutions from the building blocks. These solutions are often not optimal but rather simply good. It is in this context that the 20-atom case is interesting. Most chemical and physical reactions proceed by a multitude of paths, producing a distribution of results rather than a single outcome. Deaven's approach is a nice way of learning about such multipath processes."
The modified genetic algorithm has performed so well on carbon clusters that Deaven and his colleagues are drawing up a list of other problems to tackle. These include the simulation of grain boundaries in a metal and the simulation of protein folding. "The structure of the grain boundaries is extremely important for understanding the properties of the metal, and they're very difficult to measure experimentally," Deaven explains. As for protein folding, the team that cracks that one will have opened the way to computer-designed drugs, long a goal of the pharmaceutical industry.
The genetic algorithm's performance on these problems will either prove or disprove the conjecture that it is truly equivalent to the simulated annealing algorithm, which always works -- always works, that is, if it ever terminates.
research
funded by: Basic Energy Sciences Office, DOE; Scientific Computing Office, DOE
Home | Comments | Search | Disclaimer