- Word count: 1182
- Average reading time: 5 minutes and 54 seconds (based on 200 WPM)
Some toying around with a genetic algorithm, coded in approx. 6 hours total.
Genetic algorithms (GA) have always fascinated me; not only in how they work, but also in how amazingly simple an implementation of a GA can be. Ranging from very simple to rather complicated ones they come in all different kinds of flavours. One thing I like about heuristics in general is that they all do the same thing in general - minimize the cost function. This cost function can be anything as long as the current value can be evaluated across a target value. Do note that genetic algorithms are mostly a thing of the past as Simulated Annealing and Gradient Descent seem to perform better in general and are more widely used to my knowledge.
Nevertheless, Genetic Algorithms are fun to toy around with. Additionally, I already had some experience with GA's when I came across this blog post a while back and extended it for an assignment in a study project. Having gained experience with the animated 3d plots of matplotlib I wanted to visualize the progression of a GA.
With the code from the blog as basis I set out to take a closer look at what exactly is happening. To clarify some definitions:
- Generation: An iteration in the algorithm
- Population: A set of individuals
- Parents: A set of individuals from the previous generation. Size(Parents) < Size(Population)
- Children: A set of individuals generated from the parents based on a set of parameters, filling the remainder of the population for the current generation.
- Individual: A candidate value in the population, in this case a tuple of coordinates \((x, y, z)\)
- Best individual: An individual with the lowest cost
- Mutate: the probability of a parameter of an individual changing
- Retain: the amount of Parents that "survive" the current generations.
- Random select: the probability of a random individual being added to the list of parents.
A problem to optimize
Initially I wanted to generate a set of random dots named \(K = (X, Y)\) in a 2d space. In this 2d space I would place another dot named \(k\) for the GA to optimize. The cost function would be optimized based on the sum of euclidean distance from dot \(k = (x, y)\) with respect to all the other dots in set \(K\). The best individual of each generation is chosen on the following basis:
with size being the amount of dots present in \(K\)
With this setup I turned it on and plotted the results -- and they weren't all that spectacular: The GA did it's work perfectly with a fitness history that looked as follows:
With so little generations required to find an optimum I wanted to see more, dynamic, action of a GA. Adding another dimension would drastically increase the search-space and thus the amount of attempts required to find a good-enough optimum, changing \(K\) into \(K = (X, Y, Z)\) and \(k\) into \(k = (x, y, z)\), the new function with which the optimal individual is chosen looks now as follows:
For this GA there are five main parameters to toy around with:
- Random select
- Population Size
- Amount of generations
For all videos in this post I have kept the population size and amount of generations at a set rate: 100 and 200 respectively. In the below video you can see how the GA finds an optimum rather fast (<20 generations). The changing color of the line illustrates the progression in generations. Starting at yellow on generation 0 and ending in magenta on generation 200.
As you can see the the model finds an optimum rather fast - as normally is desired - but for this post I'd rather show some other examples to demonstrate how the parameters influence the behaviour of the optimization. For example if the \(retain\) is set very low it practically pops out a complete new population of individuals resulting in very random behaviour which does not seem to converge - ever - but then it suddenly happens, and quickly pops out again. Yet still, it can get worse:
Though I wonder what it'll look like if the initial population is almost completely retained:
With almost the entire population retained it relies mostly on the mutations and random selections, thus resulting in a slow convergence with very little variation for the optimum individual. So far the model in the first animation performed optimally - converging quickly and staying relatively close to the global optimum.
Now this is all very interesting, though it might be nice to see how it reacts to new inputs. By, for example, giving the algorithm a short time to respond to new input. In the following animations the intial dots are in an upper corner. For every 30 generations 5 new dots are added in the oppositing bottom corner. Additionally, these dots weigh 5 times as heavy as the initial dots. The goal being that the GA starts moving drastically to the other corner when the 2nd batch arrives: The 1st batch: \(5 * 5 * 1 < 30\) and at the 2nd batch \(5 * 5 * 2 > 30\) With this set-up the GA should optimally shoot to the other side after the 2nd batch.
Trying out the best performing settings, the initial ones, so far:
Indeed it performed quite well but surely it can do better, maybe if the \(retain\) is higher:
That didn't work out as planned. I'd reckon a lower \(retain\) and a higher \(mutate\):
That responded really quick! Although, close to the end it seemed to have a lot of difficulty finding an optimum - the optimal individual is still varying quite heavily. Maybe a somewhat lower \(mutate\) and a larger \(retain\):
It reacted quickly, but not as fast as the previous attempt. What if the \(retain\) is slightly higher still, the \(mutate\) lower and the \(random selection\) higher. My hypothesis is that this'll be a close to an optimal set-up - one that converges quickly but remains stable once an optimum is found whilest bumping, sporadically, out of the optimal to look for an evem better optimum.
It doesn't seem that much better, estimating the right parameters is still a tricky thing to do. However, one thing that got me wondering is if I increased the population by tenfold: a population size of a 1000. Surely more computations should improve the results drastically. Both animations below are made with a population size of 1000 instead of 100:
And indeed, the model performed better. The best fitness line is a smooth function now, this makes sense as the best performing individual can be chosen from a large population size - although if it's worth the computation time is something I am doubting. Toying around with these models and different set-ups reaffirms to me that there isn't one size-fits-all set of parameters for every problem. However, it became all to clear that the moderate models performed better overall than the extreme ones. On top of this there's still plenty of room to toy around with regarding optimization and thinking of more complicated scenarios, but I'll keep that for another time.