Great discoveries that have shaped our lives have been made by simply studying nature and the field of computer science is no exception. In this article, I explore a searching technique inspired by nature: the genetic algorithm.
This has something to do with biology and genetics (it never ceases to amaze me how great ideas and theories come from good old Mother nature). So the story goes…
In a time where curiosity was heavily rewarded, people (and by people I mean the weirdly intelligent ones who always seem to have a question about everything) were passionately occupied in understanding how organisms happen to possess certain characteristics that helped them survive on earth, and what was responsible for producing these features and transferring them to their offspring. Questions like how did a giraffe’s neck get so tall? Or how did fish learn to swim? among many others, baffled people for so long.
Inevitably, it was discovered that there were some small almost invisible things called genes located somewhere in the bloodstream of organisms that were responsible for the characteristics exhibited. So we learn that there’s a gene responsible for eye colour, your ability to roll the tongue or not (how weird is this?), height, and even for certain diseases.
They also found out that the ‘universe’ somehow selects certain individuals to transfer their gene information by some interesting design some call natural selection. In this process, every individual takes an exam conducted by Mother nature to determine which of them are fit enough to pass on their genetic information. You could call it Mother Nature’s fitness exam. The individuals that are declared fit get to pass their genes to the next generation. This process goes on severally that after a certain period of time the newer offspring produced will - according to the fitness exam - be very fit. However, there’s a question of how mother nature determines who is fit and what exactly fit means? there are so many theories on this but thankfully we leave the biologists and genetic engineers to argue it out.
They also found out that the reproduction process can happen in two ways:
- A child is produced when two genes from two parents come together (literally) through a process known as crossover.
- A single parent can produce a new child by itself through a process called mutation (this is an epic DIY - If you find anyone who’s achieved this let me know)
This concept of evolution, reproduction and natural selection teaches a pattern of finding out the best solution to any problem when there are just so many right ones this can be properly termed as the process of finding the optimal solution.
Structure of the algorithm
Bringing it all together, this is how a GA looks like:
- There’s an initial population of organisms each with varying characteristics known as genes;
- There’s a system that defines how each of the organisms is fit, this is known as the fitness exam or the fitness test.
- Organisms are selected for reproduction based on how well they score on the fitness exam;
- A new population of organisms are created by either crossing two parents and or mutating two parents;
The new population undergoes the same procedure again by going through the fitness exam and reproducing to create another population until the new offspring are determined as fit by the system.
Designing a GA for your problem:
The design of GA’s are a bit of an art and its success is dependent on the designer but these guidelines can be of help.
The goal or the problem.
First, state clearly what you want to do. What do you want to achieve? how will a solution to your problem look like?
Representation
The gene for the problem is effectively a specific solution to your problem. The aim here for you is to try to represent a single solution in code: it can be as simple as a character string or a list or can be as complex as a class. If this is designed first, you can easily build a population of these solutions.
The fitness test.
Here you design a method that can be used to evaluate each solution to see how correctly it matches up to your goal.
Reproduction
Here you define how new offspring are produced: There are already existing algorithms you can implement. Don’t forget to define both the crossover and mutation methods.
Lastly, define a stopping criterion.
When does your algorithm stop running? It can be when it has produced a particular number of generations, or when the average fitness of a population reaches a particular threshold.
The concept of the GA is usually implemented in solving problems where an optimal solution is needed but a traditional searching algorithm would be computationally infeasible or too complex to design.
Credits to Daniel Shiffman whose book The Nature of Code I believe can be very helpful. You can check his Youtube channel where he makes programming an exciting journey
If you’re looking for some examples on the GA implementation. You can check out this git repo where I implemented (rather crudely in Java) the GA to solve the toy problem of finding the 3 numbers that sum up to 100
Cover Image by (Gerd Altmann from Pixabay)