In this post, I use General Adversarial Networks (GANs), to generate images based on three training sets. All three training sets are image-based and happen to feature tobacco products. As an avid cigar smoker, who works out of cigar shops quite a bit, it’s not difficult to see where the inspiration came from.
Since there are many excellent resources that provide a good general introduction to the theory of GANs, I will refrain from going into depth on the topic. To keep things simple, the general idea is to create two neural networks, one called a Generator and the other referred to as a Discriminator. The Generator creates fakes. The Discriminator passes judgment of whether the output of the Generator is a legitimate example of the class, by developing a data distribution (think model) and judging images based on that distribution.
The two neural networks run in lock-step for a predetermined number of iterations (epochs). Over time as the Generator gets better at producing fakes, the Discriminator gets better at detecting them. The goal, at the end of the process, is to the have a Generator that can fake out the Discriminator every time. This whole process begins with some random noise being fed into the Discriminator and the Generator making its initial adjustments based on feedback from the Discriminator.
Now that we’ve done a “quick and dirty” technical explanation of the theory behind GANs, let approach it from another angle. When explaining this project to non-technical types, I’ve been using the following example. I tell them to imagine the AI like a 12-year-old with artistic talent who has never seen a cigar before. I am basically giving the young artist a set of colored pencils, showing it some examples of what a lit cigar looks like, and asking them to draw some new images of cigars, ideally, without directly copying the originals.
Generated Lit Cigars
To generate our lit cigars image matrix, we started with 85 images of cigars that were either already lit or in the process of being lit. We then trained the neural network over 100K epochs.
Some of the images are quite abstract. Others, especially those which clearly contain images of faces, are close renditions of original photos from the training set.
Generated Cigar Boxes
In generating our cigar box image matrix, we started with 105 images of cigar boxes, and again trained our GAN over 100K epochs. This set of output images is interesting in that the results are more consistent. This is very much a direct reflection of the similarity of the training images.
With our lit cigars example, we had training images with and without faces in the image. That much variance adds a fair amount of complexity to the problem of figuring out exactly what passes for a legitimate image of a lit cigar. Among the generated cigar boxes, my favorite would have to be one of the two “Eye of Sauron” boxes in the lower-middle section of the display.
Generated Pipes
To generate our pipe matrix, we started with 112 images and this time we upped the amount of training to 150K epochs. This set of images was especially interesting in terms of the mixture of photo-realisticness and abstraction. My favorite generated pipes can all be found in the last three rows of this matrix. From left to right they are the “Octo-pipe” (with three stems), the one with two bowls and the pipe that appears to have a “crystal ball” affixed to the bottom of the bowl.
There are a couple of images, which appear to be virtually exact copies of images from the training set. In thinking about these outputs, while personifying the AI, I wondered if the system could be thought of as conceptualizing these examples as “archetypical” pipes. In other words, as the most “pipe-like” of all images in the training set. It’s quite fitting that the pipe example made me think of archetypes, considering that the person most responsible for popularizing the term, Carl Jung, was an avid pipe smoker. How’s that for a synchronicity!
Even though GANs are only a couple of years old it’s clear that General Adversarial Network art is now a thing, and it’s here to stay! The idea for this post was inspired by Fast.ai Lesson 12 (2018 edition). The associated WGAN.ipynb file from the Fast.ai Git Repo is an excellent way to get started with Wasserstein GANs. If you enjoyed to post, come continue the conversation on Hacker News or Reddit.
Leave a Reply