Warning! Your browser does not support this Website: Try Google-Chrome or Firefox!

GAN

by Elvira Siegel
(Published: Sat Oct 24, 2020)

Generative Adversarial Networks

(GANs) are a type of unsupervised neural networks. The network exists since 2014 and was developed by Ian Goodfellow and colleges.


Give a GAN some training samples and it will gradually learn how to generate new data samples with the same characteristics as of the training samples. As an example such network is able to generate photographs of people who actually have never existed.


1_Example_generated_people

GAN generated faces


Or it can generate new anime characters:

2_Anime_Generated_samples

GAN generated amine faces


Or even new designs for your bedroom:

3_Bedroom_design

GAN generated bedroom design


The "magic" of GANs lies in its two main network components: a Generator and a Discriminator. Let's say we want to produce some new cat pictures showing cats who never existed. The Generator receives some input in form of real cat pictures with actually real cats and gives some output. In the very first steps those cat pictures will probably look nothing like a cat but with time our Generator will get better and produce more and more realistic cat pictures. How will it do this? Well, a Generator can improve itself by constantly receiving feedback from its most severe critic: the Discriminator. The Discriminator's task is to detect which data it receives, is real and which is fake. As everything that comes from the Generator, no matter how good or bad, is not real, the Discriminator will try to identify everything coming from the Generator as fake.

So, the Generator tries to fool the Discriminator and the Discriminator tries to not get fooled.

This is a very general idea of the GAN structure. But now let's dive deeper ...


4_dive_deeper

Generative in GANs

The are two general classes in statistical modeling: 1. generative models and 2. discriminative models. A major difference between those is that a generative model is able to generate new photos of people, cats, cars, etc ... that look like they were real people, cats, cars, etc... On the other hand, a discriminative model is only able to distinguish one object from another, for example separate a cat from a dog. Along with Naive Bayes and HHM, GANs also belong to the family of generative models, as GANs generate completely new outputs.


5_generative_prob_joint_probab

generative probability -> joint probability


6_discriminative_prob_condit_probab

discriminative probability -> conditional probability


A generative model is normally more complex to get to as it has not only to distinguish between objects but also find correlations like: if there is a car, then there must be a road. At the same time, a discriminative model only has to tell weather it is a car or a dog, hence a discriminative model draws a boundary line between two classes and a generative model directly locates those elements in the data space.


7_data_separation_1

generative model

7_data_separation_2

discriminative model


Generative models are harder to work with as they try to model how data is placed in space specifically.



Structure of GANs

A GAN consists of two crucial parts: a Generator and a Discriminator. Both of them are neural networks. The Generator network learns to generate realistic outputs. The Discriminator network learns to detect fake data which comes from the Generator. The Generator relies on the feedback from the Discriminator as the Generator needs to know weather what it produced was a good fake (=the Discriminator could not detect it as a fake) or what it produced was a bad fake (=the Discriminator could detect that it was a fake).


8_GAN_structure

Discriminator

The Discriminator is a classifier neural network (e.g. CNN for image data). It receives two types of inputs: one is real data and the other one is the fake data produced by the Generator.

As we can see in the picture above, the Discriminator has two Loss functions. If the Discriminator is in training mode, it ignores the Generator Loss and only uses the Discriminator Loss. When the Discriminator trains, it:


  1. classifies data into real and fake
  2. passes the results to its Discriminator Loss
  3. the Discriminator Loss gives a penalty for wrong classifications
  4. the Discriminator adjusts its weights with backpropagation based on the Discriminator Loss results
  5. repeats from 1

Generator

A Generator is a neural network that produces new data instances. What data input should be, so that our Generator can output absolutely new data instances? We give random noise as input for the Generator. Why random noise? Well, the Generator has more freedom to combine features learned from random noise to create brand new samples as when we give already predefined data with existing features in it. With noise data, the Generator samples from different places in the target distribution and by doing so is able to output a broad variety of new data.

As well as the Discriminator, the Generator has its own training phase. In its training the Generator:

  1. preforms random noise sampling
  2. creates some output from sampled noise
  3. gives this output to the Discriminator
  4. receives the Discriminator feedback: "real" or "fake"
  5. computes Discriminator Loss from Discriminator classification
  6. backpropagates through the Discriminator and the Generator to acquire gradients
  7. changes the Generator weights by gradients

The Generator-Discriminator-Tandem

How does the training of both networks in a GAN system occur? A GAN training of both networks happens one after another. First of all the Discriminator must complete its training in an epoch. After that the Generator comes in and begins its own training for one (or even more) epochs. Then we continue to train the Generator and the Discriminator one after another.

It's important to keep the Generator unchanged while the Discriminator trains. The Discriminator has to have a chance to recognize current flaws in the Generator and improve itself by weight update.

The Discriminator has to be kept constant as well while the Generator trains. If not the Generator will be constantly behind the Discriminator and never have a chance to fool the Discriminator.

Thanks to the training in alternating steps, a GAN can solve intractable generative problems. We start with a simple classification task for the Discriminator. With increasing training steps, the Discriminator should have a bigger problem detecting fake generated data as such.


GAN convergence is hard to identify

When the Generator improves its output results with training, the Discriminator worsens its accuracy as it is no longer able to say weather the input image is real or fake. In case we have trained a "perfect generator" which produced "perfect fakes", the accuracy of the Discriminator will be somewhat around 50%. With this accuracy, the Discriminator basically gives random answers. As a result the Discriminator feedback makes no sense anymore. At this point the Discriminator outputs random classification results. With random classification results based on nothing meaningful, the Generator cannot learn relevant features and incorporates flaws into its system, as the Generator trains on useless feedback from the Discriminator.

Convergence in GANs is not stable. We have to find the "sweet spot" where the Discriminator is just about to output accuracy of 50% but still can give valuable feedback to the Generator.


The GAN Loss

9_GAN_Loss

The Generator and the Discriminator have different goals for the loss function presented above. The Generator tries to minimize it and the Discriminator tries to maximize it, hence the "minimax game" with the loss function. The main Discriminator's goal is to achieve D(x) to be 1 for real data. The Discriminator wants ideally to classify all the the samples it receives correctly in real and fake. The Discriminator also wants D(G(z)) to be 0 if input data is fake, or in other words, it wants to output 0 probability that the input data G(z) is real.

The minimax log proceeds as following: log(D(x)) = 0 if D(x) = 1 as log(1) is always 0 (This is because any number raised to 0 equals 1). If D(x) is negative. So the Discriminator will prefer such values to maximize the loss.

As an analogy, log(1- D(G(z))) prefers values of D(G(z)) > 0.

This loss comes from the cross-entropy loss which measures the discrepancy between the real and "fake" distributions. The cross entropy loss is closely related to the Kullback–Leibler divergence between the target and predicted distributions. Cross-entropy is derived from Log Loss which deals with multi-class classification problems and Log Loss comes from Logistic regression.


Nothing is too easy

As GANs are relatively new, there are still things to improve. One of the main challenges in GANs includes the Vanishing Gradient Problem.

At the very beginning the Discriminator's job is fairly easy as the Generator produces a lot of nonsense at the start, so the Discriminator can effortlessly tell fake from real. If it happens to often, then the Generator receives no valuable feedback, hence it gets no meaningful weight update as the gradients (the feedback) are based on the perfect prediction of the Discriminator and the Generator does not know what features it should incorporate for further successful learning.

As a solution we may change the main Generator's goal from minimizing the complete MiniMax Loss function to maximizing the D(G(x)) part of the MiniMax loss function.


Conclusion

The Discriminator and the Generator "compete" with each other in "a game" called minimax. The Discriminator tries to classify its input as accurately as possible weather the input it receive is real or fake. The Generator tries to output samples that look exactly like real samples. With more training steps the Discriminator will get better at detecting fake samples coming from the Generator and the Generator will become better at producing samples which are more difficult to detect as being fake. If the GAN system is balanced, then the Discriminator outputs 50% probability for fake or real, which means the Generator is doing a great job. After this the Generator can be used separately to create meaningful samples which are based on the original training data but are still brand new.

GANs are revolutionary idea of a new neural network structure. It is here to stay and to be evolved further on.


Further recommended readings:

The original GAN Paper

Progressive Growing of GANs for Improved Quality, Stability, and Variation

Towards the Automatic Anime Characters Creation with Generative Adversarial Networks

siegel.work

It's AI Against Corona

2019-nCoV There has been a lot of talking about the new corona virus going around the world. Let's clear up some things about it first and then we will see how data science and ai can help us fight 2019-nCoV. ...

Activation Functions

What are activation functions in Neural Networks? First of all let's clear some terminology you need in order to understand the concept of an activation function. ...

Backpropagation

or backward propagation of errorsis another supervised learning optimization algorithm. The main task of the backpropagation algorithm is to find optimal weights in a by implementing optimization technique. ...

CNNs

The Convolutional Neural Network (CNN) architecture is widely used in the field of computer vision. Because we have a massive amount of data in image files, the usage of traditional neural networks wouldn't give much efficiency as the computational time would expl...

Early Stopping

In this article we will introduce you to the concept of Early Stopping and its implementation including code samples. ...

Gradient Descent

Hiking Down a Mountain Gradient Descent is a popular optimization technique in machine learning. It is aimed to find the minimum value of a function. ...

Introduction to Statistics

Part III In this third and last part of the series "Introduction to Statistics" we will cover questions as what is probability and what are its types, as well as the three probability axioms on top of which the entire probability theory is constructed. ...

Introduction to Statistics

Part I In the following three parts we will cover basic terminology as well as the core concepts from statistics. In this Part I you are going to learn about measures of central tendency (mean, median and mode). In the Part II you will read about measures of variabili...

Introduction to Statistics

Part II In this part we will continue our talk about descriptive statistics and the measures of variability such as range, standard deviation and variance as well as different types of distributions. Feel free to read the Part I of these series to deepen your knowle...

Logistic Regression

Logit Regression Logit regression is another shortened name derived from logistic unit. Logistic regression is a popular statistical model that generates probabilities for binary classification tasks. It produces discrete values and its span lies in the range of [...

Loss Functions

When training a neural network, we try to optimize the algorithm, so it gives the best possible output. This optimization needs a loss function to compute the error/loss of the model. In this article we will gain a general picture of Squared Error, Mean Sq...

The Magic Behind Tensorflow

Getting started In this article we will delve into the magic behind one of the most popular Deep Learning frameworks - Tensorflow. We will look at the crucial terminology and some core computation principles we need to grasp the real power of Tensorflow. ...

Classification with Naive Bayes

The Bayes' Theorem describes the probability of some event, based on some conditions that might be related to that event. ...

Neural Networks

Neural Networks - Introduction In Neural Networks (NNs) we try to create a program which is able to learn from experience with respect to some task. This program should cons...

PCA

Principal component analysis or PCA is a technique for taking out relevant data points (variables also called components or sometimes features) from a larger data set. From this high dimensional data set, PCA tries extracting low dimensional data points. The idea...

Introduction to reinforcement learning

Part IV: Policy Gradient In the previous articles from this series on Reinforcement Learning (RL) we discussed Model-Based and Model-Free RL. In model-free RL we talked about Value Function Approximation (VFA). In this Part we are going to learn about Policy Based R...

Introduction to Reinforcement Learning

Part I : Model-Based Reinforcement Learning Welcome to the series "Introduction to Reinforcement Learning" which will give you a broad understanding about basic (and not only :) ) techniques in the field of Reinforcement Learning. The article series assumes you have s...

Introduction to Reinforcement Learning

Part II : Model-Free Reinforcement Learning In this Part II we're going to deal with Model-Free approaches in Reinforcement Learning (RL). See what model-free prediction and control mean and get to know some useful algorithms like Monte Carlo (MC) and Temporal Differ...

Recurrent Neural Networks

RNNs A Recurrent Neural Network (RNN) is a type of neural network where an output from the previous step is given as an input to the current step. RNNs are designed to take an input series with no size limits. RNNs remember the past states and are influenced by them...

SVM

Support Vector Machines If you happened to have a classification, a regression or an outlier detection task, you might want to consider using Support Vector Machines (SVMs), a supervised learning model, that builds a line (hyperplane) to separate data into groups....

Singular Value Decomposition

Matrix factorization: Singular Value Decomposition Matrix decomposition is another name for matrix factorization. This method is a nice representation for applied linear algebra in machine learning and similar algorithms. ...

Partial Derivatives and the Jacobian Matrix

A Jacobian Matrix is a special kind of matrix that consists of first order partial derivatives for some vector function. The form of the Jacobian matrix can vary. That means, the number of rows and columns can be equal or not, denoting that in one case it is a squa...

Introduction to Reinforcement Learning

Part III: Value Function Approximation In the previous Part I and Part II of this series we described model-based and model-free reinforcement learning as well as some well known algorithms. In this Part III we are going to talk about Value Function Approximation: w...

Weight Initialization

How does Weight Initialization work? As a general rule, weights and biases are normally initialized with some random numbers. Weights and biases are extremely important model's parameters and play a pivot role in every neural network training. Therefore, one should ...

Word Embeddings

Part 1: Introduction to Word2Vec Word embedding is a popular vocabulary representation model. Such model is able to capture contexts and semantics of a word in a document. So what is it exactly? ...

Word Embeddings

Part 2: Word2Vec (Skip Gram)In the second part of Word Embeddings we will talk about what are the downsides of the Word2Vec model (Skip Gram...

t-SNE

T-Distributed Stochastic Neighbor Embedding If you do data analysis, machine learning or some other data driven research you will prob...
Copyright © 2020 by Richard Siegel at siegel.work Donate Contact & Privacy Policy