Deep Learning for Computer Vision Spring 2017: Thurs. April 13: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Wednesday, April 12, 2017

Thurs. April 13: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Alec Radford, Luke Metz, Soumith Chintala. 2015.

18 comments:

UnknownApril 12, 2017 at 1:22 PM
This presents a novel architecture for computer vision applications called Deep Convolutional Generative Adversarial Networks. These appear to be a good tool for unsupervised learning because they can extract features in both the generator and discriminator. Though, these models are not perfect, and GANs are notoriously difficult to train, resulting in generators that produce nonsensical outputs. This paper pushes forward the world of GANs in three proposed ways; it introduces a set of constraints that help with training, it uses trained discriminators and generators for further classification, and it uses filter visualization in order to show feature extraction.
In order to help with GAN model convergence, the paper implements batch normalization, ReLu activation, and eliminates fully connected layers.

Does DCGANs work on non-image machine learning applications like NLP?

-Sam Woolf
ReplyDelete
Replies
Dylan CashmanApril 12, 2017 at 2:06 PM
This paper presents a network architecture for learning low-dimensional feature representations of images via deep convolutional generative adversarial networks. Their model uses all convolutions with no fully connected layers, batch normalization, and leaky reLU. It uses no data augmentation in an effort to avoid overfitting a training set. They also use an autoencoder layer to get rid of duplicate images. The generative part of the adversarial network starts from a low-dimensional Z space; this becomes the low-dimensional representation of the images.

To empirically validate their method, they extract features on ImageNet-1K, and then use those features and a linear SVM on CIFAR-10. It performs close to state of the art among unsupervised techniques. They also test on the streetview house numbers dataset, and perform about as well. Then, they do a qualitative investigation, showing that walks in the space generate reasonable images. They also visualize the activations of different features.

Question:

How does the size of the low dimensional space compare to the previous paper we read on using autoencoders for learning representations?
ReplyDelete
Replies
ATongApril 12, 2017 at 4:32 PM
This paper presented the first mostly stable GAN with convolutions. They use recent ideas from the supervised CNN side of deep learning to stabilize network training. The authors then attempt to empirically validate their generative model in a variety of ways, for example by using features from their discriminator in a classification task, and validating the semantic meaning in their latent space through vector arithmetic on faces.

Question:
The authors present some guidelines for stable DCGAN training. Why and how do these factors: strided convolutions, batch norm, global average pooling, contribute to stable DCGAN training?

Alex Tong
ReplyDelete
Replies
UnknownApril 12, 2017 at 4:46 PM
Sam Burck:

The authors describe a new type of CNN known as the deep convolutional generative adversarial network, or DCGAN's. The authors demonstrate that both the generator and discriminator learn the hierarchy of representations between object features and scenes. DCGAN's are unique in that they are fully convolutional, use batch norm in generator and discriminator, and use ReLU's for most layers in the generator, and leaky ReLU's for all discriminator layers. The authors apply their DCGAN's to many different tasks, and in the end show that DCGAN's are able to learn good representations for both supervised learning and generative modeling. Instability remains an issue in some instances.

Question: Why do leaky ReLU's work well in the discriminator specifically?
ReplyDelete
Replies
ColeApril 12, 2017 at 8:05 PM
The authors introduce a new class of CNNs called deep convolutional generative adverserial networks (DCGANs). They train it to learn reusable feature representations from large unlabeled datasets. The method is to train a generative adversarial network (GAN) and then reuse parts of the generator and discriminator networks as features extractors for supervised learning. The authors present a model that improves on previous work whereby GANs are more stable to train over a wider range of input types. Some of the improvements included replacing layers like maxpooling with strided convolutions, using only CNNs instead of fully connected layers on top, applying batch norm (but selectively), and using relu and leaky-relu.

They tested their work by getting features from imagenet and using them with cfar10, and with addresses. The performance is competitive. They also visualized some of the random and trained filters. They did demonstrated some arithmetic for visual concepts which worked surprisingly well (e.g. man with glasses - man without glass + woman without glasses did images of a woman with glasses).

Discussion - For the arithmetic on the Z vectors, they say it worked when averaging three samples - why is that? The samples are all generated, how does creating a few and averaging work?
Also - those faces are creepy.
ReplyDelete
Replies
UnknownApril 12, 2017 at 8:47 PM
This paper proposes one way to build good image representations by training Generative Adversarial Networks, and use generator and discriminator networks as feature extractors which is very useful for unsupervised image representations. It makes GAN more stable by using deep convolutional GANs, uses trained discriminators for image classification, shows that generators have vector arithmetic properties. To scale GANs, it adopts three recently demonstrated changes to CNN architectures, first is to allow generator and discriminator to learn its own spatial upsampling,second is to eliminate fully connected layers on top of convolutional features and third is to use batch normalization to stabilize learning.

Discussion: How can generator control generator not output certain objects.
ReplyDelete
Replies
UnknownApril 12, 2017 at 8:49 PM
The paper presents an architecture for unsupervised learning of image representations in low-dimensional space using deep convolutional generative adversarial networks. They use strided convolutions, batch normalization, RELU, leaky RELU and tanh activations (RELU + tanh in generator, leaky RELU in discriminator). They don't use pooling layers or fully-connected layers. They evaluate the generative network empirically, and evaluate the discriminative network by feeding its convolutional features into an SVM classifier and comparing classification accuracy against supervised approaches.

Why do the authors avoid using data augmentation?
ReplyDelete
Replies
UnknownApril 12, 2017 at 8:57 PM
This is a cool paper; they build on earlier generative adversarial network and demonstrate that the latent space they learn is for real. The images they generated - bedrooms, the interpolation of bedrooms, add and subtract faces - are awesome. They clearly state the upgrades they made (batchnorm, strided convolutions, etc.), which was nice

But I find the whole generator/discriminator paradigm confusing. Generator takes the input in the latent space Z and upsample it via fractionally-strided convolutions generate the image. And discriminator is only relevant for training? It'd be great if we could get a quick overview on how we train the generative adversarial nets.
ReplyDelete
Replies
UnknownApril 12, 2017 at 9:16 PM
This paper describes several experiments which involve Deep Convolutional Generative Adversarial Networks, including the model they use to train their generator and their discriminator well in an unsupervised, stable way. These model improvements depart from the typical convolutional networks in that they use strided convolution rather than pooling layers, don’t use batch normalization, and get rid of the affine layers of many convolutional layers altogether. They also use ReLU in the generator and LeakyReLU in the discriminator, and conclude that some additional features which contribute to stability are using a tanh activation on the output of the network and using a global pooling layer at the discriminator output. The result is a network that is capable of generating realistic looking images, good enough to investigate the representation space meaningfully. Some of the more interesting examples of their work in this space includes being able to selectively discourage the activation of windows in the network using bounding boxes and being able to do representational arithmetic to prod the underlying semantic representations of these images.
For discussion, I’m mostly just confused by how the de-duplication process works? Would it be correct to say that their de-duplication process essentially discourages image input memorization by selectively avoiding similar examples in the code space of the training data?

Ben Papp
ReplyDelete
Replies
UnknownApril 12, 2017 at 9:30 PM
Xinmeng’s Summary:

This paper present a unsupervised learning method on convolutional networks which is called the deep convolutional generative adversarial unsupervised learning. The method use the representation from the object parts on both generator and discriminator in different hierarchy to accomplish the unsupervised learning. The paper used the approach to adopt and modify convolutional net replaced spatial pooling with strided convolution and eliminate fully connected layers on the top of convolutional features, also with batch normalization. As a result, the discriminators for image classification task, the generator learns specific object representations for major scene components. The paper claim to have a more stable set of architectures for training generative adversarial networks, and result in good understanding of the representation of the image by verifying through supervised learning. However, the paper has the weakness that it sometimes collapse a subset of filters to a single oscillating model.

Discussion: Is the reason for the weakness of the model which is collapse a subset of filters because of the dataset?
ReplyDelete
Replies
AnonymousApril 12, 2017 at 9:58 PM
Summary by Jason Krone

This paper proposes a family of architectures for generative adversarial networks (GANs) and explores the use of the GAN discriminator network as a feature extractor for other linear models fit on supervised datasets. Regarding the architecture, the authors suggest replacing any pooling layers with convolutional layers in both the discriminator and generator, using batchnorm in both networks, using ReLU as the activation function in the generator, and using Leaky ReLU as the activation function in the discriminator. The authors show that the discriminator network from the GAN can be successfully used as a feature extractor for another dataset, in this case CIFAR-10, and demonstrate that the resulting model obtains relatively high accuracy in comparison with other models. Lastly, they walk on the manifold of possible generator outputs and demonstrate that the model produces pictures with smooth transitions, which implies that the model has not simply memorized specific representations.

Questions:
- Why does using the leaky ReLU improve the performance of the discriminator?
- Why do they use a tanh activation on the last layer of the generator?
- What is the intuition on why removing pooling layers is advantageous?
ReplyDelete
Replies
UnknownApril 12, 2017 at 10:40 PM
The researchers in the paper introduce a new architecture called deep convolutional generative adversarial networks that is suitable for unsupervised learning. The researchers use several recent changes to CNN architectures such as the all convolutional net, the trend towards removing fully connected layers, and using batch normalization. For determining the capability of the DCGAN, they authors achieved a strong baseline performance in classification on the CIFAR-10 dataset. The researchers conclude with noting that there exist forms of model instability when models are trained longer.

Question:
In one of the experiments, the researchers attempted to completely remove windows from the generator, resulting in the generator replacing windows with other objects. Would this experiment still work if the researchers removed additional objects (i.e., windows and beds)?
ReplyDelete
Replies
UnknownApril 13, 2017 at 6:16 AM
Nathan Watts' summary:
This paper proposes a method for training convolutional neural networks in an unsupervised manner known as a “deep convolutional generative adversarial network.” This method improves on and extends the established architecture of generative adversarial networks, making them much more stable and understandable. Like previous GANs, this model uses a generator-discriminator pair to attempt to learn a mapping from a compressed vector space back to the image space. The structure of the networks have been changed to fully convolutional, using strided convolutions in place of all pooling and removing all fully connected layers. A ReLU activation function was used on the generator, and a leaky ReLU on the discriminator. Additionally, batch normalization was used to make the network more stable and guarded against poor initialization, preventing many network explosions and implosions. They also used an autoencoder to find and remove duplicate or very similar images from the training data to reduce overfitting.

They found that this method not only generates very convincing images, but also can be used for other very interesting applications, such as feature extractors for classification, and interpolation between and semantic arithmetic on images.
Question: Why would they choose not to augment the data at all? Does deduplication act as a sort of de-augmentation?
ReplyDelete
Replies
UnknownApril 13, 2017 at 12:00 PM
This paper proposes a convolutional GAN architecture that they call DCGAN. Their proposed architecture is a fully convolutional network that uses batch normalization and ReLU (in the generator) and Leaky ReLU (in the discriminator), and replaces pooling layers with strided convolutions. They further show that the discriminator of the DCGAN can be used independently as an image classifier, and that the generator has learned specific features of objects. They demonstrate the latter by performing vector arithmetic on averaged input vectors to the generator to generate faces with specific features (ex. Man with glasses - man + woman = woman with glasses).

Discussion:
In the conclusion, they say that training the model for too long causes some of the filters to collapse. Why does this happen, and has this issue been addressed since the publication of this paper?
ReplyDelete
Replies
UnknownApril 13, 2017 at 1:18 PM
This paper suggests using Deep Convolutional Generative Adversarial Nets. The authors propose that combination of a generative and discriminate model in deep architectures can learn features that could be used in the supervised tasks for example in classification. The problem with GAN are that they suffer from unstablity. The paper contributions are to improve the stability of such networks by changing some layers in the architecture, such as eliminating pooling layers. They also try to visualize the performance of some filter to investigate their contribution in the network.

One of the steps for improvements is using tanh instead of ReLU in the last layer. How do you think this helps stability? Could it be just some suppression of gradients to make the net more stable?
ReplyDelete
Replies
UnknownApril 13, 2017 at 1:39 PM
This paper presents a series of architectural decisions that help stabilize the output of Generative Adversarial Neural networks. Using a combination of generative and discriminative methods, the authors present a network that produces qualitatively stable results that are built from deconvolutions/fractionally strided convolutions.

In Zhou Bolei's guest lecture, he spoke about BN making networks less interpretable and that BN had a whitening effect on the convolutional units throughout a network. How does one reconcile this fact and the Author's decision to introduce BN? Does BN force the network to learn more abstract units and does this relate to the stability of the generated images?
ReplyDelete
Replies
JonApril 13, 2017 at 2:59 PM
This paper presents a technique to facilitate the use of convolutional neural networks for unsupervised tasks. The method, dubbed deep convolutional generative adverserial networks (DCGANs). The novel contributions of this paper for this technique include additional constraints imposed on the GAN architecture, utilization of features from both the generator and discriminator networks, visualization of GAN filters, and insights into the generator that allow manipulation of generated samples. The DCGANs were trained on three datasets, "LSUN", "Imagenet-1k", and "Faces". The results were validated by application of technique to CIFAR-10 dataset, comparing generated categories to true labels. The results indicated that the DCGAN learned features and resulted in accuracies similar to the baselines.

Discussion:
Can you go into detail on how the results were validated on CIFAR-10?

-Jonathan Hohrath
ReplyDelete
Replies
UnknownApril 13, 2017 at 3:32 PM
This paper proposes an improved architecture which is more stable than vanilla GAN networks. By using strided convolutions, batch normalization and global pooling (full convolutional), they have shown that the generative results were less noisy and more comprehensible than unconstrained generative networks.

An interesting application of this improved GAN is that the discriminator learns general features as the generator is capable of creating good representations. This unsupervised feature learning process is quite a bit different than normal clustering or autoencoding.
ReplyDelete
Replies

Add comment