Artistic Painter Using VGG 19 Neural Network

VGG Artistic Painter Generated Output

TensorFlow_VGG_Artistic_Painter

VGG 19 Model and Artistic Style Painter

Goal: In this project, I am going to use VGG 19-layer CNN deep neural network to learn and transfter artist style from an art work to a targeting photo. Fine details of the technique can be found in the paper A Neural Algorithm of Artistic Style.

VGG-19 model weights can be downloaded from VGG group website. They provided parameters with a Caffe model and even a matlab model to run using MatConvNet. This gives opportunity to reconstruct VGG-19 model in Python and of course TensorFlow. I will construct VGG-19 in TensorFlow and use it on both content reconstructing and style painting. The training was performed on a Lenovo W530 thinkpad laptop with Nvidia K1000M mobile graphic card

Starting with an image with random noise, this deep-learning painter will merge content from a photo and learning style from an art work of Claude Monet to create someting wonderful. The final image is a Monet style painting based on real life photo

In [1]:
# TensorFlow CNN Tutorial
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import scipy.io
import scipy.misc
import PIL
from PIL import ImageOps, Image
import time
import sys
from six.moves import urllib

# Use CPU only, since GPU is occupied by CIFAR10
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE-DEVICES"]=""    # Change to '0' to use tf.device("/gpu:0")


# Setup Matplotlib and autoreload
%matplotlib inline
%load_ext autoreload
%autoreload 2
In [2]:
style_source = 'water_lily_monet.jpg'
simg = PIL.Image.open(style_source)
plt.imshow(simg)
plt.title('Water Lilies - Claude Monet')
plt.axis('off')
plt.show()
simg = np.asarray(simg, np.float32)
print "Original image size: ", simg.shape
Original image size:  (1076, 781, 3)
In [3]:
# Shrink image to speedup processing
HEIGHT = 400
WIDTH = 300

# Define a function to resize image
def resize_img(org_img_file, height, width):
    oimg = Image.open(org_img_file)
    rz_img = ImageOps.fit(oimg, (width, height), Image.ANTIALIAS)
    rz_img_np = np.asarray(rz_img, np.float32)
    return rz_img_np


s_img = resize_img(style_source, HEIGHT, WIDTH)
plt.title('Water Lilies - Claude Monet, Resized')
plt.axis('off')
a_simg = s_img.astype(np.uint8)
plt.imshow(a_simg)
plt.show()
print "Resized art image size : ", s_img.shape
Resized art image size :  (400, 300, 3)
In [4]:
target_source = 'water_lily_photo.jpg'
t_img = resize_img(target_source, HEIGHT, WIDTH)
plt.title('Water LilY Photo, Resized')
plt.axis('off')
a_t_img = t_img.astype(np.uint8)
plt.imshow(a_t_img)
plt.show()
print "Resized photo size : ", t_img.shape
Resized photo size :  (400, 300, 3)
In [5]:
# Download VGG-19 model
vgg_model_url = 'http://www.vlfeat.org/matconvnet/models/imagenet-vgg-verydeep-19.mat'
#vgg_model_file = 'imagenet-vgg-verydeep-19.mat'
expected_bytes = 534904783

# To Do: add auto retry with delay if network connection error found
def may_need_download(url_link, dest_directory, expected_bytes):
    """
    Download files if not available locally
    Args:
         url_link: the target file for downloading
         dest_directory: local directory to save downloaded file
         expected_bytes: expected file size, used to make sure downloading is correct
    Return:
         filenpath
         """
    if not os.path.exists(dest_directory):
        os.makedirs(dest_directory)
    filename = url_link.split('/')[-1]
    filepath = os.path.join(dest_directory, filename)
    
    if not os.path.exists(filepath):
        def _progress(count, block_size, total_size):
            sys.stdout.write('\r>>> Downloading %s %.1f%%' %(filename, float(count * block_size) / float(total_size) * 100.0))
            sys.stdout.flush()
        filepath, _ = urllib.request.urlretrieve(url_link, filepath, _progress)
        print '\n'
        
    statinfo = os.stat(filepath)
    if statinfo.st_size == expected_bytes:
        print 'Successfully downloaded', filename, statinfo.st_size, 'bytes.'
    else:
        raise Exception('File ' + filepath + ' is damaged. Please try to download it again')
    return filepath

vgg_model_file = may_need_download(vgg_model_url, 'vgg_model_param', expected_bytes)
Successfully downloaded imagenet-vgg-verydeep-19.mat 534904783 bytes.
In [6]:
# Load the mat into python
# scipy.io load into a dictionary
mat_dict = scipy.io.loadmat(vgg_model_file)
print "VGG mat first level structure:"
for key in mat_dict.keys():
    print "\t", key
# Load layer parameters
layers = mat_dict['layers']
print "VGG-19 model version:", mat_dict['__version__']
print "VGG-19 parameter matrix shape:", layers.shape
VGG mat first level structure:
	layers
	__version__
	meta
	__header__
	__globals__
VGG-19 model version: 1.0
VGG-19 parameter matrix shape: (1, 43)
In [7]:
# Play with matrix to explore its structure
# print mat_dict['__globals__'] # Empty
# print mat_dict['meta']  # List of classes, averageImage, etc.
# print layers  # too many information, needs to peel the onion
# print layers[0][2]  # this level selects layers
# print layers[0][42] # last layer, softmax layer
# print layers[0][2][0][0][0]  # here is the layer name, such as 'conv1_2'
# print layers[0][3][0][0][1] # this is for layer type, such as 'conv'
# print layers[0][2][0][0][2]  # Found layer parameters, finally!
# Double check whether matches with what we are looking for, using the first layer
# First layer takes 3-channel image, process it with (3x3) convolutional kernel
layer_name = layers[0][0][0][0][0]
print "Layer name:", layer_name
w = layers[0][0][0][0][2][0][0]
print "Weight shape:", w.shape # using (3x3) kernel, to process 3 channels of input image and there are 64 such kernels
b = layers[0][0][0][0][2][0][1]
print "Bias shape:", b.shape # 64 bias parameters for 64 kernels or 64 hidden units
Layer name: [u'conv1_1']
Weight shape: (3, 3, 3, 64)
Bias shape: (64, 1)
In [8]:
# Prepare images for VGG
# Minus mean image of VGG-19 model
# Also prepare VGG-19 model parameters matrix
vgg_mean = mat_dict['meta'][0][0][-1][0][0][2][0][0]
vgg_mean = np.array(vgg_mean)
vgg_mean = np.reshape(vgg_mean, (1,1,3)) # prepare for minus mean image using broadcasting

t_img = t_img - vgg_mean
s_img = s_img - vgg_mean

t_img = np.expand_dims(t_img, 0)
s_img = np.expand_dims(s_img, 0)
In [9]:
# Extract VGG Deep Neural Network Layers
vgg_param_map = {}
vgg_layer_list = []
for l in xrange(layers[0].shape[0]):
    layer_name = layers[0][l][0][0][0][0].decode('utf-8')
    layer_type = layers[0][l][0][0][1][0].decode('utf-8')
    param = layers[0][l][0][0][2]
    vgg_param_map[layer_name] = param
    vgg_layer_list.append(layer_name)
    print "Layer: %s, type: %s, shape: %s" % (layer_name, layer_type, param.shape)
Layer: conv1_1, type: conv, shape: (1, 2)
Layer: relu1_1, type: relu, shape: (1, 1)
Layer: conv1_2, type: conv, shape: (1, 2)
Layer: relu1_2, type: relu, shape: (1, 1)
Layer: pool1, type: pool, shape: (1,)
Layer: conv2_1, type: conv, shape: (1, 2)
Layer: relu2_1, type: relu, shape: (1, 1)
Layer: conv2_2, type: conv, shape: (1, 2)
Layer: relu2_2, type: relu, shape: (1, 1)
Layer: pool2, type: pool, shape: (1,)
Layer: conv3_1, type: conv, shape: (1, 2)
Layer: relu3_1, type: relu, shape: (1, 1)
Layer: conv3_2, type: conv, shape: (1, 2)
Layer: relu3_2, type: relu, shape: (1, 1)
Layer: conv3_3, type: conv, shape: (1, 2)
Layer: relu3_3, type: relu, shape: (1, 1)
Layer: conv3_4, type: conv, shape: (1, 2)
Layer: relu3_4, type: relu, shape: (1, 1)
Layer: pool3, type: pool, shape: (1,)
Layer: conv4_1, type: conv, shape: (1, 2)
Layer: relu4_1, type: relu, shape: (1, 1)
Layer: conv4_2, type: conv, shape: (1, 2)
Layer: relu4_2, type: relu, shape: (1, 1)
Layer: conv4_3, type: conv, shape: (1, 2)
Layer: relu4_3, type: relu, shape: (1, 1)
Layer: conv4_4, type: conv, shape: (1, 2)
Layer: relu4_4, type: relu, shape: (1, 1)
Layer: pool4, type: pool, shape: (1,)
Layer: conv5_1, type: conv, shape: (1, 2)
Layer: relu5_1, type: relu, shape: (1, 1)
Layer: conv5_2, type: conv, shape: (1, 2)
Layer: relu5_2, type: relu, shape: (1, 1)
Layer: conv5_3, type: conv, shape: (1, 2)
Layer: relu5_3, type: relu, shape: (1, 1)
Layer: conv5_4, type: conv, shape: (1, 2)
Layer: relu5_4, type: relu, shape: (1, 1)
Layer: pool5, type: pool, shape: (1,)
Layer: fc6, type: conv, shape: (1, 2)
Layer: relu6, type: relu, shape: (1, 1)
Layer: fc7, type: conv, shape: (1, 2)
Layer: relu7, type: relu, shape: (1, 1)
Layer: fc8, type: conv, shape: (1, 2)
Layer: prob, type: softmax, shape: (0, 0)
In [10]:
# Define utility for generating white noise image
def generate_white_noise_image(height, width, std=30):
    white_noise_image = np.random.uniform(-std, std, (1, height, width, 3)).astype(np.float32)
    return white_noise_image

# Define utilies for constructing VGG CNN graph
def get_cnn_param(vgg_param_map, layer_name):
    """
    Inputs:
        - vgg_param_map: dict for VGG layer parameters
        - layer_name: (string) layer name, must match with the list of VGG layers
    Outputs:
        - Tuple of (W, b) of CNN layer
    """

    W = vgg_param_map[layer_name][0][0]
    b = vgg_param_map[layer_name][0][1]
    b = b.reshape(b.size)
    return W, b

def conv2d_layer(vgg_param_map, input, layer_name):
    """
    Inputs:
        - vgg_param_map: dict for VGG layer parameters
        - input: input data from previous layer
        - layer_name: (string) VGG layer name
    Output:
        - output data after CNN + Relu
    """
    
    # Since in this project we are not going to change these parameters
    # We will use constants
    with tf.variable_scope(layer_name):
        W, b = get_cnn_param(vgg_param_map, layer_name)
        W = tf.constant(W, name=layer_name + '_weights')
        b = tf.constant(b, name=layer_name + '_bias')
        cnn = tf.nn.conv2d(input, 
                           filter=W,
                          strides=[1,1,1,1],
                          padding='SAME')
        output = cnn + b
        return output

def relu_layer(input, layer_name):
    """
    Apply relu nonlinearity to input
    Inputs:
        - input: input data
        - layer_name: name of layer
    Return:
        - output: after applying Relu
    """
    with tf.variable_scope(layer_name):
        output = tf.nn.relu(input)
        return output

def affine_layer(vgg_param_map, input, layer_name):
    """
    Extract VGG-19 parameter based on layer name, construct fully connected layer
    VGG-19 uses CNN to implement affien layer
    """
    with tf.variable_scope(layer_name):
        W, b = get_cnn_param(vgg_param_map, layer_name)
        W = tf.constant(W, name=layer_name + '_weights')
        b = tf.constant(b, name=layer_name + '_bias')
        cnn = tf.nn.conv2d(input, 
                   filter=W,
                  strides=[1,1,1,1],
                  padding='SAME')
        output = cnn + b
        #output = tf.matmul(input, W) + b
        return output
    
def pool_layer(input, layer_name, maxpool=False):
    """
    Pooling layer
       Instead of max_pool, the paper recommends using average pooling
       
    Inputs:
      - input : input data from previous CNN layer
      - layer_name: VGG-19 layer name
    Returns:
      - pool_op: average pooling results of 2x2 window per test case, per channel
    """
    pool_op = None
    if maxpool:
        pool_op = tf.nn.max_pool(input, 
                                 ksize=[1,2,2,1],
                                strides=[1,2,2,1],
                                padding='SAME',
                                name='max_pool_layer' + layer_name)
    else:
        pool_op = tf.nn.avg_pool(input, 
                                 ksize=[1,2,2,1],
                                strides=[1,2,2,1],
                                padding='SAME',
                                name='avg_pool_layer' + layer_name)
    return pool_op
In [11]:
# Define a class for VGG model
class VggModel(object):
    """
    Build neural network layers for VGG-19
    """
    
    def __init__(self, vgg_param_dict, vgg_layer_list,
                 img_height,
                 img_width,
                 learning_rate=1e-2, 
                 content_layer='conv4_2', 
                 style_layers_config={'conv1_1':0.2, 
                                      'conv2_1':0.2, 
                                      'conv3_1':0.2, 
                                      'conv4_1':0.2,
                                     'conv5_1':0.2},
                 vgg_mean_pixel=[123.68, 116.779, 103.939],
                 content_weight=0.01,
                 verbose=False
                ):
        
        ...

    def _create_placeholders(self):
        ...
            
            
    def _content_loss(self, p, f):
       ...
    
    def _gram_matrix(self, F, N, M):
        ...

    def _one_style_loss(self, a, g):
       ...
        
        
    def _style_loss(self, A):
        ...
    
    def _total_loss(self, content_image, style_image):
        ...

    def build_vgg_graph(self):
        ...

    def _create_summary(self):
       ...
    
            
    def train(self, content_image, style_image):
       ...
        
In [12]:
# test vgg graph creation
model = VggModel(vgg_param_map, vgg_layer_list, 
                 img_height=HEIGHT, img_width=WIDTH,
                 content_weight=0.01,
                 learning_rate=1e2)
model.train(t_img, s_img)
Step 301
 Sum: 38453098.8
    Loss: 397448128.0
    Time: 36.7496809959
Step 306
 Sum: 38292527.3
    Loss: 395066944.0
    Time: 20.4376380444
Step 311
 Sum: 38134227.2
    Loss: 392785984.0
    Time: 20.5191249847
Step 316
 Sum: 37978276.3
    Loss: 390609600.0
    Time: 20.7634079456
Step 321
 Sum: 37824888.5
    Loss: 388543008.0
    Time: 20.7791938782
Step 326
 Sum: 37673826.5
    Loss: 386593632.0
    Time: 22.163271904
Step 331
 Sum: 37524825.1
    Loss: 386233984.0
    Time: 20.9584538937
Step 336
 Sum: 37377706.7
    Loss: 384119008.0
    Time: 20.9911539555
Step 341
 Sum: 37231463.5
    Loss: 382698176.0
    Time: 26.2235519886
Step 346
 Sum: 37086969.6
    Loss: 382249760.0
    Time: 21.2615199089
In [14]:
# Plot trained images, from random noise to painting style 
img_step0 = np.array(Image.open('outputs/0.png'))
img_step30 = np.array(Image.open('outputs/30.png'))
img_step100 = np.array(Image.open('outputs/100.png'))
img_step200 = np.array(Image.open('outputs/200.png'))
img_step300 = np.array(Image.open('outputs/300.png'))
img_step345 = np.array(Image.open('outputs/345.png'))
plt.subplot(2,4,1)
plt.title('Style')
plt.axis('off')
plt.imshow(a_simg)
plt.subplot(2,4,2)
plt.title('Content')
plt.axis('off')
plt.imshow(a_t_img)
plt.subplot(2,4,3)
plt.title('Step 0')
plt.axis('off')
plt.imshow(img_step0)
plt.subplot(2,4,4)
plt.title('Step 30')
plt.axis('off')
plt.imshow(img_step30)
plt.subplot(2,4,5)
plt.title('Step 100')
plt.axis('off')
plt.imshow(img_step100)
plt.subplot(2,4,6)
plt.title('Step 200')
plt.axis('off')
plt.imshow(img_step200)
plt.subplot(2,4,7)
plt.title('Step 300')
plt.axis('off')
plt.imshow(img_step300)
plt.subplot(2,4,8)
plt.title('Step 345')
plt.axis('off')
plt.imshow(img_step345)
plt.show()