HomeArtificial IntelligenceNeural fashion switch with keen execution and Keras

Neural fashion switch with keen execution and Keras


How would your summer time vacation’s photographs look had Edvard Munch painted them? (Maybe it’s higher to not know).
Let’s take a extra comforting instance: How would a pleasant, summarly river panorama look if painted by Katsushika Hokusai?

Type switch on photos will not be new, however acquired a lift when Gatys, Ecker, and Bethge(Gatys, Ecker, and Bethge 2015) confirmed the best way to efficiently do it with deep studying.
The primary thought is easy: Create a hybrid that could be a tradeoff between the content material picture we wish to manipulate, and a fashion picture we wish to imitate, by optimizing for maximal resemblance to each on the similar time.

If you happen to’ve learn the chapter on neural fashion switch from Deep Studying with R, you might acknowledge among the code snippets that observe.
Nonetheless, there is a vital distinction: This submit makes use of TensorFlow Keen Execution, permitting for an crucial method of coding that makes it straightforward to map ideas to code.
Similar to earlier posts on keen execution on this weblog, this can be a port of a Google Colaboratory pocket book that performs the identical activity in Python.

As regular, please ensure you have the required bundle variations put in. And no want to repeat the snippets – you’ll discover the entire code among the many Keras examples.

Stipulations

The code on this submit relies on the newest variations of a number of of the TensorFlow R packages. You possibly can set up these packages as follows:

c(128, 128, 3)

content_path  "isar.jpg"

content_image   image_load(content_path, target_size = img_shape[1:2])
content_image %>% 
  image_to_array() %>%
  `/`(., 255) %>%
  as.raster() %>%
  plot()

And right here’s the fashion mannequin, Hokusai’s The Nice Wave off Kanagawa, which you’ll obtain from Wikimedia Commons:

style_path  "The_Great_Wave_off_Kanagawa.jpg"

style_image   image_load(content_path, target_size = img_shape[1:2])
style_image %>% 
  image_to_array() %>%
  `/`(., 255) %>%
  as.raster() %>%
  plot()

We create a wrapper that hundreds and preprocesses the enter photos for us.
As we shall be working with VGG19, a community that has been educated on ImageNet, we have to remodel our enter photos in the identical method that was used coaching it. Later, we’ll apply the inverse transformation to our mixture picture earlier than displaying it.

load_and_preprocess_image  operate(path) {
  img  image_load(path, target_size = img_shape[1:2]) %>%
    image_to_array() %>%
    k_expand_dims(axis = 1) %>%
    imagenet_preprocess_input()
}

deprocess_image  operate(x) {
  x  x[1, , ,]
  # Take away zero-center by imply pixel
  x[, , 1]  x[, , 1] + 103.939
  x[, , 2]  x[, , 2] + 116.779
  x[, , 3]  x[, , 3] + 123.68
  # 'BGR'->'RGB'
  x  x[, , c(3, 2, 1)]
  x[x > 255]  255
  x[x  0]  0
  x[]  as.integer(x) / 255
  x
}

Setting the scene

We’re going to use a neural community, however we received’t be coaching it. Neural fashion switch is a bit unusual in that we don’t optimize the community’s weights, however again propagate the loss to the enter layer (the picture), with the intention to transfer it within the desired route.

We shall be serious about two sorts of outputs from the community, similar to our two objectives.
Firstly, we wish to preserve the mixture picture much like the content material picture, on a excessive degree. In a convnet, higher layers map to extra holistic ideas, so we’re choosing a layer excessive up within the graph to check outputs from the supply and the mixture.

Secondly, the generated picture ought to “seem like” the fashion picture. Type corresponds to decrease degree options like texture, shapes, strokes… So to check the mixture in opposition to the fashion instance, we select a set of decrease degree conv blocks for comparability and mixture the outcomes.

content_layers  c("block5_conv2")
style_layers  c("block1_conv1",
                 "block2_conv1",
                 "block3_conv1",
                 "block4_conv1",
                 "block5_conv1")

num_content_layers  size(content_layers)
num_style_layers  size(style_layers)

get_model  operate() {
  vgg  application_vgg19(include_top = FALSE, weights = "imagenet")
  vgg$trainable  FALSE
  style_outputs  map(style_layers, operate(layer) vgg$get_layer(layer)$output)
  content_outputs  map(content_layers, operate(layer) vgg$get_layer(layer)$output)
  model_outputs  c(style_outputs, content_outputs)
  keras_model(vgg$enter, model_outputs)
}

Losses

When optimizing the enter picture, we’ll contemplate three varieties of losses. Firstly, the content material loss: How completely different is the mixture picture from the supply? Right here, we’re utilizing the sum of the squared errors for comparability.

content_loss  operate(content_image, goal) {
  k_sum(k_square(goal - content_image))
}

Our second concern is having the kinds match as intently as doable. Type is usually operationalized because the Gram matrix of flattened function maps in a layer. We thus assume that fashion is said to how maps in a layer correlate with different.

We due to this fact compute the Gram matrices of the layers we’re serious about (outlined above), for the supply picture in addition to the optimization candidate, and evaluate them, once more utilizing the sum of squared errors.

gram_matrix  operate(x) {
  options  k_batch_flatten(k_permute_dimensions(x, c(3, 1, 2)))
  gram  k_dot(options, k_transpose(options))
  gram
}

style_loss  operate(gram_target, mixture) {
  gram_comb  gram_matrix(mixture)
  k_sum(k_square(gram_target - gram_comb)) /
    (4 * (img_shape[3] ^ 2) * (img_shape[1] * img_shape[2]) ^ 2)
}

Thirdly, we don’t need the mixture picture to look overly pixelated, thus we’re including in a regularization element, the overall variation within the picture:

total_variation_loss  operate(picture) {
  y_ij   picture[1:(img_shape[1] - 1L), 1:(img_shape[2] - 1L),]
  y_i1j  picture[2:(img_shape[1]), 1:(img_shape[2] - 1L),]
  y_ij1  picture[1:(img_shape[1] - 1L), 2:(img_shape[2]),]
  a  k_square(y_ij - y_i1j)
  b  k_square(y_ij - y_ij1)
  k_sum(k_pow(a + b, 1.25))
}

The tough factor is the best way to mix these losses. We’ve reached acceptable outcomes with the next weightings, however be happy to mess around as you see match:

content_weight  100
style_weight  0.8
total_variation_weight  0.01

Get mannequin outputs for the content material and magnificence photos

We’d like the mannequin’s output for the content material and magnificence photos, however right here it suffices to do that simply as soon as.
We concatenate each photos alongside the batch dimension, go that enter to the mannequin, and get again an inventory of outputs, the place each component of the checklist is a 4-d tensor. For the fashion picture, we’re within the fashion outputs at batch place 1, whereas for the content material picture, we’d like the content material output at batch place 2.

Within the under feedback, please word that the sizes of dimensions 2 and three will differ should you’re loading photos at a unique measurement.

get_feature_representations 
  operate(mannequin, content_path, style_path) {
    
    # dim == (1, 128, 128, 3)
    style_image 
      load_and_process_image(style_path) %>% k_cast("float32")
    # dim == (1, 128, 128, 3)
    content_image 
      load_and_process_image(content_path) %>% k_cast("float32")
    # dim == (2, 128, 128, 3)
    stack_images  k_concatenate(checklist(style_image, content_image), axis = 1)
    
    # size(model_outputs) == 6
    # dim(model_outputs[[1]]) = (2, 128, 128, 64)
    # dim(model_outputs[[6]]) = (2, 8, 8, 512)
    model_outputs  mannequin(stack_images)
    
    style_features  
      model_outputs[1:num_style_layers] %>%
      map(operate(batch) batch[1, , , ])
    content_features  
      model_outputs[(num_style_layers + 1):(num_style_layers + num_content_layers)] %>%
      map(operate(batch) batch[2, , , ])
    
    checklist(style_features, content_features)
  }

Computing the losses

On each iteration, we have to go the mixture picture by way of the mannequin, get hold of the fashion and content material outputs, and compute the losses. Once more, the code is extensively commented with tensor sizes for simple verification, however please understand that the precise numbers presuppose you’re working with 128×128 photos.

compute_loss 
  operate(mannequin, loss_weights, init_image, gram_style_features, content_features) {
    
    c(style_weight, content_weight) % loss_weights
    model_outputs  mannequin(init_image)
    style_output_features  model_outputs[1:num_style_layers]
    content_output_features 
      model_outputs[(num_style_layers + 1):(num_style_layers + num_content_layers)]
    
    # fashion loss
    weight_per_style_layer  1 / num_style_layers
    style_score  0
    # dim(style_zip[[5]][[1]]) == (512, 512)
    style_zip  transpose(checklist(gram_style_features, style_output_features))
    for (l in 1:size(style_zip)) {
      # for l == 1:
      # dim(target_style) == (64, 64)
      # dim(comb_style) == (1, 128, 128, 64)
      c(target_style, comb_style) % style_zip[[l]]
      style_score  style_score + weight_per_style_layer * 
        style_loss(target_style, comb_style[1, , , ])
    }
    
    # content material loss
    weight_per_content_layer  1 / num_content_layers
    content_score  0
    content_zip  transpose(checklist(content_features, content_output_features))
    for (l in 1:size(content_zip)) {
      # dim(comb_content) ==  (1, 8, 8, 512)
      # dim(target_content) == (8, 8, 512)
      c(target_content, comb_content) % content_zip[[l]]
      content_score  content_score + weight_per_content_layer *
        content_loss(comb_content[1, , , ], target_content)
    }
    
    # complete variation loss
    variation_loss  total_variation_loss(init_image[1, , ,])
    
    style_score  style_score * style_weight
    content_score  content_score * content_weight
    variation_score  variation_loss * total_variation_weight
    
    loss  style_score + content_score + variation_score
    checklist(loss, style_score, content_score, variation_score)
  }

Computing the gradients

As quickly as we now have the losses, acquiring the gradients of the general loss with respect to the enter picture is only a matter of calling tape$gradient on the GradientTape. Notice that the nested name to compute_loss, and thus the decision of the mannequin on our mixture picture, occurs contained in the GradientTape context.

compute_grads  
  operate(mannequin, loss_weights, init_image, gram_style_features, content_features) {
    with(tf$GradientTape() %as% tape, {
      scores 
        compute_loss(mannequin,
                     loss_weights,
                     init_image,
                     gram_style_features,
                     content_features)
    })
    total_loss  scores[[1]]
    checklist(tape$gradient(total_loss, init_image), scores)
  }

Coaching part

Now it’s time to coach! Whereas the pure continuation of this sentence would have been “… the mannequin,” the mannequin we’re coaching right here will not be VGG19 (that one we’re simply utilizing as a software), however a minimal setup of simply:

  • a Variable that holds our to-be-optimized picture
  • the loss capabilities we outlined above
  • an optimizer that may apply the calculated gradients to the picture variable (tf$practice$AdamOptimizer)

Beneath, we get the fashion options (of the fashion picture) and the content material function (of the content material picture) simply as soon as, then iterate over the optimization course of, saving the output each 100 iterations.

In distinction to the unique article and the Deep Studying with R ebook, however following the Google pocket book as a substitute, we’re not utilizing L-BFGS for optimization, however Adam, as our aim right here is to offer a concise introduction to keen execution.
Nonetheless, you could possibly plug in one other optimization methodology should you wished, changing
optimizer$apply_gradients(checklist(tuple(grads, init_image)))
by an algorithm of your selection (and naturally, assigning the results of the optimization to the Variable holding the picture).

run_style_transfer  operate(content_path, style_path) {
  mannequin  get_model()
  stroll(mannequin$layers, operate(layer) layer$trainable = FALSE)
  
  c(style_features, content_features) % 
    get_feature_representations(mannequin, content_path, style_path)
  # dim(gram_style_features[[1]]) == (64, 64)
  gram_style_features  map(style_features, operate(function) gram_matrix(function))
  
  init_image  load_and_process_image(content_path)
  init_image  tf$contrib$keen$Variable(init_image, dtype = "float32")
  
  optimizer  tf$practice$AdamOptimizer(learning_rate = 1,
                                      beta1 = 0.99,
                                      epsilon = 1e-1)
  
  c(best_loss, best_image) % checklist(Inf, NULL)
  loss_weights  checklist(style_weight, content_weight)
  
  start_time  Sys.time()
  global_start  Sys.time()
  
  norm_means  c(103.939, 116.779, 123.68)
  min_vals  -norm_means
  max_vals  255 - norm_means
  
  for (i in seq_len(num_iterations)) {
    # dim(grads) == (1, 128, 128, 3)
    c(grads, all_losses) % compute_grads(mannequin,
                                            loss_weights,
                                            init_image,
                                            gram_style_features,
                                            content_features)
    c(loss, style_score, content_score, variation_score) % all_losses
    optimizer$apply_gradients(checklist(tuple(grads, init_image)))
    clipped  tf$clip_by_value(init_image, min_vals, max_vals)
    init_image$assign(clipped)
    
    end_time  Sys.time()
    
    if (k_cast_to_floatx(loss)  best_loss) {
      best_loss  k_cast_to_floatx(loss)
      best_image  init_image
    }
    
    if (i %% 50 == 0) {
      glue("Iteration: {i}") %>% print()
      glue(
        "Whole loss: {k_cast_to_floatx(loss)},
        fashion loss: {k_cast_to_floatx(style_score)},
        content material loss: {k_cast_to_floatx(content_score)},
        complete variation loss: {k_cast_to_floatx(variation_score)},
        time for 1 iteration: {(Sys.time() - start_time) %>% spherical(2)}"
      ) %>% print()
      
      if (i %% 100 == 0) {
        png(paste0("style_epoch_", i, ".png"))
        plot_image  best_image$numpy()
        plot_image  deprocess_image(plot_image)
        plot(as.raster(plot_image), important = glue("Iteration {i}"))
        dev.off()
      }
    }
  }
  
  glue("Whole time: {Sys.time() - global_start} seconds") %>% print()
  checklist(best_image, best_loss)
}

Able to run

Now, we’re prepared to start out the method:

c(best_image, best_loss) % run_style_transfer(content_path, style_path)

In our case, outcomes didn’t change a lot after ~ iteration 1000, and that is how our river panorama was wanting:

… undoubtedly extra inviting than had it been painted by Edvard Munch!

Conclusion

With neural fashion switch, some fiddling round could also be wanted till you get the consequence you need. However as our instance exhibits, this doesn’t imply the code must be sophisticated. Moreover to being straightforward to understand, keen execution additionally enables you to add debugging output, and step by way of the code line-by-line to verify on tensor shapes.
Till subsequent time in our keen execution sequence!

Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. 2015. “A Neural Algorithm of Inventive Type.” CoRR abs/1508.06576. http://arxiv.org/abs/1508.06576.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments