What is Neural Text-Style Transfer?
If 2Pac was only allowed to release music under the pretence that his style was to match the Queen’s English, the world would have been a significantly worse place.
The advent of Style transfer (the ability to project a style of one text to another) means that it’s now possible for a Neural Network to change the feel of a text.
As you can probably guess, the application of this technology would make it useful in a number of different settings. A simple first application would be to make an article sound more formal:
Informal : I’d say it is punk though.
Formal : However, I do believe it to be punk.
but even further from this, this technology could be used to help people with problems like dyslexia.
More recently, the news that Microsoft was laying off journalists wasn’t groundbreaking news: advertising revenues are down across the board and newspapers are generally struggling to be as profitable as they were before (which was already a bit of a struggle). However, the news that they were to replace this team with AI is what startled people.
I’ve always loved writing but I’ve always sucked. My english teacher refused to let me answer questions, because undoubtedly, my answer would be wrong.
Fast forward 15 years and I’m building machine learning tools to solve just about any problem I can think of. More importantly, Neural Networks have recently found a new domain to better. Microsoft Word now incorporates a new AI that can offer to rewrite a suggestion in full, rather than simple spelling and grammatical fixes.
Have you ever been unable to express something in a given way?
Being unable to phrase something in a certain tone or to give of a certain impression is something that many writers struggle with. To preserve time, focus and energy, this tool will help writers to be able to more effectively captivate their audience by tilting the wording better. That’s what Microsoft aimed to fix here, and in what follows i’ll explain how. Microsoft have said:
“In internal evaluations, it was nearly 15 percent more effective than previous approaches in catching mistakes commonly made by people who have dyslexia.”
Neural Style Transfer
The updates that Microsoft have recently incorporated are broadly similar to the product that grammarly are well known for [can reference this]. Both sets of researchers are taking advantages of recent developments in the field of Style Transfer.
Neural Style Transfer was initially used between images, whereby, a certain composition of an image could be projected onto something similar.
However, this technique has recently been adapted for the use case of text style transfer. To do this, researchers took advantage of neural machine translations models to serve the purpose of style transferring. Think about it: a certain ‘tone’ or ‘style’ could be seen as another language and therefore:
The baseline model in the theory of neural machine translation is based on Yoshua Bengio’s paper here, building upon Sutskevers work on Sequence to Sequence learning. A neural network is formed as a RNN Encoder Decoder which works as follows.
Here, a phrase is passed into the encoder which coverts the string into a vector. This vector effectively contains a latent representation of the phrase, which is then translated using a decoder. This is called an ‘encoder-decoder architecture’ and in this manner, Neural Machine Translation (NMT) can translate local translation problems.
For neural machine translation, it uses a bidirectional RNN to process the source sentence into vectors (encoding) along with a second RNN to predict words in the target language (decoding). This process, while differing from phrase-based models in method, prove to be comparable in speed and accuracy.
Creating a model
To create a neural style transfer model, we generally have 3 key steps that we have to take:
Words are categorical in nature so the model must first be able to embed the words, finding an alternative representation that can be used in the network. A vocabulary (size V) is selected with only frequent words treated as unique, all other words are converted to an “unknown” token and get the same embedding. The embedding weights, one set per language, are usually learned during training.
embedding_encoder = variable_scope.get_variable("embedding_encoder", [src_vocab_size, embedding_size], ...)
encoder_emb_inp = embedding_ops.embedding_lookup(embedding_encoder, encoder_inputs)
Once the word embedding are retrieved, they are fed as the input into the main model which consists of two multi-layer RNNs, where one of these is an encoder for the source language and the other is a decoder for the target language. In practice, these two RNN’s are trained to have different parameters (such models do a better job when fitting large training datasets).
# Build RNN cell
encoder_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
# Run Dynamic RNN
encoder_outputs, encoder_state = tf.nn.dynamic_rnn(encoder_cell, encoder_emb_inp, sequence_length=source_sequence_length, time_major=True)
The reader who’s paying attention to the code will see that sentences can have different lengths and to avoid wasting computation here, we tell dynamic_rnn the exact source sentence lengths through source_sequence_length and since our input is time major, we set time_major=True.
The decoder needs to have access to source information. A simple way to achieve this is to initialise it with the last hidden state of the encoder, encoder_state.
# Build RNN cell
decoder_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
helper = tf.contrib.seq2seq.TrainingHelper(decoder_emb_inp, decoder_lengths, time_major=True)
decoder = tf.contrib.seq2seq.BasicDecoder(decoder_cell, helper, encoder_state, output_layer=projection_layer)
# Dynamic decoding
outputs, _ = tf.contrib.seq2seq.dynamic_decode(decoder, ...)
logits = outputs.rnn_output
Lastly, we haven’t mentioned projection_layer which is a dense matrix to turn the top hidden states to logit vectors of dimension V. We illustrate this process at the top of Figure 2.
projection_layer = layers_core.Dense(tgt_vocab_size, use_bias=False)
and finally, given the logits above, we are now ready to compute our training loss:
crossent = tf.nn.sparse_softmax_cross_entropy_with_logits(
train_loss = (tf.reduce_sum(crossent * target_weights)/batch_size)
We have now defined the forward pass of our NMT model. Computing the back propagation pass is just a matter of a few lines of code:
# Calculate and clip gradients
params = tf.trainable_variables()
gradients = tf.gradients(train_loss, params)
clipped_gradients, _ = tf.clip_by_global_norm(gradients, max_gradient_norm)
Now from here, you’re ready to begin the optimisation procedures behind creating your own neural style transfer model!
Note: the code above was largely taken from the tensorflow github documentation and more information about this procedure can be found online.
The theory of Neural Machine Style Transfer is quite an extensive history and it’s taken a while for academia to reach the current perch it’s sat upon. Translations are a notoriously difficult task because of grammatical problems, but also, interpretability for text to sound somewhat human: somewhat colloquial.
The progress that’s been made is fantastic and it’s something that will be great if it keeps on developing.
Thanks for reading! If you have any messages, please let me know!
Keep up to date with my latest articles here!