Conditional Text Generation by Fine Tuning GPT-2
- Ivan Lai
- Jan 23, 2021
- 7 min read
Updated: Jan 27, 2021
Given a title and a list of keywords, would GPT-2 be able to generate convincing fake news?

Although transformer-based models have achieved good results on a range of NLP tasks in recent years, but text-generation remains a curious case. Back in September 2020, the Guardian published an impressive article purportedly written by OpenAI's GPT-3 from scratch to much acclaim in the mainstream media, but many AI experts remained skeptical.
One issue with text generation is the lack of control in the direction it takes. With GPT-3, you can give the model an introduction and instructions, but even then it takes a human editor to pick and arrange the text from multiple outputs to something cohesive. And there are other approaches, for example CTRL from Salesforce and PPLM from UberAI.
In this article, we will fine-tune the Huggingface pre-trained GPT-2 and come up with our own solution: by the choice of data set, we potentially have better control of the text style and the generated content.
Data
We will be using samples from the news aggregator data set. It contains titles and hyperlinks to over 400k news articles from well known news publishers. To reduce the training time, I have randomly sampled around 10k articles from each of the 4 news categories: business, science, entertainment and health. The articles are pre-downloaded to reduce run time.
Keywords extraction
We need a list of keywords from each article in the training process. There is a range of methods available, from Rake to using BERT among others, but we will stick to a simple TFIDF here as this is not our main focus. In our keywords selection, we will also allow 2-gram phrases that form a proper noun phrase, for example, "content creators". This process is also performed offline as this only needs to be done once.
Code for data download and keywords extraction can be found in my GitHub repository.
Pipeline
The pipeline setup involves defining the tokenizer, model and data sets, followed by fine tuning with the trainer class and finally, text generation. I assume you are familiar with this general setup - this article provides a detailed walkthrough of the pipeline should you want a reminder. I will focus on what I do differently in order to achieve conditional text generation.
You can access the full code in my Colab notebook for more details, and you are welcome to make a copy and experiment with it.
Model
In this experiment, we will use the small version of GPT-2 with 12 layers of decoders. The model was trained on 8 million web pages, and is already quite powerful in language tasks. To retain its general power in language modelling while adopting to our data set, we will freeze the bottom 6 layers by setting their parameters.requires_grad to False, and only train the top 6. This will also speed up the training since the number of backward passes is reduced.
The name of the model layers can be found simply by print(model).
Training setup
In standard text generation fine-tuning, since we are predicting the next token given the text we have seen thus far, the labels are just the shifted encoded tokenized input (note that if we set labels=input_ids, the labels are automatically shifted inside the model - see Reference 1 below).
But here we want to have more control - aside from the article text, we will also give the model a title and a list of keywords in fine tuning. We do this in our customized dataset class.
Instead of the standard setup where we prepend the text with a bos_token and append with a eos_token:
We will also insert the title and a list of keywords before the text, separated by special separating tokens:
To aid model generalization, we introduce data augmentation by sampling and shuffling the list of keywords during training using the function join_keywords(). But no augmentation is applied during validation to retain consistency across epochs. The trainer class is then used to train the model for a small number of epochs (4 in our case) following standard practice.
Results
To test the prowess of our new fake news generator, I picked a lighthearted piece from BBC news that has been trending recently: 'We got a lot of grief when our photo became a meme' . By using the same title and (rather subjectively) picking a few keywords from the article, let's see what sort of spin our model can put on it. The keywords to be used are: 'train', 'lads', 'drinking', 'picture', 'funny' and 'instagram'.
The size of our keywords set for the corpus is quite large, but if a keyword we want to use is not in this set, we can always use a synonym from the set, or the closest one in terms of some word similarity measures.
There are a few ways to generate text from the model. If we use beam search with sampling, we get sample like this:
Unfortunately, all we have right now is an image that has gone viral.
The picture was taken by one of the boys on Instagram and quickly went viral after it appeared to show a group of friends getting drunk at a New York City bar.
As soon as the picture came out online, many people took to Twitter to express their disgust with the picture.
"What's wrong with this f**king train??," one user wrote.
Another said: "I can't believe my eyes didn't even open."
A third added: "Just saw some pictures of men being mugged in NYC... I'm so sick of these dumbasses."
This content is created and maintained by a third party, and imported onto this page to help users provide their email addresses. You may be able to find more information about this and similar content at piano.io
Each sentence is semantically coherent and the first three sentences are indeed related to the title, but then the article takes an unexpected turn. "What's wrong with this f**king train?" ... I'm wondering about that too. It appears that we have some control, but perhaps not enough, as the train of "thoughts" drifts slowly off-track.
Some keywords ('train', 'drinking', 'picture' and 'instagram', or their synonyms or connotations) were used in the generated text as directed. However, 'funny' was not used; instead we have the word 'disgust', which effectively sums up the mood of the piece, and this is quite possibly because of the title we chose.
And how does this compare with the raw GPT-2? Using the title as the first sentence, here is an output sample from the raw model for comparison:
We got a lot of grief when our photo became a meme.
"I'm not going to lie, I don't know how many times we've been in that kind of situation," he said. "It's hard for me to put it into words."
He added: "There was no way this could have happened without the help of my family and friends who were there with us on Saturday night. We're all so proud of what they did as well."
The paragraph starts off well enough, but the unintentional twist in the last sentence gave the game away. Unsurprisingly, the keywords do not feature at all.
In a way, what we are trying to do is a reverse of text summarization - we try to expand a title into an article featuring the keywords, hoping that the title's effect will permeate the entire article. In contrast, the raw GPT-2 merely continues from the first sentence, and the memory effect of the title could be more transient.
Going back to our model, we could also generate text using methods like top-p (nucleus) sampling, which tend to produce more varieties. Here are some interesting ones, sounding authentic in parts, and in turns funny and disturbing. They may not be convincing, but at least they are entertaining:
It was like we were being mugged and all over again.
“I think it's really sad that this happened to us because people are so used by the pictures on their Instagram feeds... they just want attention," said Dr Emily Loughner-Bullock from Kings College London in England who works as an epidemiologist at King Mungol University School Of Public Health (KMU). ‘When you look back through history there have been many examples where celebrities can make headlines for quite controversial things - such is how social media has affected public health research into mental illness or addiction."
The story spread online after one famous photograph emerged which showed two young men with guns talking about drug use while drinking together before another took off his shirt revealing something he had hidden under her skirt: "Nice impression but I wonder if girls don't understand what 'hooking up' looks exactly?" wrote Shilpa Khetrapal Singh Khalsa Mukherjee following Twitter users sharing similar images showing them laughing happily out loud without any justification whatsoever behind some sort action picture taken during dinner parties held between friends...
There will be no further comment here due today afternoon..
When I was 12 years old my friends and family started seeing pictures from the train accident. It just made them cry so much that they took to Instagram after it happened:
As far as their reactions are concerned this is all very funny but if you take out your cell phone or tablet then these people will be talking about what went down in Boston today - there's no way we could have imagined how bad things would look with photos like those...
“It's hard to remember the day you started your life in this world and it was just too much for one kid. It is really sad that we can no longer celebrate those days with these photos because they were meant only as fun pictures."
Join us for the world’s leading event about accelerating enterprise transformation with AI and Data by signing up today
The internet was flooded in on Saturday morning after one man posted an extremely disturbing photograph to his Twitter account. In it he is seen sitting at home surrounded only wearing headphones – just as we have done before: The picture has gone viral! Here are some highlights from that hilarious moment…
1) It's been nearly six months since I saw this image
It was all very funny. It’s not like I'm going to stop posting photos because people will be more than happy with it and the memes are still growing every day on Twitter
A new poster for an American Idol-themed picture that appeared in The New York Post is showing up at this week's event where fans can get drunk together (and drink too) before their favorite shows!
Having taken a good look at the current state of AI-generated writing, it seems that journalists will not be replaced by technology any time soon.
After all, that is the reason I published this article in the first place - how else would I be morally justified in making an effective fake news generator so readily available on the internet?
Thanks for this informative post.
I am wondering how can use your idea for fine-tuning ChatGPT models.
In ChatGPT, fine-tuning data is supposed to be structured as {"prompt":"bla bla", "completion":" bla bla"}.
How do you suggest I should transform news articles and keywords into this format?
Thanks!