AI-generated movie reviews

Creating a language model that will generate its own movie reviews using PyTorch and Fastai
nlp
Published

November 14, 2021

This blog post is basically a continuation of my previous post about classifying movie reviews and you should definitely read that if you want to better understand the methodology behind the process used in this task.

The dataset we’ll be using is the IMDb Large Movie Review Dataset, which contains 25,000 highly polarized movie reviews for training, and 25,000 for testing.

!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()
[K     |████████████████████████████████| 727kB 4.4MB/s 
[K     |████████████████████████████████| 204kB 37.9MB/s 
[K     |████████████████████████████████| 51kB 5.7MB/s 
[K     |████████████████████████████████| 1.2MB 38.5MB/s 
[K     |████████████████████████████████| 61kB 5.9MB/s 
[K     |████████████████████████████████| 61kB 7.2MB/s 
[?25hMounted at /content/gdrive
from fastbook import *
from fastai.text.all import *
path = untar_data(URLs.IMDB)
Path.BASE_PATH = path
path.ls()
(#7) [Path('imdb.vocab'),Path('train'),Path('README'),Path('tmp_clas'),Path('test'),Path('tmp_lm'),Path('unsup')]

We’ll grab the text files using get_text_files, which gets all the text files in a pth. We can optionally pass folders to restrict the search to a particular list of subfolders.

files = get_text_files(path, folders=['train', 'test', 'unsup'])
txt = files[0].open().read()
txt
"Dressed to Kill (1980) is a mystery horror film from Brian De Palma and it really works.The atmosphere is right there.The atmosphere that makes you scared.And isn't that what a horror film is supposed to do.All the actors are in the right places.Michael Caine is perfect as Dr. Robert Elliott, the shrink with a little secret.Angie Dickinson as Kate Miller, the sexually frustrated mature woman is terrific.Keith Gordon as her son Peter is brilliant.Nancy Allen as Liz Blake the call girl is fantastic.Dennis Franz does his typical detective role.His Detective Marino is one of the most colorful in this movie.There are plenty of creepy scenes in this movie.The elevator scene is one of them.There have been made comparisons between this and Alfred Hitchcock's Psycho (1960).There are some similarities between these two movies.Both of these movies may cause some sleepless nights."

Training a Text Classifier

Language Model using DataBlock

Fastai handles tokenization and numericalization automatically when TextBlock is passed to DataBlock. Let’s create a language model using TextBlock.

get_imdb = partial(get_text_files, folders=['train', 'test', 'unsup'])

dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True),
    get_items=get_imdb, splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=72)
dls_lm.show_batch(max_n=2)
text text_
0 xxbos xxmaj being that i am not a fan of xxmaj snoop xxmaj dogg , as an actor , that made me even more anxious to check out this flick . i remember he was interviewed on ” jay xxmaj leno , ” and said that he turned down a role in the big - budget xxmaj adam xxmaj sandler comedy ” the xxmaj longest xxmaj yard ” to be in this xxmaj being that i am not a fan of xxmaj snoop xxmaj dogg , as an actor , that made me even more anxious to check out this flick . i remember he was interviewed on ” jay xxmaj leno , ” and said that he turned down a role in the big - budget xxmaj adam xxmaj sandler comedy ” the xxmaj longest xxmaj yard ” to be in this film
1 viewer , the first number in the series does provide an unexpected element of suspense in addition to capable costuming from xxmaj ha xxmaj nguyen , fine stunt performing , and a polished turn from xxmaj carr . xxmaj an unrated version is available that seemingly promises to provide additional footage of the ardent romantic actions shared by the mismatched lovers . xxbos xxmaj the xxmaj minion is about … well , , the first number in the series does provide an unexpected element of suspense in addition to capable costuming from xxmaj ha xxmaj nguyen , fine stunt performing , and a polished turn from xxmaj carr . xxmaj an unrated version is available that seemingly promises to provide additional footage of the ardent romantic actions shared by the mismatched lovers . xxbos xxmaj the xxmaj minion is about … well , a

Now that our data is ready, we can fine-tune the pretrained language model.

Fine-tuning the Language Model

To convert the integer word indices into activations that we can use for our neural network, we will use embeddings. We’ll feed those embeddings into a recurrent neural network (RNN), using an architecture called AWD-LSTM. The embeddings in the pretrained model are merged with random embeddings added for words that weren’t in the pretraining vocabulary. This is handled automatically inside language_model_learner.

learn = language_model_learner(
    dls_lm, AWD_LSTM, drop_mult=0.3,
    metrics=[accuracy, Perplexity()]
).to_fp16()
learn.fit_one_cycle(3, 2e-2)
epoch train_loss valid_loss accuracy perplexity time
0 4.128321 4.070849 0.284800 58.606724 29:54
1 3.995339 3.938066 0.296213 51.319229 29:57
2 3.860701 3.867283 0.303124 47.812309 30:00
learn.unfreeze()
learn.fit_one_cycle(10, 2e-3)
epoch train_loss valid_loss accuracy perplexity time
0 3.675387 3.746690 0.317715 42.380569 32:10
1 3.645742 3.704438 0.322705 40.627209 32:08
2 3.605402 3.664308 0.327991 39.029121 31:54
3 3.535574 3.633687 0.331826 37.852131 31:51
4 3.451682 3.618303 0.334019 37.274242 31:41
5 3.417034 3.603825 0.336183 36.738476 31:49
6 3.359589 3.594853 0.337721 36.410355 31:44
7 3.266180 3.592850 0.338945 36.337505 31:36
8 3.213485 3.597207 0.339176 36.496162 31:34
9 3.178523 3.602469 0.339008 36.688713 31:36

Text Generation

Let’s use our model to generate random reviews. Since it is trained to guess what the next word of the sentence is, we can use the model to write new reviews.

TEXT = 'This movie is terrible'
N_WORDS = 70
N_SENTENCES = 5
preds = [learn.predict(TEXT, N_WORDS, temperature=0.75)
for _ in range(N_SENTENCES)]
print('\n\n'.join(preds))
This movie is terrible ! It 's terrible . The actors are ALL bad . The story is bad . The special effects are TERRIBLE . And that 's really the only thing that will save this movie . The plot is pathetic . The movie is just boring . a group of people are about to end up in a mental hospital where they

This movie is terrible . My friend went to see it and we were so disappointed . I 'm not usually a fan of the book but i had earlier read that Chris Columbus wrote some of the best writing , directing and directing since , well , there are no words to describe how bad this piece of garbage was . It was a complete waste of time

This movie is terrible . And it has been very long . i did n't think it was even worth the rental , but it was very recommended . If you are into action movies , be sure to rent Titanic . You will be disappointed . It is a well made movie . The acting is good enough to keep your interest . Everything about this

This movie is terrible . Not only is it offensive in spots , it only gets worse . It has no story line . No acting and dead and cheap special effects . What a waste of talent . My 3 year old son was laughing , not laughing . Well , i really loved the first film . This one is clearly one of the

This movie is terrible , i do n't know why i could n't find it , it was so awful that i had to leave the room after this horrible film was finished . 

 The plot was so stupid that it went on way too long . It was painful to watch . The fact that the audience was so bored was incredible . The only reason this film