NLP for learners – Rewriting a sequential model to a Functional API

In previous articles, we have used a sequential model to build the model.

model = Sequential()
model.add(LSTM(128, input_shape=(seq_length, 1)))
model.add(Dense(len(char_indices)+1, activation='softmax'))

The sequential model is easy to understand because of its simple structure, but it can only have one input and one output.

If you want to build a model with multiple inputs and outputs, you need to use the Functional API.

Functional API

Here, we describe the same structure as the sequential model in the Functional API for the sake of understanding.

input = Input(shape=(seq_length,1))

The first step is to set up the input layer: the LSTM layer must be given a tensor (batch size, time step, input dimension). However, you don’t have to specify the batch size of the input layer, just the (time step, input dimension). input contains the input layer.

lstm = LSTM(128,input_shape=(seq_length, 1))(input)

Install the LSTM layer. (input) indicates that the LSTM layer is to be connected to the input layer.

output = Dense(len(char_indices)+1, activation='softmax')(lstm)

Install the output layer. (lstm) indicates that the Dense layer is connected to the LSTM layer.

model = Model(inputs=input, outputs=output)

Finally, we install the model. Here, we specify input as the input layer and output as the output layer.

For example, to install two output layers, specify them such as outputs=[output1, output2].

Sequential Models and Functional APIs

As a result, the two models shown here are the same.

model = Sequential()
model.add(LSTM(128, input_shape=(seq_length, 1)))
model.add(Dense(len(char_indices)+1, activation='softmax'))
input = Input(shape=(seq_length,1))
lstm = LSTM(128,input_shape=(seq_length, 1))(input)
output = Dense(len(char_indices)+1, activation='softmax')(lstm)
model = Model(inputs=input, outputs=output)

Here is the overall code.

import numpy as np
import sys
import io
import os
import stanza
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
from tensorflow import keras
from keras.models import Model
from keras.layers import Dense, LSTM, Input
from keras.optimizers import Adam
from keras.utils import np_utils
from keras.preprocessing import sequence
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import TimeseriesGenerator
#read the text
with io.open('articles_u.txt', encoding='utf-8') as f:
    text = f.read()
texts = text.replace('eos', 'eos\n').splitlines()
#make the dictionary
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
char_indices = tokenizer.word_index
#make the inverted dictionary
indices_char = dict([(value, key) for (key, value) in char_indices.items()])
np.save('voa_char_indices', char_indices)
np.save('voa_indices_char', indices_char)
#vectorization
texts = tokenizer.texts_to_sequences(texts)
texts = sequence.pad_sequences(texts, maxlen=30, padding="pre", truncating="post")
#make dataset
batch_size = 100
seq_length = 5
def train_generator(start, end):
    while True:
        for step in range((end - start) // batch_size):
            x = []
            y = []
            for line in range(batch_size):
                dataset = TimeseriesGenerator(
                    texts[start+step*batch_size+line],
                    texts[start+step*batch_size+line],
                    length=seq_length,
                    batch_size=1)
                for batch in dataset:
                    X, Y = batch
                    x.extend(X[0])
                    y.extend(Y)
            x = np.reshape(x,(25*batch_size,seq_length,1))
            x = x / float(len(char_indices)+1)
            y = np_utils.to_categorical(y, len(char_indices)+1)
            yield x, y
#build the model
print('build the model....')
input = Input(shape=(seq_length,1))
lstm = LSTM(128,input_shape=(seq_length, 1))(input)
output = Dense(len(char_indices)+1, activation='softmax')(lstm)
model = Model(inputs=input, outputs=output)
optimizer = Adam(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
#training
train_val_rate = 0.8
train_start = 0
train_end = round(len(texts) * train_val_rate)
val_start = train_end + 1
val_end = len(texts)
model.fit(
    train_generator(train_start, train_end),
    steps_per_epoch=(train_end - train_start) // batch_size,
    validation_data=train_generator(val_start, val_end),
    validation_steps=(val_end - val_start) // batch_size,
    epochs=100,
    verbose=2)
#save the model
model.save('u_model.h5')