python - masking for keras BLSTM -
i'm running blstm based off of imdb example, version not classification, rather sequence prediction labels. simplicity, can treat pos tagging model. inputs sentences of words, outputs tags. syntax used in example differs in syntax other keras examples in doesn't use model.add
initiates sequence. can't figure out how add masking layer in different syntax.
i've run model , tested it, , works fine it's predicting , evaluating accuracy of 0's, padding. here's code:
from __future__ import print_function import numpy np keras.preprocessing import sequence keras.models import model keras.layers.core import masking keras.layers import timedistributed, dense keras.layers import dropout, embedding, lstm, input, merge prep_nn import prep_scan keras.utils import np_utils, generic_utils np.random.seed(1337) # reproducibility nb_words = 20000 # max. size of vocab nb_classes = 10 # number of labels hidden = 500 # 500 gives best results far batch_size = 10 # create , update net after 10 lines val_split = .1 epochs = 15 # input x multi-dimensional numpy array ids, # 1 line per array. input y multi-dimensional numpy array # binary arrays each value of each label. # maxlen length of longest line print('loading data...') (x_train, y_train), (x_test, y_test) = prep_scan( nb_words=nb_words, test_len=75) print(len(x_train), 'train sequences') print(int(len(x_train)*val_split), 'validation sequences') print(len(x_test), 'heldout sequences') # placeholder tensor input sequences sequence = input(shape=(maxlen,), dtype='int32') # embedding layer transform sequences of integers # vectors embedded = embedding(nb_words, output_dim=hidden, input_length=maxlen)(sequence) # apply forwards lstm forwards = lstm(output_dim=hidden, return_sequences=true)(embedded) # apply backwards lstm backwards = lstm(output_dim=hidden, return_sequences=true, go_backwards=true)(embedded) # concatenate outputs of 2 lstms merged = merge([forwards, backwards], mode='concat', concat_axis=-1) after_dp = dropout(0.15)(merged) # timedistributed sequence # change activation sigmoid? output = timedistributed( dense(output_dim=nb_classes, activation='softmax'))(after_dp) model = model(input=sequence, output=output) # try using different optimizers , different optimizer configs # loss=binary_crossentropy, optimizer=rmsprop model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam') print('train...') model.fit(x_train, y_train, batch_size=batch_size, nb_epoch=epochs, shuffle=true, validation_split=val_split)
update:
i merged pr , got working mask_zero=true
in embedding layer. i'm realizing after seeing terrible performance of model i'd need masking in output, others have suggested use sample_weight instead in model.fit
line. how ignore 0s?
update 2:
so read this , figured out sample_weight
matrix of 1s , 0s. thought may have been working, accuracy stalls around %50, , found it's trying predict padded parts, won't predict them 0 now, problem before using sample_weight.
current code:
from __future__ import print_function import numpy np keras.preprocessing import sequence keras.models import model keras.layers.core import masking keras.layers import timedistributed, dense keras.layers import dropout, embedding, lstm, input, merge prep_nn import prep_scan keras.utils import np_utils, generic_utils import itertools itertools import chain sklearn.preprocessing import labelbinarizer import sklearn import pandas pd np.random.seed(1337) # reproducibility nb_words = 20000 # max. size of vocab nb_classes = 10 # number of labels hidden = 500 # 500 gives best results far batch_size = 10 # create , update net after 10 lines val_split = .1 epochs = 10 # input x multi-dimensional numpy array syll ids, # 1 line per array. input y multi-dimensional numpy array # binary arrays each value of each label. # maxlen length of longest line print('loading data...') (x_train, y_train), (x_test, y_test), maxlen, sylls_ids, tags_ids, weights = prep_scan(nb_words=nb_words, test_len=75) print(len(x_train), 'train sequences') print(int(len(x_train) * val_split), 'validation sequences') print(len(x_test), 'heldout sequences') # placeholder tensor input sequences sequence = input(shape=(maxlen,), dtype='int32') # embedding layer transform sequences of integers # vectors of size 256 embedded = embedding(nb_words, output_dim=hidden, input_length=maxlen, mask_zero=true)(sequence) # apply forwards lstm forwards = lstm(output_dim=hidden, return_sequences=true)(embedded) # apply backwards lstm backwards = lstm(output_dim=hidden, return_sequences=true, go_backwards=true)(embedded) # concatenate outputs of 2 lstms merged = merge([forwards, backwards], mode='concat', concat_axis=-1) # after_dp = dropout(0.)(merged) # timedistributed sequence # change activation sigmoid? output = timedistributed( dense(output_dim=nb_classes, activation='softmax'))(merged) model = model(input=sequence, output=output) # try using different optimizers , different optimizer configs # loss=binary_crossentropy, optimizer=rmsprop model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam', sample_weight_mode='temporal') print('train...') model.fit(x_train, y_train, batch_size=batch_size, nb_epoch=epochs, shuffle=true, validation_split=val_split, sample_weight=weights)
did solve issue? not clear me how code deals padded values , word indexes. letting word indexes start 1 , defining
embedded = embedding(nb_words + 1, output_dim=hidden, input_length=maxlen, mask_zero=true)(sequence)
instead of
embedded = embedding(nb_words, output_dim=hidden, input_length=maxlen, mask_zero=true)(sequence)
according https://keras.io/layers/embeddings/?
Comments
Post a Comment