Word not found in dictionary

I am implementing an lstm model where I have already trained my model with a dataset. When I am using my new dataset to predict the output, I am having errors because some words that are in my new dataset are not present in trained model. Is there any method to perform so that is the word is not found, it does not consider it?

Actually the words from the train model are saved in a dictionary as shown in my codes below:

df = pd.read_csv('C:/Users/User/Desktop/Coding/lstm emotion recognition/emotion.data/emotion.data')


#Preparing data for model traininng
#Tokenization-Since the data is already tokenized and lowecased, we just need to split the words
input_sentences = [text.split(" ") for text in df["text"].values.tolist()]
labels = df["emotions"].values.tolist()

#creating vocabulary(word index)
#Initialize word2id and label2id dictionaries that will be used to encode words and labels
word2id = dict() #creating the dictionary named word2id
label2id = dict() #creating a dictionary named label2id

max_words = 0 #maximum number of words in a sentence

#construction of word2id
for sentence in input_sentences:
    for word in sentence:
        #Add words to word2id if not exist
        if word not in word2id:
            word2id[word] = len(word2id)
    #If length of the sentence is greater than max_words, update max_words
    if len(sentence) > max_words:
        max_words = len(sentence)

#Construction of label2id and id2label dictionaries
label2id = {l: i for i, l in enumerate(set(labels))}
id2label = {v: k for k, v in label2id.items()}

from keras.models import load_model

model = load_model('modelsave2.py')
print(model)

import keras
model_with_attentions = keras.Model(inputs=model.input,
                                    output=[model.output,
                                              model.get_layer('attention_vec').output])
#File I/O Open function for read data from JSON File
with open('C:/Users/User/Desktop/Coding/parsehubjsonfileeg/all.json', encoding='utf8') as file_object:
        # store file data in object
        data = json.load(file_object)

        # dictionary for element which you want to keep
        new_data = {'selection1': []}
        print(new_data)
        # copy item from old data to new data if it has 'reviews'
        for item in data['selection1']:
            if 'reviews' in item:
                new_data['selection1'].append(item)
                print(item['reviews'])
                print('--')

        # save in file
        with open('output.json', 'w') as f:
            json.dump(new_data, f)
selection1 = data['selection1']

for item in selection1:
    name = item['name']
    print ('>>>>>>>>>>>>>>>>>> ', name)
    CommentID = item['reviews']
    for com in CommentID:
      comment = com['review'].lower()  # converting all to lowercase
      result = re.sub(r'd+', '', comment)  # remove numbers
      results = (result.translate(
          str.maketrans('', '', string.punctuation))).strip()  # remove punctuations and white spaces
      comments = remove_stopwords(results)
      print('>>>>>>',comments)
    encoded_samples = [[word2id[word] for word in comments]]

      # Padding
      encoded_samples = keras.preprocessing.sequence.pad_sequences(encoded_samples, maxlen=max_words)

      # Make predictions
      label_probs, attentions = model_with_attentions.predict(encoded_samples)
      label_probs = {id2label[_id]: prob for (label, _id), prob in zip(label2id.items(), label_probs[0])}

      # Get word attentions using attenion vector
      print(label_probs)
      print(max(label_probs))

my output is:

>>>>>> ['amazing', 'stay', 'nights', 'cleanliness', 'room', 'faultless']
{'fear': 0.26750156, 'love': 0.0044763167, 'joy': 0.06064613, 'surprise': 0.32365623, 'sadness': 0.03203068, 'anger': 0.31168908}
surprise
>>>>>> ['good', 'time', 'food', 'good']
Traceback (most recent call last):
  File "C:/Users/User/PycharmProjects/Dissertation/loadandresult.py", line 96, in <module>
    encoded_samples = [[word2id[word] for word in comments]]
  File "C:/Users/User/PycharmProjects/Dissertation/loadandresult.py", line 96, in <listcomp>
    encoded_samples = [[word2id[word] for word in comments]]
KeyError: 'everydaythe'

the error is because the word ‘everydaythe’ is not found my my trained dataset,..What should i do to correct this? please help me guys. please

I am implementing an lstm model where I have post ourcodings pandas already trained my model with a dataset. post ourcodings pandas When I am using my new dataset to predict post ourcodings pandas the output, I am having errors because some post ourcodings pandas words that are in my new dataset are not post ourcodings pandas present in trained model. Is there any post ourcodings pandas method to perform so that is the word is not post ourcodings pandas found, it does not consider it?

Actually the words from the train model are post ourcodings pandas saved in a dictionary as shown in my codes post ourcodings pandas below:

df = pd.read_csv('C:/Users/User/Desktop/Coding/lstm emotion recognition/emotion.data/emotion.data')


#Preparing data for model traininng
#Tokenization-Since the data is already tokenized and lowecased, we just need to split the words
input_sentences = [text.split(" ") for text in df["text"].values.tolist()]
labels = df["emotions"].values.tolist()

#creating vocabulary(word index)
#Initialize word2id and label2id dictionaries that will be used to encode words and labels
word2id = dict() #creating the dictionary named word2id
label2id = dict() #creating a dictionary named label2id

max_words = 0 #maximum number of words in a sentence

#construction of word2id
for sentence in input_sentences:
    for word in sentence:
        #Add words to word2id if not exist
        if word not in word2id:
            word2id[word] = len(word2id)
    #If length of the sentence is greater than max_words, update max_words
    if len(sentence) > max_words:
        max_words = len(sentence)

#Construction of label2id and id2label dictionaries
label2id = {l: i for i, l in enumerate(set(labels))}
id2label = {v: k for k, v in label2id.items()}

from keras.models import load_model

model = load_model('modelsave2.py')
print(model)

import keras
model_with_attentions = keras.Model(inputs=model.input,
                                    output=[model.output,
                                              model.get_layer('attention_vec').output])
#File I/O Open function for read data from JSON File
with open('C:/Users/User/Desktop/Coding/parsehubjsonfileeg/all.json', encoding='utf8') as file_object:
        # store file data in object
        data = json.load(file_object)

        # dictionary for element which you want to keep
        new_data = {'selection1': []}
        print(new_data)
        # copy item from old data to new data if it has 'reviews'
        for item in data['selection1']:
            if 'reviews' in item:
                new_data['selection1'].append(item)
                print(item['reviews'])
                print('--')

        # save in file
        with open('output.json', 'w') as f:
            json.dump(new_data, f)
selection1 = data['selection1']

for item in selection1:
    name = item['name']
    print ('>>>>>>>>>>>>>>>>>> ', name)
    CommentID = item['reviews']
    for com in CommentID:
      comment = com['review'].lower()  # converting all to lowercase
      result = re.sub(r'd+', '', comment)  # remove numbers
      results = (result.translate(
          str.maketrans('', '', string.punctuation))).strip()  # remove punctuations and white spaces
      comments = remove_stopwords(results)
      print('>>>>>>',comments)
    encoded_samples = [[word2id[word] for word in comments]]

      # Padding
      encoded_samples = keras.preprocessing.sequence.pad_sequences(encoded_samples, maxlen=max_words)

      # Make predictions
      label_probs, attentions = model_with_attentions.predict(encoded_samples)
      label_probs = {id2label[_id]: prob for (label, _id), prob in zip(label2id.items(), label_probs[0])}

      # Get word attentions using attenion vector
      print(label_probs)
      print(max(label_probs))

my output is:

>>>>>> ['amazing', 'stay', 'nights', 'cleanliness', 'room', 'faultless']
{'fear': 0.26750156, 'love': 0.0044763167, 'joy': 0.06064613, 'surprise': 0.32365623, 'sadness': 0.03203068, 'anger': 0.31168908}
surprise
>>>>>> ['good', 'time', 'food', 'good']
Traceback (most recent call last):
  File "C:/Users/User/PycharmProjects/Dissertation/loadandresult.py", line 96, in <module>
    encoded_samples = [[word2id[word] for word in comments]]
  File "C:/Users/User/PycharmProjects/Dissertation/loadandresult.py", line 96, in <listcomp>
    encoded_samples = [[word2id[word] for word in comments]]
KeyError: 'everydaythe'

the error is because the word ‘everydaythe’ post ourcodings pandas is not found my my trained dataset,..What post ourcodings pandas should i do to correct this? please help me post ourcodings pandas guys. please

29

Answers 1 : of word is not found in dictionary must not be considered

You can add a the following condition solved ourcodings keras inside the list comprehension:

encoded_samples = [[word2id[word] for word in comments if word in word2id.keys()]]

This one will only add the words in solved ourcodings keras comments that are already present in the solved ourcodings keras keys of the dictionary.

Edit:

When you’re dealing with dictionaries, solved ourcodings keras and facing a situation where you’re solved ourcodings keras trying to access a key which you’re not solved ourcodings keras sure exists for every dictionary, you solved ourcodings keras can use get(). This method allows you to solved ourcodings keras query a dictionary for a key, and if it solved ourcodings keras doesn’t exist, it will return a default solved ourcodings keras value which you can choose, like in the solved ourcodings keras code below:

my_dict = {'id': 0, 'reviews': 4.5}
your_dict = {'id': 1}

# If I just specify the key, the default return value is None
your_dict.get('reviews')

# However, I can specify the return value
your_dict.get('reviews', default=4.0)

0

2023-04-11T22:47:46+00:00 2023-04-11T22:47:46+00:00Answer Link

mRahman

Here’s a word I see often on StackOverflow, «programatically.»

Used to indicate that a programmer intends to do something within the code of a program, rather than through user interaction.

For example, «a user can check a checkbox on a form, but a programmer may also do it programatically

Since this word isn’t in the dictionary, I assume it to be either incorrect to use it at all, or this is a new word that’s essentially slang.

Is there a better alternative?

Kris's user avatar

Kris

36.9k6 gold badges56 silver badges158 bronze badges

asked Feb 11, 2011 at 17:29

JYelton's user avatar

9

«Programatic» is a misspelling of «programmatic», which is in the dictionary. Your understanding of the technical usage is correct, and is slightly different than the common, dictionary definition.

I think the only reasonable alternative would be «automatically», since the programmer is automating the process, but this use is clearly inferior (at least to this programmer’s ears) to «programmatically».

answered Feb 11, 2011 at 17:36

Chris B. Behrens's user avatar

4

If we restrict ourselves to circumlocutions to avoid constructing useful and sensible words, then communication may well be impaired. In the case of «programmatically», I wouldn’t even say that one has coined a new word. To anyone who understands the concept of using program code to achieve a particular result, the words «programmatic» and «programmatically» seem to me to be rather obvious constructions.

As a programmer, I have great respect for official documentation. In this case, however, I would say that the official documentation is incomplete, out of date, or has been misinterpreted.

answered Feb 11, 2011 at 21:32

Ron Porter's user avatar

Ron PorterRon Porter

3261 silver badge4 bronze badges

1

The NOAD lists programmatic, and it reports it means of the nature of or according to a program, schedule, or method; one of the derivates reported by the dictionary is programmatically.

As alternative of programmatically, I can think of by (using a) script , by code, or by scripting.

answered Feb 11, 2011 at 17:54

apaderno's user avatar

apadernoapaderno

58.5k72 gold badges211 silver badges323 bronze badges

It’s «programmatically», not “programatically”. However, because many built-in word processor and web form dictionaries don’t recognize the word, your misspelling is relatively common in the IT world.

As a Software Developer, I frequently use the word «programmatically» at work, both verbally and in writing. I consider it to be just as valid as «grammatically», but instead of meaning «using proper grammar», I mean to convey «using the proper programming syntax».

It does annoy me that the auto-correct of many dictionaries do not consider it to be a word. I ignore the warning, and if I am properly motivated, I take the time to add the word to the internal dictionary file that the program checks against.

answered Feb 11, 2011 at 17:39

Zoot's user avatar

ZootZoot

3,4451 gold badge20 silver badges32 bronze badges

To the original question in the post, I think there is a better alternative, which is to restructure the sentence to something like «…a programmer may also do it in code.» This is the usage I hear and read commonly; it’s often used to distinguish between coding a system for some objective, or configuring the same system to accomplish that objective.

answered Feb 11, 2011 at 17:42

Tom Hughes's user avatar

Tom HughesTom Hughes

8501 gold badge8 silver badges13 bronze badges

Programmatic was coined as «according to a programme, but meaning bureaucratic, political,or administrative programme, not «a piece of code».

While discussing software programming, «programmatically» feels wrong, and to me it sounds at least redundant, like on a team of surgeons saying «how to perform a coronary
by-pass medically?» or someone within a group of musicians asking «how to play
Beethoven, musically?»

If we’re talking about software we already are speaking about doing
things with programs. One could ask «is there an API to do XYZ?» Or
«how to do xyz from my code?»

I grew up reading USA programming magazines throughout the 1980s and 1990s and I dare anyone find me one occurrence of the word «programmatically» in Byte Magazine, for instance…

It wouldnt surprise me at all if the rise of usage of the word «programmatically» in software matches the years where software development started being outsourced to cheap India or China software factories.

FC

PS:Ironically, nowadays MSFT Technet is full of this word that still rubs me the wrong way when applied to software

answered Jul 27, 2013 at 22:02

aissacf's user avatar

1

You can add a the following condition inside the list comprehension:

encoded_samples = [[word2id[word] for word in comments if word in word2id.keys()]]

This one will only add the words in comments that are already present in the keys of the dictionary.

Edit:

When you’re dealing with dictionaries, and facing a situation where you’re trying to access a key which you’re not sure exists for every dictionary, you can use get(). This method allows you to query a dictionary for a key, and if it doesn’t exist, it will return a default value which you can choose, like in the code below:

my_dict = {'id': 0, 'reviews': 4.5}
your_dict = {'id': 1}

# If I just specify the key, the default return value is None
your_dict.get('reviews')

# However, I can specify the return value
your_dict.get('reviews', default=4.0)

  • #1

Hello,

When I search the word «musée» (museum) Pleco can’t find it in the french dictionary (bought).
This word is nevertheless present : I can found only if I search with the english word «museum».

For me it’s a bug. I think there is the same problem with other words.
Fortunately the search runs correctly with mostly words, for example «maison».

Thank you to fix this bug.

mikelove


  • #2

Which French dictionary?

Perhaps it’s a text encoding issue with the é (accented character code points have an alternate meaning in older Chinese text encoding systems — two of them together turn into a Chinese character — so they can often get mangled); does it work correctly if you just type in ‘musee’?

  • #3

Hi,

My dictionary is «KEY Chinese-French».

I’ve tried with other words : the bug occurs for all words with accents.
If I type «musee», the word is not found.

However, I have another Android Chinese <> French dictionary : words with accents are correctly found.

Thank you for your help.

mikelove


  • #4

Ah, thanks — looks like a bug in the dictionary encoder; it mistakenly interpreted words with accent marks in them as inline Pinyin and skipped indexing them.

We’ve just pushed an update to fix this, so if you check the «Add-ons» screen under «Updated» (might have to refresh the page) you should be able to download a new version of the dictionary with this issue removed. (it did not affect Grand Ricci or CFDICT, which had already been configured to allow inline ‘pinyin’ — just an error with the encoder configuration for this particular title)

Thank you very much for bringing this to our attention!

Like this post? Please share to your friends:
  • Word not enough memory
  • Word not displaying margins
  • Word not counting words
  • Word not correcting spelling mistakes
  • Word of art part 4