If a user enters a form of the word «look» such as «looked» or «looking», how can I identify it as a modified version of the verb look? I imagine others have run into and have solved this problem before …
JohnZaj
3,0505 gold badges36 silver badges51 bronze badges
asked Jun 28, 2012 at 1:27
This is part of a fairly complicated problem called Stemming
However it’s easier if you only want to take care of verb. To begin with, you can try the naive lookup table approach, since English vocabulary is not that big.
If you want something fancier, check the wiki page above.
answered Jun 28, 2012 at 1:53
xvatarxvatar
3,21916 silver badges20 bronze badges
If a regex is what your looking for something like this works look.*?b
to match look , looked and looking
answered Jun 28, 2012 at 1:46
tsukimitsukimi
1,6052 gold badges21 silver badges35 bronze badges
0
Depending on your task, WordNet can be your friend for stuff like this. It’s not a stemmer, but most stem words will return hits for what you’re looking for It also provides synonyms and a lot of other information if you care about the concept ‘look’ rather than the word itself.
answered Jun 30, 2012 at 21:17
dfbdfb
13.1k2 gold badges30 silver badges52 bronze badges
How to find all forms of a word ?
Given that you have a noun , a verb, an adjective or an adverb ?
The forms i am referring to are :
Noun
Verb
Adjective
Adverb
Thanks !
asked Oct 27, 2014 at 7:09
1
Dictionaries usually list all important POS forms.
Also, words are routinely verbified, nounified, and all sorts of interuse is widespread today.
That nearly covers everything, I guess.
answered Oct 27, 2014 at 7:30
KrisKris
36.9k6 gold badges56 silver badges158 bronze badges
1
A reasonable grammar should have tables of the forms of noun, adjective and a table of the verb conjugation. But I admit that form tables are not favoured by English grammars. And the conjugation tables are mostly horrible, mixed with continuous forms, so that it is really difficult to understand the conjugation system. Adverbs are invariable, they have only one form.
answered Oct 27, 2014 at 7:41
rogermuerogermue
13.7k6 gold badges22 silver badges56 bronze badges
At a recent Camp Logos an attendee asked this question:
Can we execute a Match all word forms search with an Inline search?
First, I’ll give a little explanation.
On the Search panel menu resides the option to Match all word forms. With this feature selected, a search for faith also finds faithful, faithfulness, etc.
An Inline Search, however, does not visibly have this option. No worries, though. An Inline Search uses what was last checked or unchecked on the Search panel menu.
Follow these steps and you’ll see what I mean:
- Click the Search icon in the upper left of the program to open the Search panel
- Choose the Search panel menu (A)
- Select Match all word forms (B)
- Open a Bible such as the ESV
- Click the Inline Search icon on the Bible’s toolbar (C) to open the search criteria at the top of the Bible’s panel (D)
- Select Bible as the search type (E)
- Set the verse range to Ephesians (F)
- Type faith in the Find box (G)
- Press the Enter key to generate the search results
- Notice 10 search hits (H) including faith (I)and faithful (J)
- Choose the Search panel menu again (K)
- Uncheck Match all word forms (L)
- Click in the Find box of the Inline Search in the Bible (M)
- Press the Enter again to regenerate the results
- Notice 8 search hits (N) which only includes faith (O)
Again for emphasis. Even though an Inline Search does not visibly have the same options as the Search panel menu, it uses the last settings indicated on the Search panel menu. So if you enjoy using the Inline Search feature please remember this power-user trick.
For more detailed information about Searching, secure your copy of the Logos 7 Training Manuals Volumes 1-3 in print or digital.
Or for a fast and fun way to leverage the power of your software, attend an upcoming Camp Logos in Phoenix or Murfreesboro for live hands-on training!
Also be sure to follow the brand new MP Seminars Faithlife group and receive a FREE download of the commentary Ephesians: Verse by Verse by Dr. Grant Osborne.
Morris Proctor is a certified trainer for Logos Bible Software. Morris, who has trained thousands of Logos users at his two-day Camp Logos seminars, provides many training materials.
Accurately generate all possible forms of an English word
Word forms can accurately generate all possible forms of an English word. It can conjugate verbs. It can connect different
parts of speeches e.g noun to adjective, adjective to adverb, noun to verb etc. It can pluralize singular nouns. It does this all in one function. Enjoy!
Examples
Some very timely examples
>>> from word_forms.word_forms import get_word_forms >>> get_word_forms("president") >>> {'n': {'presidents', 'presidentships', 'presidencies', 'presidentship', 'president', 'presidency'}, 'a': {'presidential'}, 'v': {'preside', 'presided', 'presiding', 'presides'}, 'r': {'presidentially'}} >>> get_word_forms("elect") >>> {'n': {'elects', 'electives', 'electors', 'elect', 'eligibilities', 'electorates', 'eligibility', 'elector', 'election', 'elections', 'electorate', 'elective'}, 'a': {'eligible', 'electoral', 'elective', 'elect'}, 'v': {'electing', 'elects', 'elected', 'elect'}, 'r': set()} >>> get_word_forms("politician") >>> {'n': {'politician', 'politics', 'politicians'}, 'a': {'political'}, 'v': set(), 'r': {'politically'}} >>> get_word_forms("am") >>> {'n': {'being', 'beings'}, 'a': set(), 'v': {'was', 'be', "weren't", 'am', "wasn't", "aren't", 'being', 'were', 'is', "isn't", 'been', 'are', 'am not'}, 'r': set()} >>> get_word_forms("ran") >>> {'n': {'run', 'runniness', 'runner', 'runninesses', 'running', 'runners', 'runnings', 'runs'}, 'a': {'running', 'runny'}, 'v': {'running', 'run', 'ran', 'runs'}, 'r': set()} >>> get_word_forms('continent', 0.8) # with configurable similarity threshold >>> {'n': {'continents', 'continency', 'continences', 'continent', 'continencies', 'continence'}, 'a': {'continental', 'continent'}, 'v': set(), 'r': set()}
As you can see, the output is a dictionary with four keys. «r» stands for adverb, «a» for adjective, «n» for noun
and «v» for verb. Don’t ask me why «r» stands for adverb. This is what WordNet uses, so this is why I use it too
Help can be obtained at any time by typing the following:
>>> help(get_word_forms)
Why?
In Natural Language Processing and Search, one often needs to treat words like «run» and «ran», «love» and «lovable»
or «politician» and «politics» as the same word. This is usually done by algorithmically reducing each word into a
base word and then comparing the base words. The process is called Stemming.
For example, the Porter Stemmer reduces both «love» and «lovely»
into the base word «love».
Stemmers have several shortcomings. Firstly, the base word produced by the Stemmer is not always a valid English word.
For example, the Porter Stemmer reduces the word «operation» to «oper». Secondly, the Stemmers have a high false negative rate.
For example, «run» is reduced to «run» and «ran» is reduced to «ran». This happens because the Stemmers use a set of
rational rules for finding the base words, and as we all know, the English language does not always behave very rationally.
Lemmatizers are more accurate than Stemmers because they produce a base form that is present in the dictionary (also called the Lemma). So the reduced word is always a valid English word. However, Lemmatizers also have false negatives because they are not very good at connecting words across different parts of speeches. The WordNet Lemmatizer included with NLTK fails at almost all such examples. «operations» is reduced to «operation» and «operate» is reduced to «operate».
Word Forms tries to solve this problem by finding all possible forms of a given English word. It can perform verb conjugations, connect noun forms to verb forms, adjective forms, adverb forms, plularize singular forms etc.
Bonus: A simple lemmatizer
We also offer a very simple lemmatizer based on word_forms
. Here is how to use it.
>>> from word_forms.lemmatizer import lemmatize >>> lemmatize("operations") 'operant' >>> lemmatize("operate") 'operant'
Enjoy!
Compatibility
Tested on Python 3
Installation
Using pip
:
pip install -U word_forms
From source
Or you can install it from source:
- Clone the repository:
git clone https://github.com/gutfeeling/word_forms.git
- Install it using
pip
orsetup.py
pip install -e word_forms
% or
cd word_forms
python setup.py install
Acknowledgement
- The XTAG project for information on verb conjugations.
- WordNet
Maintainer
Hi, I am Dibya and I maintain this repository. I would love to hear from you. Feel free to get in touch with me
at dibyachakravorty@gmail.com.
Contributors
- Tom Aarsen @CubieDev is a major contributor and is singlehandedly responsible for v2.0.0.
- Sajal Sharma @sajal2692 ia a major contributor.
Contributions
Word Forms is not perfect. In particular, a couple of aspects can be improved.
- It sometimes generates non dictionary words like «runninesses» because the pluralization/singularization algorithm is
not perfect. At the moment, I am using inflect for it.
If you like this package, feel free to contribute. Your pull requests are most welcome.