I want to check in a Python program if a word is in the English dictionary.
I believe nltk wordnet interface might be the way to go but I have no clue how to use it for such a simple task.
def is_english_word(word):
pass # how to I implement is_english_word?
is_english_word(token.lower())
In the future, I might want to check if the singular form of a word is in the dictionary (e.g., properties -> property -> english word). How would I achieve that?
Salvador Dali
211k145 gold badges695 silver badges750 bronze badges
asked Sep 24, 2010 at 16:01
1
For (much) more power and flexibility, use a dedicated spellchecking library like PyEnchant
. There’s a tutorial, or you could just dive straight in:
>>> import enchant
>>> d = enchant.Dict("en_US")
>>> d.check("Hello")
True
>>> d.check("Helo")
False
>>> d.suggest("Helo")
['He lo', 'He-lo', 'Hello', 'Helot', 'Help', 'Halo', 'Hell', 'Held', 'Helm', 'Hero', "He'll"]
>>>
PyEnchant
comes with a few dictionaries (en_GB, en_US, de_DE, fr_FR), but can use any of the OpenOffice ones if you want more languages.
There appears to be a pluralisation library called inflect
, but I’ve no idea whether it’s any good.
answered Sep 24, 2010 at 16:26
KatrielKatriel
119k19 gold badges134 silver badges168 bronze badges
13
It won’t work well with WordNet, because WordNet does not contain all english words.
Another possibility based on NLTK without enchant is NLTK’s words corpus
>>> from nltk.corpus import words
>>> "would" in words.words()
True
>>> "could" in words.words()
True
>>> "should" in words.words()
True
>>> "I" in words.words()
True
>>> "you" in words.words()
True
answered Jan 28, 2014 at 8:38
SadıkSadık
4,1777 gold badges53 silver badges89 bronze badges
6
Using NLTK:
from nltk.corpus import wordnet
if not wordnet.synsets(word_to_test):
#Not an English Word
else:
#English Word
You should refer to this article if you have trouble installing wordnet or want to try other approaches.
nickb
59k12 gold badges105 silver badges141 bronze badges
answered Mar 18, 2011 at 11:29
Susheel JavadiSusheel Javadi
2,9843 gold badges32 silver badges34 bronze badges
6
Using a set to store the word list because looking them up will be faster:
with open("english_words.txt") as word_file:
english_words = set(word.strip().lower() for word in word_file)
def is_english_word(word):
return word.lower() in english_words
print is_english_word("ham") # should be true if you have a good english_words.txt
To answer the second part of the question, the plurals would already be in a good word list, but if you wanted to specifically exclude those from the list for some reason, you could indeed write a function to handle it. But English pluralization rules are tricky enough that I’d just include the plurals in the word list to begin with.
As to where to find English word lists, I found several just by Googling «English word list». Here is one: http://www.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt You could Google for British or American English if you want specifically one of those dialects.
answered Sep 24, 2010 at 16:12
kindallkindall
177k35 gold badges271 silver badges305 bronze badges
7
For All Linux/Unix Users
If your OS uses the Linux kernel, there is a simple way to get all the words from the English/American dictionary. In the directory /usr/share/dict
you have a words
file. There is also a more specific american-english
and british-english
files. These contain all of the words in that specific language. You can access this throughout every programming language which is why I thought you might want to know about this.
Now, for python specific users, the python code below should assign the list words to have the value of every single word:
import re
file = open("/usr/share/dict/words", "r")
words = re.sub("[^w]", " ", file.read()).split()
file.close()
def is_word(word):
return word.lower() in words
is_word("tarts") ## Returns true
is_word("jwiefjiojrfiorj") ## Returns False
Hope this helps!
answered Apr 28, 2020 at 12:09
1
For a faster NLTK-based solution you could hash the set of words to avoid a linear search.
from nltk.corpus import words as nltk_words
def is_english_word(word):
# creation of this dictionary would be done outside of
# the function because you only need to do it once.
dictionary = dict.fromkeys(nltk_words.words(), None)
try:
x = dictionary[word]
return True
except KeyError:
return False
answered Jun 27, 2016 at 19:58
Eb AbadiEb Abadi
5355 silver badges17 bronze badges
2
I find that there are 3 package-based solutions to solve the problem. They are pyenchant, wordnet and corpus(self-defined or from ntlk). Pyenchant couldn’t installed easily in win64 with py3. Wordnet doesn’t work very well because it’s corpus isn’t complete. So for me, I choose the solution answered by @Sadik, and use ‘set(words.words())’ to speed up.
First:
pip3 install nltk
python3
import nltk
nltk.download('words')
Then:
from nltk.corpus import words
setofwords = set(words.words())
print("hello" in setofwords)
>>True
answered Feb 3, 2019 at 3:53
Young YangYoung Yang
1341 silver badge5 bronze badges
1
With pyEnchant.checker SpellChecker:
from enchant.checker import SpellChecker
def is_in_english(quote):
d = SpellChecker("en_US")
d.set_text(quote)
errors = [err.word for err in d]
return False if ((len(errors) > 4) or len(quote.split()) < 3) else True
print(is_in_english('“办理美国加州州立大学圣贝纳迪诺分校高仿成绩单Q/V2166384296加州州立大学圣贝纳迪诺分校学历学位认证'))
print(is_in_english('“Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.”'))
> False
> True
answered May 4, 2017 at 14:16
1
For a semantic web approach, you could run a sparql query against WordNet in RDF format. Basically just use urllib module to issue GET request and return results in JSON format, parse using python ‘json’ module. If it’s not English word you’ll get no results.
As another idea, you could query Wiktionary’s API.
answered Sep 24, 2010 at 17:28
burkestarburkestar
7531 gold badge4 silver badges12 bronze badges
use nltk.corpus instead of enchant. Enchant gives ambiguous results. For example :
for benchmark and bench-mark enchant is returning true. It should suppose to return false for benchmark.
answered Apr 10, 2021 at 11:51
Download this txt file https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt
then create a Set
out of it using the following python code snippet that loads about 370k non-alphanumeric words in english
>>> with open("/PATH/TO/words_alpha.txt") as f:
>>> words = set(f.read().split('n'))
>>> len(words)
370106
From here onwards, you can check for existence in constant time using
>>> word_to_check = 'baboon'
>>> word_to_check in words
True
Note that this set might not be comprehensive but still gets the job done, user should do quality checks to make sure it works for their use-case as well.
answered May 23, 2022 at 18:19
AyushAyush
4522 gold badges8 silver badges24 bronze badges
Here I introduce several ways to identify if the word consists of the English alphabet or not.
1. Using isalpha method
In Python, string object has a method called isalpha
word = "Hello"
if word.isalpha():
print("It is an alphabet")
word = "123"
if word.isalpha():
print("It is an alphabet")
else:
print("It is not an alphabet")
However, this approach has a minor problem; for example, if you use the Korean alphabet, it still considers the Korean word as an alphabet. (Of course, for the non-Korean speaker, it wouldn’t be a problem 😅 )
To avoid this behavior, you should add encode method before call isalpha.
word = "한글"
if word.encode().isalpha():
print("It is an alphabet")
else:
print("It is not an alphabet")
2. Using Regular Expression.
I think this is a universal approach, regardless of programming language.
import re
word="hello"
reg = re.compile(r'[a-zA-Z]')
if reg.match(word):
print("It is an alphabet")
else:
print("It is not an alphabet")
word="123"
reg = re.compile(r'[a-z]')
if reg.match(word):
print("It is an alphabet")
else:
print("It is not an alphabet")
3. Using operator
It depends on the precondition; however, we will just assume the goal is if all characters should be the English alphabet or not.
Therefore, we can apply the comparison operator.
word = "hello"
if 'a' <= word[0] <= "z" or 'A' <= word[0] <='Z':
print("It is an alphabet")
else:
print("It is not an alphabet")
Note that we have to consider both upper and lower cases. Also, we shouldn’t use the entire word because the comparison would work differently based on the length of the word.
We can also simplify this code using the lower or upper method in the string.
word = "hello"
if 'a' <= word[0].lower() <= "z":
print("It is an alphabet")
else:
print("It is not an alphabet")
4. Using lower and upper method
This is my favorite approach. Since the English alphabet has Lower and Upper cases, unlike other characters (number or Korean), we can leverage this characteristic to identify the word.
word = "hello"
if word.upper() != word.lower():
print("It is an alphabet")
else:
print("It is not an alphabet")
Happy coding!
In this article, you will learn how to write python code for spell checking. I have discussed various methods you can use to implement your spell checker program.
But, before that, let’s learn some core topics, i.e., what spell checking is and its benefits. And after that, we will learn different approaches to writing Python programs for spell checking.
Table of contents
- What is spell checking?
- How to check spelling in python
- Using a Dictionary for Spell Checking
- Using enchant Spell Checker Library
- Using pyspellchecker
- Using TextBlob
- Using autocorrect
- Conclusion
What is spell checking?
Spell checking is the process of checking a document or sentence for spelling errors. It is done in two ways either manually by proofreading the documents or by using a spell checker software program like Grammarly, Pro Writing Aid, Ginger Software, etc.
Spell checking is an integral part of editing and proofreading as it ensures the document is error-free.
Here are some of the benefits of proofreading and spell-checking the documents:
- Help communicate more effectively.
- Help avoid embarrassing mistakes.
- Help Impress your boss or teacher.
- Help get a better grade on an assignment.
- Help avoid misunderstandings.
- Help find errors in your writing.
- Help proofread your work.
- Help improve your writing skills.
- Help avoid plagiarism.
- Help save time.
How to check spelling in python
There are several ways to approach spell checking in python. One common approach is to use a dictionary to store a list of words. The program then checks each word in the document against the dictionary to see if spells are correct.
Another approach is to use a spell checker library. These libraries typically use a more sophisticated approach to spell checking than a simple dictionary lookup. They may also consider the context of a word; to identify errors better.
In this article, We will start by looking at how to use a dictionary for spell-checking. After that, we will explore how to use spell checker libraries for writing spell-checking programs that will check and suggest English words.
01.
Using a Dictionary for Spell Checking
One of the simplest ways to approach spell checking is to use a dictionary. A dictionary is a data structure that stores a collection of values. Each value in a dictionary is associated with a key.
In the context of spell checking, the keys are words, and the values are Boolean values that indicate whether the spelling of a word is correct.
Let’s start by creating an empty dictionary. We will call our dictionary spell_dict.
spell_dict = {}
Next, we must populate our dictionary with words and their associated Boolean values. There are a few ways to do this; One option is to manually add words to the dictionary. This is fine for a few words, but it quickly becomes tedious for large dictionaries.
A more efficient approach is to read the words from a file. We can then loop over the words in the file and add them to the dictionary.
The following code shows how to read the words from a file and add them to a dictionary.
Note: In the words.txt file, place each word in a separate line.
with open ('words.txt', 'r') as f:
for line in f :
word = line.strip()
spell_dict[word] = True
We can now use our dictionary for spell checking using python. The following code shows how to check a word against the dictionary.
def check_spelling(word):
if word in spell_dict:
return True
else:
return False
The check_spelling() function takes a word as an argument and returns a Boolean value. If the word is in the dictionary, the function returns True. Otherwise, it returns False.
We can use the check_spelling() function to spell check a document. The following code shows how to do this.
with open('document.txt', 'r') as f:
for line in f:
for word in line.split():
if not check_spelling(word):
print('Incorrect spelling: ' + word)
In the code above, we have opened the document.txt file in read-only mode. After that, we used a for loop to iterate over the lines in the file. For each line, we have used the split() method to split the line into words. We then used another for loop to iterate over the words. For each word, we have used the check_spelling() function to check if the word is correct. If the word is not in the dictionary, the check_spelling() function will return False.
And here is the complete code in action:
spell_dict = {}
with open ('words.txt', 'r') as f:
for line in f :
word = line.strip()
spell_dict[word] = True
def check_spelling(word):
if word in spell_dict:
return True
else:
return False
with open('document.txt', 'r') as f:
for line in f:
for word in line.split():
if not check_spelling(word):
print('Incorrect spelling: ' + word)
'''
#--------words.txt-----
hello
how
are
you
doing?
#--------document.txt-----
Hell how r you doing?
#Output
Incorrect spelling: Hell
Incorrect spelling: r
'''
The spell-checking approach used in the code above is very basic. It will only identify words that are not in the dictionary.
If you need a more sophisticated spell checker program, you should consider using a different spell checker library.
02.
Using enchant Spell Checker Library
There are several spell checker libraries available for python. In this section, we will look at how to use the pyenchant library.
The pyenchant library is a Python wrapper for the Enchant spell checker library. Enchant is a cross-platform library that supports a variety of languages.
The pyenchant library is available from PyPI, so install it using pip.
pip install pyenchant
And if you are facing any problems, you can visit the pyenchant installation guide.
After successful installation, we can use enchant to check the spelling of words. The following code shows how to do this.
import enchant
d = enchant.Dict("en_US")
word = "hellow"
if d.check(word):
print ("Spelling is correct")
else:
print ("Spelling is incorrect")
In the code above, we have imported the enchant module. We then used the Dict class to create a dictionary object for the en_US language.
We then checked the spelling of the word “hellow”. The check() method returns a Boolean value. If the spelling of a word is correct, it returns True. Otherwise, it returns False.
If there is a misspelling, the pyenchant library can suggest possible corrections. The following code shows how to do this.
import enchant
d = enchant.Dict("en_US")
word = "hellow"
if d.check(word):
print("Spelling is correct")
else:
print("Spelling is incorrect")
print("Suggested corrections: ")
for correction in d.suggest(word):
print(correction)
Apart from that, the pyenchant library also helps check the spelling in the document. The following code shows how to do this.
import enchant
d = enchant.Dict( "en_US" )
with open('document.txt', 'r') as f:
for line in f:
for word in line.split():
if not d.check(word):
print('Incorrect spelling: ' + word)
print('Suggested corrections:')
for correction in d.suggest(word):
print(correction)
In the code above, we have imported the enchant module. We then used the Dict class to create a dictionary object for the en_US language.
We then opened the document.txt file in read-only mode. We have then used a for loop to iterate over the lines in the file. For each line, we have used the split() method to split the line into words. We then used another for loop to iterate over the words.
For each word, we have used the check() method to check if the word is correct. If the word is not in the dictionary, the check() method will return False. In this case, we have printed a message to the console to indicate that the word is incorrect. We have then used the suggest() method to suggest possible corrections.
The spell checker approach used in the code above is more sophisticated than the one we used in the previous section. It takes into account the context of a word, which helps to identify errors.
Now, let’s discuss how to create a spell checker program in python using the pyspellchecker library. Pyspellchecker is a pure python library for checking spelling mistakes in strings.
We will first install the pyspellchecker library using pip.
pip install pyspellchecker
After installation, we can import the library into our python program.
import spellchecker
Now, we will create a function that takes a string as an input and returns a list of misspelled words.
def spell_check(string):
spell = spellchecker.SpellChecker()
misspelled_words = spell.unknown(string.split())
return misspelled_words
We can now test our function on a string with some misspelled words.
string = 'This is a sentense with some mispelled words'
misspelled_words = spell_check(string)
print(misspelled_words)
The code will print out the following result:
{'sentense', 'mispelled'}
Here is the complete code in action:
import spellchecker
def spell_check(string):
spell = spellchecker.SpellChecker()
misspelled_words = spell.unknown(string.split())
return misspelled_words
string = 'This is a sentense with some mispelled words'
misspelled_words = spell_check(string)
print(misspelled_words)
'''
#Output
{'mispelled', 'sentense'}
'''
Apart from that, you can also write a code that can give a suggestion to the misspelled words. Here is how you can do it.
from spellchecker import SpellChecker
spell = SpellChecker()
# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])
for word in misspelled:
# Get the one `most likely` answer
print(spell.correction(word))
# Get a list of `likely` options
print(spell.candidates(word))
'''
#Output
happenning
{'hapening', 'happenning'}
'''
We have already seen two libraries for implementing spell-checking programs using python. The most popular one is probably the TextBlob library. The TextBlob library provides a simple interface for doing spell checking in python.
To use the TextBlob library, you first need to install it. You can do this using the pip command:
pip install textblob
After that, you need to create a TextBlob object by passing a string of text to the TextBlob constructor:
text = "Website name is problem solving code"
blob = TextBlob(text)
Once you have a TextBlob object, you can use the correct() method to correct the spelling of words in the string. The correct() method takes a word as an argument and returns the corrected spelling of the word:
print(blob.correct())
If the word is not in the TextBlob dictionary, then the correct() method will return the word unchanged.
Here is the complete code in action
from textblob import TextBlob
text = "Website name is problem solving code"
blob = TextBlob(text)
print(blob.correct())
'''
#Output
Website name is problem solving code
'''
The TextBlob library also provides many other methods and functions for spell checking. For more information, see the TextBlob documentation.
The Autocorrect module also enables us to check the spelling of a single word. And return the correct spelling of the word. To check the spelling, we are required to use the spell() function from the autocorrect library.
To use the autocorrect library, you first need to install it. You can do this using the pip command:
pip install autocorrect
Once you install it, you have to import it, and here is the complete working code.
from autocorrect import Speller
spell = Speller(lang='en')
print(spell('mussage'))
print(spell('survice'))
print(spell('hte'))
'''
#Output
message
service
the
'''
Conclusion
In this article, we have looked at how to use python for spell checking. We have looked at different approaches: one using a dictionary and the others using a spell checker library.
Which approach you use will depend on the specific requirements of your project. If you need a more sophisticated spell checker, you should consider using a spell checker library.
Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article
Given string str, the task is to check if this string str consists of valid English words or not.
A string is known as a valid English word if it meets all the below criteria-
- The string can have an uppercase character as the first character only.
- The string can only have lower case characters.
- The string can consist of only one hyphen(‘-‘) surrounded by characters on both ends.
- The string cannot consist of any digits.
- If there is any punctuation mark it must be only one and it must be present at the end.
Print the number of valid words in the string str.
Input: str = “i Love- Geeks-forgeeks!”
Output: 1 word
Explanation:
word 1 = “i” does not contain first uppercase character, it is not valid word
word 2 = “Love-” hyphen is not surrounded by characters on both ends, it is not valid word
word 3 = “Geeks-forgeeks!” is a valid wordInput: str = “!this 1-s b8d!”
Output: 0 words
Explanation:
word 1 = “!this” punctuation mark is in the beginning, it is not valid word
word 2 = “1-s” digit as first character, it is not valid word
word 3 = “b8d!” first character is not uppercase, it is not valid word
Approach:
- Initialize the variable ans to keep count of the number of valid words.
- Loop through each word present in the sentence.
- Check each letter of the word to see if it meets the criteria mentioned in the problem statement.
- If any of the criteria is not met then return false.
- If all the criteria are satisfied by the word, then increment the value of the variable ans.
- Print the value of the variable ans.
Below is the C++ program of the above approach-
C++
#include <bits/stdc++.h>
using
namespace
std;
bool
ValidWords(string sentence)
{
int
hyphen = 0;
int
size = sentence.size();
if
(
isupper
(sentence[0])) {
for
(
int
i = 0; i < size; i++) {
if
(
isdigit
(sentence[i]))
return
false
;
if
(
isupper
(sentence[i]))
return
false
;
if
(
isalpha
(sentence[i]))
continue
;
if
(sentence[i] ==
'-'
) {
if
(++hyphen > 1)
return
false
;
if
(i - 1 < 0
|| !
isalpha
(sentence[i - 1])
|| i + 1 >= size
|| !
isalpha
(sentence[i + 1]))
return
false
;
}
else
if
(i != size - 1
&& ispunct(sentence[i]))
return
false
;
}
}
else
return
true
;
}
int
main()
{
string sentence =
"i Love- Geeks-Forgeeks!"
;
istringstream s(sentence);
string word;
int
ans = 0;
while
(s >> word)
if
(ValidWords(word))
ans++;
cout << ans <<
" words"
;
}
Java
import
java.io.*;
class
GFG {
static
boolean
ValidWords(String sentence)
{
int
hyphen =
0
;
int
size = sentence.length();
if
(Character.isUpperCase(sentence.charAt(
0
))) {
for
(
int
i =
0
; i < size; i++) {
if
(Character.isDigit(sentence.charAt(i)))
return
false
;
if
(Character.isUpperCase(
sentence.charAt(i)))
return
false
;
if
(Character.isAlphabetic(
sentence.charAt(i)))
continue
;
if
(sentence.charAt(i) ==
'-'
) {
hyphen = hyphen +
1
;
if
(hyphen >
1
)
return
false
;
if
(i -
1
<
0
|| !Character.isAlphabetic(
sentence.charAt(i -
1
))
|| i +
1
>= size
|| !Character.isAlphabetic(
sentence.charAt(i +
1
)))
return
false
;
}
else
if
(i != size -
1
&& ((sentence.charAt(i) ==
'!'
|| sentence.charAt(i) ==
','
|| sentence.charAt(i) ==
';'
|| sentence.charAt(i) ==
'.'
|| sentence.charAt(i) ==
'?'
|| sentence.charAt(i) ==
'-'
|| sentence.charAt(i) ==
'''
|| sentence.charAt(i) ==
'"'
|| sentence.charAt(i)
==
':'
)))
return
false
;
}
}
else
return
true
;
return
false
;
}
public
static
void
main(String[] args)
{
String sentence =
"i Love- Geeks-Forgeeks!"
;
int
ans =
0
;
String words[] = sentence.split(
" "
);
for
(String word : words) {
if
(ValidWords(word)==
true
){
ans++;
}
}
System.out.print(ans +
" words"
);
}
}
Python3
def
ValidWords(sentence):
hyphen
=
0
size
=
len
(sentence)
if
(sentence[
0
] >
=
'A'
and
sentence[
0
] <
=
'Z'
):
for
i
in
range
(size):
if
(sentence[i] >
=
'0'
and
sentence[i] <
=
'9'
):
return
False
if
(sentence[i] >
=
'A'
and
sentence[i] <
=
'Z'
):
return
False
if
(sentence[i] >
=
'a'
and
sentence[i] <
=
'z'
or
sentence[i] >
=
'A'
and
sentence[i] <
=
'Z'
):
continue
if
(sentence[i]
=
=
'-'
):
if
(hyphen
+
1
>
1
):
return
False
if
(i
-
1
<
0
or
~(sentence[i
-
1
] >
=
'a'
and
sentence[i
-
1
] <
=
'z'
or
sentence[i
-
1
] >
=
'A'
and
sentence[i
-
1
] <
=
'Z'
)
or
i
+
1
>
=
size
or
~(sentence[i
+
1
] >
=
'a'
and
sentence[i
+
1
] <
=
'z'
or
sentence[i
+
1
] >
=
'A'
and
sentence[i
+
1
] <
=
'Z'
)):
return
False
elif
(i !
=
size
-
1
and
((sentence[i]
=
=
'!'
or
sentence[i]
=
=
','
or
sentence[i]
=
=
';'
or
sentence[i]
=
=
'.'
or
sentence[i]
=
=
'?'
or
sentence[i]
=
=
'-'
or
sentence[i]
=
=
'''
or
sentence[i]
=
=
'"'
or
sentence[i]
=
=
':'
))):
return
False
else
:
return
True
sentence
=
"i Love- Geeks-Forgeeks!"
word
=
sentence.split(
' '
)
ans
=
0
for
indx
in
word :
if
(ValidWords(indx)):
ans
+
=
1
print
(f
"{ans} words"
)
C#
using
System;
class
GFG
{
static
bool
ValidWords(String sentence)
{
int
hyphen = 0;
int
size = sentence.Length;
if
(
char
.IsUpper(sentence[0]))
{
for
(
int
i = 0; i < size; i++)
{
if
(
char
.IsDigit(sentence[i]))
return
false
;
if
(
char
.IsUpper(sentence[i]))
return
false
;
if
(
char
.IsLetter(sentence[i]))
continue
;
if
(sentence[i] ==
'-'
)
{
hyphen = hyphen + 1;
if
(hyphen > 1)
return
false
;
if
(i - 1 < 0
|| !
char
.IsLetter(sentence[i - 1])
|| i + 1 >= size
|| !
char
.IsLetter(sentence[i + 1]))
return
false
;
}
else
if
(i != size - 1
&& ((sentence[i] ==
'!'
|| sentence[i] ==
','
|| sentence[i] ==
';'
|| sentence[i] ==
'.'
|| sentence[i] ==
'?'
|| sentence[i] ==
'-'
|| sentence[i] ==
'''
|| sentence[i] ==
'"'
|| sentence[i]
==
':'
)))
return
false
;
}
}
else
return
true
;
return
false
;
}
public
static
void
Main()
{
String sentence =
"i Love- Geeks-Forgeeks!"
;
int
ans = 0;
String[] words = sentence.Split(
" "
);
foreach
(String word
in
words)
{
if
(ValidWords(word) ==
true
)
{
ans++;
}
}
Console.Write(ans +
" words"
);
}
}
Javascript
<script>
const ValidWords = (sentence) => {
let hyphen = 0;
let size = sentence.length;
if
(sentence[0] >=
'A'
&& sentence[0] <=
'Z'
)
{
for
(let i = 0; i < size; i++)
{
if
(sentence[i] >=
'0'
&& sentence[i] <=
'9'
)
return
false
;
if
(sentence[i] >=
'A'
&& sentence[i] <=
'Z'
)
return
false
;
if
(sentence[i] >=
'a'
&& sentence[i] <=
'z'
||
sentence[i] >=
'A'
&& sentence[i] <=
'Z'
)
continue
;
if
(sentence[i] ==
'-'
) {
if
(++hyphen > 1)
return
false
;
if
(i - 1 < 0
|| !(sentence[i - 1] >=
'a'
&&
sentence[i - 1] <=
'z'
||
sentence[i - 1] >=
'A'
&&
sentence[i - 1] <=
'Z'
)
|| i + 1 >= size
|| !(sentence[i + 1] >=
'a'
&&
sentence[i + 1] <=
'z'
||
sentence[i + 1] >=
'A'
&&
sentence[i + 1] <=
'Z'
))
return
false
;
}
else
if
(i != size - 1
&& ((sentence[i] ==
'!'
|| sentence[i] ==
','
|| sentence[i] ==
';'
|| sentence[i] ==
'.'
|| sentence[i] ==
'?'
|| sentence[i] ==
'-'
|| sentence[i] ==
'''
|| sentence[i] ==
'"'
|| sentence[i]
==
':'
)))
return
false
;
}
}
else
return
true
;
}
let sentence = "i Love- Geeks-Forgeeks!";
let word = sentence.split(
' '
);
let ans = 0;
for
(let indx
in
word)
if
(ValidWords(word[indx]))
ans++;
document.write(`${ans} words`);
</script>
Time Complexity: O(N) as only one traversal of the string of length N is enough for the algorithm to perform all the tasks hence the overall complexity is linear.
Auxiliary Space: O(N) as the variable s stores all the words of the strings hence the overall space occupied by the algorithm is equal to the length of the string
Like Article
Save Article
Pure Python Spell Checking based on Peter
Norvig’s blog post on setting
up a simple spell checking algorithm.
It uses a Levenshtein Distance
algorithm to find permutations within an edit distance of 2 from the
original word. It then compares all permutations (insertions, deletions,
replacements, and transpositions) to known words in a word frequency
list. Those words that are found more often in the frequency list are
more likely the correct results.
pyspellchecker supports multiple languages including English, Spanish,
German, French, and Portuguese. For information on how the dictionaries were
created and how they can be updated and improved, please see the
Dictionary Creation and Updating section of the readme!
pyspellchecker supports Python 3
pyspellchecker allows for the setting of the Levenshtein Distance (up to two) to check.
For longer words, it is highly recommended to use a distance of 1 and not the
default 2. See the quickstart to find how one can change the distance parameter.
Installation
The easiest method to install is using pip:
pip install pyspellchecker
To build from source:
git clone https://github.com/barrust/pyspellchecker.git
cd pyspellchecker
python -m build
For python 2.7 support, install release 0.5.6
but note that no future updates will support python 2.
pip install pyspellchecker==0.5.6
Quickstart
After installation, using pyspellchecker should be fairly straight
forward:
from spellchecker import SpellChecker
spell = SpellChecker()
# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])
for word in misspelled:
# Get the one `most likely` answer
print(spell.correction(word))
# Get a list of `likely` options
print(spell.candidates(word))
If the Word Frequency list is not to your liking, you can add additional
text to generate a more appropriate list for your use case.
from spellchecker import SpellChecker
spell = SpellChecker() # loads default word frequency list
spell.word_frequency.load_text_file('./my_free_text_doc.txt')
# if I just want to make sure some words are not flagged as misspelled
spell.word_frequency.load_words(['microsoft', 'apple', 'google'])
spell.known(['microsoft', 'google']) # will return both now!
If the words that you wish to check are long, it is recommended to reduce the
distance to 1. This can be accomplished either when initializing the spell
check class or after the fact.
from spellchecker import SpellChecker
spell = SpellChecker(distance=1) # set at initialization
# do some work on longer words
spell.distance = 2 # set the distance parameter back to the default
Non-English Dictionaries
pyspellchecker supports several default dictionaries as part of the default
package. Each is simple to use when initializing the dictionary:
from spellchecker import SpellChecker
english = SpellChecker() # the default is English (language='en')
spanish = SpellChecker(language='es') # use the Spanish Dictionary
russian = SpellChecker(language='ru') # use the Russian Dictionary
arabic = SpellChecker(language='ar') # use the Arabic Dictionary
The currently supported dictionaries are:
-
English — ‘en’
-
Spanish — ‘es’
-
French — ‘fr’
-
Portuguese — ‘pt’
-
German — ‘de’
-
Russian — ‘ru’
-
Arabic — ‘ar’
Dictionary Creation and Updating
The creation of the dictionaries is, unfortunately, not an exact science. I have provided a script that, given a text file of sentences (in this case from
OpenSubtitles) it will generate a word frequency list based on the words found within the text. The script then attempts to *clean up* the word frequency by, for example, removing words with invalid characters (usually from other languages), removing low count terms (misspellings?) and attempts to enforce rules as available (no more than one accent per word in Spanish). Then it removes words from a list of known words that are to be removed. It then adds words into the dictionary that are known to be missing or were removed for being too low frequency.
The script can be found here: scripts/build_dictionary.py`. The original word frequency list parsed from OpenSubtitles can be found in the `scripts/data/` folder along with each language’s include and exclude text files.
Any help in updating and maintaining the dictionaries would be greatly desired. To do this, a
discussion could be started on GitHub or pull requests to update the include and exclude files could be added.
Additional Methods
On-line documentation is available; below contains the cliff-notes version of some of the available functions:
correction(word): Returns the most probable result for the
misspelled word
candidates(word): Returns a set of possible candidates for the
misspelled word
known([words]): Returns those words that are in the word frequency
list
unknown([words]): Returns those words that are not in the frequency
list
word_probability(word): The frequency of the given word out of all
words in the frequency list
The following are less likely to be needed by the user but are available:
edit_distance_1(word): Returns a set of all strings at a Levenshtein
Distance of one based on the alphabet of the selected language
edit_distance_2(word): Returns a set of all strings at a Levenshtein
Distance of two based on the alphabet of the selected language
Credits
-
Peter Norvig blog post on setting up a simple spell checking algorithm
-
P Lison and J Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)
Sometimes, we want to check if a word is an English word with Python.
In this article, we’ll look at how to check if a word is an English word with Python.
How to check if a word is an English word with Python?
To check if a word is an English word with Python, we can use the enchant
module.
To install it, we run:
pip install pyenchant
Then we can use it by writing:
import enchant
d = enchant.Dict("en_US")
print(d.check("Hello"))
print(d.suggest("Helo"))
We return the enchant dictionary object with the enchant.Dict
class with the locale string as its argument.
Then we call d.check
with a string to check if the string is an English word.
And we also called d.suggest
with a string to check if there’re any English words close to the string argument.
Therefore, we see:
True
['He lo', 'He-lo', 'Hello', 'Helot', 'Help', 'Halo', 'Hell', 'Held', 'Helm', 'Hero', "He'll"]
from the print
output.
Conclusion
To check if a word is an English word with Python, we can use the enchant
module.
Web developer specializing in React, Vue, and front end development.
View Archive
language_tool_python
: a grammar checker for Python 📝
Current LanguageTool version: 5.5
This is a Python wrapper for LanguageTool. LanguageTool is open-source grammar tool, also known as the spellchecker for OpenOffice. This library allows you to make to detect grammar errors and spelling mistakes through a Python script or through a command-line interface.
Local and Remote Servers
By default, language_tool_python
will download a LanguageTool server .jar
and run that in the background to detect grammar errors locally. However, LanguageTool also offers a Public HTTP Proofreading API that is supported as well. Follow the link for rate limiting details. (Running locally won’t have the same restrictions.)
Using language_tool_python
locally
Local server is the default setting. To use this, just initialize a LanguageTool object:
import language_tool_python tool = language_tool_python.LanguageTool('en-US') # use a local server (automatically set up), language English
Using language_tool_python
with the public LanguageTool remote server
There is also a built-in class for querying LanguageTool’s public servers. Initialize it like this:
import language_tool_python tool = language_tool_python.LanguageToolPublicAPI('es') # use the public API, language Spanish
Using language_tool_python
with the another remote server
Finally, you’re able to pass in your own remote server as an argument to the LanguageTool
class:
import language_tool_python tool = language_tool_python.LanguageTool('ca-ES', remote_server='https://language-tool-api.mywebsite.net') # use a remote server API, language Catalan
Apply a custom list of matches with utils.correct
If you want to decide which Match
objects to apply to your text, use tool.check
(to generate the list of matches) in conjunction with language_tool_python.utils.correct
(to apply the list of matches to text). Here is an example of generating, filtering, and applying a list of matches. In this case, spell-checking suggestions for uppercase words are ignored:
>>> s = "Department of medicine Colombia University closed on August 1 Milinda Samuelli" >>> is_bad_rule = lambda rule: rule.message == 'Possible spelling mistake found.' and len(rule.replacements) and rule.replacements[0][0].isupper() >>> import language_tool_python >>> tool = language_tool_python.LanguageTool('en-US') >>> matches = tool.check(s) >>> matches = [rule for rule in matches if not is_bad_rule(rule)] >>> language_tool_python.utils.correct(s, matches) 'Department of medicine Colombia University closed on August 1 Melinda Sam'
Example usage
From the interpreter:
>>> import language_tool_python >>> tool = language_tool_python.LanguageTool('en-US') >>> text = 'A sentence with a error in the Hitchhiker’s Guide tot he Galaxy' >>> matches = tool.check(text) >>> len(matches) 2 ... >>> tool.close() # Call `close()` to shut off the server when you're done.
Check out some Match
object attributes:
>>> matches[0].ruleId, matches[0].replacements # ('EN_A_VS_AN', ['an']) ('EN_A_VS_AN', ['an']) >>> matches[1].ruleId, matches[1].replacements ('TOT_HE', ['to the'])
Print a Match
object:
>>> print(matches[1]) Line 1, column 51, Rule ID: TOT_HE[1] Message: Did you mean 'to the'? Suggestion: to the ...
Automatically apply suggestions to the text:
>>> tool.correct(text) 'A sentence with an error in the Hitchhiker’s Guide to the Galaxy'
From the command line:
$ echo 'This are bad.' > example.txt $ language_tool_python example.txt example.txt:1:1: THIS_NNS[3]: Did you mean 'these'?
Closing LanguageTool
language_tool_python
runs a LanguageTool Java server in the background. It will shut the server off when garbage collected, for example when a created language_tool_python.LanguageTool
object goes out of scope. However, if garbage collection takes awhile, the process might not get deleted right away. If you’re seeing lots of processes get spawned and not get deleted, you can explicitly close them:
import language_tool_python tool = language_tool_python.LanguageToolPublicAPI('de-DE') # starts a process # do stuff with `tool` tool.close() # explicitly shut off the LanguageTool
You can also use a context manager (with .. as
) to explicitly control when the server is started and stopped:
import language_tool_python with language_tool_python.LanguageToolPublicAPI('de-DE') as tool: # do stuff with `tool` # no need to call `close() as it will happen at the end of the with statement
Client-Server Model
You can run LanguageTool on one host and connect to it from another. This is useful in some distributed scenarios. Here’s a simple example:
server
>>> import language_tool_python >>> tool = language_tool_python.LanguageTool('en-US', host='0.0.0.0') >>> tool._url 'http://0.0.0.0:8081/v2/'
client
>>> import language_tool_python >>> lang_tool = language_tool_python.LanguageTool('en-US', remote_server='http://0.0.0.0:8081') >>> >>> >>> lang_tool.check('helo darknes my old frend') [Match({'ruleId': 'UPPERCASE_SENTENCE_START', 'message': 'This sentence does not start with an uppercase letter.', 'replacements': ['Helo'], 'offsetInContext': 0, 'context': 'helo darknes my old frend', 'offset': 0, 'errorLength': 4, 'category': 'CASING', 'ruleIssueType': 'typographical', 'sentence': 'helo darknes my old frend'}), Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['darkness', 'darkens', 'darkies'], 'offsetInContext': 5, 'context': 'helo darknes my old frend', 'offset': 5, 'errorLength': 7, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'helo darknes my old frend'}), Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['friend', 'trend', 'Fred', 'freed', 'Freud', 'Friend', 'fend', 'fiend', 'frond', 'rend', 'fr end'], 'offsetInContext': 20, 'context': 'helo darknes my old frend', 'offset': 20, 'errorLength': 5, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'helo darknes my old frend'})] >>>
Configuration
LanguageTool offers lots of built-in configuration options.
Example: Enabling caching
Here’s an example of using the configuration options to enable caching. Some users have reported that this helps performance a lot.
import language_tool_python tool = language_tool_python.LanguageTool('en-US', config={ 'cacheSize': 1000, 'pipelineCaching': True })
Example: Setting maximum text length
Here’s an example showing how to configure LanguageTool to set a maximum length on grammar-checked text. Will throw an error (which propagates to Python as a language_tool_python.LanguageToolError
) if text is too long.
import language_tool_python tool = language_tool_python.LanguageTool('en-US', config={ 'maxTextLength': 100 })
Full list of configuration options
Here’s a full list of configuration options. See the LanguageTool HTTPServerConfig documentation for details.
'maxTextLength' - maximum text length, longer texts will cause an error (optional)
'maxTextHardLength' - maximum text length, applies even to users with a special secret 'token' parameter (optional)
'secretTokenKey' - secret JWT token key, if set by user and valid, maxTextLength can be increased by the user (optional)
'maxCheckTimeMillis' - maximum time in milliseconds allowed per check (optional)
'maxErrorsPerWordRate' - checking will stop with error if there are more rules matches per word (optional)
'maxSpellingSuggestions' - only this many spelling errors will have suggestions for performance reasons (optional,
affects Hunspell-based languages only)
'maxCheckThreads' - maximum number of threads working in parallel (optional)
'cacheSize' - size of internal cache in number of sentences (optional, default: 0)
'cacheTTLSeconds' - how many seconds sentences are kept in cache (optional, default: 300 if 'cacheSize' is set)
'requestLimit' - maximum number of requests per requestLimitPeriodInSeconds (optional)
'requestLimitInBytes' - maximum aggregated size of requests per requestLimitPeriodInSeconds (optional)
'timeoutRequestLimit' - maximum number of timeout request (optional)
'requestLimitPeriodInSeconds' - time period to which requestLimit and timeoutRequestLimit applies (optional)
'languageModel' - a directory with '1grams', '2grams', '3grams' sub directories which contain a Lucene index
each with ngram occurrence counts; activates the confusion rule if supported (optional)
'word2vecModel' - a directory with word2vec data (optional), see
https://github.com/languagetool-org/languagetool/blob/master/languagetool-standalone/CHANGES.md#word2vec
'fasttextModel' - a model file for better language detection (optional), see
https://fasttext.cc/docs/en/language-identification.html
'fasttextBinary' - compiled fasttext executable for language detection (optional), see
https://fasttext.cc/docs/en/support.html
'maxWorkQueueSize' - reject request if request queue gets larger than this (optional)
'rulesFile' - a file containing rules configuration, such as .langugagetool.cfg (optional)
'warmUp' - set to 'true' to warm up server at start, i.e. run a short check with all languages (optional)
'blockedReferrers' - a comma-separated list of HTTP referrers (and 'Origin' headers) that are blocked and will not be served (optional)
'premiumOnly' - activate only the premium rules (optional)
'disabledRuleIds' - a comma-separated list of rule ids that are turned off for this server (optional)
'pipelineCaching' - set to 'true' to enable caching of internal pipelines to improve performance
'maxPipelinePoolSize' - cache size if 'pipelineCaching' is set
'pipelineExpireTimeInSeconds' - time after which pipeline cache items expire
'pipelinePrewarming' - set to 'true' to fill pipeline cache on start (can slow down start a lot)
Installation
To install via pip:
$ pip install --upgrade language_tool_python
Customizing Download URL or Path
To overwrite the host part of URL that is used to download LanguageTool-{version}.zip:
$ export LTP_DOWNLOAD_HOST = [alternate URL]
This can be used to downgrade to an older version, for example, or to download from a mirror.
And to choose the specific folder to download the server to:
$ export LTP_PATH = /path/to/save/language/tool
The default download path is ~/.cache/language_tool_python/
. The LanguageTool server is about 200 MB, so take that into account when choosing your download folder. (Or, if you you can’t spare the disk space, use a remote URL!)
Prerequisites
- Python 3.6+
- LanguageTool (Java 8.0 or higher)
The installation process should take care of downloading LanguageTool (it may
take a few minutes). Otherwise, you can manually download
LanguageTool-stable.zip and unzip it
into where the language_tool_python
package resides.
LanguageTool Version
As of April 2020, language_tool_python
was forked from language-check
and no longer supports LanguageTool versions lower than 4.0.
Acknowledgements
This is a fork of https://github.com/myint/language-check/ that produces more easily parsable
results from the command-line.
In the following tutorial, we will discuss a Python package called LanguageTool and understand how to create a simple grammar and spell checker using the Python programming language.
So, let’s get begun.
Understanding the LanguageTool library in Python
LanguageTool is an open-source tool used for grammar and spell-checking purposes, and it is also known as the spellchecker for OpenOffice. This package allows programmers to detect grammatical and spelling mistakes through a Python code snippet or a Command-line Interface (CLI).
How to Install the LanguageTool library?
To install the Python library, we need ‘pip’, a framework to manage packages required to install the modules from the trusted public repositories. Once we have ‘pip’, we can install the LanguageTool library using the command from a Windows command prompt (CMD) or terminal as shown below:
Syntax:
The language_tool_python library will download a LanguageTool server as a JAR file by default and execute that in the background to detect grammatical errors locally. But LanguageTool also provides a Public HTTP Proofreading API that is supported; however, there is a limitation in the number of calls.
Verifying the Installation
Once the library is installed, we can verify it by creating an empty Python program file and writing an import statement as follows:
File: verify.py
Now, save the above file and execute it using the following command in a terminal:
Syntax:
If the above Python program file does not return any error, the library is installed properly. However, in the case where an exception is raised, try reinstalling the library, and it is also recommended to refer to the official documentation of the module.
Working with the Python LanguageTool library
In the following section, we will understand the working of the LanguageTool library in Python using a practical example. The following Python script demonstrates the detection of grammatical mistakes and correcting them as well. We will work with the following text:
Text:
LanguageTool provides utility to check grammar and spelling errors. We just have to paste the text here and click the ‘Check Text’ button. Click the colored phrases for for information on potential errors. or we can use this text too see an some of the issues that LanguageTool can dedect. Whot do someone thinks of grammar checkers? Please not that they are not perfect. Style problems get a blue marker: It is 7 P.M. in the evening. The weather was nice on Monday, 22 November 2021
The above text contains some grammatical and spelling errors highlighted in bold. Let us consider the following Python script to understand the working of the LanguageTool utility:
Example:
Output:
[Match({'ruleId': 'ENGLISH_WORD_REPEAT_RULE', 'message': 'Possible typo: you repeated a word', 'replacements': ['for'], 'offsetInContext': 43, 'context': "...Text' button. Click the colored phrases for for information on potential errors. or we ...", 'offset': 165, 'errorLength': 7, 'category': 'MISC', 'ruleIssueType': 'duplication', 'sentence': 'Click the colored phrases for for information on potential errors.'}), Match({'ruleId': 'UPPERCASE_SENTENCE_START', 'message': 'This sentence does not start with an uppercase letter.', 'replacements': ['Or'], 'offsetInContext': 43, 'context': '...or for information on potential errors. or we can use this text too see an some of...', 'offset': 206, 'errorLength': 2, 'category': 'CASING', 'ruleIssueType': 'typographical', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'TOO_TO', 'message': 'Did you mean "to see"?', 'replacements': ['to see'], 'offsetInContext': 43, 'context': '...tential errors. or we can use this text too see an some of the issues that LanguageTool...', 'offset': 230, 'errorLength': 7, 'category': 'CONFUSED_WORDS', 'ruleIssueType': 'misspelling', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'EN_A_VS_AN', 'message': 'Use "a" instead of 'an' if the following word doesn't start with a vowel sound, e.g. 'a sentence', 'a university'.', 'replacements': ['a'], 'offsetInContext': 43, 'context': '...errors. or we can use this text too see an some of the issues that LanguageTool ca...', 'offset': 238, 'errorLength': 2, 'category': 'MISC', 'ruleIssueType': 'misspelling', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['detect', 'defect', 'deduct', 'deject'], 'offsetInContext': 43, 'context': '...ome of the issues that LanguageTool can dedect. Whot do someone thinks of grammar chec...', 'offset': 282, 'errorLength': 6, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['Who', 'What', 'Shot', 'Whom', 'Hot', 'WHO', 'Whet', 'Whit', 'Whoa', 'Whop', 'WHT', 'Wot', 'W hot'], 'offsetInContext': 43, 'context': '...he issues that LanguageTool can dedect. Whot do someone thinks of grammar checkers? ...', 'offset': 290, 'errorLength': 4, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'Whot do someone thinks of grammar checkers?'}), Match({'ruleId': 'PLEASE_NOT_THAT', 'message': 'Did you mean "note"?', 'replacements': ['note'], 'offsetInContext': 43, 'context': '...eone thinks of grammar checkers? Please not that they are not perfect. Style proble...', 'offset': 341, 'errorLength': 3, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'Please not that they are not perfect.'}), Match({'ruleId': 'PM_IN_THE_EVENING', 'message': 'This is redundant. Consider using "P.M."', 'replacements': ['P.M.'], 'offsetInContext': 43, 'context': '...yle problems get a blue marker: It is 7 P.M. in the evening. The weather was nice on Monday, 22 Nov...', 'offset': 414, 'errorLength': 19, 'category': 'REDUNDANCY', 'ruleIssueType': 'style', 'sentence': 'Style problems get a blue marker: It is 7 P.M. in the evening.'})]
Explanation:
In the above snippet of code, we have imported the required library and defined a tool that uses the LanguageTool utility to check the grammar and spelling errors in the text. We have then defined another string variable that stores the text passage we wanted to check. We have then retrieved the match using the check() function and printed them for the users.
As a result, we can observe that we have a detailed dictionary that displays the ruleId, message, replacements, offsetInContext, context, offset, and a lot more. We can find a detailed explanation of every rule ID in the LanguageTool Community.
Since we have detected the mistakes, it is time for us to correct them. Let us consider the following Python script demonstrating the same:
Example:
Output:
LanguageTool provides utility to check grammar and spelling errors. We just have to paste the text here and click the 'Check Text' button. Click the colored phrases for information on potential errors. Or we can use this text to see a some of the issues that LanguageTool can detect. Who do someone thinks of grammar checkers? Please note that they are not perfect. Style problems get a blue marker: It is 7 P.M.. The weather was nice on Monday, 22 November 2021
Explanation:
We have included some new variables to address mistakes, corrections, starting positions, and ending positions in the above snippet of code. We have then used the for-loop to iterate through the rules in my_matches and replace the mistakes with their corrections. We have then stored these corrected texts in a list. At last, we have again used the for-loop to iterate through the string elements in the list, join them together, and print the resulting text for the users.
Hence, we have successfully corrected the mistakes that we find out in the previous snippet of code.
Now, let us observe the mistakes that we captured earlier along with their respective corrections using the following Python script:
Example:
Output:
[('for for', 'for'), ('or', 'Or'), ('too see', 'to see'), ('an', 'a'), ('dedect', 'detect'), ('Whot', 'Who'), ('not', 'note'), ('P.M. in the evening', 'P.M.')]
Explanation:
In the above snippet of code, we have printed the list of the mistakes in the Text with their respective corrections.
Applying Suggestions to the Text automatically
Let us consider a simple example demonstrating how we can apply suggestions automatically to the Text using the LanguageTool library in Python.
Example:
Output:
Original Text: A quick broun fox jumpps over a a little lazy dog. Text after correction: A quick brown fox jumps over a little lazy dog.
Explanation:
In the above snippet of code, we have imported the required library and defined the tool for LanguageTool specifying the language as US English. We have then defined a string variable and stored some text to it. We have then used the correct() function of the tool to automatically correct the mistake in the text and print the resultant text for the users.
Hello
In this tutorial, I will show you how to check the language used in the sting.
And to do that, we need to work with the Googletrans library.
Googletrans is a Google API library that provides Google translate futures like translating, detecting…, and in our case, we’ll use the detect() method.
Let’s get started
Installing Googletrans
Install via pip:
pip install googletrans==3.1.0a0
How to use the detect() method
The detect() method returns the language of the text and the confidence.
Let me show you how to use it.
from googletrans import Translator
detector = Translator()
dec_lan = detector.detect('이 문장은 한글로 쓰여졌습니다.')
print(dec_lan)
Output
Detected(lang=ko, confidence=1.0)
As you can see, the method detected ko language with 1.0 confidence.
confidence is between 0.1 to 1.0
Print the language:
print(dec_lan.lang)
Output:
ko
Print the confidence:
print(dec_lan.confidence)
Output:
1.0
detect multiple strings:
sentences = ["I see cats", "bounjour mon chat"]
dec_lan = detector.detect(sentences)
for dec in dec_lan:
print(dec.lang)
Output:
Detected(lang=en, confidence=1.0)
Detected(lang=fr, confidence=1.0)
If you want to show the full language name, you need to follow these steps:
first, Define a dictionary that contains languages with code:
LANGUAGES = {
'af': 'afrikaans',
'sq': 'albanian',
'am': 'amharic',
'ar': 'arabic',
'hy': 'armenian',
'az': 'azerbaijani',
'eu': 'basque',
'be': 'belarusian',
'bn': 'bengali',
'bs': 'bosnian',
'bg': 'bulgarian',
'ca': 'catalan',
'ceb': 'cebuano',
'ny': 'chichewa',
'zh-cn': 'chinese (simplified)',
'zh-tw': 'chinese (traditional)',
'co': 'corsican',
'hr': 'croatian',
'cs': 'czech',
'da': 'danish',
'nl': 'dutch',
'en': 'english',
'eo': 'esperanto',
'et': 'estonian',
'tl': 'filipino',
'fi': 'finnish',
'fr': 'french',
'fy': 'frisian',
'gl': 'galician',
'ka': 'georgian',
'de': 'german',
'el': 'greek',
'gu': 'gujarati',
'ht': 'haitian creole',
'ha': 'hausa',
'haw': 'hawaiian',
'iw': 'hebrew',
'he': 'hebrew',
'hi': 'hindi',
'hmn': 'hmong',
'hu': 'hungarian',
'is': 'icelandic',
'ig': 'igbo',
'id': 'indonesian',
'ga': 'irish',
'it': 'italian',
'ja': 'japanese',
'jw': 'javanese',
'kn': 'kannada',
'kk': 'kazakh',
'km': 'khmer',
'ko': 'korean',
'ku': 'kurdish (kurmanji)',
'ky': 'kyrgyz',
'lo': 'lao',
'la': 'latin',
'lv': 'latvian',
'lt': 'lithuanian',
'lb': 'luxembourgish',
'mk': 'macedonian',
'mg': 'malagasy',
'ms': 'malay',
'ml': 'malayalam',
'mt': 'maltese',
'mi': 'maori',
'mr': 'marathi',
'mn': 'mongolian',
'my': 'myanmar (burmese)',
'ne': 'nepali',
'no': 'norwegian',
'or': 'odia',
'ps': 'pashto',
'fa': 'persian',
'pl': 'polish',
'pt': 'portuguese',
'pa': 'punjabi',
'ro': 'romanian',
'ru': 'russian',
'sm': 'samoan',
'gd': 'scots gaelic',
'sr': 'serbian',
'st': 'sesotho',
'sn': 'shona',
'sd': 'sindhi',
'si': 'sinhala',
'sk': 'slovak',
'sl': 'slovenian',
'so': 'somali',
'es': 'spanish',
'su': 'sundanese',
'sw': 'swahili',
'sv': 'swedish',
'tg': 'tajik',
'ta': 'tamil',
'te': 'telugu',
'th': 'thai',
'tr': 'turkish',
'uk': 'ukrainian',
'ur': 'urdu',
'ug': 'uyghur',
'uz': 'uzbek',
'vi': 'vietnamese',
'cy': 'welsh',
'xh': 'xhosa',
'yi': 'yiddish',
'yo': 'yoruba',
'zu': 'zulu',
}
Now let’s detect the language of the string and print the full name of the language.
detector = Translator()
dec_lan = detector.detect('이 문장은 한글로 쓰여졌습니다.')
print(LANGUAGES[dec_lan.lang])
Output:
korean
Check if a string is English
dec_lan = detector.detect('Googletrans is a free and unlimited python library that implemented Google Translate API')
if dec_lan.lang == "en" and dec_lan.confidence == 1:
print('Yes! it is')
else:
print('No! it is not')
Output
Yes! it is