Check english word in python - Word и Excel - помощь в работе с программами

I want to check in a Python program if a word is in the English dictionary.

I believe nltk wordnet interface might be the way to go but I have no clue how to use it for such a simple task.

def is_english_word(word):
    pass # how to I implement is_english_word?

is_english_word(token.lower())

In the future, I might want to check if the singular form of a word is in the dictionary (e.g., properties -> property -> english word). How would I achieve that?

Salvador Dali

211k145 gold badges695 silver badges750 bronze badges

asked Sep 24, 2010 at 16:01

For (much) more power and flexibility, use a dedicated spellchecking library like PyEnchant. There’s a tutorial, or you could just dive straight in:

>>> import enchant
>>> d = enchant.Dict("en_US")
>>> d.check("Hello")
True
>>> d.check("Helo")
False
>>> d.suggest("Helo")
['He lo', 'He-lo', 'Hello', 'Helot', 'Help', 'Halo', 'Hell', 'Held', 'Helm', 'Hero', "He'll"]
>>>

PyEnchant comes with a few dictionaries (en_GB, en_US, de_DE, fr_FR), but can use any of the OpenOffice ones if you want more languages.

There appears to be a pluralisation library called inflect, but I’ve no idea whether it’s any good.

answered Sep 24, 2010 at 16:26

KatrielKatriel

119k19 gold badges134 silver badges168 bronze badges

It won’t work well with WordNet, because WordNet does not contain all english words.
Another possibility based on NLTK without enchant is NLTK’s words corpus

>>> from nltk.corpus import words
>>> "would" in words.words()
True
>>> "could" in words.words()
True
>>> "should" in words.words()
True
>>> "I" in words.words()
True
>>> "you" in words.words()
True

answered Jan 28, 2014 at 8:38

SadıkSadık

4,1777 gold badges53 silver badges89 bronze badges

Using NLTK:

from nltk.corpus import wordnet

if not wordnet.synsets(word_to_test):
  #Not an English Word
else:
  #English Word

You should refer to this article if you have trouble installing wordnet or want to try other approaches.

nickb

59k12 gold badges105 silver badges141 bronze badges

answered Mar 18, 2011 at 11:29

Susheel JavadiSusheel Javadi

2,9843 gold badges32 silver badges34 bronze badges

Using a set to store the word list because looking them up will be faster:

with open("english_words.txt") as word_file:
    english_words = set(word.strip().lower() for word in word_file)

def is_english_word(word):
    return word.lower() in english_words

print is_english_word("ham")  # should be true if you have a good english_words.txt

To answer the second part of the question, the plurals would already be in a good word list, but if you wanted to specifically exclude those from the list for some reason, you could indeed write a function to handle it. But English pluralization rules are tricky enough that I’d just include the plurals in the word list to begin with.

As to where to find English word lists, I found several just by Googling «English word list». Here is one: http://www.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt You could Google for British or American English if you want specifically one of those dialects.

answered Sep 24, 2010 at 16:12

kindallkindall

177k35 gold badges271 silver badges305 bronze badges

For All Linux/Unix Users

If your OS uses the Linux kernel, there is a simple way to get all the words from the English/American dictionary. In the directory /usr/share/dict you have a words file. There is also a more specific american-english and british-english files. These contain all of the words in that specific language. You can access this throughout every programming language which is why I thought you might want to know about this.

Now, for python specific users, the python code below should assign the list words to have the value of every single word:

import re
file = open("/usr/share/dict/words", "r")
words = re.sub("[^w]", " ",  file.read()).split()
file.close()
    
def is_word(word):
    return word.lower() in words
 
is_word("tarts") ## Returns true
is_word("jwiefjiojrfiorj") ## Returns False

Hope this helps!

answered Apr 28, 2020 at 12:09

For a faster NLTK-based solution you could hash the set of words to avoid a linear search.

from nltk.corpus import words as nltk_words
def is_english_word(word):
    # creation of this dictionary would be done outside of 
    #     the function because you only need to do it once.
    dictionary = dict.fromkeys(nltk_words.words(), None)
    try:
        x = dictionary[word]
        return True
    except KeyError:
        return False

answered Jun 27, 2016 at 19:58

Eb AbadiEb Abadi

5355 silver badges17 bronze badges

I find that there are 3 package-based solutions to solve the problem. They are pyenchant, wordnet and corpus(self-defined or from ntlk). Pyenchant couldn’t installed easily in win64 with py3. Wordnet doesn’t work very well because it’s corpus isn’t complete. So for me, I choose the solution answered by @Sadik, and use ‘set(words.words())’ to speed up.

First:

pip3 install nltk
python3

import nltk
nltk.download('words')

Then:

from nltk.corpus import words
setofwords = set(words.words())

print("hello" in setofwords)
>>True

answered Feb 3, 2019 at 3:53

Young YangYoung Yang

1341 silver badge5 bronze badges

With pyEnchant.checker SpellChecker:

from enchant.checker import SpellChecker

def is_in_english(quote):
    d = SpellChecker("en_US")
    d.set_text(quote)
    errors = [err.word for err in d]
    return False if ((len(errors) > 4) or len(quote.split()) < 3) else True

print(is_in_english('“办理美国加州州立大学圣贝纳迪诺分校高仿成绩单Q/V2166384296加州州立大学圣贝纳迪诺分校学历学位认证'))
print(is_in_english('“Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.”'))

> False
> True

answered May 4, 2017 at 14:16

For a semantic web approach, you could run a sparql query against WordNet in RDF format. Basically just use urllib module to issue GET request and return results in JSON format, parse using python ‘json’ module. If it’s not English word you’ll get no results.

As another idea, you could query Wiktionary’s API.

answered Sep 24, 2010 at 17:28

burkestarburkestar

7531 gold badge4 silver badges12 bronze badges

use nltk.corpus instead of enchant. Enchant gives ambiguous results. For example :
for benchmark and bench-mark enchant is returning true. It should suppose to return false for benchmark.

answered Apr 10, 2021 at 11:51

Download this txt file https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt

then create a Set out of it using the following python code snippet that loads about 370k non-alphanumeric words in english

>>> with open("/PATH/TO/words_alpha.txt") as f:
>>>     words = set(f.read().split('n'))
>>> len(words)
370106

From here onwards, you can check for existence in constant time using

>>> word_to_check = 'baboon'
>>> word_to_check in words
True

Note that this set might not be comprehensive but still gets the job done, user should do quality checks to make sure it works for their use-case as well.

answered May 23, 2022 at 18:19

AyushAyush

4522 gold badges8 silver badges24 bronze badges

Источник

Here I introduce several ways to identify if the word consists of the English alphabet or not.

1. Using isalpha method

In Python, string object has a method called isalpha

word = "Hello"
if word.isalpha():
    print("It is an alphabet")
    
word = "123"
if word.isalpha():
    print("It is an alphabet")
else:
    print("It is not an alphabet")

However, this approach has a minor problem; for example, if you use the Korean alphabet, it still considers the Korean word as an alphabet. (Of course, for the non-Korean speaker, it wouldn’t be a problem 😅 )

To avoid this behavior, you should add encode method before call isalpha.

word = "한글"
if word.encode().isalpha():
    print("It is an alphabet")
else:
    print("It is not an alphabet")

2. Using Regular Expression.

I think this is a universal approach, regardless of programming language.

import re
word="hello"
reg = re.compile(r'[a-zA-Z]')

if reg.match(word):
    print("It is an alphabet")
else:
    print("It is not an alphabet")
    
word="123"
reg = re.compile(r'[a-z]')
if reg.match(word):
    print("It is an alphabet")
else:
    print("It is not an alphabet")

3. Using operator

It depends on the precondition; however, we will just assume the goal is if all characters should be the English alphabet or not.

Therefore, we can apply the comparison operator.

word = "hello"

if 'a' <= word[0] <= "z" or 'A' <= word[0] <='Z':
    print("It is an alphabet")
else:
    print("It is not an alphabet")

Note that we have to consider both upper and lower cases. Also, we shouldn’t use the entire word because the comparison would work differently based on the length of the word.

We can also simplify this code using the lower or upper method in the string.

word = "hello"

if 'a' <= word[0].lower() <= "z":
    print("It is an alphabet")
else:
    print("It is not an alphabet")

4. Using lower and upper method

This is my favorite approach. Since the English alphabet has Lower and Upper cases, unlike other characters (number or Korean), we can leverage this characteristic to identify the word.

word = "hello"
if word.upper() != word.lower():
    print("It is an alphabet")
else:
    print("It is not an alphabet")

Happy coding!

Источник

In this article, you will learn how to write python code for spell checking. I have discussed various methods you can use to implement your spell checker program.

But, before that, let’s learn some core topics, i.e., what spell checking is and its benefits. And after that, we will learn different approaches to writing Python programs for spell checking.

Table of contents

What is spell checking?
How to check spelling in python
Using a Dictionary for Spell Checking
Using enchant Spell Checker Library
Using pyspellchecker
Using TextBlob
Using autocorrect
Conclusion

What is spell checking?

Spell checking is the process of checking a document or sentence for spelling errors. It is done in two ways either manually by proofreading the documents or by using a spell checker software program like Grammarly, Pro Writing Aid, Ginger Software, etc.

Spell checking is an integral part of editing and proofreading as it ensures the document is error-free.

Here are some of the benefits of proofreading and spell-checking the documents:

Help communicate more effectively.
Help avoid embarrassing mistakes.
Help Impress your boss or teacher.
Help get a better grade on an assignment.
Help avoid misunderstandings.
Help find errors in your writing.
Help proofread your work.
Help improve your writing skills.
Help avoid plagiarism.
Help save time.

How to check spelling in python

There are several ways to approach spell checking in python. One common approach is to use a dictionary to store a list of words. The program then checks each word in the document against the dictionary to see if spells are correct.

Another approach is to use a spell checker library. These libraries typically use a more sophisticated approach to spell checking than a simple dictionary lookup. They may also consider the context of a word; to identify errors better.

In this article, We will start by looking at how to use a dictionary for spell-checking. After that, we will explore how to use spell checker libraries for writing spell-checking programs that will check and suggest English words.

01.

Using a Dictionary for Spell Checking

One of the simplest ways to approach spell checking is to use a dictionary. A dictionary is a data structure that stores a collection of values. Each value in a dictionary is associated with a key.

In the context of spell checking, the keys are words, and the values are Boolean values that indicate whether the spelling of a word is correct.

Let’s start by creating an empty dictionary. We will call our dictionary spell_dict.

spell_dict = {}

Next, we must populate our dictionary with words and their associated Boolean values. There are a few ways to do this; One option is to manually add words to the dictionary. This is fine for a few words, but it quickly becomes tedious for large dictionaries.

A more efficient approach is to read the words from a file. We can then loop over the words in the file and add them to the dictionary.

The following code shows how to read the words from a file and add them to a dictionary.

Note: In the words.txt file, place each word in a separate line.

with open ('words.txt', 'r') as f:
   for line in f :
       word = line.strip()
       spell_dict[word] = True

We can now use our dictionary for spell checking using python. The following code shows how to check a word against the dictionary.

def check_spelling(word):
   if word in spell_dict:
       return True
   else:
       return False

The check_spelling() function takes a word as an argument and returns a Boolean value. If the word is in the dictionary, the function returns True. Otherwise, it returns False.

We can use the check_spelling() function to spell check a document. The following code shows how to do this.

with open('document.txt', 'r') as f:
   for line in f:
       for word in line.split():
           if not check_spelling(word):
               print('Incorrect spelling: ' + word)

In the code above, we have opened the document.txt file in read-only mode. After that, we used a for loop to iterate over the lines in the file. For each line, we have used the split() method to split the line into words. We then used another for loop to iterate over the words. For each word, we have used the check_spelling() function to check if the word is correct. If the word is not in the dictionary, the check_spelling() function will return False.

And here is the complete code in action:

spell_dict = {}

with open ('words.txt', 'r') as f:
   for line in f :
       word = line.strip()
       spell_dict[word] = True

def check_spelling(word):
   if word in spell_dict:
       return True
   else:
       return False

with open('document.txt', 'r') as f:
   for line in f:
       for word in line.split():
           if not check_spelling(word):
               print('Incorrect spelling: ' + word)
              
'''
#--------words.txt-----
hello
how
are
you
doing?

#--------document.txt-----
Hell how r you doing?

#Output
Incorrect spelling: Hell
Incorrect spelling: r
'''

The spell-checking approach used in the code above is very basic. It will only identify words that are not in the dictionary.

If you need a more sophisticated spell checker program, you should consider using a different spell checker library.

02.

Using enchant Spell Checker Library

There are several spell checker libraries available for python. In this section, we will look at how to use the pyenchant library.

The pyenchant library is a Python wrapper for the Enchant spell checker library. Enchant is a cross-platform library that supports a variety of languages.

The pyenchant library is available from PyPI, so install it using pip.

pip install pyenchant

And if you are facing any problems, you can visit the pyenchant installation guide.

After successful installation, we can use enchant to check the spelling of words. The following code shows how to do this.

import enchant
d = enchant.Dict("en_US")
word = "hellow"
if d.check(word):
   print ("Spelling is correct")
else:
   print ("Spelling is incorrect")

In the code above, we have imported the enchant module. We then used the Dict class to create a dictionary object for the en_US language.

We then checked the spelling of the word “hellow”. The check() method returns a Boolean value. If the spelling of a word is correct, it returns True. Otherwise, it returns False.

If there is a misspelling, the pyenchant library can suggest possible corrections. The following code shows how to do this.

import enchant

d = enchant.Dict("en_US")
word = "hellow"
if d.check(word):
    print("Spelling is correct")
else:
    print("Spelling is incorrect")
    print("Suggested corrections: ")
    for correction in d.suggest(word):
        print(correction)

Apart from that, the pyenchant library also helps check the spelling in the document. The following code shows how to do this.

import enchant
d = enchant.Dict( "en_US" )
with open('document.txt', 'r') as f:
   for line in f:
       for word in line.split():
           if not d.check(word):
               print('Incorrect spelling: ' + word)
               print('Suggested corrections:')
               for correction in d.suggest(word):
                   print(correction)

In the code above, we have imported the enchant module. We then used the Dict class to create a dictionary object for the en_US language.

We then opened the document.txt file in read-only mode. We have then used a for loop to iterate over the lines in the file. For each line, we have used the split() method to split the line into words. We then used another for loop to iterate over the words.

For each word, we have used the check() method to check if the word is correct. If the word is not in the dictionary, the check() method will return False. In this case, we have printed a message to the console to indicate that the word is incorrect. We have then used the suggest() method to suggest possible corrections.

The spell checker approach used in the code above is more sophisticated than the one we used in the previous section. It takes into account the context of a word, which helps to identify errors.

Now, let’s discuss how to create a spell checker program in python using the pyspellchecker library. Pyspellchecker is a pure python library for checking spelling mistakes in strings.

We will first install the pyspellchecker library using pip.

pip install pyspellchecker

After installation, we can import the library into our python program.

import spellchecker

Now, we will create a function that takes a string as an input and returns a list of misspelled words.

def spell_check(string):
   spell = spellchecker.SpellChecker()
   misspelled_words = spell.unknown(string.split())
   return misspelled_words

We can now test our function on a string with some misspelled words.

string = 'This is a sentense with some mispelled words'
misspelled_words = spell_check(string)
print(misspelled_words)

The code will print out the following result:

{'sentense', 'mispelled'}

Here is the complete code in action:

import spellchecker

def spell_check(string):
   spell = spellchecker.SpellChecker()
   misspelled_words = spell.unknown(string.split())
   return misspelled_words

string = 'This is a sentense with some mispelled words'
misspelled_words = spell_check(string)
print(misspelled_words)

'''
#Output
{'mispelled', 'sentense'}
'''

Apart from that, you can also write a code that can give a suggestion to the misspelled words. Here is how you can do it.

from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])

for word in misspelled:
  # Get the one `most likely` answer
  print(spell.correction(word))

  # Get a list of `likely` options
  print(spell.candidates(word))

'''
#Output
happenning
{'hapening', 'happenning'}
'''

We have already seen two libraries for implementing spell-checking programs using python. The most popular one is probably the TextBlob library. The TextBlob library provides a simple interface for doing spell checking in python.

To use the TextBlob library, you first need to install it. You can do this using the pip command:

pip install textblob

After that, you need to create a TextBlob object by passing a string of text to the TextBlob constructor:

text = "Website name is problem solving code"
blob = TextBlob(text)

Once you have a TextBlob object, you can use the correct() method to correct the spelling of words in the string. The correct() method takes a word as an argument and returns the corrected spelling of the word:

print(blob.correct())

If the word is not in the TextBlob dictionary, then the correct() method will return the word unchanged.

Here is the complete code in action

from textblob import TextBlob
text = "Website name is problem solving code"
blob = TextBlob(text)

print(blob.correct())

'''
#Output
Website name is problem solving code
'''

The TextBlob library also provides many other methods and functions for spell checking. For more information, see the TextBlob documentation.

The Autocorrect module also enables us to check the spelling of a single word. And return the correct spelling of the word. To check the spelling, we are required to use the spell() function from the autocorrect library.

To use the autocorrect library, you first need to install it. You can do this using the pip command:

pip install autocorrect

Once you install it, you have to import it, and here is the complete working code.

from autocorrect import Speller

spell = Speller(lang='en')

print(spell('mussage'))
print(spell('survice'))
print(spell('hte'))

'''
#Output
message
service
the
'''

Conclusion

In this article, we have looked at how to use python for spell checking. We have looked at different approaches: one using a dictionary and the others using a spell checker library.

Which approach you use will depend on the specific requirements of your project. If you need a more sophisticated spell checker, you should consider using a spell checker library.

Источник

Improve Article

Save Article

Like Article

Read

Discuss

Improve Article

Save Article

Like Article

Given string str, the task is to check if this string str consists of valid English words or not.

A string is known as a valid English word if it meets all the below criteria-

The string can have an uppercase character as the first character only.
The string can only have lower case characters.
The string can consist of only one hyphen(‘-‘) surrounded by characters on both ends.
The string cannot consist of any digits.
If there is any punctuation mark it must be only one and it must be present at the end.

Print the number of valid words in the string str.

Input: str = “i Love- Geeks-forgeeks!”
Output: 1 word
Explanation:
word 1 = “i” does not contain first uppercase character, it is not valid word
word 2 = “Love-” hyphen is not surrounded by characters on both ends, it is not valid word
word 3 = “Geeks-forgeeks!” is a valid word

Input: str = “!this 1-s b8d!”
Output: 0 words
Explanation:
word 1 = “!this” punctuation mark is in the beginning, it is not valid word
word 2 = “1-s” digit as first character, it is not valid word
word 3 = “b8d!” first character is not uppercase, it is not valid word

Approach:

Initialize the variable ans to keep count of the number of valid words.
Loop through each word present in the sentence.
Check each letter of the word to see if it meets the criteria mentioned in the problem statement.
If any of the criteria is not met then return false.
If all the criteria are satisfied by the word, then increment the value of the variable ans.
Print the value of the variable ans.

Below is the C++ program of the above approach-

C++

#include <bits/stdc++.h>

using namespace std;

bool ValidWords(string sentence)

{

int hyphen = 0;

int size = sentence.size();

if (isupper(sentence[0])) {

for (int i = 0; i < size; i++) {

if (isdigit(sentence[i]))

return false;

if (isupper(sentence[i]))

return false;

if (isalpha(sentence[i]))

continue;

if (sentence[i] == '-') {

if (++hyphen > 1)

return false;

if (i - 1 < 0

|| !isalpha(sentence[i - 1])

|| i + 1 >= size

|| !isalpha(sentence[i + 1]))

return false;

}

else if (i != size - 1

&& ispunct(sentence[i]))

return false;

}

else

return true;

}

int main()

{

string sentence = "i Love- Geeks-Forgeeks!";

istringstream s(sentence);

string word;

int ans = 0;

while (s >> word)

if (ValidWords(word))

ans++;

cout << ans << " words";

}

Java

import java.io.*;

class GFG {

static boolean ValidWords(String sentence)

{

int hyphen = 0;

int size = sentence.length();

if (Character.isUpperCase(sentence.charAt(0))) {

for (int i = 0; i < size; i++) {

if (Character.isDigit(sentence.charAt(i)))

return false;

if (Character.isUpperCase(

sentence.charAt(i)))

return false;

if (Character.isAlphabetic(

sentence.charAt(i)))

continue;

if (sentence.charAt(i) == '-') {

hyphen = hyphen +1 ;

if (hyphen > 1)

return false;

if (i - 1 < 0

|| !Character.isAlphabetic(

sentence.charAt(i - 1))

|| i + 1 >= size

|| !Character.isAlphabetic(

sentence.charAt(i + 1)))

return false;

}

else if (i != size - 1

&& ((sentence.charAt(i) == '!'

|| sentence.charAt(i) == ','

|| sentence.charAt(i) == ';'

|| sentence.charAt(i) == '.'

|| sentence.charAt(i) == '?'

|| sentence.charAt(i) == '-'

|| sentence.charAt(i) == '''

|| sentence.charAt(i) == '"'

|| sentence.charAt(i)

== ':')))

return false;

}

else

return true;

return false;

}

public static void main(String[] args)

{

String sentence = "i Love- Geeks-Forgeeks!";

int ans = 0;

String words[] = sentence.split(" ");

for (String word : words) {

if (ValidWords(word)==true){

ans++;

}

System.out.print(ans + " words");

}

Python3

def ValidWords(sentence):

hyphen = 0

size = len(sentence)

if (sentence[0] >= 'A' and sentence[0] <= 'Z'):

for i in range(size):

if (sentence[i] >= '0' and sentence[i] <= '9'):

return False

if (sentence[i] >= 'A' and sentence[i] <= 'Z'):

return False

if (sentence[i] >= 'a' and sentence[i] <= 'z' or sentence[i] >= 'A'

and sentence[i] <= 'Z'):

continue

if (sentence[i] == '-'):

if (hyphen+1 > 1):

return False

if (i - 1 < 0 or ~(sentence[i - 1] >= 'a' and

sentence[i - 1] <= 'z' or sentence[i - 1] >= 'A'

and sentence[i - 1] <= 'Z') or i + 1 >= size or

~(sentence[i + 1] >= 'a' and sentence[i + 1] <= 'z'

or sentence[i + 1] >= 'A' and sentence[i + 1] <= 'Z')):

return False

elif (i != size - 1 and ((sentence[i] == '!' or sentence[i] == ','

or sentence[i] == ';' or sentence[i] == '.' or sentence[i] == '?'

or sentence[i] == '-' or sentence[i] == '''or sentence[i] == '"'

or sentence[i] == ':'))):

return False

else:

return True

sentence = "i Love- Geeks-Forgeeks!"

word = sentence.split(' ')

ans = 0

for indx in word :

if (ValidWords(indx)):

ans += 1

print(f"{ans} words")

C#

using System;

class GFG

{

static bool ValidWords(String sentence)

{

int hyphen = 0;

int size = sentence.Length;

if (char.IsUpper(sentence[0]))

{

for (int i = 0; i < size; i++)

{

if (char.IsDigit(sentence[i]))

return false;

if (char.IsUpper(sentence[i]))

return false;

if (char.IsLetter(sentence[i]))

continue;

if (sentence[i] == '-')

{

hyphen = hyphen + 1;

if (hyphen > 1)

return false;

if (i - 1 < 0

|| !char.IsLetter(sentence[i - 1])

|| i + 1 >= size

|| !char.IsLetter(sentence[i + 1]))

return false;

}

else if (i != size - 1

&& ((sentence[i] == '!'

|| sentence[i] == ','

|| sentence[i] == ';'

|| sentence[i] == '.'

|| sentence[i] == '?'

|| sentence[i] == '-'

|| sentence[i] == '''

|| sentence[i] == '"'

|| sentence[i]

== ':')))

return false;

}

else

return true;

return false;

}

public static void Main()

{

String sentence = "i Love- Geeks-Forgeeks!";

int ans = 0;

String[] words = sentence.Split(" ");

foreach (String word in words)

{

if (ValidWords(word) == true)

{

ans++;

}

Console.Write(ans + " words");

}

Javascript

<script>

const ValidWords = (sentence) => {

let hyphen = 0;

let size = sentence.length;

if (sentence[0] >= 'A' && sentence[0] <= 'Z')

{

for (let i = 0; i < size; i++)

{

if (sentence[i] >= '0' && sentence[i] <= '9')

return false;

if (sentence[i] >= 'A' && sentence[i] <= 'Z')

return false;

if (sentence[i] >= 'a' && sentence[i] <= 'z' ||

sentence[i] >= 'A' && sentence[i] <= 'Z')

continue;

if (sentence[i] == '-') {

if (++hyphen > 1)

return false;

if (i - 1 < 0

|| !(sentence[i - 1] >= 'a' &&

sentence[i - 1] <= 'z' ||

sentence[i - 1] >= 'A' &&

sentence[i - 1] <= 'Z')

|| i + 1 >= size

|| !(sentence[i + 1] >= 'a' &&

sentence[i + 1] <= 'z' ||

sentence[i + 1] >= 'A' &&

sentence[i + 1] <= 'Z'))

return false;

}

else if (i != size - 1

&& ((sentence[i] == '!'

|| sentence[i] == ','

|| sentence[i] == ';'

|| sentence[i] == '.'

|| sentence[i] == '?'

|| sentence[i] == '-'

|| sentence[i] == '''

|| sentence[i] == '"'

|| sentence[i]

== ':')))

return false;

}

else

return true;

}

let sentence = "i Love- Geeks-Forgeeks!";

let word = sentence.split(' ');

let ans = 0;

for (let indx in word)

if (ValidWords(word[indx]))

ans++;

document.write(`${ans} words`);

</script>

Time Complexity: O(N) as only one traversal of the string of length N is enough for the algorithm to perform all the tasks hence the overall complexity is linear.
Auxiliary Space: O(N) as the variable s stores all the words of the strings hence the overall space occupied by the algorithm is equal to the length of the string

Like Article

Save Article

Источник

License

Pure Python Spell Checking based on Peter
Norvig’s blog post on setting
up a simple spell checking algorithm.

It uses a Levenshtein Distance
algorithm to find permutations within an edit distance of 2 from the
original word. It then compares all permutations (insertions, deletions,
replacements, and transpositions) to known words in a word frequency
list. Those words that are found more often in the frequency list are
more likely the correct results.

pyspellchecker supports multiple languages including English, Spanish,
German, French, and Portuguese. For information on how the dictionaries were
created and how they can be updated and improved, please see the
Dictionary Creation and Updating section of the readme!

pyspellchecker supports Python 3

pyspellchecker allows for the setting of the Levenshtein Distance (up to two) to check.
For longer words, it is highly recommended to use a distance of 1 and not the
default 2. See the quickstart to find how one can change the distance parameter.

Installation

The easiest method to install is using pip:

pip install pyspellchecker

To build from source:

git clone https://github.com/barrust/pyspellchecker.git
cd pyspellchecker
python -m build

For python 2.7 support, install release 0.5.6
but note that no future updates will support python 2.

pip install pyspellchecker==0.5.6

Quickstart

After installation, using pyspellchecker should be fairly straight
forward:

from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))

If the Word Frequency list is not to your liking, you can add additional
text to generate a more appropriate list for your use case.

from spellchecker import SpellChecker

spell = SpellChecker()  # loads default word frequency list
spell.word_frequency.load_text_file('./my_free_text_doc.txt')

# if I just want to make sure some words are not flagged as misspelled
spell.word_frequency.load_words(['microsoft', 'apple', 'google'])
spell.known(['microsoft', 'google'])  # will return both now!

If the words that you wish to check are long, it is recommended to reduce the
distance to 1. This can be accomplished either when initializing the spell
check class or after the fact.

from spellchecker import SpellChecker

spell = SpellChecker(distance=1)  # set at initialization

# do some work on longer words

spell.distance = 2  # set the distance parameter back to the default

Non-English Dictionaries

pyspellchecker supports several default dictionaries as part of the default
package. Each is simple to use when initializing the dictionary:

from spellchecker import SpellChecker

english = SpellChecker()  # the default is English (language='en')
spanish = SpellChecker(language='es')  # use the Spanish Dictionary
russian = SpellChecker(language='ru')  # use the Russian Dictionary
arabic = SpellChecker(language='ar')   # use the Arabic Dictionary

The currently supported dictionaries are:

English — ‘en’
Spanish — ‘es’
French — ‘fr’
Portuguese — ‘pt’
German — ‘de’
Russian — ‘ru’
Arabic — ‘ar’

Dictionary Creation and Updating

The creation of the dictionaries is, unfortunately, not an exact science. I have provided a script that, given a text file of sentences (in this case from
OpenSubtitles) it will generate a word frequency list based on the words found within the text. The script then attempts to *clean up* the word frequency by, for example, removing words with invalid characters (usually from other languages), removing low count terms (misspellings?) and attempts to enforce rules as available (no more than one accent per word in Spanish). Then it removes words from a list of known words that are to be removed. It then adds words into the dictionary that are known to be missing or were removed for being too low frequency.

The script can be found here: scripts/build_dictionary.py`. The original word frequency list parsed from OpenSubtitles can be found in the `scripts/data/` folder along with each language’s include and exclude text files.

Any help in updating and maintaining the dictionaries would be greatly desired. To do this, a
discussion could be started on GitHub or pull requests to update the include and exclude files could be added.

Additional Methods

On-line documentation is available; below contains the cliff-notes version of some of the available functions:

correction(word): Returns the most probable result for the
misspelled word

candidates(word): Returns a set of possible candidates for the
misspelled word

known([words]): Returns those words that are in the word frequency
list

unknown([words]): Returns those words that are not in the frequency
list

word_probability(word): The frequency of the given word out of all
words in the frequency list

The following are less likely to be needed by the user but are available:

edit_distance_1(word): Returns a set of all strings at a Levenshtein
Distance of one based on the alphabet of the selected language

edit_distance_2(word): Returns a set of all strings at a Levenshtein
Distance of two based on the alphabet of the selected language

Credits

Peter Norvig blog post on setting up a simple spell checking algorithm
P Lison and J Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)

Источник

Sometimes, we want to check if a word is an English word with Python.

In this article, we’ll look at how to check if a word is an English word with Python.

How to check if a word is an English word with Python?

To check if a word is an English word with Python, we can use the enchant module.

To install it, we run:

pip install pyenchant

Then we can use it by writing:

import enchant

d = enchant.Dict("en_US")
print(d.check("Hello"))
print(d.suggest("Helo"))

We return the enchant dictionary object with the enchant.Dict class with the locale string as its argument.

Then we call d.check with a string to check if the string is an English word.

And we also called d.suggest with a string to check if there’re any English words close to the string argument.

Therefore, we see:

True
['He lo', 'He-lo', 'Hello', 'Helot', 'Help', 'Halo', 'Hell', 'Held', 'Helm', 'Hero', "He'll"]

from the print output.

Conclusion

To check if a word is an English word with Python, we can use the enchant module.

Web developer specializing in React, Vue, and front end development.

View Archive

Источник

`language_tool_python`: a grammar checker for Python 📝

Current LanguageTool version: 5.5

This is a Python wrapper for LanguageTool. LanguageTool is open-source grammar tool, also known as the spellchecker for OpenOffice. This library allows you to make to detect grammar errors and spelling mistakes through a Python script or through a command-line interface.

Local and Remote Servers

By default, language_tool_python will download a LanguageTool server .jar and run that in the background to detect grammar errors locally. However, LanguageTool also offers a Public HTTP Proofreading API that is supported as well. Follow the link for rate limiting details. (Running locally won’t have the same restrictions.)

Using `language_tool_python` locally

Local server is the default setting. To use this, just initialize a LanguageTool object:

import language_tool_python
tool = language_tool_python.LanguageTool('en-US')  # use a local server (automatically set up), language English

Using `language_tool_python` with the public LanguageTool remote server

There is also a built-in class for querying LanguageTool’s public servers. Initialize it like this:

import language_tool_python
tool = language_tool_python.LanguageToolPublicAPI('es') # use the public API, language Spanish

Using `language_tool_python` with the another remote server

Finally, you’re able to pass in your own remote server as an argument to the LanguageTool class:

import language_tool_python
tool = language_tool_python.LanguageTool('ca-ES', remote_server='https://language-tool-api.mywebsite.net')  # use a remote server API, language Catalan

Apply a custom list of matches with `utils.correct`

If you want to decide which Match objects to apply to your text, use tool.check (to generate the list of matches) in conjunction with language_tool_python.utils.correct (to apply the list of matches to text). Here is an example of generating, filtering, and applying a list of matches. In this case, spell-checking suggestions for uppercase words are ignored:

>>> s = "Department of medicine Colombia University closed on August 1 Milinda Samuelli"
>>> is_bad_rule = lambda rule: rule.message == 'Possible spelling mistake found.' and len(rule.replacements) and rule.replacements[0][0].isupper()
>>> import language_tool_python
>>> tool = language_tool_python.LanguageTool('en-US')
>>> matches = tool.check(s)
>>> matches = [rule for rule in matches if not is_bad_rule(rule)]
>>> language_tool_python.utils.correct(s, matches)
'Department of medicine Colombia University closed on August 1 Melinda Sam'

Example usage

From the interpreter:

>>> import language_tool_python
>>> tool = language_tool_python.LanguageTool('en-US')
>>> text = 'A sentence with a error in the Hitchhiker’s Guide tot he Galaxy'
>>> matches = tool.check(text)
>>> len(matches)
2
...
>>> tool.close() # Call `close()` to shut off the server when you're done.

Check out some Match object attributes:

>>> matches[0].ruleId, matches[0].replacements # ('EN_A_VS_AN', ['an'])
('EN_A_VS_AN', ['an'])
>>> matches[1].ruleId, matches[1].replacements
('TOT_HE', ['to the'])

Print a Match object:

>>> print(matches[1])
Line 1, column 51, Rule ID: TOT_HE[1]
Message: Did you mean 'to the'?
Suggestion: to the
...

Automatically apply suggestions to the text:

>>> tool.correct(text)
'A sentence with an error in the Hitchhiker’s Guide to the Galaxy'

From the command line:

$ echo 'This are bad.' > example.txt
$ language_tool_python example.txt
example.txt:1:1: THIS_NNS[3]: Did you mean 'these'?

Closing LanguageTool

language_tool_python runs a LanguageTool Java server in the background. It will shut the server off when garbage collected, for example when a created language_tool_python.LanguageTool object goes out of scope. However, if garbage collection takes awhile, the process might not get deleted right away. If you’re seeing lots of processes get spawned and not get deleted, you can explicitly close them:

import language_tool_python
tool = language_tool_python.LanguageToolPublicAPI('de-DE') # starts a process
# do stuff with `tool`
tool.close() # explicitly shut off the LanguageTool

You can also use a context manager (with .. as) to explicitly control when the server is started and stopped:

import language_tool_python

with language_tool_python.LanguageToolPublicAPI('de-DE') as tool:
  # do stuff with `tool`
# no need to call `close() as it will happen at the end of the with statement

Client-Server Model

You can run LanguageTool on one host and connect to it from another. This is useful in some distributed scenarios. Here’s a simple example:

server

>>> import language_tool_python
>>> tool = language_tool_python.LanguageTool('en-US', host='0.0.0.0')
>>> tool._url
'http://0.0.0.0:8081/v2/'

client

>>> import language_tool_python
>>> lang_tool = language_tool_python.LanguageTool('en-US', remote_server='http://0.0.0.0:8081')
>>>
>>>
>>> lang_tool.check('helo darknes my old frend')
[Match({'ruleId': 'UPPERCASE_SENTENCE_START', 'message': 'This sentence does not start with an uppercase letter.', 'replacements': ['Helo'], 'offsetInContext': 0, 'context': 'helo darknes my old frend', 'offset': 0, 'errorLength': 4, 'category': 'CASING', 'ruleIssueType': 'typographical', 'sentence': 'helo darknes my old frend'}), Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['darkness', 'darkens', 'darkies'], 'offsetInContext': 5, 'context': 'helo darknes my old frend', 'offset': 5, 'errorLength': 7, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'helo darknes my old frend'}), Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['friend', 'trend', 'Fred', 'freed', 'Freud', 'Friend', 'fend', 'fiend', 'frond', 'rend', 'fr end'], 'offsetInContext': 20, 'context': 'helo darknes my old frend', 'offset': 20, 'errorLength': 5, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'helo darknes my old frend'})]
>>>

Configuration

LanguageTool offers lots of built-in configuration options.

Example: Enabling caching

Here’s an example of using the configuration options to enable caching. Some users have reported that this helps performance a lot.

import language_tool_python
tool = language_tool_python.LanguageTool('en-US', config={ 'cacheSize': 1000, 'pipelineCaching': True })

Example: Setting maximum text length

Here’s an example showing how to configure LanguageTool to set a maximum length on grammar-checked text. Will throw an error (which propagates to Python as a language_tool_python.LanguageToolError) if text is too long.

import language_tool_python
tool = language_tool_python.LanguageTool('en-US', config={ 'maxTextLength': 100 })

Full list of configuration options

Here’s a full list of configuration options. See the LanguageTool HTTPServerConfig documentation for details.

'maxTextLength' - maximum text length, longer texts will cause an error (optional)
'maxTextHardLength' - maximum text length, applies even to users with a special secret 'token' parameter (optional)
'secretTokenKey' - secret JWT token key, if set by user and valid, maxTextLength can be increased by the user (optional)
'maxCheckTimeMillis' - maximum time in milliseconds allowed per check (optional)
'maxErrorsPerWordRate' - checking will stop with error if there are more rules matches per word (optional)
'maxSpellingSuggestions' - only this many spelling errors will have suggestions for performance reasons (optional,
                          affects Hunspell-based languages only)
'maxCheckThreads' - maximum number of threads working in parallel (optional)
'cacheSize' - size of internal cache in number of sentences (optional, default: 0)
'cacheTTLSeconds' - how many seconds sentences are kept in cache (optional, default: 300 if 'cacheSize' is set)
'requestLimit' - maximum number of requests per requestLimitPeriodInSeconds (optional)
'requestLimitInBytes' - maximum aggregated size of requests per requestLimitPeriodInSeconds (optional)
'timeoutRequestLimit' - maximum number of timeout request (optional)
'requestLimitPeriodInSeconds' - time period to which requestLimit and timeoutRequestLimit applies (optional)
'languageModel' - a directory with '1grams', '2grams', '3grams' sub directories which contain a Lucene index
                  each with ngram occurrence counts; activates the confusion rule if supported (optional)
'word2vecModel' - a directory with word2vec data (optional), see
https://github.com/languagetool-org/languagetool/blob/master/languagetool-standalone/CHANGES.md#word2vec
'fasttextModel' - a model file for better language detection (optional), see
                  https://fasttext.cc/docs/en/language-identification.html
'fasttextBinary' - compiled fasttext executable for language detection (optional), see
                  https://fasttext.cc/docs/en/support.html
'maxWorkQueueSize' - reject request if request queue gets larger than this (optional)
'rulesFile' - a file containing rules configuration, such as .langugagetool.cfg (optional)
'warmUp' - set to 'true' to warm up server at start, i.e. run a short check with all languages (optional)
'blockedReferrers' - a comma-separated list of HTTP referrers (and 'Origin' headers) that are blocked and will not be served (optional)
'premiumOnly' - activate only the premium rules (optional)
'disabledRuleIds' - a comma-separated list of rule ids that are turned off for this server (optional)
'pipelineCaching' - set to 'true' to enable caching of internal pipelines to improve performance
'maxPipelinePoolSize' - cache size if 'pipelineCaching' is set
'pipelineExpireTimeInSeconds' - time after which pipeline cache items expire
'pipelinePrewarming' - set to 'true' to fill pipeline cache on start (can slow down start a lot)

Installation

To install via pip:

$ pip install --upgrade language_tool_python

Customizing Download URL or Path

To overwrite the host part of URL that is used to download LanguageTool-{version}.zip:

$ export LTP_DOWNLOAD_HOST = [alternate URL]

This can be used to downgrade to an older version, for example, or to download from a mirror.

And to choose the specific folder to download the server to:

$ export LTP_PATH = /path/to/save/language/tool

The default download path is ~/.cache/language_tool_python/. The LanguageTool server is about 200 MB, so take that into account when choosing your download folder. (Or, if you you can’t spare the disk space, use a remote URL!)

Prerequisites

Python 3.6+
LanguageTool (Java 8.0 or higher)

The installation process should take care of downloading LanguageTool (it may
take a few minutes). Otherwise, you can manually download
LanguageTool-stable.zip and unzip it
into where the language_tool_python package resides.

LanguageTool Version

As of April 2020, language_tool_python was forked from language-check and no longer supports LanguageTool versions lower than 4.0.

Acknowledgements

This is a fork of https://github.com/myint/language-check/ that produces more easily parsable
results from the command-line.

Источник

In the following tutorial, we will discuss a Python package called LanguageTool and understand how to create a simple grammar and spell checker using the Python programming language.

So, let’s get begun.

Understanding the LanguageTool library in Python

LanguageTool is an open-source tool used for grammar and spell-checking purposes, and it is also known as the spellchecker for OpenOffice. This package allows programmers to detect grammatical and spelling mistakes through a Python code snippet or a Command-line Interface (CLI).

How to Install the LanguageTool library?

To install the Python library, we need ‘pip’, a framework to manage packages required to install the modules from the trusted public repositories. Once we have ‘pip’, we can install the LanguageTool library using the command from a Windows command prompt (CMD) or terminal as shown below:

Syntax:

The language_tool_python library will download a LanguageTool server as a JAR file by default and execute that in the background to detect grammatical errors locally. But LanguageTool also provides a Public HTTP Proofreading API that is supported; however, there is a limitation in the number of calls.

Verifying the Installation

Once the library is installed, we can verify it by creating an empty Python program file and writing an import statement as follows:

File: verify.py

Now, save the above file and execute it using the following command in a terminal:

Syntax:

If the above Python program file does not return any error, the library is installed properly. However, in the case where an exception is raised, try reinstalling the library, and it is also recommended to refer to the official documentation of the module.

Working with the Python LanguageTool library

In the following section, we will understand the working of the LanguageTool library in Python using a practical example. The following Python script demonstrates the detection of grammatical mistakes and correcting them as well. We will work with the following text:

Text:

LanguageTool provides utility to check grammar and spelling errors. We just have to paste the text here and click the ‘Check Text’ button. Click the colored phrases for for information on potential errors. or we can use this text too see an some of the issues that LanguageTool can dedect. Whot do someone thinks of grammar checkers? Please not that they are not perfect. Style problems get a blue marker: It is 7 P.M. in the evening. The weather was nice on Monday, 22 November 2021

The above text contains some grammatical and spelling errors highlighted in bold. Let us consider the following Python script to understand the working of the LanguageTool utility:

Example:

Output:

[Match({'ruleId': 'ENGLISH_WORD_REPEAT_RULE', 'message': 'Possible typo: you repeated a word', 'replacements': ['for'], 'offsetInContext': 43, 'context': "...Text' button. Click the colored phrases for for information on potential errors. or we ...", 'offset': 165, 'errorLength': 7, 'category': 'MISC', 'ruleIssueType': 'duplication', 'sentence': 'Click the colored phrases for for information on potential errors.'}), Match({'ruleId': 'UPPERCASE_SENTENCE_START', 'message': 'This sentence does not start with an uppercase letter.', 'replacements': ['Or'], 'offsetInContext': 43, 'context': '...or for information on potential errors. or we can use this text too see an some of...', 'offset': 206, 'errorLength': 2, 'category': 'CASING', 'ruleIssueType': 'typographical', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'TOO_TO', 'message': 'Did you mean "to see"?', 'replacements': ['to see'], 'offsetInContext': 43, 'context': '...tential errors. or we can use this text too see an some of the issues that LanguageTool...', 'offset': 230, 'errorLength': 7, 'category': 'CONFUSED_WORDS', 'ruleIssueType': 'misspelling', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'EN_A_VS_AN', 'message': 'Use "a" instead of 'an' if the following word doesn't start with a vowel sound, e.g. 'a sentence', 'a university'.', 'replacements': ['a'], 'offsetInContext': 43, 'context': '...errors. or we can use this text too see an some of the issues that LanguageTool ca...', 'offset': 238, 'errorLength': 2, 'category': 'MISC', 'ruleIssueType': 'misspelling', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['detect', 'defect', 'deduct', 'deject'], 'offsetInContext': 43, 'context': '...ome of the issues that LanguageTool can dedect. Whot do someone thinks of grammar chec...', 'offset': 282, 'errorLength': 6, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['Who', 'What', 'Shot', 'Whom', 'Hot', 'WHO', 'Whet', 'Whit', 'Whoa', 'Whop', 'WHT', 'Wot', 'W hot'], 'offsetInContext': 43, 'context': '...he issues that LanguageTool can dedect. Whot do someone thinks of grammar checkers? ...', 'offset': 290, 'errorLength': 4, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'Whot do someone thinks of grammar checkers?'}), Match({'ruleId': 'PLEASE_NOT_THAT', 'message': 'Did you mean "note"?', 'replacements': ['note'], 'offsetInContext': 43, 'context': '...eone thinks of grammar checkers? Please not that they are not perfect. Style proble...', 'offset': 341, 'errorLength': 3, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'Please not that they are not perfect.'}), Match({'ruleId': 'PM_IN_THE_EVENING', 'message': 'This is redundant. Consider using "P.M."', 'replacements': ['P.M.'], 'offsetInContext': 43, 'context': '...yle problems get a blue marker: It is 7 P.M. in the evening. The weather was nice on Monday, 22 Nov...', 'offset': 414, 'errorLength': 19, 'category': 'REDUNDANCY', 'ruleIssueType': 'style', 'sentence': 'Style problems get a blue marker: It is 7 P.M. in the evening.'})]

Explanation:

In the above snippet of code, we have imported the required library and defined a tool that uses the LanguageTool utility to check the grammar and spelling errors in the text. We have then defined another string variable that stores the text passage we wanted to check. We have then retrieved the match using the check() function and printed them for the users.

As a result, we can observe that we have a detailed dictionary that displays the ruleId, message, replacements, offsetInContext, context, offset, and a lot more. We can find a detailed explanation of every rule ID in the LanguageTool Community.

Since we have detected the mistakes, it is time for us to correct them. Let us consider the following Python script demonstrating the same:

Example:

Output:

LanguageTool provides utility to check grammar and spelling errors. We just have to paste the text here and click the 'Check Text' button. Click the colored phrases for information on potential errors. Or we can use this text to see a some of the issues that LanguageTool can detect. Who do someone thinks of grammar checkers? Please note that they are not perfect. Style problems get a blue marker: It is 7 P.M.. The weather was nice on Monday, 22 November 2021

Explanation:

We have included some new variables to address mistakes, corrections, starting positions, and ending positions in the above snippet of code. We have then used the for-loop to iterate through the rules in my_matches and replace the mistakes with their corrections. We have then stored these corrected texts in a list. At last, we have again used the for-loop to iterate through the string elements in the list, join them together, and print the resulting text for the users.

Hence, we have successfully corrected the mistakes that we find out in the previous snippet of code.

Now, let us observe the mistakes that we captured earlier along with their respective corrections using the following Python script:

Example:

Output:

[('for for', 'for'), ('or', 'Or'), ('too see', 'to see'), ('an', 'a'), ('dedect', 'detect'), ('Whot', 'Who'), ('not', 'note'), ('P.M. in the evening', 'P.M.')]

Explanation:

In the above snippet of code, we have printed the list of the mistakes in the Text with their respective corrections.

Applying Suggestions to the Text automatically

Let us consider a simple example demonstrating how we can apply suggestions automatically to the Text using the LanguageTool library in Python.

Example:

Output:

Original Text: A quick broun fox jumpps over a a little lazy dog.
Text after correction: A quick brown fox jumps over a little lazy dog.

Explanation:

In the above snippet of code, we have imported the required library and defined the tool for LanguageTool specifying the language as US English. We have then defined a string variable and stored some text to it. We have then used the correct() function of the tool to automatically correct the mistake in the text and print the resultant text for the users.

Источник

Hello

In this tutorial, I will show you how to check the language used in the sting.
And to do that, we need to work with the Googletrans library.

Googletrans is a Google API library that provides Google translate futures like translating, detecting…, and in our case, we’ll use the detect() method.

Let’s get started

Installing Googletrans

Install via pip:


pip install googletrans==3.1.0a0

How to use the detect() method

The detect() method returns the language of the text and the confidence.

Let me show you how to use it.


from googletrans import Translator

detector = Translator()

dec_lan = detector.detect('이 문장은 한글로 쓰여졌습니다.')

print(dec_lan)

Output


Detected(lang=ko, confidence=1.0)

As you can see, the method detected ko language with 1.0 confidence.

confidence is between 0.1 to 1.0

Print the language:


print(dec_lan.lang)

Output:

ko

Print the confidence:


print(dec_lan.confidence)

Output:

1.0

detect multiple strings:


sentences = ["I see cats", "bounjour mon chat"]
dec_lan = detector.detect(sentences)

for dec in dec_lan:
  print(dec.lang)

Output:


Detected(lang=en, confidence=1.0)
Detected(lang=fr, confidence=1.0)

If you want to show the full language name, you need to follow these steps:

first, Define a dictionary that contains languages with code:


LANGUAGES = {
    'af': 'afrikaans',
    'sq': 'albanian',
    'am': 'amharic',
    'ar': 'arabic',
    'hy': 'armenian',
    'az': 'azerbaijani',
    'eu': 'basque',
    'be': 'belarusian',
    'bn': 'bengali',
    'bs': 'bosnian',
    'bg': 'bulgarian',
    'ca': 'catalan',
    'ceb': 'cebuano',
    'ny': 'chichewa',
    'zh-cn': 'chinese (simplified)',
    'zh-tw': 'chinese (traditional)',
    'co': 'corsican',
    'hr': 'croatian',
    'cs': 'czech',
    'da': 'danish',
    'nl': 'dutch',
    'en': 'english',
    'eo': 'esperanto',
    'et': 'estonian',
    'tl': 'filipino',
    'fi': 'finnish',
    'fr': 'french',
    'fy': 'frisian',
    'gl': 'galician',
    'ka': 'georgian',
    'de': 'german',
    'el': 'greek',
    'gu': 'gujarati',
    'ht': 'haitian creole',
    'ha': 'hausa',
    'haw': 'hawaiian',
    'iw': 'hebrew',
    'he': 'hebrew',
    'hi': 'hindi',
    'hmn': 'hmong',
    'hu': 'hungarian',
    'is': 'icelandic',
    'ig': 'igbo',
    'id': 'indonesian',
    'ga': 'irish',
    'it': 'italian',
    'ja': 'japanese',
    'jw': 'javanese',
    'kn': 'kannada',
    'kk': 'kazakh',
    'km': 'khmer',
    'ko': 'korean',
    'ku': 'kurdish (kurmanji)',
    'ky': 'kyrgyz',
    'lo': 'lao',
    'la': 'latin',
    'lv': 'latvian',
    'lt': 'lithuanian',
    'lb': 'luxembourgish',
    'mk': 'macedonian',
    'mg': 'malagasy',
    'ms': 'malay',
    'ml': 'malayalam',
    'mt': 'maltese',
    'mi': 'maori',
    'mr': 'marathi',
    'mn': 'mongolian',
    'my': 'myanmar (burmese)',
    'ne': 'nepali',
    'no': 'norwegian',
    'or': 'odia',
    'ps': 'pashto',
    'fa': 'persian',
    'pl': 'polish',
    'pt': 'portuguese',
    'pa': 'punjabi',
    'ro': 'romanian',
    'ru': 'russian',
    'sm': 'samoan',
    'gd': 'scots gaelic',
    'sr': 'serbian',
    'st': 'sesotho',
    'sn': 'shona',
    'sd': 'sindhi',
    'si': 'sinhala',
    'sk': 'slovak',
    'sl': 'slovenian',
    'so': 'somali',
    'es': 'spanish',
    'su': 'sundanese',
    'sw': 'swahili',
    'sv': 'swedish',
    'tg': 'tajik',
    'ta': 'tamil',
    'te': 'telugu',
    'th': 'thai',
    'tr': 'turkish',
    'uk': 'ukrainian',
    'ur': 'urdu',
    'ug': 'uyghur',
    'uz': 'uzbek',
    'vi': 'vietnamese',
    'cy': 'welsh',
    'xh': 'xhosa',
    'yi': 'yiddish',
    'yo': 'yoruba',
    'zu': 'zulu',
}

Now let’s detect the language of the string and print the full name of the language.


detector = Translator()

dec_lan = detector.detect('이 문장은 한글로 쓰여졌습니다.')

print(LANGUAGES[dec_lan.lang])

Output:


korean

Check if a string is English


dec_lan = detector.detect('Googletrans is a free and unlimited python library that implemented Google Translate API')

if dec_lan.lang == "en" and dec_lan.confidence == 1:
    print('Yes! it is')
else:
    print('No! it is not')

Output


Yes! it is

Источник

For All Linux/Unix Users

1. Using isalpha method

2. Using Regular Expression.

3. Using operator

4. Using lower and upper method

What is spell checking?

How to check spelling in python

Using a Dictionary for Spell Checking

Using enchant Spell Checker Library

Conclusion

C++

Java

Python3

C#

Javascript

Installation

Quickstart

Non-English Dictionaries

Dictionary Creation and Updating

Additional Methods

The following are less likely to be needed by the user but are available:

Credits

How to check if a word is an English word with Python?

Conclusion

language_tool_python: a grammar checker for Python 📝

Local and Remote Servers

Using language_tool_python locally

Using language_tool_python with the public LanguageTool remote server

Using language_tool_python with the another remote server

Apply a custom list of matches with utils.correct

Example usage

Closing LanguageTool

Client-Server Model

server

client

Configuration

Example: Enabling caching

Example: Setting maximum text length

Full list of configuration options

Installation

Customizing Download URL or Path

Prerequisites

LanguageTool Version

Acknowledgements

Understanding the LanguageTool library in Python

How to Install the LanguageTool library?

Verifying the Installation

Working with the Python LanguageTool library

Applying Suggestions to the Text automatically

Installing Googletrans

How to use the detect() method

Check if a string is English

`language_tool_python`: a grammar checker for Python 📝

Using `language_tool_python` locally

Using `language_tool_python` with the public LanguageTool remote server

Using `language_tool_python` with the another remote server

Apply a custom list of matches with `utils.correct`