Find word sounds like

FaviconSimilar Sound

e.g.: ideas

Find words and phrases that can be pronounced similarly to another word or phrase

Related
  1. Language
Direct Link
FaviconSimilar Sound
Publisher
Lycos RhymeZone

Welcome to SoundsLike

SoundsLike is a python package
SoundsLike helps find words that sound like other words

Developed by Tal Zaken


PyPI
PyPI

What it does:

SoundsLike provides various functions that generate lists of similar-sounding words for a given search term. This general purpose tool can be useful for matching similar strings whose content is made up of the English language.

Who it’s for:

SoundsLike is for me. I’m interested in using it to deal with with messy names, misspelled words, and bad transcriptions. I think it can be especially useful for resolving mismatches at the interface of typed text and spoken language. Some example applications include:

  • Telephone Customer Service
  • Immigration Research
  • Database Entity Resolution

That said, it’s mostly just a project to help guide my own learning journey. If it’s useful for you too, that’s even better!

Some potential uses:

  • Finding alternate spellings of words.
  • Handling mispronunciations and/or transcription errors in search functions.
  • A songwriting or poem-writing aid.

How to install it:

pip install SoundsLike

Contents:

  • SoundsLike.py
  • DictionaryTools.py
  • FuzzyTerm.py
  • Example.py

Simple usage:

Perfect Homophones:

Example 1

from SoundsLike.SoundsLike import Search

Search.perfectHomophones('Jonathan')

['Johnathan', 'Johnathon', 'Jonathan', 'Jonathon', 'Jonothan']

Close Homophones:

Example 1

Search.perfectHomophones('Lucy')

['Lucey', 'Lucie', 'Lucy', 'Luisi']

Search.closeHomophones('Lucy')

['Lucey', 'Lucie', 'Lucy', 'Luisi']

Example 2

 Search.perfectHomophones('Lou C')

[]

 Search.closeHomophones('Lou C')

['Lucey', 'Lucie', 'Lucy', 'Luisi']

Other homophone and rhyming patterns are available in SoundsLike.py. Explore them using the help() function in your interactive interpreter.

Examples include:

  • Vowel-class Homophones: Vowel phones are reduced to their ARPAbet classification.
  • Phone-class Homophones: All phones are reduced to their ARPAbet classification.
  • End-rhymes: Traditional rhyming. Takes optional arguments to find end-rhymes with same syllabic length and/or same first initial.

Full documentation:

Coming eventually!

For detailed instructions, try running help(SoundsLike) in your interactive python interpreter.
You can also run help() on any of the individual modules contained in SoundsLike, though you may need to import them individually to do so. Keep in mind that the package is called SoundsLike, and the primary module is also called SoundsLike, so just make sure you specify the correct one.

SoundsLike uses the CMU Pronouncing Dictionary: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
It also offers some tools for working with dictionaries, if you prefer to use your own.
Phoneme generation, when enabled, is provided by g2p-en: https://github.com/Kyubyong/g2p
Similar string matching is provided by difflib: https://docs.python.org/3/library/difflib.html

Credits:

  • The CMU Pronouncing Dictionary
  • cmudict python wrapper by David L. Day
  • g2p-en python module by Kyubyong Park

Dependencies:

  • cmudict
  • g2p-en
  • json
  • re

Notes:

  • While this module supports multi-token search terms, it always reduces them to one group of phones. This could lead to some unexpected, but still useful, results. Resultantly, multi-token results are not supported at this time.
  • Support is not presently offered for multiple pronunciations of a given token.
  • English Language CMU Dict can be swapped out for any other pronunciation dict by uncommenting and setting the DictionaryFilepath to point at a JSON file. This could be useful if one wished to build and use a custom dictionary.

Ideas:

  • Provide option to import CMUdict (or any other dict) from a JSON, so that functions can reference it directly (rather than it being imported anew each time a function is called).
  • Create match pattern for same first and last syllable, and same number of syllables.
  • Add multi-token results. Check each token in multi-token search terms, and concatenate all possible results if all tokens are found. e.g.: «Lee Ann» could return «Leigh Anne,» «Lea An,» «Lianne,» etc.
  • Develop module to figure out «smart selection» results for display.
    -Dramatically speed up subsequent searches by front-loading rhyme-pattern generation and hashing the results.

License:

Licensed under the Apache License, Version 2.0

Enjoy!

Welcome to SoundsLike

SoundsLike is a python package
SoundsLike helps find words that sound like other words

Developed by Tal Zaken


What it does:

SoundsLike provides various functions that generate lists of similar-sounding words for a given search term. This general purpose tool is useful for matching similar strings whose content is made up of the English language.

Who it’s for:

SoundsLike is for anyone who deals with messy names, misspelled words, or bad transcriptions. It is especially useful for resolving mismatches at the interface of typed text and spoken language. Some example applications include:

  • Telephone Customer Service
  • Immigration Research
  • Database Entity Resolution

That said, it’s mostly just for me and my own learning journey. If it’s useful for you too, that’s even better!

How to install it:

pip install SoundsLike

Contents:

  • SoundsLike.py
  • DictionaryTools.py
  • FuzzyTerm.py

Simple usage:

Perfect Homophones:

Example 1

from SoundsLike.SoundsLike import Search

Search.perfectHomophones('Jonathan')

['Johnathan', 'Johnathon', 'Jonathan', 'Jonathon', 'Jonothan']

Close Homophones:

Example 1

Search.perfectHomophones('Lucy')

['Lucey', 'Lucie', 'Lucy', 'Luisi']

Search.closeHomophones('Lucy')

['Lucey', 'Lucie', 'Lucy', 'Luisi']

Example 2

 Search.perfectHomophones('Lou C')

[]

 Search.closeHomophones('Lou C')

['Lucey', 'Lucie', 'Lucy', 'Luisi']

Other homophone and rhyming patterns are available in SoundsLike.py. Explore them using the help() function in your interactive interpreter.

Examples include:

  • Vowel-class Homophones: Vowel phones are reduced to their ARPAbet classification.
  • Phone-class Homophones: All phones are reduced to their ARPAbet classification.
  • End-rhymes: Traditional rhyming. Takes optional arguments to find end-rhymes with same syllabic length and/or same first initial.

Full documentation:

Coming eventually!

For detailed instructions, try running help(SoundsLike) in your interactive python interpreter.
You can also run help() on any of the individual modules contained in SoundsLike, though you may need to import them individually to do so. Keep in mind that the package is called SoundsLike, and the primary module is also called SoundsLike, so just make sure you specify the correct one.

SoundsLike uses the CMU Pronouncing Dictionary: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
It also offers some tools for working with dictionaries, if you prefer to use your own.
Phoneme generation, when enabled, is provided by g2p-en: https://github.com/Kyubyong/g2p
Similar string matching is provided by difflib: https://docs.python.org/3/library/difflib.html

Credits:

  • The CMU Pronouncing Dictionary
  • cmudict python wrapper by David L. Day
  • g2p-en python module by Kyubyong Park

Dependencies:

  • cmudict
  • g2p-en
  • json
  • re

Notes:

  • While this module supports multi-token search terms, it always reduces them to one group of phones. This could lead to some unexpected, but still useful, results. Resultantly, multi-token results are not supported at this time.
  • Support is not presently offered for multiple pronunciations of a given token.
  • English Language CMU Dict can be swapped out for any other pronunciation dict by uncommenting and setting the DictionaryFilepath to point at a JSON file. This would be useful if one wishes to add terms to a custom dictionary.

Ideas:

  • Provide option to import CMUdict (or any other dict) from a JSON, so that functions can reference it directly (rather than it being imported anew each time a function is called).
  • Create match pattern for same first and last syllable, and same number of syllables.
  • Add multi-token results. Check each token in multi-token search terms, and concatenate all possible results if all tokens are found. e.g.: «Lee Ann» could return «Leigh Anne,» «Lea An,» «Lianne,» etc.
  • Develop module to figure out «smart selection» results for display.

License:

Licensed under the Apache License, Version 2.0

Enjoy!

In MySQL, you can use the SOUNDS LIKE operator to return results that sound like a given word.

This operator works best on strings in the English language (using it with other languages may return unreliable results).

Syntax

The syntax goes like this:

expr1 SOUNDS LIKE expr2

Where expr1 and expr2 are the input strings being compared.

This operator is the equivalent of doing the following:

SOUNDEX(expr1) = SOUNDEX(expr2)

Example 1 – Basic Usage

Here’s an example of how to use this operator in a SELECT statement:

SELECT 'Damn' SOUNDS LIKE 'Dam';

Result:

+--------------------------+
| 'Damn' SOUNDS LIKE 'Dam' |
+--------------------------+
|                        1 |
+--------------------------+

In this case, the return value is 1 which means that the two input strings sound alike.

Here’s what happens if the input strings don’t sound alike:

SELECT 'Damn' SOUNDS LIKE 'Cat';

Result:

+--------------------------+
| 'Damn' SOUNDS LIKE 'Cat' |
+--------------------------+
|                        0 |
+--------------------------+

Example 2 – Compared to SOUNDEX()

Here it is compared to SOUNDEX():

SELECT 
  'Damn' SOUNDS LIKE 'Dam' AS 'SOUNDS LIKE',
  SOUNDEX('Dam') = SOUNDEX('Damn') AS 'SOUNDEX()';

Result:

+-------------+-----------+
| SOUNDS LIKE | SOUNDEX() |
+-------------+-----------+
|           1 |         1 |
+-------------+-----------+

Example 3 – A Database Example

Here’s an example of how we can use this operator within a database query:

SELECT ArtistName FROM Artists
WHERE ArtistName SOUNDS LIKE 'Ay See Dee Ci';

Result:

+------------+
| ArtistName |
+------------+
| AC/DC      |
+------------+

And here it is using SOUNDEX():

SELECT ArtistName FROM Artists
WHERE SOUNDEX(ArtistName) = SOUNDEX('Ay See Dee Ci');

Result:

+------------+
| ArtistName |
+------------+
| AC/DC      |
+------------+

I have an algorithm that generates strings based on a list of input words. How do I separate only the strings that sounds like English words? ie. discard RDLO while keeping LORD.

EDIT: To clarify, they do not need to be actual words in the dictionary. They just need to sound like English. For example KEAL would be accepted.

Dori's user avatar

Dori

9151 gold badge12 silver badges20 bronze badges

asked Sep 18, 2008 at 12:20

Ozgur Ozcitak's user avatar

Ozgur OzcitakOzgur Ozcitak

10.3k8 gold badges45 silver badges56 bronze badges

1

You can build a markov-chain of a huge english text.

Afterwards you can feed words into the markov chain and check how high the probability is that the word is english.

See here: http://en.wikipedia.org/wiki/Markov_chain

At the bottom of the page you can see the markov text generator. What you want is exactly the reverse of it.

In a nutshell: The markov-chain stores for each character the probabilities of which next character will follow. You can extend this idea to two or three characters if you have enough memory.

answered Sep 18, 2008 at 12:23

Nils Pipenbrinck's user avatar

Nils PipenbrinckNils Pipenbrinck

82.8k30 gold badges150 silver badges221 bronze badges

The easy way with Bayesian filters (Python example from http://sebsauvage.net/python/snyppets/#bayesian)

from reverend.thomas import Bayes
guesser = Bayes()
guesser.train('french','La souris est rentrée dans son trou.')
guesser.train('english','my tailor is rich.')
guesser.train('french','Je ne sais pas si je viendrai demain.')
guesser.train('english','I do not plan to update my website soon.')

>>> print guesser.guess('Jumping out of cliffs it not a good idea.')
[('english', 0.99990000000000001), ('french', 9.9999999999988987e-005)]

>>> print guesser.guess('Demain il fera très probablement chaud.')
[('french', 0.99990000000000001), ('english', 9.9999999999988987e-005)]

answered Sep 18, 2008 at 12:42

e-satis's user avatar

1

You could approach this by tokenizing a candidate string into bigrams—pairs of adjascent letters—and checking each bigram against a table of English bigram frequencies.

  • Simple: if any bigram is sufficiently low on the frequency table (or outright absent), reject the string as implausible. (String contains a «QZ» bigram? Reject!)
  • Less simple: calculate the overall plausibility of the whole string in terms of, say, a product of the frequencies of each bigram divided by the mean frequency of a valid English string of that length. This would allow you to both (a) accept a string with an odd low-frequency bigram among otherwise high-frequency bigrams, and (b) reject a string with several individual low-but-not-quite-below-the-threshold bigrams.

Either of those would require some tuning of the threshold(s), the second technique more so than the first.

Doing the same thing with trigrams would likely be more robust, though it’ll also likely lead to a somewhat more strict set of «valid» strings. Whether that’s a win or not depends on your application.

Bigram and trigram tables based on existing research corpora may be available for free or purchase (I didn’t find any freely available but only did a cursory google so far), but you can calculate a bigram or trigram table from yourself from any good-sized corpus of English text. Just crank through each word as a token and tally up each bigram—you might handle this as a hash with a given bigram as the key and an incremented integer counter as the value.

English morphology and English phonetics are (famously!) less than isometric, so this technique might well generate strings that «look» English but present troublesome prounciations. This is another argument for trigrams rather than bigrams—the weirdness produced by analysis of sounds that use several letters in sequence to produce a given phoneme will be reduced if the n-gram spans the whole sound. (Think «plough» or «tsunami», for example.)

answered Sep 18, 2008 at 18:31

Josh Millard's user avatar

It’s quite easy to generate English sounding words using a Markov chain. Going backwards is more of a challenge, however. What’s the acceptable margin of error for the results? You could always have a list of common letter pairs, triples, etc, and grade them based on that.

answered Sep 18, 2008 at 12:22

William Keller's user avatar

William KellerWilliam Keller

5,2361 gold badge25 silver badges22 bronze badges

You should research «pronounceable» password generators, since they’re trying to accomplish the same task.

A Perl solution would be Crypt::PassGen, which you can train with a dictionary (so you could train it to various languages if you need to). It walks through the dictionary and collects statistics on 1, 2, and 3-letter sequences, then builds new «words» based on relative frequencies.

answered Sep 18, 2008 at 12:44

Andrew Barnett's user avatar

Andrew BarnettAndrew Barnett

5,0661 gold badge22 silver badges25 bronze badges

I’d be tempted to run the soundex algorithm over a dictionary of English words and cache the results, then soundex your candidate string and match against the cache.

Depending on performance requirements, you could work out a distance algorithm for soundex codes and accept strings within a certain tolerance.

Soundex is very easy to implement — see Wikipedia for a description of the algorithm.

An example implementation of what you want to do would be:

def soundex(name, len=4):
    digits = '01230120022455012623010202'
    sndx = ''
    fc = ''

    for c in name.upper():
        if c.isalpha():
            if not fc: fc = c
            d = digits[ord(c)-ord('A')]
            if not sndx or (d != sndx[-1]):
                sndx += d

    sndx = fc + sndx[1:]
    sndx = sndx.replace('0','')
    return (sndx + (len * '0'))[:len]

real_words = load_english_dictionary()
soundex_cache = [ soundex(word) for word in real_words ]

if soundex(candidate) in soundex_cache:
    print "keep"
else:
    print "discard"

Obviously you’ll need to provide an implementation of read_english_dictionary.

EDIT: Your example of «KEAL» will be fine, since it has the same soundex code (K400) as «KEEL». You may need to log rejected words and manually verify them if you want to get an idea of failure rate.

answered Sep 18, 2008 at 12:30

Russ's user avatar

RussRuss

1,5241 gold badge9 silver badges5 bronze badges

Metaphone and Double Metaphone are similar to SOUNDEX, except they may be tuned more toward your goal than SOUNDEX. They’re designed to «hash» words based on their phonetic «sound», and are good at doing this for the English language (but not so much other languages and proper names).

One thing to keep in mind with all three algorithms is that they’re extremely sensitive to the first letter of your word. For example, if you’re trying to figure out if KEAL is English-sounding, you won’t find a match to REAL because the initial letters are different.

answered Sep 18, 2008 at 12:53

Andrew Barnett's user avatar

Andrew BarnettAndrew Barnett

5,0661 gold badge22 silver badges25 bronze badges

Do they have to be real English words, or just strings that look like they could be English words?

If they just need to look like possible English words you could do some statistical analysis on some real English texts and work out which combinations of letters occur frequently. Once you’ve done that you can throw out strings that are too improbable, although some of them may be real words.

Or you could just use a dictionary and reject words that aren’t in it (with some allowances for plurals and other variations).

answered Sep 18, 2008 at 12:25

Kevin ORourke's user avatar

You could compare them to a dictionary (freely available on the internet), but that may be costly in terms of CPU usage. Other than that, I don’t know of any other programmatic way to do it.

answered Sep 18, 2008 at 12:22

Ryan Bigg's user avatar

Ryan BiggRyan Bigg

107k23 gold badges235 silver badges261 bronze badges

That sounds like quite an involved task! Off the top of my head, a consonant phoneme needs a vowel either before or after it. Determining what a phoneme is will be quite hard though! You’ll probably need to manually write out a list of them. For example, «TR» is ok but not «TD», etc.

answered Sep 18, 2008 at 12:26

Iain's user avatar

IainIain

9,37211 gold badges46 silver badges64 bronze badges

I would probably evaluate each word using a SOUNDEX algorithm against a database of english words. If you’re doing this on a SQL-server it should be pretty easy to setup a database containing a list of most english words (using a freely available dictionary), and MSSQL server has SOUNDEX implemented as an available search-algorithm.

Obviously you can implement this yourself if you want, in any language — but it might be quite a task.

This way you’d get an evaluation of how much each word sounds like an existing english word, if any, and you could setup some limits for how low you’d want to accept results. You’d probably want to consider how to combine results for multiple words, and you would probably tweak the acceptance-limits based on testing.

answered Sep 18, 2008 at 12:32

ulrikj's user avatar

ulrikjulrikj

1762 silver badges12 bronze badges

I’d suggest a few simple rules and standard pairs and triplets would be good.

For example, english sounding words tend to follow the pattern of vowel-consonant-vowel, apart from some dipthongs and standard consonant pairs (e.g. th, ie and ei, oo, tr). With a system like that you should strip out almost all words that don’t sound like they could be english. You’d find on closer inspection that you will probably strip out a lot of words that do sound like english as well, but you can then start adding rules that allow for a wider range of words and ‘train’ your algorithm manually.

You won’t remove all false negatives (e.g. I don’t think you could manage to come up with a rule to include ‘rythm’ without explicitly coding in that rythm is a word) but it will provide a method of filtering.

I’m also assuming that you want strings that could be english words (they sound reasonable when pronounced) rather than strings that are definitely words with an english meaning.

answered Sep 18, 2008 at 12:35

workmad3's user avatar

workmad3workmad3

24.9k4 gold badges35 silver badges56 bronze badges

Homonyms are a bare bear, am I right?

You know the difference between “faze” and “phase”, but your fingers type “The process didn’t phase him,” and your brain hears “The process didn’t faze him.” While proofreading, copy-blindness fills in the proper word instead of seeing the error, so you don’t even realize you made the mistake until a helpful reader sends you an email about finding the typo in the published book.

Fortunately, MS Word has a tool that helps find homonyms. It’s in Find/Replace: “Sounds Like (English)”.

2018-02-26_Word Sounds Like Search

Go to Home > Editing > Replace > More. Check the box for “Sounds Like (English)”. Enter a homonym you’re unsure of. For example “faze”. Click “Find Next” and Word will find any word that sounds like faze, including “phase”. You can double-check each instance and determine if you’ve used the correct word.

To make sure you find all the word forms, such as fazed, fazing or unfazed, check the box for “Find all word forms (English)”, too.

This also works for words with apostrophes, such as “it’s”. Mixing up “its” and “it’s” (or God help us, its) is probably the number one homonym mix-up across the board. Searching for them with this method is a bit tedious, but it’s a lot more reliable than trying to root them out during a proofread and much better than letting errors appear in the published book.

To find them type its into the Find field. Check the boxes for “Sounds like (English)” and “Ignore punctuation characters”. Use “Find Next” to search for each instance and determine if you used it correctly. This also works for possessives such as “the Smiths’ house” or “Smith’s house”.

EDITED TO UPDATE: It’s been pointed out that this option is not available in older versions of Word. If your Find/Replace task box doesn’t look like the one pictured above, then this isn’t going to work for you. Sorry.

Uncertain about homonyms? Here’s a terrific resource:

 Alan Cooper’s Homonym List

Happy hunting!

****************************

My goal for 2018 is to teach as many writers as possible how to efficiently and expertly use MS Word as a writing and self-publishing tool. Watch this blog-space for more tips, tricks and techniques. Or, if you’d prefer all the information in one package, including step-by-step instructions for formatting ebooks and print-on-demand editions, WORD for the Wise: Using Microsoft Office Word for Creative Writing and Self-publishing is available at Amazon as an ebook and in print.

Понравилась статья? Поделить с друзьями:
  • Find word search answers
  • Find word puzzle answers
  • Find word or phrases in the texts
  • Find word on web pages
  • Find word on the page for each of these definitions