Make vocabulary word list

Making semi-automatic word lists to study a text in a foreign language

Introduction

When studying a foreign language, it is useful to learn vocabulary in
context: I used to make vocabulary lists to record unknown words for
each text I studied, so when I revised the vocabulary list I was able to
remember both the word and the context.

I designed a workflow and a series of scripts to build in a
semi-automatic way to build vocabulary lists from raw texts. Here is the
workflow:

  • copy the text in Microsoft Word or LibreOffice Writer, read the text
    and underline unkown words,
  • extract the underlined words, with a script which will also get their lemmas
    (that is, the form under which they are recorded in the dictionary, for
    example cats -> cat); clean this list (remove unwanted lemmas, add
    some more, etc.),
  • extract dictionary definitions, again with a script and clean them
    (choose between various meanings, remove or adde examples, etc.),
  • transform the list to a .tex file, again with a script, and make a
    pdf out of it.

You get, for example, for Ancient Greek (2 pages are presented side by
side):

Complete example of the workflow

I will use Ancient Greek as an example, but the workflow can be adapted
to other languages (I have used it for Latin, English, German). You
will need some knownledge of the command line, but it should work on
Mac OS, Linux and Windows (with Cygwin).

The first step is to copy the text you want to study in Microsoft Word or
LibreOffice Writer, read it and underline the unknown words. Here is an
example from Xenophon’s Anabasis:

Save the text in the .odt format (even with Microsoft Word), because
this is an open source format that can be read by non-proprietary
programs. Let’s say you have saved it under the name
xenophon_anabasis.odt.

Run the following command to extract the words and get the lemmas (or
lemmata). Lemmas are searched by requiring pages from the Perseus
website:

python3 extract_words.py xenophon_anabasis.odt | 
   perl getlemmata.pl  -i /dev/stdin -o xenophon_anabasis.list

You will get a xenophon_anabasis.list file, containing the lemmas.
Open it with a text editor (Gedit, Text Editor, Notepad, etc.):

# The lemmata found on Perseus are:
# for word βοιωτιος:
Βοιώτιος
# for word συνεγενετο:
συγγίγνομαι
# for word πρωτοις:
πρότερος
πρῶτος
πρωτός
# for word ηττασθαι:
ἡσσάομαι
# for word ευεργετων:
εὐεργέτης
εὐεργετέω
...

(Only the beginning of the file is shown.)

Based on your knowledge of the language and of the text, try to select
the correct the correct lemma when there are several choices, because
one word form may be related to several lemmas (as in English: «axes»
may be the plural of «ax(e)» or of «axis»). Lines starting with a
hashtag are ignored. So you may end up with something like:

# The lemmata found on Perseus are:
# for word βοιωτιος:
Βοιώτιος
# for word συνεγενετο:
συγγίγνομαι
# for word πρωτοις:
πρῶτος
# for word ηττασθαι:
#ἡσσάομαι
ἡττάω
# for word ευεργετων:
εὐεργέτης
...

You may also add lemma that have not been found, if you have looked up a
word in another dictionary. Save the file, for instance in
xenophon_anabasis.list-cleaned. Now run the simdic.pl script to get
the definitions from a dictionary you specify with the -d option:

perl simdic.pl -d /tmp/bailly.dic < xenophon_anabasis.list-cleaned 
   > xenophon_anabasis.voc

For each word in the cleaned list, dictionary definitions have now been
added in the resulting xenophon_anabasis.voc file:

Βοιώτιος α, ον   ::de Béotie, Béotien ; (en mauv. part) syn. de lourdaud
                 ::[Βοιωτός]
συγγίγνομαι      ::f. συγγενήσομαι, ao.2 συνεγενόμην, etc
                 ::1 naître avec
                 ::2 être ensemble, être avec, avoir des rapports, fréquenter [...]
                 ::3 venir à l’aide de, aider, ÷ dat
                 ::[σύν, γίγνομαι]
πρῶτος η, ον     ::premier
                 ::I. (adj.) 1 (dans l’espace) le plus en avant  [...]
                 ::2 (avec idée de rang et de nombre) le premier  [...]
                 ::II. (subst.) 1 τὸ πρῶτον, τὰ πρῶτα, le commencement
                 ::2 τὰ πρῶτα (ἆθλα) le premier prix ; τὰ πρῶτα λαμβάνειν  [...]
                 ::3 τὰ πρῶτα, le plus haut degré, le point le plus élevé  [...]
                 ::III. (adv.) 1 fém. πρώτη : τὴν πρώτην, la première fois,  [...]
                 ::2 neutre πρῶτον, πρῶτα et τὸ πρῶτον, τὰ πρῶτα,  [...]
                 ::[p. *πρόατος, Sp. de πρό ; cf. Cp. πρότερος]
ἡττάω            ::att. c. ἡσσάω
εὐεργέτης ου (ὁ) ::bienfaiteur, évergète
                 ::[εὖ, ἔργον]

(I’ve cut the end of the lines to not overload this guide.)

Open the file in your text editor and perform some clean up: choose the
right definition (the one that is used in the original text), if there
are several, choose the example you want to include, etc. Add a title,
and, optionaly, the type greekfont to get a nice greek font (defined
in the mklist_text.pl code). You may end up with something like this:

#title:Xenophon, Anabasis, II, 6, 16--20
#type:greekfont
Βοιώτιος α, ον                   ::de Béotie, Béotien
συγγίγνομαι                      ::fréquenter +dat
οἱ πρῶτοι                        ::les premiers (des citoyens), les plus nobles
ἡττάω                            ::être inférieur à (+gen) en (+acc ou dat)
εὐεργέτης ου (ὁ)                 ::bienfaiteur, évergète
σφόδρα                           ::très, tout à fait
ἔχω                              ::tenir pour, regarder comme
ἔνδηλος ος, ον                   ::clair, évident
αὖ                               ::d'un autre côté, au contraire, cependant
αἰδώς όος-οῦς (ἡ)                ::respect
ἐμποιέω                          ::faire naître, engendrer, produire
αἰσχύνομαι                       ::avoir honte devant, avoir du respect pour +acc
ἀπεχθάνομαι                      ::être haï, être odieux
...

Save the file under the name xenophon_anabasis.voc-final.

Then use the mklist_text.pl to create a .tex file:

perl mklist_text.pl xenophon_anabasis.voc-final

If latexmk is installed, the script will compile you .tex file
automatically and you will just end up with a nice pdf file, like this:

The pdf file will be name accordingly to the title of the vocabulary
list, for example xenophon_anabasis_ii_6_16-20.pdf in our example.

The page size is A5.

You will find longer samples in Greek and Latin in the samples
directory.

In summary:

# underline words with Office or Writer, and save to .odt
python3 extract_words.py xenophon_anabasis.odt | 
   perl getlemmata.pl  -i /dev/stdin -o xenophon_anabasis.list
# clean the file xenophon_anabasis.list
perl simdic.pl -d /tmp/bailly.dic < xenophon_anabasis.list-cleaned 
   > xenophon_anabasis.voc
# clean the file xenophon_anabasis.voc
perl mklist_text.pl xenophon_anabasis.voc-final

The dictionary files

In order to get the definition with the simdic.pl script, you will
need to download and create a dictionary file. It is a simple text file
with the following format:

  • each entry is on a line, containing:
    • the entry word
    • a tabulation
    • the complete entry, e.g. with the gender, genitive case, etc.
    • a tabulation
    • the various definitions and examples are separated with the characters
      n (not an actual new line, just the two characters).

This is also the file format use by StarDict and GoldenDict programs.
Here is an example:

Α, α	Α, α	nἄλφα (τὸ)nindécl.nnalpha :n	1e lettre de l’alphabet grec [...]
Αἰάκειον	Αἰάκειον ου (τὸ)	nnsanctuaire d’Éaque, à Éginenn[Αἴακος]
Αἰάντειος	Αἰάντειος ος, ον	nnd’Ajaxnn[Αἴας]
Αἰήτας	Αἰήτας αο (ὁ)	nndor. c. Αἰήτης
Αἰήτης	Αἰήτης ου (ὁ)	nnÆètès :n	1 roi de Colchiden	2 autres
Αἰακίδης	Αἰακίδης ου (ὁ)	nnfils, descendant d’Éaque ; οἱ Αἰακῖδαι les Éacides [...]
Αἰαντίδης 1	Αἰαντίδης 1 ου	nadj. m.nnde la tribu Æantidenn[Αἰαντίς]
...

You can download many dictionary online. The Bailly Abrégé (the one
used in this example) can be downloaded from my
website.

Contact

Please ask me any question through my website: boberle.com.

Students Working

Support materials fast

The ideal instructional resource for spelling, language arts, ESL, and vocabulary enrichment in any subject.
Quickly create individual or class sets of vocabulary worksheets to support your lessons
saving valuable preparation time and resources.

Designed for educators but ideal for anyone interested in making cloze tests, spelling exercises, word searches, crosswords, word jumbles, and other
vocabulary puzzles and activities.

Vocabulary Activities

Auto-Generated Activities

Activities are automatically generated from your word list, sentence collection, or text file. Just a single-click of the mouse and a completely
new and original activity is generated instantly.

See how a crossword activity is generated

Word List Activities

Word Activities

Create more than 25 word activities instantly from just one word-list. All it takes is a list of words with accompanying clues. Use one
of the many included lists or easily create your own.
Then generate new original word activities with a single click. Includes crosswords, word searches,
jumbles, mazes, decoding, spelling, and a whole lot more.

Generating Word List Activities

Text Activities

Text Activities

Create multiple text activities from a single text passage. Use one of the included text passages or create your own.
Each text activity is automatically generated from the provided text. Includes cloze tests, cryptograms,
spelling, punctuation practice, and more.

Generating Text Activities

Sentence Activities

Sentence Activities

Generate sentence activities from any collection of sentences. Create your own collections with the sentence collection editor or use
one of the built-in collections. Sentence activities include spelling, scrambles, matching, word shapes, and more.

Generating Sentence Activities

Cloze Tests

Cloze Tests for Reading Comprehension

One of the most popular activity generators in Vocabulary Worksheet Factory is the cloze generator. Take any passage of text
and instantly turn it into a cloze worksheet. Select the increment and miniumum word length, and optionally add a word bank, hints, and distractor words.

See how a cloze activity is created

I use the vocabulary worksheet software. I have subscribed to another software
package this year. I have just wasted US $30.00. No other software comes anywhere
near your products. Your stuff is easy to use and adaptable to Australian needs.
All the worksheets I make using Schoolhouse Technologies software look professional
and are easy for children to use. The one off payment gives me a solid product that
I can use almost daily.

Spelling Practice

Spelling Practice

Provide spelling practice and assessments in a variety of contexts. Turn any word list, text passage, or sentence collection into an instant spelling activity. Misspellings are automatically generated based on
common typo, phonetic, and other spelling errors.

Word Searches

Word Searches Plus

Why settle for just the common style of word search? Mix things up with these word search
variants: Word Angles, Wacky Trails, and Missing Vowels. With any of the word searches,
employ a word bank or clues or both. Hide words in up to eight directions. Even use start
bubbles to aid discovery.

Generating Word Search Alternatives

Word Bank

Word Banks with a Twist

Word banks can be added to almost every word list activity to aid in solving the activity and to provide self-correction. But
an extra fun challenge can be provided by turning the word list into its own jumble. Words can be reversed, split and rejoined in mixed order, or just completely
scrambled. A puzzle within a puzzle.

I have been using the Vocabulary
Worksheet Factory puzzles to teach math vocabulary and spelling. They are a big
hit with the kids. Spelling and word recognition has improved.
I assign them as homework and they return them completed dying to check the answer
keys. I have left them as review work when a substitute teacher has been needed.
I use them as a warm up exercise on Monday mornings…Keep up the good work.

Clues

Clues in Unexpected Places

We expect to find clues in crossword puzzles but not normally in word searches. So imagine a word search
that, instead of a word bank simply providing a list of the hidden words, has a set of clues that must first be solved.
Now imagine that option being available in 16 additional word puzzles from jumbles to decoding. A whole new level of challenge.

Vocabulary Activities

The Activity Generators

Engage and challenge your students with targeted vocabulary worksheet activities.
With a total of more than 45 activity generators and the many
activity configurations, limitless vocabulary worksheets are just a mouse click away.

Examine the activity generators

New in Version 7

New in version 6

Experience new ways to engage and challenge your students with version 6 of Vocabulary Worksheet Factory.
This new version brings new activities, new options for existing activities, enhanced document layout,
improved dialogs for working with word lists, sentences, and text, and much more.

See what’s new in version 6

Select your edition

With five editions of Vocabulary Worksheet Factory from Free to Enterprise, there is an affordable vocabulary worksheet generator for everyone.

Windows 11, 10, 8, 7  Designed for Windows 11, 10, 8, 7

  • single-user
  • free word search maker
  • single-user
  • 23 activity generators
  • single-user
  • 45 activity generators
  • all users site wide
  • 45 activity generators
  • site-wide + publishing
  • 45 activity generators

60-Day Money-Back Guarantee

If for any reason you are not satisfied with your Schoolhouse Technologies Software product in the
first 60 days after purchase, simply contact our customer service team and we will make it right.

Not a Subscription

You buy it, you own it. No monthly or yearly subscription costs. Of course, from time to time, we release
a new improved version that you just might want to pay a reduced upgrade cost to acquire. But it’s your choice.

No-Penalty Edition Upgrades

Changed your mind about which edition would best meet your needs after buying? Not a problem. You
simply pay the difference between editions when you upgrade.

Outstanding Support

One thing our customers agree on is that our support is exceptional. We are always there to help with
any issues you may encounter.

Free Word Search Generator

Download the Word Search Edition of Vocabulary Worksheet Factory and get a free word search generator. Generate
word searches from any word list in seconds. Hide the words in up to eight directions.
Jumble the words in the word bank to add a degree of difficulty.
Or provide a greater challenge by using clues in place of the word bank. Includes
evaluation of the Pro Edition.

Download It Now

By
Last updated:

December 6, 2022

One of the largest English dictionaries has more than 21,000 pages.

Here’s something even more impressive: someone actually attempted to read it from start to finish in one year.

Don’t worry thoughyou don’t need to do all that to master English.

Master the most common 3,000 words, and you’ll pick up 90% of what you’re hearing and reading.

Bump that up to around 10,000, and you’re considered fluent.

In this post, we’ve put together all of our best English vocabulary lists.

Travel English? Business English? Slang words? We’ve got them all here!  

Contents

  • Core English Vocabulary
    • Common English words
    • Important specific words
    • Easily confused words
    • Time, day and months vocab
    • Friends, Family and home
    • Romance and love vocabulary
    • Travel and survival English
    • Food, drink and eating out
    • Hobbies
    • Nature-related words
  • Advanced English Vocabulary
    • Difficult English words
    • Business and professional English
    • Word Parts and Components
  • English Slang
    • Regional English Slang
      • American English
      • Australian English
      • British English
      • New Zealand English
  • More Fun English Vocabulary
    • Holidays in English
    • Miscellaneous fun English vocabulary


Download:
This blog post is available as a convenient and portable PDF that you
can take anywhere.
Click here to get a copy. (Download)

Core English Vocabulary

Use English pretty often, and you’ll notice that the same words keep popping up over and over.

In this section, we’ll tackle the core English vocabulary that you need to know, from articles such as a and the to friendly greetings and ordering from restaurants like a local.

This is the practical type of English that’s meant for your day-to-day life—whether you’re chatting with friends, traveling or about to go on a date!

Common English words

Important specific words

Easily confused words

Time, day and months vocab

Friends, Family and home

Romance and love vocabulary

Travel and survival English

Food, drink and eating out

Hobbies

Nature-related words

Advanced English Vocabulary

Already feel confident with basic English but you want to expand your vocabulary? Then you might be ready to move on to more advanced English!

You can delve into widely known but more complicated words like illusion and runners-up. Or maybe you’d want to find out all about common word roots (they’ll boost your comprehension right away!). There’s also the weird but wonderful world of homophones, where two words sound alike but have different meanings.

Deepen your understanding of English with these blog posts:

Difficult English words

https://www.fluentu.com/blog/english/difficult-english-words/

https://www.fluentu.com/blog/english/english-hard-words/

Business and professional English

Word Parts and Components

English Slang

Once you’ve got the foundations of English down, one way to sound even more natural is to learn slang. If you read through any English-language social media website—Twitter, Youtube, Facebook or Reddit, for example—you’ll see lots of slang:

Don’t be such a couch potato.

TBH, I haven’t seen that meme yet.

This summer, I’m going to YOLO.

Slang tends to pop up in informal or casual conversations as well as online. Different English countries can also have different slang! 

Regional English Slang

American English

Australian English

British English

New Zealand English

More Fun English Vocabulary

When it comes to English vocabulary, you’ll keep finding fascinating words.

For one, there are words for special occasions. Some of the most prominent English-speaking holidays are Valentine’s Day, Easter, Thanksgiving and Christmas, and they each have their own unique vocabulary.

The English language also has tons of interesting niches you can look into. Broaden your tech speak in English with words like “download” and “screenshot,” or get trendy with some of the newest words in the language!

Make your English more colorful with these guides:

Holidays in English

Miscellaneous fun English vocabulary

Constantly learning vocabulary is a key part of becoming fluent.

It’s fascinating to see how communicating in English becomes easier as you pick up more and more words!

With this master sheet of resources, you can grow your vocabulary—from building a foundation with the most basic words to expressing yourself like a native.

How to use VocabGrabber:

  1. Copy text from any document
  2. Paste the copied text into the box
  3. Grab your vocabulary words!

See it in action!

Try VocabGrabber with one of our samples:

Nouns:

    Adjectives:

      Verbs:

        Adverbs:

          How does Vocabgrabber work?
          VocabGrabber analyzes any text you’re interested in, generating lists of the most useful vocabulary words and showing you how those words are used in context. Just copy text from a document and paste it into the box, and then click on the «Grab Vocabulary!» button. VocabGrabber will automatically create a list of vocabulary from your text, which you can then sort, filter, and save.
          Select any word on the list and you’ll see a snapshot of the Visual Thesaurus map and definitions for that word, along with examples of the word in your text. Click on the word map or the highlighted word in the example to see the Visual Thesaurus in action.
          Want to try it out? Click on one of our sample texts to fill the box and start grabbing!
          How can I view my vocabulary list?
          After you grab the vocabulary from a text, you will see a list of words and phrases in «tag cloud» view. In the default view, words in the vocab list are arranged by relevance (more on that below!). In the tag cloud, words that appear most frequently in the text are displayed in a larger font size. The color of the words is based on whether they match one of our seven subject areas (Arts & Literature, Geography, Math, People, Science, Social Studies, Vocabulary).
          You can also choose «list» view, which will give you the vocab list in a table, with columns displaying each word’s subject areas, relevance score, and number of occurrences in the text. Or you can select «gallery» view, displaying a thumbnail image of each word’s map in the Visual Thesaurus.
          How can I sort my vocabulary list?
          Above the word list you’ll see four different sorting options: Relevance, A-Z, Occurrences, and Familiarity. By default the words are arranged from most relevant to least relevant. The other options allow you to sort your list alphabetically, by number of occurrences in the text, or by how familiar the words are in written English overall. To reverse any of these orderings, just click on the name of the option again to toggle the list order.
          How can I filter my list by subject?
          Your list will initially have the «Show All Words» box checked. But if you want to focus on vocabulary in one or more particular subjects, just click the appropriate box or boxes. The number in parentheses next to the subject name indicates how many words in the text match the subject.
          Subjects include academic areas of interest (Arts & Literature, Science, and Social Studies), names of historical figures and places (People and Geography), and words that are of particular importance for language learners at all levels (Vocabulary).
          How can I filter my list by relevance? (And what is relevance, anyway?)
          All the words in your vocab list are ranked with a relevance score of 1 to 5, with 5 being the most relevant to the text. We calculate relevance by comparing how frequently words are used in the text versus how they are used in written English overall. That allows us to zero in on which words are most significant for the average reader.
          By default, the vocab list displays words with relevance 2 through 5, leaving off the words that score only 1 and are therefore least significant. But you can choose any combination of scores by clicking on the bars under «Show Relevance.»
          How can I add VocabGrabber to my browser toolbar?
          In the top right-hand corner, click on the button next to «Add VocabGrabber to your Toolbar.» Then follow the directions for your browser to install the VocabGrabber directly on your toolbar. Once installed, you’ll be able to use the VocabGrabber on any online text without having to copy and paste. Just click on the VocabGrabber «bookmarklet» and the VocabGrabber will immediately start grabbing the vocabulary from whatever page you’re reading in your browser.
          How can I create a Visual Thesaurus word list from my vocabulary?
          Individual subscribers to the Visual Thesaurus can generate word lists from VocabGrabber results. Subscribers can click a button that says «Create Word List,» which automatically selects whichever vocabulary words you have displayed based on your sorting and filtering options. You can then add a title to your word list and choose to include an example sentence of each word drawn from the text you’re analyzing. (If the word appears more than once in the text, you can pick which example sentence you want to use.) You can also customize the list by deselecting any words that you don’t want to appear. Then just click on «Save Word List» to add it to your collection of Visual Thesaurus word lists.

          Like this post? Please share to your friends:
        • Make up 4 sentences with the word combinations example he has
        • Make word list online
        • Make use of word art
        • Make up 3 sentences with the word combination write them down
        • Make word list from text