Is the word «blackboard” longer than the word “window”?
This man is taller than that one.
Which building is the highest in London?
Today the weather is colder than it was yesterday.
January is the coldest month of the year.
My sister speaks English worse than I do.
Which is the hottest month of the year?
Your car is now better than it was last year.
Карточка 10
That is the most beautiful car in the shop.
Tim is the best pupil in his class.
That is the funniest story in the book.
Jane is the shortest of the five girls.
Mr. Baker is the worst driver in this town/
Our town is the youngest in our region.
My friend is the most curious girl in our school.
This museum is the oldest building in our town.
Карточка 11
Напиши по образцу: (funny) The parrot is funny, but the monkey is funnier.
(big) This bear is big, but that bear is bigger.
(fat) This cat is fat, but that cat is fatter.
(busy) This monkey is busy, but that monkey is busiest .
(dirty) This pig is dirty, but that pig is dirtier.
(lazy) This pupil is lazy, but that pupil is lazier.
(old) This church is old, but that church is older.
(good) This computer is good, but that computer is better.
(small) This apple is small, but that apple is smaller.
From Wikipedia, the free encyclopedia
The identity of the longest word in English depends on the definition of a word and of length.
Words may be derived naturally from the language’s roots or formed by coinage and construction. Additionally, comparisons are complicated because place names may be considered words, technical terms may be arbitrarily long, and the addition of suffixes and prefixes may extend the length of words to create grammatically correct but unused or novel words.
The length of a word may also be understood in multiple ways. Most commonly, length is based on orthography (conventional spelling rules) and counting the number of written letters. Alternate, but less common, approaches include phonology (the spoken language) and the number of phonemes (sounds).
Word | Letters | Meaning | Claim | Dispute |
---|---|---|---|---|
methionylthreonylthreonylglutaminylalanyl…isoleucine | 189,819 | The chemical composition of titin, the largest known protein | Longest known word overall by magnitudes. Attempts to say the entire word have taken two[1] to three and a half hours.[2] | Technical; not in dictionary; whether this should actually be considered a word is disputed |
methionylglutaminylarginyltyrosylglutamyl…serine | 1,909 | The chemical name of E. coli TrpA (P0A877) | Longest published word[3] | Technical |
lopadotemachoselachogaleokranioleipsano…pterygon | 183 | A fictional dish of food | Longest word coined by a major author,[4] the longest word ever to appear in literature[5] | Contrived nonce word; not in dictionary; Ancient Greek transliteration |
pneumonoultramicroscopicsilicovolcanoconiosis | 45 | The disease silicosis | Longest word in a major dictionary[6] | Contrived coinage to make it the longest word; technical, but only mentioned and never actually used in communication |
supercalifragilisticexpialidocious | 34 | Unclear in source work, has been cited as a nonsense word | Made popular in the Mary Poppins film and musical[7] | Contrived coinage |
pseudopseudohypoparathyroidism | 30 | A hereditary medical disorder | Longest non-contrived word in a major dictionary[8] | Technical |
antidisestablishmentarianism | 28 | The political position of opposing disestablishment | Longest non-contrived and nontechnical word[9] | Not all dictionaries accept it due to lack of usage.[10] |
honorificabilitudinitatibus | 27 | The state of being able to achieve honors | Longest word in Shakespeare’s works; longest word in the English language featuring alternating consonants and vowels[11] | Latin |
Major dictionaries
The longest word in any of the major English language dictionaries is pneumonoultramicroscopicsilicovolcanoconiosis (45 letters), a word that refers to a lung disease contracted from the inhalation of very fine silica particles,[12] specifically from a volcano; medically, it is the same as silicosis. The word was deliberately coined to be the longest word in English, and has since been used[citation needed] in a close approximation of its originally intended meaning, lending at least some degree of validity to its claim.[6]
The Oxford English Dictionary contains pseudopseudohypoparathyroidism (30 letters).
Merriam-Webster’s Collegiate Dictionary does not contain antidisestablishmentarianism (28 letters), as the editors found no widespread, sustained usage of the word in its original meaning. The longest word in that dictionary is electroencephalographically (27 letters).[13]
The longest non-technical word in major dictionaries is floccinaucinihilipilification at 29 letters. Consisting of a series of Latin words meaning «nothing» and defined as «the act of estimating something as worthless»; its usage has been recorded as far back as 1741.[14][15][16]
Ross Eckler has noted that most of the longest English words are not likely to occur in general text, meaning non-technical present-day text seen by casual readers, in which the author did not specifically intend to use an unusually long word. According to Eckler, the longest words likely to be encountered in general text are deinstitutionalization and counterrevolutionaries, with 22 letters each.[17]
A computer study of over a million samples of normal English prose found that the longest word one is likely to encounter on an everyday basis is uncharacteristically, at 20 letters.[18]
The word internationalization is abbreviated «i18n», the embedded number representing the number of letters between the first and the last.[19][20][21]
Creations of long words
Coinages
In his play Assemblywomen (Ecclesiazousae), the ancient Greek comedic playwright Aristophanes created a word of 171 letters (183 in the transliteration below), which describes a dish by stringing together its ingredients:
Henry Carey’s farce Chrononhotonthologos (1743) holds the opening line: «Aldiborontiphoscophornio! Where left you Chrononhotonthologos?»
Thomas Love Peacock put these creations into the mouth of the phrenologist Mr. Cranium in his 1816 book Headlong Hall: osteosarchaematosplanchnochondroneuromuelous (44 characters) and osseocarnisanguineoviscericartilaginonervomedullary (51 characters).
James Joyce made up nine 100-letter words plus one 101-letter word in his novel Finnegans Wake, the most famous of which is Bababadalgharaghtakamminarronnkonnbronntonnerronntuonnthunntrovarrhounawnskawntoohoohoordenenthurnuk. Appearing on the first page, it allegedly represents the symbolic thunderclap associated with the fall of Adam and Eve. As it appears nowhere else except in reference to this passage, it is generally not accepted as a real word. Sylvia Plath made mention of it in her semi-autobiographical novel The Bell Jar, when the protagonist was reading Finnegans Wake.
«Supercalifragilisticexpialidocious», the 34-letter title of a song from the movie Mary Poppins, does appear in several dictionaries, but only as a proper noun defined in reference to the song title. The attributed meaning is «a word that you say when you don’t know what to say.» The idea and invention of the word is credited to songwriters Robert and Richard Sherman.
Agglutinative constructions
The English language permits the legitimate extension of existing words to serve new purposes by the addition of prefixes and suffixes. This is sometimes referred to as agglutinative construction. This process can create arbitrarily long words: for example, the prefixes pseudo (false, spurious) and anti (against, opposed to) can be added as many times as desired. More familiarly, the addition of numerous «great»s to a relative, such as «great-great-great-great-grandparent», can produce words of arbitrary length. In musical notation, an 8192nd note may be called a semihemidemisemihemidemisemihemidemisemiquaver.
Antidisestablishmentarianism is the longest common example of a word formed by agglutinative construction.
Technical terms
A number of scientific naming schemes can be used to generate arbitrarily long words.
The IUPAC nomenclature for organic chemical compounds is open-ended, giving rise to the 189,819-letter chemical name Methionylthreonylthreonyl…isoleucine for the protein also known as titin, which is involved in striated muscle formation. In nature, DNA molecules can be much bigger than protein molecules and therefore potentially be referred to with much longer chemical names. For example, the wheat chromosome 3B contains almost 1 billion base pairs,[22] so the sequence of one of its strands, if written out in full like Adenilyladenilylguanilylcystidylthymidyl…, would be about 8 billion letters long. The longest published word, Acetylseryltyrosylseryliso…serine, referring to the coat protein of a certain strain of tobacco mosaic virus (P03575), is 1,185 letters long, and appeared in the American Chemical Society’s Chemical Abstracts Service in 1964 and 1966.[23] In 1965, the Chemical Abstracts Service overhauled its naming system and started discouraging excessively long names. In 2011, a dictionary broke this record with a 1909-letter word describing the trpA protein (P0A877).[3]
John Horton Conway and Landon Curt Noll developed an open-ended system for naming powers of 10, in which one sexmilliaquingentsexagintillion, coming from the Latin name for 6560, is the name for 103(6560+1) = 1019683. Under the long number scale, it would be 106(6560) = 1039360.
Gammaracanthuskytodermogammarus loricatobaicalensis is sometimes cited as the longest binomial name—it is a kind of amphipod. However, this name, proposed by B. Dybowski, was invalidated by the International Code of Zoological Nomenclature in 1929 after being petitioned by Mary J. Rathbun to take up the case.[24]
Myxococcus llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogochensis is the longest accepted binomial name for an organism. It is a bacterium found in soil collected at Llanfairpwllgwyngyll (discussed below). Parastratiosphecomyia stratiosphecomyioides is the longest accepted binomial name for any animal, or any organism visible with the naked eye. It is a species of soldier fly.[25] The genus name Parapropalaehoplophorus (a fossil glyptodont, an extinct family of mammals related to armadillos) is two letters longer, but does not contain a similarly long species name.
Aequeosalinocalcalinoceraceoaluminosocupreovitriolic, at 52 letters, describing the spa waters at Bath, England, is attributed to Dr. Edward Strother (1675–1737).[26] The word is composed of the following elements:
- Aequeo: equal (Latin, aequo[27])
- Salino: containing salt (Latin, salinus)
- Calcalino: calcium (Latin, calx)
- Ceraceo: waxy (Latin, cera)
- Aluminoso: alumina (Latin)
- Cupreo: from «copper»
- Vitriolic: resembling vitriol
Notable long words
Place names
The longest officially recognized place name in an English-speaking country is Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu (85 letters), which is a hill in New Zealand. The name is in the Māori language. A widely recognized version of the name is Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu (85 letters), which appears on the signpost at the location (see the photo on this page). In Māori, the digraphs ng and wh are each treated as single letters.
In Canada, the longest place name is Dysart, Dudley, Harcourt, Guilford, Harburn, Bruton, Havelock, Eyre and Clyde, a township in Ontario, at 61 letters or 68 non-space characters.[28]
The 58-letter name Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch is the name of a town on Anglesey, an island of Wales. In terms of the traditional Welsh alphabet, the name is only 51 letters long, as certain digraphs in Welsh are considered as single letters, for instance ll, ng and ch. It is generally agreed, however, that this invented name, adopted in the mid-19th century, was contrived solely to be the longest name of any town in Britain. The official name of the place is Llanfairpwllgwyngyll, commonly abbreviated to Llanfairpwll or Llanfair PG.
The longest non-contrived place name in the United Kingdom which is a single non-hyphenated word is Cottonshopeburnfoot (19 letters) and the longest which is hyphenated is Sutton-under-Whitestonecliffe (29 characters).
The longest place name in the United States (45 letters) is Chargoggagoggmanchauggagoggchaubunagungamaugg, a lake in Webster, Massachusetts. It means «Fishing Place at the Boundaries – Neutral Meeting Grounds» and is sometimes facetiously translated as «you fish your side of the water, I fish my side of the water, nobody fishes the middle». The lake is also known as Webster Lake.[29] The longest hyphenated names in the U.S. are Winchester-on-the-Severn, a town in Maryland, and Washington-on-the-Brazos, a notable place in Texas history. The longest single-word town names in the U.S. are Kleinfeltersville, Pennsylvania and Mooselookmeguntic, Maine.
The longest official geographical name in Australia is Mamungkukumpurangkuntjunya.[30] It has 26 letters and is a Pitjantjatjara word meaning «where the Devil urinates».[31]
Liechtenstein is the longest country name with single name in English. The second longest country name with single name in English is Turkmenistan. There are longer country names if one includes ones with spaces.
Personal names
Guinness World Records formerly contained a category for longest personal name used.
- From about 1975 to 1985, the recordholder was Adolph Blaine Charles David Earl Frederick Gerald Hubert Irvin John Kenneth Lloyd Martin Nero Oliver Paul Quincy Randolph Sherman Thomas Uncas Victor William Xerxes Yancy Zeus Wolfeschlegelsteinhausenbergerdorffvoralternwarengewissenhaftschaferswessenschafewarenwohlgepflegeundsorgfaltigkeitbeschutzenvonangreifendurchihrraubgierigfeindewelchevoralternzwolftausendjahresvorandieerscheinenwanderersteerdemenschderraumschiffgebrauchlichtalsseinursprungvonkraftgestartseinlangefahrthinzwischensternartigraumaufdersuchenachdiesternwelchegehabtbewohnbarplanetenkreisedrehensichundwohinderneurassevonverstandigmenschlichkeitkonntefortplanzenundsicherfreuenanlebenslanglichfreudeundruhemitnichteinfurchtvorangreifenvonandererintelligentgeschopfsvonhinzwischensternartigraum, Senior (746 letters), also known as Wolfe+585, Senior.
- After 1985 Guinness briefly awarded the record to a newborn girl with a longer name. The category was removed shortly afterward.
Long birth names are often coined in protest of naming laws or for other personal reasons.
- The naming law in Sweden was challenged by parents Lasse Diding and Elisabeth Hallin, who proposed the given name «Brfxxccxxmnpcccclllmmnprxvclmnckssqlbb11116» for their child (pronounced [ˈǎlːbɪn], 43 characters), which was rejected by a district court in Halmstad, southern Sweden.
Words with certain characteristics of notable length
- Schmaltzed and strengthed (10 letters) appear to be the longest monosyllabic words recorded in The Oxford English Dictionary, while scraunched and scroonched appear to be the longest monosyllabic words recorded in Webster’s Third New International Dictionary; but squirrelled (11 letters) is the longest if pronounced as one syllable only (as permitted in The Shorter Oxford English Dictionary and Merriam-Webster Online Dictionary at squirrel, and in Longman Pronunciation Dictionary). Schtroumpfed (12 letters) was coined by Umberto Eco, while broughammed (11 letters) was coined by William Harmon after broughamed (10 letters) was coined by George Bernard Shaw.
- Strengths is the longest word in the English language containing only one vowel letter.[32]
- Euouae, a medieval musical term, is the longest English word consisting only of vowels, and the word with the most consecutive vowels. However, the «word» itself is simply a mnemonic consisting of the vowels to be sung in the phrase «seculorum Amen» at the end of the lesser doxology. (Although u was often used interchangeably with v, and the variant «Evovae» is occasionally used, the v in these cases would still be a vowel.)
- The longest words with no repeated letters are dermatoglyphics and uncopyrightable.[33]
- The longest word whose letters are in alphabetical order is the eight-letter Aegilops, a grass genus. However, this is arguably a proper noun. There are several six-letter English words with their letters in alphabetical order, including abhors, almost, begins, biopsy, chimps and chintz.[34] There are few 7-letter words, such as «billowy» and «beefily». The longest words whose letters are in reverse alphabetical order are sponged, wronged and trollied.
- The longest words recorded in OED with each vowel only once, and in order, are abstemiously, affectiously, and tragediously (OED). Fracedinously and gravedinously (constructed from adjectives in OED) have thirteen letters; Gadspreciously, constructed from Gadsprecious (in OED), has fourteen letters. Facetiously is among the few other words directly attested in OED with single occurrences of all six vowels (counting y as a vowel).
- The longest single palindromic word in English is rotavator, another name for a rotary tiller for breaking and aerating soil.
Typed words
- The longest words typable with only the left hand using conventional hand placement on a QWERTY keyboard are tesseradecades, aftercataracts, dereverberated, dereverberates[35] and the more common but sometimes hyphenated sweaterdresses.[34] Using the right hand alone, the longest word that can be typed is johnny-jump-up, or, excluding hyphens, monimolimnion[36] and phyllophyllin.
- The longest English word typable using only the top row of letters has 11 letters: rupturewort. The word teetertotter (used in North American English) is longer at 12 letters, although it is usually spelled with a hyphen.
- The longest using only the middle row is shakalshas (10 letters). Nine-letter words include flagfalls; eight-letter words include galahads and alfalfas.
- Since the bottom row contains no vowels, no standard words can be formed. [37]
- The longest words typable by alternating left and right hands are antiskepticism and leucocytozoans respectively.[34]
- On a Dvorak keyboard, the longest «left-handed» words are epopoeia, jipijapa, peekapoo, and quiaquia.[38] Other such long words are papaya, Kikuyu, opaque, and upkeep.[39] Kikuyu is typed entirely with the index finger, and so the longest one-fingered word on the Dvorak keyboard. There are no vowels on the right-hand side, and so the longest «right-handed» word is crwths.
See also
- Lipogram
- List of long species names
- List of the longest English words with one syllable
- Longest English sentence
- Longest word in French
- Longest word in Romanian
- Longest word in Spanish
- Longest word in Turkish
- Number of words in English
- Scriptio continua
- Sesquipedalianism
- Donaudampfschiffahrtselektrizitätenhauptbetriebswerkbauunterbeamtengesellschaft, longest published word in German
References
- ^ «Reading The Longest English Word (190,000 Characters)». YouTube. Archived from the original on 2021-11-10. Retrieved 2 August 2020.
- ^ «World’s longest word takes 3.5 hours to pronounce». CW39 Houston. 2012-12-08. Retrieved 2020-05-18.
- ^ a b Colista Moore (2011). Student’s Dictionary. p. 524. ISBN 978-1-934669-21-1.
- ^ see separate article Lopado…pterygon
- ^ Donald McFarlan; Norris Dewar McWhirter; David A. Boeh (1989). Guinness book of world records: 1990. Sterling. p. 129. ISBN 978-0-8069-5790-6.
- ^ a b Coined around 1935 to be the longest word; press reports on puzzle league members legitimized it somewhat. First appeared in the MWNID supplement, 1939. Today OED and several others list it, but citations are almost always as «longest word». More detail at pneumonoultramicroscopicsilicovolcanoconiosis.
- ^ «Merriam Webster: Supercalifragilisticexpialidocious».
- ^ «What is the longest English word?». AskOxford. Archived from the original on 2008-10-22. Retrieved 2010-08-22.
- ^ «What is the longest English word?». oxforddictionaries.com.[dead link]
- ^ «Merriam Webster: «Antidisestablishmentarianism is not in the dictionary.»«.
- ^ «Cool, Strange, and Interesting Facts,» fact 99. InnocentEnglish.com. Retrieved 2019-03-13.
- ^ «pneumonoultramicroscopicsilicovolcanoconiosis – definition of pneumonoultramicroscopicsilicovolcanoconiosis in English from the Oxford dictionary». oxforddictionaries.com. Archived from the original on 2012-07-19.
- ^ «The Longest Word in the Dictionary» (Video). Ask the Editor. Merriam-Webster. Archived from the original on 21 November 2013. Retrieved 14 November 2013.
- ^ «Floccinaucinihilipilification» by Michael Quinion World Wide Words Archived 2006-08-21 at the Wayback Machine;
- ^ The Guinness Book of Records, in its 1992 and previous editions, declared the longest real word in the English language to be floccinaucinihilipilification. More recent editions of the book have acknowledged pneumonoultramicroscopicsilicovolcanoconiosis. What is the longest English word? — Oxford Dictionaries Online Archived 2006-08-26 at the Wayback Machine
- ^ In recent times its usage has been recorded in the proceedings of the United States Senate by Senator Robert Byrd Discussion between Sen. Moynihan and Sen. Byrd «Mr. President, may I say to the distinguished Senator from New York, I used that word on the Senate floor myself 2 or 3 years ago. I cannot remember just when or what the occasion was, but I used it on that occasion to indicate that whatever it was I was discussing it was something like a mere trifle or nothing really being of moment.» Congressional Record June 17, 1991, p. S7887, and at the White House by Bill Clinton’s press secretary Mike McCurry, albeit sarcastically. December 6, 1995, White House Press Briefing in discussing Congressional Budget Office estimates and assumptions: «But if you – as a practical matter of estimating the economy, the difference is not great. There’s a little bit of floccinaucinihilipilification going on here.»
- ^ Eckler, R. Making the Alphabet Dance, p 252, 1996.
- ^ «Longest Common Words – Modern». Maltron.com. Archived from the original on 27 April 2009. Retrieved 2010-08-22.
- ^ «Glossary of W3C Jargon». World Wide Web Consortium. Archived from the original on 2008-10-25. Retrieved 2008-10-13.
- ^ «Origin of the Abbreviation I18n». Archived from the original on 2014-06-27.
- ^ «Localization vs. Internationalization». World Wide Web Consortium. Archived from the original on 2016-04-03.
- ^ Paux et al. (2008) Science, Vol. 322 (5898) 101-104. A Physical Map of the 1-Gigabase Bread Wheat Chromosome 3B Paux, Etienne; Sourdille, Pierre; Salse, Jérôme; Saintenac, Cyrille; Choulet, Frédéric; Leroy, Philippe; Korol, Abraham; Michalak, Monika; Kianian, Shahryar; Spielmeyer, Wolfgang; Lagudah, Evans; Somers, Daryl; Kilian, Andrzej; Alaux, Michael; Vautrin, Sonia; Bergès, Hélène; Eversole, Kellye; Appels, Rudi; Safar, Jan; Simkova, Hana; Dolezel, Jaroslav; Bernard, Michel; Feuillet, Catherine (2008). «A Physical Map of the 1-Gigabase Bread Wheat Chromosome 3B». Science. 322 (5898): 101–104. Bibcode:2008Sci…322..101P. doi:10.1126/science.1161847. PMID 18832645. S2CID 27686615. Archived from the original on 2015-09-03. Retrieved 2012-12-01.
- ^ Chemical Abstracts Formula Index, Jan.-June 1964, Page 967F; Chemical Abstracts 7th Coll. Formulas, C23H32-Z, 56-65, 1962–1966, Page 6717F
- ^ «Opinion 105. Dybowski’s (1926) Names of Crustacea Suppressed». Opinions Rendered by the International Commission on Zoological Nomenclature: Opinions 105 to 114. Smithsonian Miscellaneous Collections. Vol. 73. 1929. pp. 1–3. hdl:10088/23619. BHL page 8911139.
- ^ rjk. «World’s longest name of an animal. Parastratiosphecomyia stratiosphecomyioides Stratiomyid Fly Soldier Fly». thelongestlistofthelongeststuffatthelongestdomainnameatlonglast.com. Archived from the original on 2011-11-17. Retrieved 2011-12-17.
- ^ cited in some editions of the Guinness Book of Records as the longest word in English, see Askoxford.com on the longest English word
- ^ [1][dead link]
- ^ «GeoNames Government of Canada site». Archived from the original on 2009-02-06.
- ^ Belluck, Pam (2004-11-20). «What’s the Name of That Lake? It’s Hard to Say». The New York Times.
- ^ «Geoscience Australia Gazetteer». Archived from the original on 2007-10-01.
- ^ «South Australian State Gazetteer». Archived from the original on 2007-10-01.
- ^ «Guinness Records».
- ^ «Longest Word Without Repeating Letters». December 2014.
- ^ a b c «Typewriter Words». Questrel.com. Archived from the original on 2010-09-27. Retrieved 2010-08-22.
- ^ «Science Links Japan | Two Unique Aftercataracts Requiring Surgical Removal». Sciencelinks.jp. 2009-03-18. Archived from the original on 2011-02-17. Retrieved 2010-08-22.
- ^ «Dictionary entry for monimolimnion, a word that, at 13 letters, is longer than any of the words linked in the source above». Archived from the original on 2009-09-09. Retrieved 2009-08-15.
- ^ «Word Records». Fun-with-words.com. Archived from the original on 2012-08-26. Retrieved 2012-08-13.
- ^ «Typewriter Words». Wordnik.com. Archived from the original on 2011-07-17. Retrieved 2011-01-15.
- ^ «The Dvorak Keyboard and You». Theworldofstuff.com. Archived from the original on 2010-08-20. Retrieved 2010-08-22.
External links
This audio file was created from a revision of this article dated 8 January 2011, and does not reflect subsequent edits.
- A Collection of Word Oddities and Trivia – Long words
- Long words (chemical names)
- Long words (place names)
- What is the longest English word?, AskOxford.com «Ask the Experts»
- What is the Longest Word?, Fun-With-Words.com
- Full chemical name of titin.
- Taxonomy of Wordplay
Преобразуйте прилагаьельные в сравниельную и превосходную форму. Пример: long-longer than-the longest. Вот прилагательные: fast, pretty,beautiful, ugly, thin, fashionable, tall, interesting, good, happy, sad, unusual.
In this post, we’re going to walk through another common coding interview problem: finding the longest word in a paragraph. This is a really good question, because it’s very easy to forget some important details, and the perhaps obvious solution isn’t necessarily the best.
Walking Through the Problem
For this particular problem, we’re going to imagine we’ve been given the following string of lorem ipsum text. For the sake of space, we’ll assume that this string is being used as the value for a variable called base_string
.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas luctus ipsum et facilisis dignissim. Duis mattis enim urna. Quisque orci nunc, vulputate id accumsan nec, imperdiet sit amet sem. Integer consequat nibh vel mattis elementum. Nam est elit, sodales vitae augue nec, consequat porttitor nibh. Aliquam ut risus vehicula, egestas ligula sed, egestas neque. Fusce hendrerit vel risus in molestie. Morbi molestie eleifend odio vel ullamcorper. Donec at odio libero. Quisque vulputate nisl nisi, ut convallis lorem vulputate et. Aenean pretium eu tellus a dapibus. Ut id sem vulputate, finibus erat quis, vestibulum enim. Donec commodo dui eros, non hendrerit orci faucibus eu. Integer at blandit ex. Duis posuere, leo non porta tincidunt, augue turpis posuere odio, sed faucibus erat elit vel turpis. Quisque vitae tristique leo.
Lorem ipsum is just a common dummy text used in typesetting and printing to simulate actual text content. It includes words of various lengths, as well as punctuation, so it’s perfect for our purposes.
So, how do we find the longest word?
Since our paragraph is currently a single string, we’re probably going to have to find a way to break it up into individual words. This will let us to perform a direct comparison between different words, allowing us to find out whether a given word is longer than another.
There are a few ways we can perform the actual comparison.
We can create a variable called something like longest_word
set to an empty string, and then iterate over all the words, comparing the length of the current longest_word
with the word we’re checking as part of this iteration. If the new word is longer, we can replace the value of longest_word
with the new word. Repeating this process for every word in the paragraph will leave us with the longest word in the paragraph bound to the longest_word
variable.
Another option is that we use the max
function, which takes in an iterable, and returns a single item from the collection. It will return the single largest member of the collection, but we’ll have to provide some configuration to max
so that it knows what that means given the context.
Both of these solutions potentially have a small issue in that they only return a single word. What if two words are of identical length? Do we provide one of these longest words as a solution, or do we provide a collection of all the longest words? If we have to provide all the longest words, then we’re going to need to use a different method.
With that, let’s move onto out first bit of code for this problem. There is a problem with these solutions, which I’ll talk about in a moment.
Our First Solution
Python includes a handy method for strings called split
. You can read about split
in the official documentation.
split
will essentially allow us to divide a string into a list of items using a delimiter string. Whenever this delimiter is encountered in the string, split
will create a break point. Everything between these break points becomes an item in the resulting list.
We can therefore do something like this:
word_list = base_string.split(" ")
# ["Lorem", "ipsum", "dolor", "sit", "amet,", ... etc.]
Eagle eyed readers may have already noticed a problem. The comma after amet
is included in the corresponding list item. This is the case for all words that have punctuation, and it poses a serious problem.
Our text actually has two longest words, but one of them is at the end of a sentence, and therefore has an appended full stop. With the punctuation included, this word incorrectly becomes the sole longest word in the paragraph.
The problem could be even worse if we have punctuation like quotation marks, since the inclusion of two punctuation characters might make a word longer than the real longest word, despite being potentially a character shorter in reality. This only gets worse as punctuation gets compounded together.
It’s clear then that we have to do something about all this extra punctuation.
We can make use of another string method called strip
to take of the problem. Once again, you can find details on how strip
works in the official docs.
strip
allows us to remove characters from either end of a string, and it will keep removing those characters until it encounters a character which does not match those it was instructed to remove. By default, it removes whitespace characters, but we can provide a collection of punctuation instead.
Using a list comprehension, we can therefore iterate over our word_list
and correct the punctuation issue:
word_list = base_string.split(" ")
processed_words = [word.strip(".,?!"':;(){}[]") for word in word_list]
This is absolutely going to work for our example string, but it’s also not a very good solution. For a start, having to use these escape characters is ugly, but more importantly, this group of punctuation is not complete, and it would very tedious to make it complete.
Right now we don’t do anything to filter out tildes for example (~
), or percentage signs, or hash symbols. There’s also another issue we didn’t consider at all thus far: numbers. Numbers could easily end up being larger than our longest words, giving us erroneous results.
Instead of removing things we don’t wan’t, perhaps we can go the other way, adding only the things we do want. Enter RegEx.
Using RegEx
RegEx or regular expressions are a means through which we can define patterns to look for in strings. RegEx is a pretty expansive topic, certainly too big to cover here, but there’s a tutorial on using RegEx in Python available in the documentation.
In order to work with regular expressions in Python, we need to import the re
module, so that’s our first step:
import re
Next, we’ll take our base_string
variable, containing our lorem ipsum text, and provide as an argument to the findall
function that we find in the re
module. Along with base_string
, we’re also going to provide a pattern using the RegEx syntax like so:
import re
word_list = re.findall("[A-Za-z]+", base_string)
RegEx is notoriously cryptic, but our pattern here is relatively simple. It says that we’re looking for any letter character in the basic Latin alphabet; we don’t care about case; and we want patterns that include any number of these characters, as indicated by the +
symbol.
Our resulting word_list
variable will now contain a list full of letter only strings. While traversing our string, any time a character was encountered that wasn’t part of the our defined pattern, findall
considered that the end of a word.
Pretty neat stuff if you ask me.
Another Minor Problem
Unfortunately, it’s too early to celebrate just yet. There’s once again a problem with our current implementation: punctuation inside words.
Consider a situation like this:
John’s dog hasn’t eaten its food.
We might make a very good argument that «John’s» and «hasn’t» are single words, in which case, we have a problem. Right now if we run our code on this sentence we get the following:
import re
base_string = "John's dog hasn't eaten its food."
word_list = re.findall("[A-Za-z]+", base_string)
# ['John', 's', 'dog', 'hasn', 't', 'eaten', 'its', 'food']
This might be perfectly fine, but what happens if your interviewer asks you to treat «John’s» as one word? In that case, «John’s» is a 5 letter word with a piece of punctuation.
The best answer might be to combine our methods. We can split the string based on spaces, and then check each word for punctuation. This time, however, instead of splitting the string, we’ll perform a replacement using another re
function called sub
.
sub
takes a pattern, a replacement string for that pattern, and a string to search.
import re
base_string = "John's dog hasn't eaten its food."
word_list = base_string.split(" ")
processed_words = [re.sub("[^A-Za-z]+", "", word) for word in word_list]
# ['Johns', 'dog', 'hasnt', 'eaten', 'its', 'food']
Here our pattern is exactly the same as before, except we’ve added a special character, ^
, which essentially says match anything which is not in this pattern. As our pattern includes only basic Latin letters, all punctuation is going to be matched by this pattern. Our replacement string is an empty string, which means any time we find a match, the matching characters will simply be removed.
Putting this in a list comprehension, we can perform the substitution for every word in the word_list
, giving us a list of properly processed words.
With that, we can finally start counting the length of the words and find the longest word in the paragraph.
Finding the Longest Word
As I mentioned before, we have a couple of options for finding the longest word in our list of words.
Let’s start with the string replacement method, using our original paragraph:
import re
word_list = base_string.split(" ")
processed_words = [re.sub("[^A-Za-z]+", "", word) for word in word_list]
longest_word = ""
for word in processed_words:
if len(longest_word) < len(word):
longest_word = word
print(longest_word) # consectetur
In this implementation, we start with an empty string, and begin iterating over all the words in the processed_words
. If the current word
is longer than the current longest_word
we replace the longest word. Otherwise, nothing happens.
Since we start with an empty string, the first word will become the longest string initially, but it will be dethroned as soon as a longer word comes alone.
Instead of this fairly manual method, we could make use of the max
function.
import re
word_list = base_string.split(" ")
processed_words = [re.sub("[^A-Za-z]+", "", word) for word in word_list]
longest_word = max(processed_words, key=len)
print(longest_word) # consectetur
In this method we have to pass in the processed_words
as the first argument, but we also have to provide a key
. By providing the len
function as a key, max
will use the length of the various words for determining which is of the greatest value.
Both of these methods only provide a single word, so how do we get all of the longest words if we have multiple words of equal length?
Finding a List of the Longest Words
In order to find all of the longest words, we can do a second pass over our processed_words
now that we know what the longest word actually is. We can do this using a conditional list comprehension.
import re
word_list = base_string.split(" ")
processed_words = [re.sub("[^A-Za-z]+", "", word) for word in word_list]
max_word_length = len(max(processed_words, key=len))
longest_words = [word for word in processed_words if len(word) == max_word_length]
print(longest_words) # ['consectetur', 'ullamcorper']
Wrapping Up
With that, we’ve successfully found the longest words in a paragraph. Of course this isn’t the only way you could solve this problem, and the specific problem might call for minor variations as well. You may be asked to find the length of the longest word in the paragraph, but if you can do the version shown here, that question should be a breeze.
We’d love to see your own solutions to this problem, so get on repl.it and share your solutions with us on Twitter. If you liked the post, we’d also really appreciate it if you could share it with your techy friends.
We’ll see you again Monday when we’ll be releasing part two of our brief look at the itertools
module! If you can’t wait, you might want to check out our Complete Python Course where we go into more depth on many of these topics.
1. Что называется массивом?
а. Под массивом понимается совокупность конечного числа данных различных типов.
б. Под массивом понимается совокупность конечного числа данных одного типа.
в. Под массивом понимается совокупность бесконечного числа данных одного типа.
2. Укажите правильное описание массива
а. Var a:array[1..1000] of integer;
б. Var А, В, С: ARRAY [1..50] OF REAL or INTEGER;
в. Var А: ARRAY [1..50 OF REAL];
3. Числовой массив А заполнен последовательно числами: 7, 15, 87, 34. Укажите значение элемента А[2].
а. 34
б. 87
в. 15
г. 7
4. Какая команда заполняет массив с клавиатуры?
а. Read(A[i])
б. Rаndom(s)
в. ROUND
5. Что производит следующий фрагмент программы?
for i:=1 to n do write(a[i],’ ‘);
а. Ввод элементов массива с клавиатуры.
б. Вывод элементов массива на экран монитора.
в. Ввод элементов массива из файла.
6. Что производит следующий фрагмент программы?
randomize;
write(‘число элементов’);
readln(n);
for i:=1 to n do begin
a[i]:=random(45)-22;
end;
write(n);
а. Заполняет массив случайными числами и не выводит их на экран.
б. Заполняет массив случайными числами и выводит их на экран.
в. Заполняет массив одинаковыми числами и выводит их на экран.