The German word for model is
Modell
Gender
The gender of Modell is neuter. E.g. das Modell.
Plural
The plural of Modell is Modelle.
German Definition
model | |
Substantiv: | |
[1] Abbild, Form, Modell, Vorbild | |
[1] „A scientific model seeks to represent empirical objects, phenomena, and physical processes in a logical and objective way. All models are in simulacra, that is, simplified reflections of reality, but, despite their inherent falsity, they are nevertheless extremely useful.“ | |
Ein wissenschaftliches Modell versucht, empirische Objekte, Phänomene und physikalische Prozesse auf eine logische und unabhängige Art und Weise darzustellen. Alle Modelle sind in Simulacra, das heißt vereinfachte Abbilde der Realität, aber trotz ihre innewohnenden Falschheit sind sie dennoch extrem nützlich. |
Translations for model and their definitions
Model | |
n-n. (person who serves as a subject for fashion) model |
Vorbild | |
n-n. model (praiseworthy example) |
Modell | ||
n-n. A model of an object | ||
n-n. A theoretical model |
Mannequin | |
n-n. (person who serves as a subject for fashion) model |
Pronunciation
Dictionary
More German words for Professions
All vocabulary sets
Random Quiz:
Wie lautet das Wort für to swallow?
Start learning German vocabulary
From Wikipedia, the free encyclopedia
Look up Modell or modell in Wiktionary, the free dictionary.
Modell is the German word for «model» and also a surname. It may refer to:
People[edit]
- Arnold Modell (1924–2022), American professor of social psychiatry
- Art Modell (1925–2012), American business executive and sports team owner
- Bernadette Modell, (born 1935), British geneticist
- David Modell (1961–2017), American business executive and sports team owner
- Frank Modell (1917-2016), American cartoonist
- Merriam Modell (1908–1994), American author of pulp fiction
- Pat Modell (1931–2011), American TV actress
- Rod Modell, given name for Deepchord, electronic music producer from Detroit, Michigan
- William Modell (1921–2008), American businessman and chairman of Modell’s Sporting Goods
Companies[edit]
- Modell’s, a sporting goods retailer based in New York City
- Modell (pawn shop), a pawnbroker based in New York City, originally formed as a spinoff of the sporting goods company
- Schabak Modell, a die-cast toy producer in Germany
- Schuco Modell, a die-cast toy producer in Germany
Media and entertainment[edit]
- Das Modell, a song recorded by the electro-pop group Kraftwerk
- Modell Bianka, a 1951 East German film
- Robert Patrick Modell, a character in episodes of the TV series X-files
Other uses[edit]
- Berliner Modell, a learning theory
- Modell M and Modell S, types of Mauser bolt-action rifles
- V-Modell, an software development model
See also[edit]
- Model (disambiguation)
- Modella, Victoria, a rural locality in Australia
- Micky Modelle, a music DJ and producer
- Modello, the Italian word for «model» or preparatory study for a work of art
Model Meaning in German
You have searched the English word model meaning in German Modell. model meaning has been search 5195 (five thousand one hundred and ninety-five) times till 4/13/2023. You can also find model meaning and Translation in Urdu, Hindi, Arabic, Spanish, French and other languages.
English — German
German — English
Definition & Synonyms
• Model
Definition & Meaning
- (n.) Something intended to serve, or that may serve, as a pattern of something to be made; a material representation or embodiment of an ideal; sometimes, a drawing; a plan; as, the clay model of a sculpture; the inventors model of a machine.
- (n.) A person who poses as a pattern to an artist.
- (n.) Any copy, or resemblance, more or less exact.
- (n.) That by which a thing is to be measured; standard.
- (n.) A miniature representation of a thing, with the several parts in due proportion; sometimes, a facsimile of the same size.
- (v. i.) To make a copy or a pattern; to design or imitate forms; as, to model in wax.
- (v. t.) To plan or form after a pattern; to form in model; to form a model or pattern for; to shape; to mold; to fashion; as, to model a house or a government; to model an edifice according to the plan delineated.
- (a.) Suitable to be taken as a model or pattern; as, a model house; a model husband.
- (n.) Anything which serves, or may serve, as an example for imitation; as, a government formed on the model of the American constitution; a model of eloquence, virtue, or behavior.
Multi Language Dictionary
Verbs |
|||||||
---|---|---|---|---|---|---|---|
to model so./sth. | modeled/modelled, modeled/modelled | | jmdn./etw. modellieren | modellierte, modelliert | | ||||||
to model sth. | modeled/modelled, modeled/modelled | — e.g. behaviour | etw.acc. vorleben | lebte vor, vorgelebt | | ||||||
to model sth. | modeled/modelled, modeled/modelled | — e.g. clay | etw.acc. formen | formte, geformt | | ||||||
to model sth. on sth. | etw.nom. nach etw.acc. modellieren | modellierte, modelliert | | ||||||
to model sth. | modeled/modelled, modeled/modelled | — a system, process | ein Modell von etw.dat. erstellen | ||||||
to model sth. | modeled/modelled, modeled/modelled | | etw.acc. entwickeln | entwickelte, entwickelt | | ||||||
to model oneself on so. | jmdn. nachahmen | ahmte nach, nachgeahmt | — jmdn. zum Vorbild nehmen | ||||||
to model oneself on so. | sichdat. jmdn. zum Vorbild nehmen | ||||||
to act as a model to a painter | einem Maler Modell sitzen | ||||||
to act as a model to a painter | einem Maler Modell stehen |
Forum discussions containing the search term |
|
---|---|
Models werben Models | Last post 07 Jun 11, 11:17 |
wie übersetzt man das am besten? so im sinne von kunden werben kunden? models recruit models? | 3 Replies |
Models | Last post 13 Oct 10, 09:33 |
Can there be two different models of the same thing? For example: — This heater comes in t… | 5 Replies |
exploratory models | Last post 02 Apr 08, 10:28 |
The exploratory models are based on the assumption that prior theory can provide no indicati… | 1 Replies |
policy models | Last post 28 Sep 08, 11:32 |
«…Economic information includes the economic data the central bank uses, the policy models… | 29 Replies |
Projection models…. | Last post 02 Jun 07, 20:17 |
Bin mir beim fett gedruckten Bereich nicht sicher: CIR=konstanter Zinssatz For a CIR, most … | 1 Replies |
covariance models | Last post 10 May 05, 17:05 |
Moreover, XXX provides a consistent framework for security level analysis through our intuit… | 1 Replies |
balance models | Last post 16 Jun 16, 15:52 |
Complete a Test Drive with the different in balance modelsHallo — ich denke mal, dieser Satz… | 4 Replies |
distributed models | Last post 30 May 11, 10:03 |
It has been well recognized that heterogeneity of the population may play a substantial role… | 2 Replies |
curricular models | Last post 06 Nov 10, 19:57 |
What is the best way to say «curricular models», in the context of a four week placement in … | 5 Replies |
Modellvorstellungen — models | Last post 05 Dec 07, 10:16 |
Voruntersuchungen zur Stützung der Arbeitshypothesen, Ansätze, Modellvorstellungen und dere… | 0 Replies |
More |
Other actions
Find out more
In need of language advice? Get help from other users in our forums.
Edit your word lists
Sortieren Sie Ihre gespeicherten Vokabeln.
Search history
Sehen Sie sich Ihre letzten Suchanfragen an.
English ⇔ German — leo.org: Start page
SUCHWORT — Translation in LEO’s English ⇔ German Dictionary
LEO.org: Your online dictionary for English-German translations. Offering forums, vocabulary trainer and language courses. Also available as App!
Learn the translation for ‘SUCHWORT’ in LEO’s English ⇔ German dictionary. With noun/verb tables for the different cases and tenses ✓ links to audio pronunciation and relevant forum discussions ✓ free vocabulary trainer ✓
1. Introduction
The vocabulary of the language and the word formation mechanism that serves it, provide a variety of materials for observations, reflections and generalizations. The functioning of the language is associated with the disappearance of certain words, with the appearance of others, with shifts in the meaning of the third, with a change in the stylistic status of the fourth. Historical lexicology convinces us that words are created from the real material in the language and by its models, produced in modern language. The main ways of developing the vocabulary of the German language are word formation, changing the meaning of the word, leading to the appearance of homonyms and borrowings. Each of these paths has its own characteristics. With the help of word formation and change of meaning, the language was enriched with new words based on words already existed in it. With the help of borrowing, the vocabulary of one language is enriched by the vocabulary of another language.
2. Word Formation in German
Word formation in German was practiced by many foreign and domestic linguists. A major contribution to its study was made by E. S. Kubryakova, K. A. Levkovskaya, R. Z. Muriassov, M. D. Stepanova, V. Fleischer, V. Hentzen, T. Shippan, G. Schmidt (Schmidt, 2005) . Word formation, along with borrowing, is the most important way of enriching the vocabulary of the language. By analogy (models and patterns) with already existing lexemes with the help of morphemic and lexical material, word-building constructions are created. There are various models of word-building structures, which often differ in various ways in different researchers, since linguistics does not have a single interpretation and definition of the word-formation model as a unit of word-formation (Stepanova, 2007). The division is based on the types of word-building elements, their combination and hence the resulting word-formative meaning. Development in word formation does not consist in the emergence of new ways of word formation, but in the use of predominantly one or other model (Степанова, 2007) .
On the basis of the whole system of word formation, it should be noted that in the German language a major role in the derivation performs compounding, which is a leading way of word formation in German language at the present moment. Being a multifaceted, multidimensional and highly complex phenomenon, word, on the one hand, is often intertwined with the affixation and other means of word formation, on the other hand, is on the border with the syntax. Especially productive is compositing as a way of forming German nouns, which are distinguished by a great variety of their morphological composition. Prefixation is the same as compositing, a very ancient but productive way of word formation. It should be noted that almost all available prefixes are produced in modern German. Suffixation can also be classified as productive ways of word formation in modern German. However, in spite of a number of features that combine the suffix and half-suffix, one can speak about the apparent advantages in word-formation of semisuffles before suffixes. Firstly, many variants of the extended suffixes are unproductive, for example―aner, -aster, -iener, -eiser, -ianer, -iter, -ner, -ser etc. Foreign-language affixes practically do not participate in the word-production with German bases. Secondly, semi-suffixes that retain a part of the deep semantic structure (biological genus, etc.) are more informative, and the main goal of communication is the transfer of information. Unlike suffixation, word formation by changing the root as an independent way is unproductive in modern language, although words formed in this way are very numerous.
The interpenetration of systems of exogenous and common German word formation is manifested in the functioning of such a unit as a confix. The term “confix” was proposed by G. Schmidt in 1987 examining related morphemes based on the work of E. Fischer and the French linguists A. Martine. G. Schmidt distinguishes confixes as a kind of combination-frequency elements that are not used as a separate lexeme. However, in studies of the last decade, conducted by both foreign and Russian scientists, confix is recognized as a separate word-forming unit within the framework of exogenous word-formation. Most confixes are of Greek-Latin origin (aero-, biblio-, diskut-, fanat-, neo-, polit-, -drom, -graph, -krat, -naut, -phil, -phob, etc.). However, Eins V. suggest that the original German confixes (for example, schwieger-, stief-, -wart) be singled out (Eins, 2009) .
If we start from the whole system of word formation, it should be noted that in German the main role in word formation is performed by composing, remaining productive way of word-formation to the present day. In the XIX and XX centuries the binomial model prevails, productivity of the three-term model increases from the unit up to regular.
3. The Most Widespread Composites in the News and Specialized Texts
The most relevant and innovative part of the dictionary is the composition of modern German neologisms. They indirectly reflect mentality prevailing in society as an expression of the spirit of the time. The appearance of a large number of new words in modern German was primarily caused by geopolitical changes in Europe: first―by merging Germany and related events (die Osterweiterung, der Solidaritätszuschlag, die Ostalgie, Dunkeldeutschland), then―integration in Europe (das Euroland, das Eurogeld, die Eurozone, Teuro), the war in the Balkans (der Kosovo-Krieg, der Blauhelmeinsatz, der Kollateralschaden), an increase activity of terrorist organizations (der Anti-Terror-Krieg, die Milzbrandattacke, der Schläfer). Events in the domestic political life of Germany also gave impetus to the formation of a whole number of new words: the designation of new bills, political programs, reform projects, etc. (Job-Floater, Riester-Rente), economic realities (die Öko-Steuer, Ein-Euro-Job, das Sparpaket). Technical Progress, achievement of science, the emergence of new products of human material activity were refracted in the corresponding lexical innovations, for example: die Datenautobahn, der Stammzellenimport, die Organspende, etc. Replenishment vocabulary of modern German neologisms is due to the action of various linguistic factors, but the most productive of them is composing.
The German compound noun is different diversity of its morphological composition. The word-determinant can be expressed by basics, correlated with various parts of speech: noun, adjective, verb, number. In German there are complex words in which the first component is represented by a verb stem. For example, such German nouns like die Bewegungskraft―a driving force, das Tragbett―a portable bed, die Nähmaschine―a sewing machine. Complex words in German do not disappear and if available variability, for example, a golden Uhr = Golduhr. The possibility of combining different bases in complex word of German is boundless in that the extent to which there is an infinite variety of combinations words in syntactic phrases (Münch, 1990) .
Scientists believe that theoretically formal the possibility of forming a compound word in German is unlimited. But still there are some limitations on the semantic plan, for example: das Brigittenlächeln―smile like Brigitts; das Gefangenauto―a machine in which they carried captives. Without context, the meaning is not understood by these words. Thus, the semantic limitations manifested in the fact that in many cases for the semantization of a complex noun is necessary external context. The glossary is presented in German not only in the system the noun, but also in the verb system, the name of the adjective. In the system of the adjective in German we can observe a large variety of structural models of a complex adjective, a wide variety of semantic relations, expressed by them (Müller, 2005) .
Thus, German is widely represented complex adjectives expressing relations comparison or amplification, for example, kreideweiß, bleißschwer, todunglücklich and others.
Focusing on the dictionary of word-building elements by A.N. Zuev, 71 prefixes of nouns can be found. Almost all available prefixes are productive in modern German language. The semantic feature of prefixing is that the prefix generalizes a certain attribute characteristic of a number of objects or phenomena. For example, un- and miβ- prefixes express negativity, the prefix ur- is the concept of the ancient or the original. Also the most frequent and productive are such prefixes of nouns, like ge-, erz-.
Suffixation, like prefixation, is an ancient and at the same time, a productive way of word formation in modern German.
The suffix, like the prefix, is a word-building morpheme, that is, a significant unit of language not found in modern language as an independent lexical unit. The suffix not only creates a new word, but also formalizes the corresponding part of speech, in a number of cases, accurately determining the nature of its grammatical changes. The suffixes of nouns define simultaneously the genus, the type of declension and the formation of plural nouns.
Thus, suffixation is closer than other methods of word formation, it comes into contact with grammar and morphology. Suffix, as well as the prefix, is a characteristic indicator of the lexical generalization, since it indicates that this concept belongs to that or a different class of concepts (Donalies, 2006) .
Thus, suffixes of nouns -er and -in indicate the belonging of the given word to the class of words denoting male or female characters, for example:
Lehrer-Lehrerin; Manager-Managerin; Chef-Chefin. Word formation by means of suffixation is a stable way of enriching the vocabulary of the German language. The suffixes of the German language are a system, stable in its basis, but changing and replenishing in the process of language development. At the moment there are about 50 suffixes of nouns.
There are such frequency and productive suffixes as -ling, -heit, -er, -chen, -tum (Donalies, 2007) .
Compounding and affixation are closely related. Many affixes of the modern German language arose on the basis of components of complex words. This process is natural and historically justified. As a result of desemantization, the second component of the word has such suffixes, as -schaft, -heit, -tum, -sam, -bar, etc. Suffixes do not always retain their sound composition. For example, -schaft, -tum are formed from the Old German noun scaft, tuom with the meaning “Beschaffenheit”, “Zustand”, “Eigenschaft”. Since the meaning of the second component is not guessed by native speakers, it is only a means of word formation and does not possess the lexical meaning. The suffix can receive various other meanings over time. The suffix -heit in the Middle German period denoted “Weise”, “Art”, “Lage”. This value still exists today in some dialects (Duden, 2000) .
Among the verbs being analyzed, the compositional, as in the whole in German language, is the most productive way of word formation. Complex verbs constitute the largest number of new words (47% of the total sample). Depending on the number of components simultaneously participating in the formative act, two-component and multicomponent (there are three or more components) models, which in turn are characterized by determinative or indefinite relations. V. Fleisher refers the latter to the copulative word formation, for example, grinskeuchen, rollrasseln (Fleischer, 1995). In our material, only two-component determinate composites are marked, for example, fernheizen―“heat”, sich totarbeiten―“work to the point of exhaustion”. The first component serves as the definition of the second, which, in turn, gives a general morphological and semantic-categorical characteristic of the connection: fettfüttern―a complex verb with meaning “fatten”.
Different parts of speech are the first component of the complex verb. A large spread in modern German language was given by complex verbs with the first component―the dialect (75% of the total number of complex verbs presented in the sample). The most productive among them: weg- (wegdösen―“fall asleep”), weiter- (weiterverhandeln―“to continue negotiation”), zusammen- (“zusammenmixen―mix”), rein- (reinhämmern―“work hard”), heraus- (herauszüchten―“withdraw, grow (new breed of animals)”), herum- (herumkommandieren―“the time to always command someone”), herunter- (herunterstufen―“enroll in a lower category”) (Fleischer, 1995) .
The second place in terms of productivity is taken by the adjective (22% of the total number of complex verbs presented in the sample), for example, losträllern―“sing (song) without text and clear articulation of words”, hochhasten―“to hurry.” Complex verbs with the first component of the noun (3%), for example, mondlanden―“make a landing on the moon”. In our material, the first frequency components are more frequent (68% of the total number of complex verbs) than the second frequency components (22%). It should also be noted that the verbal composition is significantly replenished due to the tendency of stable word combinations to transition to complex verbs, for example, such heißreden―“to talk, to enter into a rage (during a conversation)”, sich querlegen―“to resist, resist (to someone, anything).”
The semiprecipitation is 26% of the whole body of verbs being studied and takes the second place in terms of productivity after composing. In this the group is dominated by verbs formed by semi-prefixes that slightly change the semantics of the original verb, for example, ab- (abfilmen―“cinema, spoken (for) remove” (filmen―“to produce (movie), shooting of smth., take a picture off smth. for the cinema”)), ein- (einkurven―“aviation, landing” (kurven―“Aviation. do a turn [turn]”)), an- (anbaden―“open the bathing season” (baden―“bathe, wash”)), durch- (durchleiden―“(a certain time, situation), suffering to experience” (leiden―“to suffer from something, tolerate,”)), aus- (auspennen―“to have a good night’s sleep” (pennen―“to be able to sleep, but chew, stumble overnight”)), mit- (mitliefern―“simultaneously with something deliver (liefern―“deliver (goods); deliver”)). When creating a goal can be used and semi-prefix nouns, for example, zwischennutzen―“use in between something” (nutzen―“to be useful for smbd./smth., to do good, help someone/do something.”).
The remaining methods of verbal word formation, namely, prefixation, suffixation and verbalization, are less presented in modern German.
The most productive prefixes used to create verbs by prefixing (5% of the total number of subjects units), are ver-, be-, ent-, er- and zer-, for example, verchartern―“surrender to lease a ship, a plane” (chartern―“charter a vessel”), entkalken―“remove calcareous deposits”.
2) Agricultural production. fertilize with lime beknien―“strongly and persistently ask” (knien―“to kneel”).
However, prefixes of foreign origin such as de-, kor-, re-, inter- are distinguished only for verbs created from the borrowed model (from foreign elements) using the suffix -ier―(1.8% of the total verbs), for example, renaturieren―“to restore in nature, recultivate”, depopulieren―“depopulate”.
In the study of verbalization (9%), verbs formed rom the fundamentals of nouns (the most productive model), for example, birnen―“to feel the impact of a drug or alcohol” (die Birne―“1) pear (fetus), 2) (electric) light bulb”), as well as from the bases of the adjective―“mucht” from him comes an unpleasant smell (“mucht” 1 hungry, 2. weak, languid, tired, tired, exhausted”).
From the bases of nouns, verbs of the most diverse values. When analyzing verbs of this type of word formation, we adhered to the semantic classification presented by Duden (Duden, 2000) .
As a result of the study, the following vertex groups were identified:
1) verbs of comparison formed from names of persons: flapsen―“fooling around” (der Flaps), eumeln―“celebrate, be mischievous and joyful, have fun” (der Eumel―“youthful, handsome, handsome man”); from the names of animals: dackeln―“go (slowly)” (der Dackel―“dachshund (breed dogs)”);
2) abstract verbs based on abstract names, for example, boomen―“to experience a boom (a sharp rise)” (der Boom―“boom,(sharp) rise”), fighten―“(sport.) persistently (actively) fight” (der Fight―“competition”);
3) ornamental verbs from abstract names: punchen―“beat” (der Punch), stressen―“create (physical, psychological) stress” (der Stress―“stress, stress state”);
4) verbs with instrumental meaning, formed from names items: faxen―“send by fax” (das Fax―“fax”), düsen―“go to motorcycle” (die Düse―“technical nozzle; nozzle; jet; nozzle; mouthpiece”);
5) locative verbs with the basis of nouns denoting the place or direction: saunen―“soak in the Finnish sauna [in the sauna]” (die Sauna―“Sauna, Finnish sauna”); shoppen―“buy” (das Shop―“shop”).
There are verbs that form several values at once: liften―1) “climb on a ski lift for skiers” (instrumental); 2) “Inspire; strengthen, enhance” (ornamental) (Duden, 2000) .
From the stem of adjectives, modern verbs are formed much less often.
4. Conclusion
Word formation is the most important way of enriching the vocabulary of the language. The basic methods of word formation include: changing the root of the word, changing the word from one lexical-grammatical class to another, composing, prefixing, and suffixation. However, the mentioned word formation methods differ in their degree of productivity.
It is noted that German language has a different way of word formation. In nouns to productive ways of word formation are compounding, prefixing, suffixing, and model derivation of the implicit: the verb―prefixation, compounding and derivation; adjectives―suffixes, word prefixation. It should also pay attention to grammar, morphology, syntax and semantics of the language-pending system of word formation in the German language, because derivation is included in the morphology taken into account its relationship with the vocabulary and semantics, so we need to pay attention to developments in this semantic phenomenon, word formation morpheme based on certain values. There are many features in the grammar of the German language. But the system of word formation of the German language is very similar to the system of word formation in other languages. This paper analyzed all these studies in a comparative form and drew attention to some controversial results of them. Based on the whole system of word formation, it should be noted that in German the main role in word formation is played by composing, which remains the leading way of word formation in the German language at the moment.
Prefixation is the same, as well as composing, a very ancient, but productive way of word formation. It should be noted that almost all available prefixes are produced in modern German. Suffixation can also be classified as productive way of word formation in modern German. However, despite a number of features that combine the suffix and half-suffix, one can speak of visible advantages in word formation of half suffixes before suffixes.
First, many variants of extended suffixes are unproductive, for example -aner, -aster, -iener, -eiser, -ianer, -iter, -ner, -ser, etc. Secondly, semi-suffixes that retain a part of the deep semantic structure (biological gender, etc.), are more informative, and the main purpose of communication is the transfer of information. Unlike suffixation, word formation by changing the root as an independent way is unproductive in modern language, although words formed in this way are very numerous.
Welcome
In my bachelor thesis I trained German word embeddings with gensim’s word2vec library and evaluated them with generated test sets. This page offers an overview about the project and download links for scripts, source and evaluation files. The whole project is licensed under MIT license.
Training and Evaluation
I found the following parameter configuration to be optimal to train german language models with word2vec:
- a corpus as big as possible (and as diverse as possible without being informal)
- filtering of punctuation and stopwords
- forming bigramm tokens
- using skip-gram as training algorithm with hierarchical softmax
- window size between 5 and 10
- dimensionality of feature vectors of 300 or more
- using negative sampling with 10 samples
- ignoring all words with total frequency lower than 50
The following table shows some training stats for training a model with the above specification:
training time | 6,16 h |
training speed | 26626 words/s |
vocab size | 608.130 words |
corpus size | 651.219.519 words |
model size | 720 MB |
To train this model, you can take the following snippets after downloading this toolkit and navigating to its directory, where the preprocessing.py and the training.py script are used.
Make working directories:
mkdir corpus
mkdir model
Build news corpus:
wget http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.de.shuffled.gz
gzip -d news.2013.de.shuffled.gz
python preprocessing.py news.2013.de.shuffled corpus/news.2013.de.shuffled.corpus -psub
rm news.2013.de.shuffled.gz
Build wikipedia corpus:
wget http://download.wikimedia.org/dewiki/latest/dewiki-latest-pages-articles.xml.bz2
wget http://medialab.di.unipi.it/Project/SemaWiki/Tools/WikiExtractor.py
python WikiExtractor.py -c -b 25M -o extracted dewiki-latest-pages-articles.xml.bz2
find extracted -name '*bz2' ! -exec bzip2 -k -c -d {} ; > dewiki.xml
printf "Number of articles: "
grep -o "<doc" dewiki.xml | wc -w
sed -i 's/<[^>]*>//g' dewiki.xml
rm -rf extracted
python preprocessing.py dewiki.xml corpus/dewiki.corpus -psub
rm dewiki.xml
Training:
python training.py corpus/ model/my.model -s 300 -w 5 -n 10 -m 50
Subsequently the evaluation.py script can be used to evaluate the trained model:
python evaluation.py model/my.model -u -t 10
Further examples and code explanation can be found in the following ipython notebooks:
Semantic arithmetic
With basic vector arithmetic it’s possible to show the meaning of words that are representable by the model. Therefore the vectors are added or subtracted and with the help of the cosine similarity the vector(s) that are nearest to the result can be found. In the following, some interesting examples are shown:
Frau + Kind = Mutter (0,831)
Frau + Hochzeit = Ehefrau (0,795)
A common family relationship: a woman with a child added is a mother. In word2vec terms: adding the vector of child
to the vector of woman
results in a vector which is closest to mother
with a comparatively high cosine similarity of 0,831. In the same way a woman
with a wedding
results in a wife
.
Obama - USA + Russland = Putin (0,780)
The model is able to find a leader to a given country. Here Obama
without USA
is the feature for a country leader. Adding this feature to Russia
results in Putin
. It’s also successful for other countries.
Verwaltungsgebaeude + Buecher = Bibliothek (0,718)
Verwaltungsgebaeude + Buergermeister = Rathaus (0,746)
Haus + Filme = Kino (0,713)
Haus + Filme + Popcorn = Kino (0,721)
The relationship between a building and its function is found correctly. Here an administration building
with books
is logically the library
and an administration building
with a mayor
is the city hall
. Moreover a house
with movies
results in a cinema
. Note that when adding popcorn
to the equation, the resulting vector gets a little closer to the vector of the word cinema
.
Becken + Wasser = Schwimmbecken (0,790)
Sand + Wasser = Schlamm (0,792)
Meer + Sand = Strand (0,725)
Some nice examples with water: sand
and water
result in mud
, sea
and sand
result in beach
and a basin
with water
is a pool
.
Planet + Wasser = Erde (0,717)
Planet - Wasser = Pluto (0,385)
The main feature of our planet is correctly represented by the model: a planet
with water
is the earth
, while a planet
without water
is Pluto
. That’s not quite accurate, because Pluto is made of water ice to one third…
Kerze + Feuerzeug = brennende_Kerze (0,768)
Here is quite a good example for a semantically correct guess of a bigram token: candle
and lighter
result in a burning_candle
.
The examples shown above are the results of a quick manual search for useful vector equations in the model. There are more amazing semantic relations for sure.
Visualizing features with PCA
The Principal Component Analysis is a method to reduce the number of dimensions of high-dimensional vectors, while keeping main features (= the principal components). Therefore the 300 dimensions of the vectors of my German language model were reduced to a two-dimensional representation and plotted with pythons matplotlib for some word classes.
british_pounds
is here more accurate then just pounds
because of multiple meanings of the word. Same with US-Dollar
and Dollar
.
The plots above are created with the visualize.py script of this project. Some further examples and code explanation can be found in the PCA ipython notebook.
Download
Model
The German language model, trained with word2vec on the German Wikipedia (15th May 2015) and German news articles (15th May 2015):
german.model [704 MB]
Syntactic Questions
10k questions with German umlauts:
syntactic.questions
The same 10k questions with transformed German umlauts:
syntactic.questions.nouml
Evaluation source files:
adjectives.txt
nouns.txt
verbs.txt
Semantic Questions
300 opposite questions with German umlauts:
semantic_op.questions
The same 300 opposite questions with transformed German umlauts:
semantic_op.questions.nouml
540 best match questions with German umlauts:
semantic_bm.questions
The same 540 best match questions with transformed German umlauts:
semantic_bm.questions.nouml
110 doesn’t fit questions with German umlauts:
semantic_df.questions
The same 110 doesn’t fit questions with transformed German umlauts:
semantic_df.questions.nouml
Evaluation source files:
opposite.txt
bestmatch.txt
doesntfit.txt