German word for model

The German word for model is
Modell

model

Gender

The gender of Modell is neuter. E.g. das Modell.

Plural

The plural of Modell is Modelle.

German Definition

model
     Substantiv:
     [1] Abbild, Form, Modell, Vorbild
          [1] „A scientific model seeks to represent empirical objects, phenomena, and physical processes in a logical and objective way. All models are in simulacra, that is, simplified reflections of reality, but, despite their inherent falsity, they are nevertheless extremely useful.“
            Ein wissenschaftliches Modell versucht, empirische Objekte, Phänomene und physikalische Prozesse auf eine logische und unabhängige Art und Weise darzustellen. Alle Modelle sind in Simulacra, das heißt vereinfachte Abbilde der Realität, aber trotz ihre innewohnenden Falschheit sind sie dennoch extrem nützlich.

Translations for model and their definitions

Model
     n-n. (person who serves as a subject for fashion) model
Vorbild
     n-n. model (praiseworthy example)
Modell pronunciation
     n-n. A model of an object
     n-n. A theoretical model
Mannequin
     n-n. (person who serves as a subject for fashion) model

Pronunciation

pronunciation

Dictionary

More German words for Professions
All vocabulary sets

Random Quiz:
Wie lautet das Wort für to swallow?

Start learning German vocabulary

From Wikipedia, the free encyclopedia

Look up Modell or modell in Wiktionary, the free dictionary.

Modell is the German word for «model» and also a surname. It may refer to:

People[edit]

  • Arnold Modell (1924–2022), American professor of social psychiatry
  • Art Modell (1925–2012), American business executive and sports team owner
  • Bernadette Modell, (born 1935), British geneticist
  • David Modell (1961–2017), American business executive and sports team owner
  • Frank Modell (1917-2016), American cartoonist
  • Merriam Modell (1908–1994), American author of pulp fiction
  • Pat Modell (1931–2011), American TV actress
  • Rod Modell, given name for Deepchord, electronic music producer from Detroit, Michigan
  • William Modell (1921–2008), American businessman and chairman of Modell’s Sporting Goods

Companies[edit]

  • Modell’s, a sporting goods retailer based in New York City
  • Modell (pawn shop), a pawnbroker based in New York City, originally formed as a spinoff of the sporting goods company
  • Schabak Modell, a die-cast toy producer in Germany
  • Schuco Modell, a die-cast toy producer in Germany

Media and entertainment[edit]

  • Das Modell, a song recorded by the electro-pop group Kraftwerk
  • Modell Bianka, a 1951 East German film
  • Robert Patrick Modell, a character in episodes of the TV series X-files

Other uses[edit]

  • Berliner Modell, a learning theory
  • Modell M and Modell S, types of Mauser bolt-action rifles
  • V-Modell, an software development model

See also[edit]

  • Model (disambiguation)
  • Modella, Victoria, a rural locality in Australia
  • Micky Modelle, a music DJ and producer
  • Modello, the Italian word for «model» or preparatory study for a work of art

Model Meaning in German

You have searched the English word model meaning in German Modell. model meaning has been search 5195 (five thousand one hundred and ninety-five) times till 4/13/2023. You can also find model meaning and Translation in Urdu, Hindi, Arabic, Spanish, French and other languages.

English — German

German — English

Definition & Synonyms

• Model
Definition & Meaning

  1. (n.) Something intended to serve, or that may serve, as a pattern of something to be made; a material representation or embodiment of an ideal; sometimes, a drawing; a plan; as, the clay model of a sculpture; the inventors model of a machine.
  2. (n.) A person who poses as a pattern to an artist.
  3. (n.) Any copy, or resemblance, more or less exact.
  4. (n.) That by which a thing is to be measured; standard.
  5. (n.) A miniature representation of a thing, with the several parts in due proportion; sometimes, a facsimile of the same size.
  6. (v. i.) To make a copy or a pattern; to design or imitate forms; as, to model in wax.
  7. (v. t.) To plan or form after a pattern; to form in model; to form a model or pattern for; to shape; to mold; to fashion; as, to model a house or a government; to model an edifice according to the plan delineated.
  8. (a.) Suitable to be taken as a model or pattern; as, a model house; a model husband.
  9. (n.) Anything which serves, or may serve, as an example for imitation; as, a government formed on the model of the American constitution; a model of eloquence, virtue, or behavior.

Multi Language Dictionary

MULTI LANGUAGE DICTIONARY

Nouns::Verbs::Definitions::Examples::Similar::Discussions:: 

Verbs

    to model so./sth.  | modeled/modelled, modeled/modelled |   jmdn./etw. modellieren  | modellierte, modelliert |
    to model sth.  | modeled/modelled, modeled/modelled |   — e.g. behaviour   etw.acc. vorleben  | lebte vor, vorgelebt |
    to model sth.  | modeled/modelled, modeled/modelled |   — e.g. clay   etw.acc. formen  | formte, geformt |
    to model sth. on sth.   etw.nom. nach etw.acc. modellieren  | modellierte, modelliert |
    to model sth.  | modeled/modelled, modeled/modelled |   — a system, process   ein Modell von etw.dat. erstellen
    to model sth.  | modeled/modelled, modeled/modelled |   etw.acc. entwickeln  | entwickelte, entwickelt |
    to model oneself on so.   jmdn. nachahmen  | ahmte nach, nachgeahmt |   — jmdn. zum Vorbild nehmen
    to model oneself on so.   sichdat. jmdn. zum Vorbild nehmen
    to act as a model to a painter   einem Maler Modell sitzen
    to act as a model to a painter   einem Maler Modell stehen

Forum discussions containing the search term

Models werben Models Last post 07 Jun 11, 11:17
wie übersetzt man das am besten? so im sinne von kunden werben kunden? models recruit models? 3 Replies
Models Last post 13 Oct 10, 09:33
Can there be two different models of the same thing? For example: — This heater comes in t… 5 Replies
exploratory models Last post 02 Apr 08, 10:28
The exploratory models are based on the assumption that prior theory can provide no indicati… 1 Replies
policy models Last post 28 Sep 08, 11:32
«…Economic information includes the economic data the central bank uses, the policy models… 29 Replies
Projection models…. Last post 02 Jun 07, 20:17
Bin mir beim fett gedruckten Bereich nicht sicher: CIR=konstanter Zinssatz For a CIR, most … 1 Replies
covariance models Last post 10 May 05, 17:05
Moreover, XXX provides a consistent framework for security level analysis through our intuit… 1 Replies
balance models Last post 16 Jun 16, 15:52
Complete a Test Drive with the different in balance modelsHallo — ich denke mal, dieser Satz… 4 Replies
distributed models Last post 30 May 11, 10:03
It has been well recognized that heterogeneity of the population may play a substantial role… 2 Replies
curricular models Last post 06 Nov 10, 19:57
What is the best way to say «curricular models», in the context of a four week placement in … 5 Replies
Modellvorstellungen — models Last post 05 Dec 07, 10:16
Voruntersuchungen zur Stützung der Arbeitshypothesen, Ansätze, Modellvorstellungen und dere… 0 Replies

More

Other actions

Find out more

Find out more

In need of language advice? Get help from other users in our forums.

Edit your word lists

Edit your word lists

Sortieren Sie Ihre gespeicherten Vokabeln.

Search history

Search history

Sehen Sie sich Ihre letzten Suchanfragen an.

English ⇔ German — leo.org: Start page

SUCHWORT — Translation in LEO’s ­English ⇔ German Dictionary

LEO.org: Your online dictionary for ­English-German­ translations. Offering forums, vocabulary trainer and language courses. Also available as App!

Learn the translation for ‘SUCHWORT’ in LEO’s ­English ⇔ German­ dictionary. With noun/verb tables for the different cases and tenses ✓ links to audio pronunciation and relevant forum discussions ✓ free vocabulary trainer ✓

1. Introduction

The vocabulary of the language and the word formation mechanism that serves it, provide a variety of materials for observations, reflections and generalizations. The functioning of the language is associated with the disappearance of certain words, with the appearance of others, with shifts in the meaning of the third, with a change in the stylistic status of the fourth. Historical lexicology convinces us that words are created from the real material in the language and by its models, produced in modern language. The main ways of developing the vocabulary of the German language are word formation, changing the meaning of the word, leading to the appearance of homonyms and borrowings. Each of these paths has its own characteristics. With the help of word formation and change of meaning, the language was enriched with new words based on words already existed in it. With the help of borrowing, the vocabulary of one language is enriched by the vocabulary of another language.

2. Word Formation in German

Word formation in German was practiced by many foreign and domestic linguists. A major contribution to its study was made by E. S. Kubryakova, K. A. Levkovskaya, R. Z. Muriassov, M. D. Stepanova, V. Fleischer, V. Hentzen, T. Shippan, G. Schmidt (Schmidt, 2005) . Word formation, along with borrowing, is the most important way of enriching the vocabulary of the language. By analogy (models and patterns) with already existing lexemes with the help of morphemic and lexical material, word-building constructions are created. There are various models of word-building structures, which often differ in various ways in different researchers, since linguistics does not have a single interpretation and definition of the word-formation model as a unit of word-formation (Stepanova, 2007). The division is based on the types of word-building elements, their combination and hence the resulting word-formative meaning. Development in word formation does not consist in the emergence of new ways of word formation, but in the use of predominantly one or other model (Степанова, 2007) .

On the basis of the whole system of word formation, it should be noted that in the German language a major role in the derivation performs compounding, which is a leading way of word formation in German language at the present moment. Being a multifaceted, multidimensional and highly complex phenomenon, word, on the one hand, is often intertwined with the affixation and other means of word formation, on the other hand, is on the border with the syntax. Especially productive is compositing as a way of forming German nouns, which are distinguished by a great variety of their morphological composition. Prefixation is the same as compositing, a very ancient but productive way of word formation. It should be noted that almost all available prefixes are produced in modern German. Suffixation can also be classified as productive ways of word formation in modern German. However, in spite of a number of features that combine the suffix and half-suffix, one can speak about the apparent advantages in word-formation of semisuffles before suffixes. Firstly, many variants of the extended suffixes are unproductive, for example―aner, -aster, -iener, -eiser, -ianer, -iter, -ner, -ser etc. Foreign-language affixes practically do not participate in the word-production with German bases. Secondly, semi-suffixes that retain a part of the deep semantic structure (biological genus, etc.) are more informative, and the main goal of communication is the transfer of information. Unlike suffixation, word formation by changing the root as an independent way is unproductive in modern language, although words formed in this way are very numerous.

The interpenetration of systems of exogenous and common German word formation is manifested in the functioning of such a unit as a confix. The term “confix” was proposed by G. Schmidt in 1987 examining related morphemes based on the work of E. Fischer and the French linguists A. Martine. G. Schmidt distinguishes confixes as a kind of combination-frequency elements that are not used as a separate lexeme. However, in studies of the last decade, conducted by both foreign and Russian scientists, confix is recognized as a separate word-forming unit within the framework of exogenous word-formation. Most confixes are of Greek-Latin origin (aero-, biblio-, diskut-, fanat-, neo-, polit-, -drom, -graph, -krat, -naut, -phil, -phob, etc.). However, Eins V. suggest that the original German confixes (for example, schwieger-, stief-, -wart) be singled out (Eins, 2009) .

If we start from the whole system of word formation, it should be noted that in German the main role in word formation is performed by composing, remaining productive way of word-formation to the present day. In the XIX and XX centuries the binomial model prevails, productivity of the three-term model increases from the unit up to regular.

3. The Most Widespread Composites in the News and Specialized Texts

The most relevant and innovative part of the dictionary is the composition of modern German neologisms. They indirectly reflect mentality prevailing in society as an expression of the spirit of the time. The appearance of a large number of new words in modern German was primarily caused by geopolitical changes in Europe: first―by merging Germany and related events (die Osterweiterung, der Solidaritätszuschlag, die Ostalgie, Dunkeldeutschland), then―integration in Europe (das Euroland, das Eurogeld, die Eurozone, Teuro), the war in the Balkans (der Kosovo-Krieg, der Blauhelmeinsatz, der Kollateralschaden), an increase activity of terrorist organizations (der Anti-Terror-Krieg, die Milzbrandattacke, der Schläfer). Events in the domestic political life of Germany also gave impetus to the formation of a whole number of new words: the designation of new bills, political programs, reform projects, etc. (Job-Floater, Riester-Rente), economic realities (die Öko-Steuer, Ein-Euro-Job, das Sparpaket). Technical Progress, achievement of science, the emergence of new products of human material activity were refracted in the corresponding lexical innovations, for example: die Datenautobahn, der Stammzellenimport, die Organspende, etc. Replenishment vocabulary of modern German neologisms is due to the action of various linguistic factors, but the most productive of them is composing.

The German compound noun is different diversity of its morphological composition. The word-determinant can be expressed by basics, correlated with various parts of speech: noun, adjective, verb, number. In German there are complex words in which the first component is represented by a verb stem. For example, such German nouns like die Bewegungskraft―a driving force, das Tragbett―a portable bed, die Nähmaschine―a sewing machine. Complex words in German do not disappear and if available variability, for example, a golden Uhr = Golduhr. The possibility of combining different bases in complex word of German is boundless in that the extent to which there is an infinite variety of combinations words in syntactic phrases (Münch, 1990) .

Scientists believe that theoretically formal the possibility of forming a compound word in German is unlimited. But still there are some limitations on the semantic plan, for example: das Brigittenlächeln―smile like Brigitts; das Gefangenauto―a machine in which they carried captives. Without context, the meaning is not understood by these words. Thus, the semantic limitations manifested in the fact that in many cases for the semantization of a complex noun is necessary external context. The glossary is presented in German not only in the system the noun, but also in the verb system, the name of the adjective. In the system of the adjective in German we can observe a large variety of structural models of a complex adjective, a wide variety of semantic relations, expressed by them (Müller, 2005) .

Thus, German is widely represented complex adjectives expressing relations comparison or amplification, for example, kreideweiß, bleißschwer, todunglücklich and others.

Focusing on the dictionary of word-building elements by A.N. Zuev, 71 prefixes of nouns can be found. Almost all available prefixes are productive in modern German language. The semantic feature of prefixing is that the prefix generalizes a certain attribute characteristic of a number of objects or phenomena. For example, un- and miβ- prefixes express negativity, the prefix ur- is the concept of the ancient or the original. Also the most frequent and productive are such prefixes of nouns, like ge-, erz-.

Suffixation, like prefixation, is an ancient and at the same time, a productive way of word formation in modern German.

The suffix, like the prefix, is a word-building morpheme, that is, a significant unit of language not found in modern language as an independent lexical unit. The suffix not only creates a new word, but also formalizes the corresponding part of speech, in a number of cases, accurately determining the nature of its grammatical changes. The suffixes of nouns define simultaneously the genus, the type of declension and the formation of plural nouns.

Thus, suffixation is closer than other methods of word formation, it comes into contact with grammar and morphology. Suffix, as well as the prefix, is a characteristic indicator of the lexical generalization, since it indicates that this concept belongs to that or a different class of concepts (Donalies, 2006) .

Thus, suffixes of nouns -er and -in indicate the belonging of the given word to the class of words denoting male or female characters, for example:

Lehrer-Lehrerin; Manager-Managerin; Chef-Chefin. Word formation by means of suffixation is a stable way of enriching the vocabulary of the German language. The suffixes of the German language are a system, stable in its basis, but changing and replenishing in the process of language development. At the moment there are about 50 suffixes of nouns.

There are such frequency and productive suffixes as -ling, -heit, -er, -chen, -tum (Donalies, 2007) .

Compounding and affixation are closely related. Many affixes of the modern German language arose on the basis of components of complex words. This process is natural and historically justified. As a result of desemantization, the second component of the word has such suffixes, as -schaft, -heit, -tum, -sam, -bar, etc. Suffixes do not always retain their sound composition. For example, -schaft, -tum are formed from the Old German noun scaft, tuom with the meaning “Beschaffenheit”, “Zustand”, “Eigenschaft”. Since the meaning of the second component is not guessed by native speakers, it is only a means of word formation and does not possess the lexical meaning. The suffix can receive various other meanings over time. The suffix -heit in the Middle German period denoted “Weise”, “Art”, “Lage”. This value still exists today in some dialects (Duden, 2000) .

Among the verbs being analyzed, the compositional, as in the whole in German language, is the most productive way of word formation. Complex verbs constitute the largest number of new words (47% of the total sample). Depending on the number of components simultaneously participating in the formative act, two-component and multicomponent (there are three or more components) models, which in turn are characterized by determinative or indefinite relations. V. Fleisher refers the latter to the copulative word formation, for example, grinskeuchen, rollrasseln (Fleischer, 1995). In our material, only two-component determinate composites are marked, for example, fernheizen―“heat”, sich totarbeiten―“work to the point of exhaustion”. The first component serves as the definition of the second, which, in turn, gives a general morphological and semantic-categorical characteristic of the connection: fettfüttern―a complex verb with meaning “fatten”.

Different parts of speech are the first component of the complex verb. A large spread in modern German language was given by complex verbs with the first component―the dialect (75% of the total number of complex verbs presented in the sample). The most productive among them: weg- (wegdösen―“fall asleep”), weiter- (weiterverhandeln―“to continue negotiation”), zusammen- (“zusammenmixen―mix”), rein- (reinhämmern―“work hard”), heraus- (herauszüchten―“withdraw, grow (new breed of animals)”), herum- (herumkommandieren―“the time to always command someone”), herunter- (herunterstufen―“enroll in a lower category”) (Fleischer, 1995) .

The second place in terms of productivity is taken by the adjective (22% of the total number of complex verbs presented in the sample), for example, losträllern―“sing (song) without text and clear articulation of words”, hochhasten―“to hurry.” Complex verbs with the first component of the noun (3%), for example, mondlanden―“make a landing on the moon”. In our material, the first frequency components are more frequent (68% of the total number of complex verbs) than the second frequency components (22%). It should also be noted that the verbal composition is significantly replenished due to the tendency of stable word combinations to transition to complex verbs, for example, such heißreden―“to talk, to enter into a rage (during a conversation)”, sich querlegen―“to resist, resist (to someone, anything).”

The semiprecipitation is 26% of the whole body of verbs being studied and takes the second place in terms of productivity after composing. In this the group is dominated by verbs formed by semi-prefixes that slightly change the semantics of the original verb, for example, ab- (abfilmen―“cinema, spoken (for) remove” (filmen―“to produce (movie), shooting of smth., take a picture off smth. for the cinema”)), ein- (einkurven―“aviation, landing” (kurven―“Aviation. do a turn [turn]”)), an- (anbaden―“open the bathing season” (baden―“bathe, wash”)), durch- (durchleiden―“(a certain time, situation), suffering to experience” (leiden―“to suffer from something, tolerate,”)), aus- (auspennen―“to have a good night’s sleep” (pennen―“to be able to sleep, but chew, stumble overnight”)), mit- (mitliefern―“simultaneously with something deliver (liefern―“deliver (goods); deliver”)). When creating a goal can be used and semi-prefix nouns, for example, zwischennutzen―“use in between something” (nutzen―“to be useful for smbd./smth., to do good, help someone/do something.”).

The remaining methods of verbal word formation, namely, prefixation, suffixation and verbalization, are less presented in modern German.

The most productive prefixes used to create verbs by prefixing (5% of the total number of subjects units), are ver-, be-, ent-, er- and zer-, for example, verchartern―“surrender to lease a ship, a plane” (chartern―“charter a vessel”), entkalken―“remove calcareous deposits”.

2) Agricultural production. fertilize with lime beknien―“strongly and persistently ask” (knien―“to kneel”).

However, prefixes of foreign origin such as de-, kor-, re-, inter- are distinguished only for verbs created from the borrowed model (from foreign elements) using the suffix -ier―(1.8% of the total verbs), for example, renaturieren―“to restore in nature, recultivate”, depopulieren―“depopulate”.

In the study of verbalization (9%), verbs formed rom the fundamentals of nouns (the most productive model), for example, birnen―“to feel the impact of a drug or alcohol” (die Birne―“1) pear (fetus), 2) (electric) light bulb”), as well as from the bases of the adjective―“mucht” from him comes an unpleasant smell (“mucht” 1 hungry, 2. weak, languid, tired, tired, exhausted”).

From the bases of nouns, verbs of the most diverse values. When analyzing verbs of this type of word formation, we adhered to the semantic classification presented by Duden (Duden, 2000) .

As a result of the study, the following vertex groups were identified:

1) verbs of comparison formed from names of persons: flapsen―“fooling around” (der Flaps), eumeln―“celebrate, be mischievous and joyful, have fun” (der Eumel―“youthful, handsome, handsome man”); from the names of animals: dackeln―“go (slowly)” (der Dackel―“dachshund (breed dogs)”);

2) abstract verbs based on abstract names, for example, boomen―“to experience a boom (a sharp rise)” (der Boom―“boom,(sharp) rise”), fighten―“(sport.) persistently (actively) fight” (der Fight―“competition”);

3) ornamental verbs from abstract names: punchen―“beat” (der Punch), stressen―“create (physical, psychological) stress” (der Stress―“stress, stress state”);

4) verbs with instrumental meaning, formed from names items: faxen―“send by fax” (das Fax―“fax”), düsen―“go to motorcycle” (die Düse―“technical nozzle; nozzle; jet; nozzle; mouthpiece”);

5) locative verbs with the basis of nouns denoting the place or direction: saunen―“soak in the Finnish sauna [in the sauna]” (die Sauna―“Sauna, Finnish sauna”); shoppen―“buy” (das Shop―“shop”).

There are verbs that form several values at once: liften―1) “climb on a ski lift for skiers” (instrumental); 2) “Inspire; strengthen, enhance” (ornamental) (Duden, 2000) .

From the stem of adjectives, modern verbs are formed much less often.

4. Conclusion

Word formation is the most important way of enriching the vocabulary of the language. The basic methods of word formation include: changing the root of the word, changing the word from one lexical-grammatical class to another, composing, prefixing, and suffixation. However, the mentioned word formation methods differ in their degree of productivity.

It is noted that German language has a different way of word formation. In nouns to productive ways of word formation are compounding, prefixing, suffixing, and model derivation of the implicit: the verb―prefixation, compounding and derivation; adjectives―suffixes, word prefixation. It should also pay attention to grammar, morphology, syntax and semantics of the language-pending system of word formation in the German language, because derivation is included in the morphology taken into account its relationship with the vocabulary and semantics, so we need to pay attention to developments in this semantic phenomenon, word formation morpheme based on certain values. There are many features in the grammar of the German language. But the system of word formation of the German language is very similar to the system of word formation in other languages. This paper analyzed all these studies in a comparative form and drew attention to some controversial results of them. Based on the whole system of word formation, it should be noted that in German the main role in word formation is played by composing, which remains the leading way of word formation in the German language at the moment.

Prefixation is the same, as well as composing, a very ancient, but productive way of word formation. It should be noted that almost all available prefixes are produced in modern German. Suffixation can also be classified as productive way of word formation in modern German. However, despite a number of features that combine the suffix and half-suffix, one can speak of visible advantages in word formation of half suffixes before suffixes.

First, many variants of extended suffixes are unproductive, for example -aner, -aster, -iener, -eiser, -ianer, -iter, -ner, -ser, etc. Secondly, semi-suffixes that retain a part of the deep semantic structure (biological gender, etc.), are more informative, and the main purpose of communication is the transfer of information. Unlike suffixation, word formation by changing the root as an independent way is unproductive in modern language, although words formed in this way are very numerous.

Welcome

In my bachelor thesis I trained German word embeddings with gensim’s word2vec library and evaluated them with generated test sets. This page offers an overview about the project and download links for scripts, source and evaluation files. The whole project is licensed under MIT license.

Training and Evaluation

I found the following parameter configuration to be optimal to train german language models with word2vec:

  • a corpus as big as possible (and as diverse as possible without being informal)
  • filtering of punctuation and stopwords
  • forming bigramm tokens
  • using skip-gram as training algorithm with hierarchical softmax
  • window size between 5 and 10
  • dimensionality of feature vectors of 300 or more
  • using negative sampling with 10 samples
  • ignoring all words with total frequency lower than 50

The following table shows some training stats for training a model with the above specification:

training time 6,16 h
training speed 26626 words/s
vocab size 608.130 words
corpus size 651.219.519 words
model size 720 MB

To train this model, you can take the following snippets after downloading this toolkit and navigating to its directory, where the preprocessing.py and the training.py script are used.

Make working directories:

mkdir corpus
mkdir model

Build news corpus:

wget http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.de.shuffled.gz
gzip -d news.2013.de.shuffled.gz
python preprocessing.py news.2013.de.shuffled corpus/news.2013.de.shuffled.corpus -psub
rm news.2013.de.shuffled.gz

Build wikipedia corpus:

wget http://download.wikimedia.org/dewiki/latest/dewiki-latest-pages-articles.xml.bz2
wget http://medialab.di.unipi.it/Project/SemaWiki/Tools/WikiExtractor.py
python WikiExtractor.py -c -b 25M -o extracted dewiki-latest-pages-articles.xml.bz2
find extracted -name '*bz2' ! -exec bzip2 -k -c -d {} ; > dewiki.xml
printf "Number of articles: "
grep -o "<doc" dewiki.xml | wc -w
sed -i 's/<[^>]*>//g' dewiki.xml
rm -rf extracted
python preprocessing.py dewiki.xml corpus/dewiki.corpus -psub
rm dewiki.xml

Training:

python training.py corpus/ model/my.model -s 300 -w 5 -n 10 -m 50

Subsequently the evaluation.py script can be used to evaluate the trained model:

python evaluation.py model/my.model -u -t 10

Further examples and code explanation can be found in the following ipython notebooks:

  • Preprocessing
  • Training
  • Evaluation
  • Semantic arithmetic

    With basic vector arithmetic it’s possible to show the meaning of words that are representable by the model. Therefore the vectors are added or subtracted and with the help of the cosine similarity the vector(s) that are nearest to the result can be found. In the following, some interesting examples are shown:

    Frau + Kind = Mutter (0,831)
    Frau + Hochzeit = Ehefrau (0,795)

    A common family relationship: a woman with a child added is a mother. In word2vec terms: adding the vector of child to the vector of woman results in a vector which is closest to mother with a comparatively high cosine similarity of 0,831. In the same way a woman with a wedding results in a wife.

    Obama - USA + Russland = Putin (0,780)

    The model is able to find a leader to a given country. Here Obama without USA is the feature for a country leader. Adding this feature to Russia results in Putin. It’s also successful for other countries.

    Verwaltungsgebaeude + Buecher = Bibliothek (0,718)
    Verwaltungsgebaeude + Buergermeister = Rathaus (0,746)
    Haus + Filme = Kino (0,713)
    Haus + Filme + Popcorn = Kino (0,721)

    The relationship between a building and its function is found correctly. Here an administration building with books is logically the library and an administration building with a mayor is the city hall. Moreover a house with movies results in a cinema. Note that when adding popcorn to the equation, the resulting vector gets a little closer to the vector of the word cinema.

    Becken + Wasser = Schwimmbecken (0,790)
    Sand + Wasser = Schlamm (0,792)
    Meer + Sand = Strand (0,725)

    Some nice examples with water: sand and water result in mud, sea and sand result in beach and a basin with water is a pool.

    Planet + Wasser = Erde (0,717)
    Planet - Wasser = Pluto (0,385)

    The main feature of our planet is correctly represented by the model: a planet with water is the earth, while a planet without water is Pluto. That’s not quite accurate, because Pluto is made of water ice to one third…

    Kerze + Feuerzeug = brennende_Kerze (0,768)

    Here is quite a good example for a semantically correct guess of a bigram token: candle and lighter result in a burning_candle.

    The examples shown above are the results of a quick manual search for useful vector equations in the model. There are more amazing semantic relations for sure.

    Visualizing features with PCA

    The Principal Component Analysis is a method to reduce the number of dimensions of high-dimensional vectors, while keeping main features (= the principal components). Therefore the 300 dimensions of the vectors of my German language model were reduced to a two-dimensional representation and plotted with pythons matplotlib for some word classes.

    PCA: Capital of a country

    Countries and capitals are grouped correctly. The connecting lines are approximately parallel (except the one for Sweden maybe…) and of the same length. So the model understands the concept of capitals and countries.
    PCA: Currency of a country
    Countries and their currencies are also grouped correctly. As well as the capitals, the concept of currencies of countries is well understood by the model. Note: british_pounds is here more accurate then just pounds because of multiple meanings of the word. Same with US-Dollar and Dollar.
    PCA: Language of a country
    Finally another great example of grouped features with languages of countries.

    The plots above are created with the visualize.py script of this project. Some further examples and code explanation can be found in the PCA ipython notebook.

    Download

    Model

    The German language model, trained with word2vec on the German Wikipedia (15th May 2015) and German news articles (15th May 2015):
    german.model [704 MB]

    Syntactic Questions

    10k questions with German umlauts:
    syntactic.questions
    The same 10k questions with transformed German umlauts:
    syntactic.questions.nouml

    Evaluation source files:
    adjectives.txt
    nouns.txt
    verbs.txt

    Semantic Questions

    300 opposite questions with German umlauts:
    semantic_op.questions
    The same 300 opposite questions with transformed German umlauts:
    semantic_op.questions.nouml

    540 best match questions with German umlauts:
    semantic_bm.questions
    The same 540 best match questions with transformed German umlauts:
    semantic_bm.questions.nouml

    110 doesn’t fit questions with German umlauts:
    semantic_df.questions
    The same 110 doesn’t fit questions with transformed German umlauts:
    semantic_df.questions.nouml

    Evaluation source files:
    opposite.txt
    bestmatch.txt
    doesntfit.txt

    Понравилась статья? Поделить с друзьями:
  • German word for learning
  • German word for language in english
  • German word for know it all
  • German word for job
  • German word for i want you