I have a txt file with several thousand words. However, when I try and pull them into a dictionary
variable in python 3.7 some words do not appear.
The dictionary file is here
For example:
dictionary = {}
with open("en-dict.txt", "r", encoding="utf8") as file:
for line in file:
line = file.readline().strip()
dictionary[line] = 0
if "eat" in dictionary:
print("Yes")
Why is this happening?
Thanks
asked Jun 18, 2019 at 20:06
EMLEML
3756 silver badges15 bronze badges
3
try with this code:
dictionary = {}
with open("en-dict.txt", "r", encoding="utf8") as file:
for line in file:
dictionary[line.split("n")[0]] = 0
print(dictionary)
if "eat" in dictionary:
print("Yes")
answered Jun 18, 2019 at 20:19
GhassenGhassen
7716 silver badges14 bronze badges
2
At first: This is a really nice question for an interview. You can show off your knowledge of data structures and algorithms here, but you can also demonstrate how familiar you are with your programming language and concepts like test driven development for example.
So, what went wrong here? At the first sight, your solution looks «ok». If you look a little closer, there are some issues:
Data Structures
You were pretty spot on when choosing a tree here. You could have done a little better by calling it with its common name, Trie. That comes from retrieval and is described in detail over here.
This data structure in its simplest form (we don’t need any more for solving your problem) consists of trie nodes that have 1.) children where each child is assigned a character and 2.) a marker if this node is the last character of a word (so that we can distinguish a prefix of a word and a complete word).
You tried printing a graph as an example, which unfortunately got lost in formatting. Here is one from Wikipedia:
In your solution, you have used two HashSets (of TreeNodes and chars). I don’t understand what the Index set is for, as you didn’t include your code for adding nodes. The bigger problem I see here is the type HashSet itself: They describe unordered lists of stuff («sets»), and you are trying to access them using an indexer (e.g. root.Nodes[str[currentIndex]]
). This will not only give unpredicted results, it won’t even compile. There is no indexer on HashSet.
What you actually want to use here is a Dictionary. Looking at the graph above, we seem to be navigating along using characters. That means, in every node, having one character we can determine the next node to go to. A dictionary can do that for us.
The whole thing
Rather than trying to come up with some working code during the interview, I would have talked about the architectural and algorithmic features of the to-be-built solution. I guess, if you had written out some unit tests that specify the behavior of the system it would have come out better than writing code that obviously won’t even compile.
Let’s see how we can easily build the whole thing using TDD (Test Driven Development), writing tests first and then bringing them from red to green by developing and refactoring our Trie.
Please create an empty Unit Test Project and create a new file in there. Add the NuGet package «FluentAssertions». You can paste each method stepwise to see how the solution matures. If there is a [TestMethod]
attribute above, add it to the TrieTests
class. If there is no attribute, add or update the method in TrieNode
.
We start with:
[TestClass]
public class TrieTests
{
private readonly TrieNode trie = new TrieNode();
[TestMethod]
public void AddingDoesntThrow()
{
this.trie.Add("word");
}
}
public class TrieNode
{
public void Add(string blah)
{
throw new NotImplementedException();
}
}
Ok. We need to add stuff to the trie, so we have an Add. Test will be red. Update Add method:
public void Add(string blah)
{
}
Bam, test is green now. Next one! (Don’t do more that you need to make your test green. Deleting the NotImplementedException is enough for now.)
[TestMethod]
public void FindEmptyString()
{
this.trie.Add("");
this.trie.Find("").Should().BeTrue();
}
public bool Find(string query)
{
throw new NotImplementedException();
}
Test is red.
public bool Find(string query)
{
return true;
}
Test is green! But wait. What if there is nothing in the trie?
[TestMethod]
public void FindInEmptyTrieReturnsFalse()
{
this.trie.Find("").Should().BeFalse();
}
Test is red, because we have no logic at all. Let’s add the isWord
boolean field and use it:
public class TrieNode
{
private bool isWord;
public void Add(string s)
{
if (s.Length == 0)
{
this.isWord = true;
}
}
public bool Find(string query)
{
if (query.Length == 0)
{
return this.isWord;
}
return false;
}
}
Okay, our trie works with empty strings! What happens if somebody passes null
?
[TestMethod]
public void FindWithNullQueryThrows()
{
Action find = () => this.trie.Find(null);
find.ShouldThrow<ArgumentNullException>();
}
Test is red. We need to check the argument and throw an exception:
public bool Find(string query)
{
if (query == null)
{
throw new ArgumentNullException("query");
}
if (query.Length == 0)
{
return this.isWord;
}
return false;
}
Let’s add support for real words now:
[TestMethod]
public void FindWordReturnsTrue()
{
this.trie.Add("word");
this.trie.Find("word").Should().BeTrue();
}
This is the tricky part. To make this work, we have to add the aforementioned dictionary of child nodes. Adding «word» creates child nodes for each character; finding «word» checks if the next child node exists and recurses down the tree: A trie node finds «word» if it has a child node «w» and this child node finds «ord».
public class TrieNode
{
private readonly Dictionary<char, TrieNode> children
= new Dictionary<char, TrieNode>();
private bool isWord;
public void Add(string s)
{
if (s.Length == 0)
{
this.isWord = true;
return;
}
var c = s[0];
var child = GetOrCreateChild(c);
child.Add(s.Substring(1));
}
private TrieNode GetOrCreateChild(char c)
{
TrieNode child;
if (!children.TryGetValue(c, out child))
{
child = new TrieNode();
children.Add(c, child);
}
return child;
}
public bool Find(string query)
{
if (query == null)
{
throw new ArgumentNullException("query");
}
if (query.Length == 0)
{
return this.isWord;
}
var c = query[0];
var child = this.GetChild(c);
return child != null && child.Find(s.Substring(1));
}
private TrieNode GetChild(char c)
{
TrieNode child;
return this.children.TryGetValue(c, out child) ? child : null;
}
}
We can even add further tests to check if everything works as expected:
[TestMethod]
public void FindLongerWordReturnsTrue()
{
this.trie.Add("word");
this.trie.Add("wording");
this.trie.Find("wording").Should().BeTrue();
}
[TestMethod]
public void FindPrefixReturnsFalse()
{
this.trie.Add("word");
this.trie.Find("wor").Should().BeFalse();
}
Both green already! Now we still have something to do: We need the wildcard support.
[TestMethod]
public void FindWithWildcardReturnsTrue()
{
this.trie.Add("word");
this.trie.Find("w??d").Should().BeTrue();
}
Instead of looking for one child node, a «?» means looking in all child nodes. We can generalize that: GetChild(char) gets renamed to GetChildren(char) and returns an IEnumerable:
public class TrieNode
{
private readonly Dictionary<char, TrieNode> children
= new Dictionary<char, TrieNode>();
private bool isWord;
public void Add(string s)
{
if (s.Length == 0)
{
this.isWord = true;
return;
}
var c = s[0];
var child = GetOrCreateChild(c);
child.Add(s.Substring(1));
}
private TrieNode GetOrCreateChild(char c)
{
TrieNode child;
if (!children.TryGetValue(c, out child))
{
child = new TrieNode();
children.Add(c, child);
}
return child;
}
public bool Find(string query)
{
if (query == null)
{
throw new ArgumentNullException("query");
}
if (query.Length == 0)
{
return this.isWord;
}
var c = query[0];
var child = this.GetChildren(c);
return child.Any(x => x.Find(query.Substring(1)));
}
private IEnumerable<TrieNode> GetChildren(char c)
{
if (c == '?')
return this.children.Values;
TrieNode child;
return this.children.TryGetValue(c, out child)
? new[] { child }
: Enumerable.Empty<TrieNode>();
}
}
You could add additional features like a collection of values for each trie node or prefix search, but I guess this simple implementation would have solved your question.
A. Listen, CD (22), and read the texts (1 − 4). Match them with the titles (a − d).
a) We Don’t Use Them All
b) Great−grandfather Languages
c) Very Helpful Books
d) How It All Started
1. We do not really know how languages began. Some believe that languages began when prehistoric people tried to imitate birds and animals. Some think that the first words were the natural sounds that people made when they felt happy or were in shock.
2. What we know about the history of languages is that many of them come from one ancient language. Linguists say that this and all the languages that come from it are a “family” of languages. English and Russian belong to the Indo−European family of languages. French, Italian, German, Norwegian also belong to it.
3. How many words must a language have? For example, there are more than 450,000 words in Webster’s New International Dictionary. Nobody knows all of them, but most people are able to understand about 35,000 and use from 10,000 to 12,000.
4. If you hear or read a new word and want to know what it means, you try to find this word in a dictionary. Modern dictionaries are very different from old ones. Most of them give words alphabetically. Together with the words they give information about how they sound and what meanings they have. A dictionary may also give you information about how to use the word grammatically and gives examples. There are monolingual [ˌmɒnəʊˈlɪŋɡwəl], bilingual [baɪˈlɪŋɡwəl] and multilingual dictionaries. They give information about words in one (mono−), two (bi−) or more than two (multi−) languages.
There are also different kinds of online dictionaries which you may download from the Internet. They can be really very useful.
B. What do the words in italics mean?
reshalka.com
ГДЗ Английский язык 7 класс (часть 1) Афанасьева. UNIT 2. Step 2. Номер №6
Решение
Перевод задания
A. Послушай Аудио (22) и прочитай тексты (1 − 4). Сопоставь их с названиями (a − d).
а) Мы не используем их все
b) Языки прадедов
c) Очень полезные книги
d) Как все начиналось
1. Мы действительно не знаем, как начались языки. Некоторые считают, что языки зародились, когда доисторические люди пытались подражать птицам и животным. Некоторые считают, что первыми словами были естественные звуки, издаваемые людьми, когда они чувствовали себя счастливыми или были в шоке.
2. Что мы знаем об истории языков, так это то, что многие из них происходят от одного древнего языка. Лингвисты говорят, что этот и все языки, которые из него происходят, являются «семейством» языков. Английский и русский принадлежат к индоевропейской семье языков. К ней также относятся французский, итальянский, немецкий, норвежский языки.
3. Сколько слов должно быть в языке? Например, в Новом международном словаре Вебстера более 450 000 слов. Никто не знает их всех, но большинство людей способны понять около 35 000 и использовать от 10 000 до 12 000.
4. Если вы слышите или читаете новое слово и хотите знать, что оно означает, вы пытаетесь найти его в словаре. Современные словари сильно отличаются от старых. Большинство из них дают слова в алфавитном порядке. Вместе со словами они дают информацию о том, как они звучат и какое значение они имеют. Словарь может также дать вам информацию о том, как использовать слово грамматически и приводит примеры. Существуют одноязычные, двуязычные и многоязычные словари. Они дают информацию о словах на одном (моно−), двух (би) или более, чем двух (мульти), языках.
Существуют также различные виды онлайн−словарей, которые вы можете скачать из Интернета. Они могут быть действительно очень полезными.
B. Что означают слова, выделенные курсивом?
ОТВЕТ
A.
1 – d, 2 – b, 3 – a, 4 – c.
B.
Indo−European − индоевропейская
imitate – подражать, имитировать
prehistoric − доисторический
alphabetically – в алфавитном порядке
grammatically – грамматически
bilingual – двуязычные
multilingual − многоязычные
monolingual – одноязычные
- Forum
- Beginners
- Finding words in a dictionary file
Finding words in a dictionary file
Ask the user for the name of a file (dictionary) — «I don’t need to do this as I am using a Mac.
You may assume that the given file contains only words, and that there are at most 1000 words in the file (there may be fewer)
Read all the words from the file, and store them in an array of type string.
Now ask the user to enter a word that you will search for in the array. If you find it, print the location at which it was found. Otherwise, print a message indicating that the word was not found.
You should consider the first word to be in position 1.
|
|
So this is the code I have at this point. No matter what it prints out that the word was not found. The dictionary file I’m using has the words:
apple
bear
cat
dog
egg
file
google
hello
iphone
jeep
Now, if I take one of the ‘=’ from the ‘found == true’ out, then it just acts as a counter. No matter what I enter it says it’s found and just adds one to the count. I’m not sure what’s wrong with this code, and my instructor believes it might be something with the fact that several others and I are using Mac’s. Just looking for some help and other opinions on what we could all do to make our code work? Thanks a million!
I think the mistake is at line 57.
|
|
You assign b to a after the break statement, meaning the assignment will never even be executed, since the break statement jumps out of the loop already. Try moving the assignment up one place, like this:
|
|
That way the assignment will be executed and it will give you the requested index.
You may want to print out your dictionary after you read the file, to insure you’re reading the file correctly. You may also want to consider using a std::vector instead of the array.
Edit: I didn’t test your file reading code but this seems to work properly:
|
|
Last edited on
@jlb
Didn’t you see this?
store them in an array of type string.
In a normal program a vector would be the best option, but in assignments it’s better to follow the instructions.
Yes, but what’s your point? I skipped over half of the assignment so using the vector is a minor issue. My code was meant only to show that the OP’s logic in the remaining code is sound. Even without moving that assignment you pointed out the find logic works, the «position» reporting was incorrect but the rest of the logic works.
I edited the break statement but still no luck. My instructor just sent me this in an email as to what the problem could be. Any ideas guys?
1. Try #include <string> and test your original code first,
if it does not work,
2. Try to use the compare function
Your code will be something like:
if(s.compare(dictionary[i]))
…….
assume s is the target string and dictionary is your dictionary array
I of course tried the #include <string> but to no avail. And I’m not exactly sure what the compare fuction is or how to implement it in my code..
Yes you should be #including the string header.
I doubt that the compare() member function will help, but you replace the operator== with the compare function.
You really need to check that you actually read the file correctly. As I stated in my previous post your «find» logic seems to be correct.
Does the dictionary file you use come from a Windows system? If so it might use rn as newline. A Mac system usually uses only a n, meaning the r isn’t marked as part of the newline. So every string you read from your dictionary file will contain a trailing r, which would cause the == operator to fail.
Windows systems automatically convert rn to n when using standard library functions, while Mac systems will not. This might be causing the issue.
Last edited on
Yes it is a windows file, and I’ve made sure my code is reading the file correctly. If I add in a statement to print out the file contents, it does with no problem.
@Shadowwolf Since the file we were to be using came from a windows system, how exactly would I alter my code to make it work? Or would it be easier for me to just make a file on my Mac with the words and use that as my input file?
It would probably be easier to make the file on your Mac, but if you want to do it in the program you could check if the input ends with a carriage return, and if so, delete it. You could use something like this:
|
|
Optionally, if you have access to a C++11 compiler, you could use dictionary[i].pop_back();
instead of the dictionary[i] = dictionary[i].substr(0,length-1);
line.
One of the reasons I said to verify your file is because of the differences with the line endings. The easiest way to tell if it is actually a line ending problem would be to print the string along with the length of the string. If the string appears to have one more character then it probably is a non-printing character in the stream. If your instructor is using a Windows machine he will not have this issue. If this is the case it would probably be easier to fix the line ending issue before you parse it with your program. Many text editors have the ability to fix this issue so you can first try that. If not look for programs with the names of dos2unix or possibly follow one of the methods in this link:
Yes it is a windows file, and I’ve made sure my code is reading the file correctly. If I add in a statement to print out the file contents, it does with no problem.
http://www.microhowto.info/howto/convert_the_line_endings_in_a_text_file_from_dos_to_unix_format.html
Topic archived. No new replies allowed.
1. funny — synonyms: amusing, humorous, witty, comic, droll, facetious — смешные — синонимы: забавные, с чувством юмора, остроумные, комические, озорные, шутливые
2. timid — synonyms: apprehensive, fearful, easily frightened, afraid, faint-hearted, timorous, nervous — робкие — синонимы: опасающийся, пугливые, легко пугается, боится, малодушный, робкий, нервный
3. intelligent — synonyms: clever, bright, brilliant, quick-witted, quick on the uptake, smart, canny, astute — умные — синонимы: умные, яркие, блестящие, сообразительные, быстро соображает, умный, хитрый, проницательный
4. kind — synonyms: good, nice, gentle, gracious — добрый — синонимы: хорошие, красивые, нежные, милостивые
5. nice — synonyms: enjoyable, pleasant, agreeable, good, satisfying, gratifying, delightful, marvelous — красивый — синонимы: приятные, приятные, приятные, хорошие, удовлетворяющие, приятно, восхитительно, изумительные
6. rude — synonyms: ill-mannered, bad-mannered, impolite, discourteous, uncivil, ill-behaved, unmannerly — грубые — синонимы: невоспитанный, невоспитанный, невежливый, невоспитанный, невежливый, плохо вел себя, невоспитанный