Are you completely sure you need to consider that's
as two words? (viz. that is
)
Ordinarily, I believe that's
is counted as one word in English.
But if your perspective on the requirements is correct, you have a (moderately) difficult problem: I don’t think there is any (reasonable) regex that can distinguish between something like that's
(contraction of that
and is
) and something like steve's
(possessive).
AFAIK you will have to write something yourself.
Suggestion: take a look at this list of English language contractions. You could use it to make an enumeration of the things you need to handle in a special way.
Basic Example
enum Contraction {
AINT("ain't", "is not"),
ARENT("aren't", "are not"),
// Many, many in between...
YOUVE("you've", "you have");
private final String oneWord;
private final String twoWords;
private Contraction(String oneWord, String twoWords) {
this.oneWord = oneWord;
this.twoWords = twoWords;
}
public String getOneWord() {
return oneWord;
}
public String getTwoWords() {
return twoWords;
}
}
String s = "That's a good question".toLowerCase();
for (Contraction c : Contraction.values()) {
s = s.replaceAll(c.getOneWord(), c.getTwoWords())
}
String[] words = s.split("\s+");
// And so forth...
NOTE: This example handles case sensitivity by converting the entire input to lower case, so the elements in the enum
will match. If that doesn’t work for you, you may need to handle it in another way.
I’m not clear on what you need to do with the words once you have them, so I left that part out.
I need to write a program that rads the text of a text file and returns the amount of times a specific word shows up. I cant figure out how to write a program that reads the text file word by word, ive only managed to write a file that reads line by line. How would i make it so it reads word by word?
import java.io.*;
import java.util.Scanner;
public class main
{
public static void main (String [] args)
{
Scanner scan = new Scanner(System.in);
System.out.println("Name of file: ");
String filename= scan.nextLine();
int count =0;
try
{
FileReader file = new FileReader(filename);
BufferedReader reader = new BufferedReader(file);
String a = "";
int linecount = 0;
String line;
System.out.println("What word are you looking for: ");
String a1 = scan.nextLine();
while((line = reader.readLine()) != null)
{
linecount++;
if(line.equalsIgnoreCase("that"));
count++;
}
reader.close();
}
catch(IOException e)
{
System.out.println("File Not found");
}
System.out.println("the word that appears " + count + " many times");
}
}
posted 14 years ago
-
Number of slices to send:
Optional ‘thank-you’ note:
Hi A beginner question.
I have a text file not in text format (.txt) but it does contain text and numbers.
I would like to know How to read a file line by line and store each word or number into an arraylist, then output them on a new file?
e.g. my text file call ( colorsANDnumbers.data )
Red 2 Blue 3 Yellow 4 Green 5
2 Red 3 Blue 4 Yellow 5 Green
Is that possible to be done with just one arraylist?
regards
Gaz
posted 14 years ago
-
Number of slices to send:
Optional ‘thank-you’ note:
[edit]Add code tags. CR[/edit]
Marshal
Posts: 77646
posted 14 years ago
-
Number of slices to send:
Optional ‘thank-you’ note:
Please use the CODE button; I have edited that post so you can see how much better it looks.
Please don’t simply give out code like that. Since it is pretty standard code, which could have been copied from the Java Tutorials, I think I shall let it stand. But (look at the Beginners’ Forum contents page), where we explain that people learn a lot better if they work out things for themselves.
It doesn’t actually work in its present condition, and I can see a potentially serious error, which I shall let you find for yourself . I shall also leave you to work out what people would do in Java5 or Java6.
*************************************************************************************************
Yes, you can put those entries into a single List<String>, but is that really appropriate? I suggest you go through the different interfaces in the Collections Framework and you might find something more appropriate for keeping colours and numbers.
Gary kwlai
Greenhorn
Posts: 12
posted 14 years ago
-
Number of slices to send:
Optional ‘thank-you’ note:
Impressive
I have few things not quite understand from the code, what does line 21 and 34 actually doing??, because I have not cover WInputStreamReader and Iterator yet.
Also almost every codes thesedays has Try and Catch in them… are those required? does it prevent the program from crashing or halt when there is an error?
regards
Gaz
Bijj shar
Greenhorn
Posts: 13
posted 14 years ago
-
Number of slices to send:
Optional ‘thank-you’ note:
Ritchie-
Thanks for letting me know to use Code Button. What error you are seeing in present condition please explain and user has asked about read and write data in file and he is reading data from existing file why you are giving him suggestion out of box.
posted 14 years ago
-
Number of slices to send:
Optional ‘thank-you’ note:
Gary Lai wrote:Also almost every codes thesedays has Try and Catch in them… are those required? does it prevent the program from crashing or halt when there is an error?
regards
If the API throws any kind of exception that inherets from java.lang.Exception the compiler will force you to surround the code with a try/catch block. This allows you to catch any exceptions that are thrown and deal with them. Some API’s throw RuntimeExceptions which don’t require try/catch blocks but if they throw an exception, the application will just die.
Campbell Ritchie
Marshal
Posts: 77646
posted 14 years ago
-
Number of slices to send:
Optional ‘thank-you’ note:
You are using the wrong classes for reading; you ought to use FileReader and BufferedReader because it is a text file. DataInputStreams are not designed for text files.
You are opening several Readers; I may be mistaken, but are you actually closing them? If you leave the Reader open, you may suffer a memory leak. That was what worried me. Anyway, when I tried your code, I couldn’t get it to work; I got what appears to be a FileNotFoundException.
I would simply use the Scanner and Formatter classes for text files; they are much easier to use. Since they «consume» their Exceptions, you can get away without the try-catch.
posted 11 years ago
-
1
-
Number of slices to send:
Optional ‘thank-you’ note:
Hi this is not reading word by word. This is how it’s done:
Scanner input = new Scanner(new File(«liron.txt»));
while(input.hasNext()) {
String word = input.next();
}
lowercase baba
Posts: 13086
posted 11 years ago
-
Number of slices to send:
Optional ‘thank-you’ note:
Liron Meir wrote:Hi this is not reading word by word. This is how it’s done:
Given that the question, and the last reply, was almost three years ago, i doubt the original poster is still waiting for an answer, or is terribly worried about it anymore.
There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
Liron Meir
Greenhorn
Posts: 2
posted 11 years ago
-
1
-
Number of slices to send:
Optional ‘thank-you’ note:
Yes, but if someone is looking for a solution to read word by word, this is not it.
Campbell Ritchie
Marshal
Posts: 77646
posted 11 years ago
-
Number of slices to send:
Optional ‘thank-you’ note:
Welcome to the Ranch
That is what I was hinting at when I mentioned Scanner. We prefer not to give the full solution and it says the following on this forum’s title page:
We’re all here to learn, so when responding to others, please focus on helping them discover their own solutions, instead of simply providing answers.
Java split String by words example shows how to split string into words in Java. The example also shows how to break string sentences into words using the split method.
The simplest way to split the string by words is by the space character as shown in the below example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
package com.javacodeexamples.stringexamples; import java.util.Arrays; public class StringSplitByWords { public static void main(String[] args) { String sentence = «Java String split by words from sentence»; //get words from sentence String[] words = splitSentenceByWords(sentence); //print words System.out.println(Arrays.toString(words)); } private static String[] splitSentenceByWords(String str){ //if string is empty or null, return empty array if(str == null || str.equals(«»)) return new String[0]; String[] words = str.split(» «); return words; } } |
Output
[Java, String, split, by, words, from, sentence] |
As you can see from the output, it worked for the test sentence string. The sentence is broken down into words by splitting it using space.
Let’s try some other not-so-simple sentences.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
package com.javacodeexamples.stringexamples; import java.util.Arrays; public class StringSplitByWords { public static void main(String[] args) { String[] sentences = { «string with lot of spaces», «Hello, can I help you?», «Java is a ‘programming’ language.», «this is user-generated content» }; for (String sentence : sentences){ //get words from sentence String[] words = splitSentenceByWords(sentence); //print words System.out.println(Arrays.toString(words)); } } private static String[] splitSentenceByWords(String str){ //if string is empty or null, return empty array if(str == null || str.equals(«»)) return new String[0]; String[] words = str.split(» «); return words; } } |
Output
[string, , with, , , , lot, of, , , spaces] [Hello,, can, I, help, you?] [Java, is, a, ‘programming’, language.] [this, is, user-generated, content] |
As you can see from the output, our code did not work as expected. The reason being is simple split by space is not enough to separate words from a string. Sentences may be separated by punctuation marks like dot, comma, question marks, etc.
In order to make the code handle all these punctuation and symbols, we will change our regular expression pattern from only space to all the punctuation marks and symbols as given below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
package com.javacodeexamples.stringexamples; import java.util.Arrays; public class StringSplitByWords { public static void main(String[] args) { String[] sentences = { «string with lot of spaces», «Hello, can I help you?», «Java is a ‘programming’ language.», «this is [user-generated] content» }; for (String sentence : sentences){ //get words from sentence String[] words = splitSentenceByWords(sentence); //print words System.out.println(Arrays.toString(words)); } } private static String[] splitSentenceByWords(String str){ //if string is empty or null, return empty array if(str == null || str.equals(«»)) return new String[0]; String[] words = str.split(«[ !»\#$%&'()*+,-./:;<=>[email protected]\[\]^_`{|}~]+»); return words; } } |
Output
[string, with, lot, of, spaces] [Hello, can, I, help, you] [Java, is, a, programming, language] [this, is, user, generated, content] |
This time we got the output as we wanted. The regex pattern [ !"\#$%&'()*+,-./:;<=>[email protected]\[\]^_`{|}~]+
includes almost all the punctuation and symbols that can be used in a sentence including space. We applied + at the end to match one or more instances of these to make sure that we do not get any empty words.
Instead of this pattern, you can also use \P{L}
pattern to extract words from the sentence, where \P
denotes POSIX expression and L
denotes character class for word characters. You need to change the line with the split
method as given below.
String[] words = str.split(«\P{L}+»); |
Please note that \P{L}
expression works for both ASCII and non-ASCII characters (i.e. accented characters like “café” or “kākā”).
This example is a part of the Java String tutorial with examples and the Java RegEx tutorial with examples.
Please let me know your views in the comments section below.
About the author
- Author
- Recent Posts
Rahim
I have a master’s degree in computer science and over 18 years of experience designing and developing Java applications. I have worked with many fortune 500 companies as an eCommerce Architect. Follow me on LinkedIn and Facebook.
Following Java example program used to search for the given word in the file.
Step 1: Iterate the word array.
Step 2: Create an object to FileReader and BufferedReader.
Step 3: Set the word wanted to search in the file. For example,
String input=”Java”;
Step 4: Read the content of the file, using the following while loop
while((s=br.readLine())!=null)
Step 5: Using equals() method the file words are compared with the given word and the count is added.
Step 6: The count shows the word occurrence or not in the file.
FileWordSearch.java
package File;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public class FileWordSearch
{
public static void main(String[] args) throws IOException
{
File f1=new File("input.txt"); //Creation of File Descriptor for input file
String[] words=null; //Intialize the word Array
FileReader fr = new FileReader(f1); //Creation of File Reader object
BufferedReader br = new BufferedReader(fr); //Creation of BufferedReader object
String s;
String input="Java"; // Input word to be searched
int count=0; //Intialize the word to zero
while((s=br.readLine())!=null) //Reading Content from the file
{
words=s.split(" "); //Split the word using space
for (String word : words)
{
if (word.equals(input)) //Search for the given word
{
count++; //If Present increase the count by one
}
}
}
if(count!=0) //Check for count not equal to zero
{
System.out.println("The given word is present for "+count+ " Times in the file");
}
else
{
System.out.println("The given word is not present in the file");
}
fr.close();
}
}
Output:
The given word is present for 2 Times in the file
There are various methods in Java using which you can parse for words in a string for a specific word. Here we are going to discuss 3 of them.
The contains() method
The contains() method of the String class accepts a sequence of characters value and verifies whether it exists in the current String. If found it returns true else, it returns false.
Example
Live Demo
import java.util.StringTokenizer; import java.util.regex.Pattern; public class ParsingForSpecificWord { public static void main(String args[]) { String str1 = "Hello how are you, welcome to Tutorialspoint"; String str2 = "Tutorialspoint"; if (str1.contains(str2)){ System.out.println("Search successful"); } else { System.out.println("Search not successful"); } } }
Output
Search successful
The indexOf() method
The indexOf() method of the String class accepts a string value and finds the (starting) index of it in the current String and returns it. This method returns -1 if it doesn’t find the given string in the current one.
Example
Live Demo
public class ParsingForSpecificWord { public static void main(String args[]) { String str1 = "Hello how are you, welcome to Tutorialspoint"; String str2 = "Tutorialspoint"; int index = str1.indexOf(str2); if (index>0){ System.out.println("Search successful"); System.out.println("Index of the word is: "+index); } else { System.out.println("Search not successful"); } } }
Output
Search successful Index of the word is: 30
The StringTokenizer class
Using the StringTokenizer class, you can divide a String into smaller tokens based on a delimiter and traverse through them. Following example tokenizes all the words in the source string and compares each word of it with the given word using the equals() method.
Example
Live Demo
import java.util.StringTokenizer; public class ParsingForSpecificWord { public static void main(String args[]) { String str1 = "Hello how are you welcome to Tutorialspoint"; String str2 = "Tutorialspoint"; //Instantiating the StringTookenizer class StringTokenizer tokenizer = new StringTokenizer(str1," "); int flag = 0; while (tokenizer.hasMoreElements()) { String token = tokenizer.nextToken(); if (token.equals(str2)){ flag = 1; } else { flag = 0; } } if(flag==1) System.out.println("Search successful"); else System.out.println("Search not successful"); } }
Output
Search successful
Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article
Given a string, extract words from it. “Words” are defined as contiguous strings of alphabetic characters i.e. any upper or lower case characters a-z or A-Z.
Examples:
Input : Funny?? are not you? Output : Funny are not you Input : Geeks for geeks?? Output : Geeks for geeks
Recommended: Please try your approach on {IDE} first, before moving on to the solution.
We have discussed a solution for C++ in this post : Program to extract words from a given String
We have also discussed basic approach for java in these posts : Counting number of lines, words, characters and paragraphs in a text file using Java and Print first letter in word using Regex.
In this post, we will discuss Regular Expression approach for doing the same. This approach is best in terms of Time Complexity and is also used for large input files. Below is the regular expression for any word.
[a-zA-Z]+
import
java.util.regex.Matcher;
import
java.util.regex.Pattern;
public
class
Test
{
public
static
void
main(String[] args)
{
String s1 =
"Geeks for Geeks"
;
String s2 =
"A Computer Science Portal for Geeks"
;
Pattern p = Pattern.compile(
"[a-zA-Z]+"
);
Matcher m1 = p.matcher(s1);
Matcher m2 = p.matcher(s2);
System.out.println(
"Words from string ""
+ s1 +
"" : "
);
while
(m1.find()) {
System.out.println(m1.group());
}
System.out.println(
"Words from string ""
+ s2 +
"" : "
);
while
(m2.find()) {
System.out.println(m2.group());
}
}
}
Output:
Words from string "Geeks for Geeks" : Geeks for Geeks Words from string "A Computer Science Portal for Geeks" : A Computer Science Portal for Geeks
This article is contributed by Gaurav Miglani. If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.
Like Article
Save Article
Using a Stack and StringTokenizer seems to be a bit of an overkill here. A more simplified version could be written like this:
public static String reverse(final String input) {
Objects.requireNonNull(input);
final StringBuilder stringBuilder = new StringBuilder();
for (final String part : input.split("\s+")) {
if (!part.isEmpty()) {
if (stringBuilder.length() > 0) {
stringBuilder.insert(0, " ");
}
stringBuilder.insert(0, part);
}
}
return stringBuilder.toString();
}
This uses the ability of split()
to take a String apart based on a given Regex. So the string is split based on a sequence of one or more white-spaces, which matches your initial requirement. Note that split will return an empty array in case input is an empty string.
The for-each loop then works through the array and inserts the part at the beginning of the stringBuilder (except for the first time), which will effectively reverse the array. Most people would probably use a reverse for(i...)
loop here, but because this is a code review I try to be extra correct, and this is the safer version: You cannot cause ArrayIndexOutOfBoundsExceptions with for-each.
Any leading/trailing white-space will cause empty parts to appear in the splitted list, so there is a check for it in the loop. I use isEmpty()
here, as this is in general safer (as in: less chance to do typos) than length() > 0
and benefits from potential, internal optimizations of String in terms of execution speed. You could call trim()
beforehand, but this would cause some unnecessary String operations and creations, so this version is more efficient.
Please also note the check for null
at the beginning. The code would otherwise throw when trying to split on null
, but it is considered good practice to check before-hand, as theoretically (not in this code but in other) you otherwise might leave the system in an undefined state.
I also make use of final
, which allows some compiler optimizations and prevents you from some basic coding mistakes, which is considered good practice.
Testing
It is always a good idea to do some quick testing of the functionality including the basic edge-cases, so here it is:
To check whether or not I do correct white-space removal, I change stringBuilder.insert(0, " ")
to stringBuilder.insert(0, "+")
for the test, to make the white-space visible.
When testing with System.out.println(reverse("the sky is blue"));
the result is:
blue+is+sky+the
The initial requirement is met.
When testing with System.out.println(reverse(" t the sky ist blue "));
the result is:
blue+is+sky+the
Leading, trailing and in-between white-spaces are correctly removed.
When testing with reverse("")
the result is an empty String as expected.
When testing with reverse(null)
a NullPointerException is thrown at the beginning of the function.