Java read from file word by word

Regarding your original problem: I assume your file looks like this:

5 a section of five words 3 three words
section 2 short section 7 this section contains a lot 
of words

And you want to get the output like this:

[a, section, of, five, words]
[three, words, section]
[short, section]
[this, section, contains, a, lot, of, words]

In general Stream API is badly suitable for such problems. Writing plain old loop looks a better solution here. If you still want to see Stream API based solution, I can suggest using my StreamEx library which contains headTail() method allowing you to easily write custom stream-transformation logic. Here’s how your problem could be solved using the headTail:

/* Transform Stream of words like 2, a, b, 3, c, d, e to
   Stream of lists like [a, b], [c, d, e] */
public static StreamEx<List<String>> records(StreamEx<String> input) {
    return input.headTail((count, tail) -> 
        makeRecord(tail, Integer.parseInt(count), new ArrayList<>()));
}

private static StreamEx<List<String>> makeRecord(StreamEx<String> input, int count, 
                                                 List<String> buf) {
    return input.headTail((head, tail) -> {
        buf.add(head);
        return buf.size() == count 
                ? records(tail).prepend(buf)
                : makeRecord(tail, count, buf);
    });
}

Usage example:

String s = "5 a section of five words 3 three wordsn"
        + "section 2 short section 7 this section contains a lotn"
        + "of words";
Reader reader = new StringReader(s);
Stream<List<String>> stream = records(StreamEx.ofLines(reader)
               .flatMap(Pattern.compile("\s+")::splitAsStream));
stream.forEach(System.out::println);

The result looks exactly as desired output above. Replace reader with your BufferedReader or FileReader to read from the input file. The stream of records is lazy: at most one record is preserved by the stream at a time and if you short-circuit, the rest of the input will not be read (well, of course the current file line will be read to the end). The solution, while looks recursive, does not eat stack or heap, so it works for huge files as well.


Explanation:

The headTail() method takes a two-argument lambda which is executed at most once during the outer stream terminal operation execution, when stream element is requested. The lambda receives the first stream element (head) and the stream which contains all other original elements (tail). The lambda should return a new stream which will be used instead of the original one. In records we have:

return input.headTail((count, tail) -> 
    makeRecord(tail, Integer.parseInt(count), new ArrayList<>()));

First element of the input is count: convert it to number, create empty ArrayList and call makeRecord for the tail. Here’s makeRecord helper method implementation:

return input.headTail((head, tail) -> {

First stream element is head, add it to the current buffer:

    buf.add(head);

Target buffer size is reached?

    return buf.size() == count 

If yes, call the records for the tail again (process the next record, if any) and prepend the resulting stream with single element: current buffer.

            ? records(tail).prepend(buf)

Otherwise, call myself for the tail (to add more elements to the buffer).

            : makeRecord(tail, count, buf);
});


posted 14 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Hi A beginner question.

I have a text file not in text format (.txt) but it does contain text and numbers.

I would like to know How to read a file line by line and store each word or number into an arraylist, then output them on a new file?

e.g. my text file call ( colorsANDnumbers.data )

Red 2 Blue 3 Yellow 4 Green 5

2 Red 3 Blue 4 Yellow 5 Green

Is that possible to be done with just one arraylist?

regards

Gaz


posted 14 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator


[edit]Add code tags. CR[/edit]

Marshal

Posts: 77646


posted 14 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Please use the CODE button; I have edited that post so you can see how much better it looks.

Please don’t simply give out code like that. Since it is pretty standard code, which could have been copied from the Java Tutorials, I think I shall let it stand. But (look at the Beginners’ Forum contents page), where we explain that people learn a lot better if they work out things for themselves.

It doesn’t actually work in its present condition, and I can see a potentially serious error, which I shall let you find for yourself . I shall also leave you to work out what people would do in Java5 or Java6.

*************************************************************************************************

Yes, you can put those entries into a single List<String>, but is that really appropriate? I suggest you go through the different interfaces in the Collections Framework and you might find something more appropriate for keeping colours and numbers.

Gary kwlai

Greenhorn

Posts: 12


posted 14 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Impressive

I have few things not quite understand from the code, what does line 21 and 34 actually doing??, because I have not cover WInputStreamReader and Iterator yet.

Also almost every codes thesedays has Try and Catch in them… are those required? does it prevent the program from crashing or halt when there is an error?

regards

Gaz

Bijj shar

Greenhorn

Posts: 13


posted 14 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Ritchie-

Thanks for letting me know to use Code Button. What error you are seeing in present condition please explain and user has asked about read and write data in file and he is reading data from existing file why you are giving him suggestion out of box.


posted 14 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Gary Lai wrote:Also almost every codes thesedays has Try and Catch in them… are those required? does it prevent the program from crashing or halt when there is an error?

regards

If the API throws any kind of exception that inherets from java.lang.Exception the compiler will force you to surround the code with a try/catch block. This allows you to catch any exceptions that are thrown and deal with them. Some API’s throw RuntimeExceptions which don’t require try/catch blocks but if they throw an exception, the application will just die.

Campbell Ritchie

Marshal

Posts: 77646


posted 14 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

You are using the wrong classes for reading; you ought to use FileReader and BufferedReader because it is a text file. DataInputStreams are not designed for text files.

You are opening several Readers; I may be mistaken, but are you actually closing them? If you leave the Reader open, you may suffer a memory leak. That was what worried me. Anyway, when I tried your code, I couldn’t get it to work; I got what appears to be a FileNotFoundException.

I would simply use the Scanner and Formatter classes for text files; they are much easier to use. Since they «consume» their Exceptions, you can get away without the try-catch.


posted 11 years ago


  • Likes 1
  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Hi this is not reading word by word. This is how it’s done:

Scanner input = new Scanner(new File(«liron.txt»));

while(input.hasNext()) {

String word = input.next();

}

lowercase baba

Posts: 13086

Chrome
Java
Linux


posted 11 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Liron Meir wrote:Hi this is not reading word by word. This is how it’s done:

Given that the question, and the last reply, was almost three years ago, i doubt the original poster is still waiting for an answer, or is terribly worried about it anymore.

There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors

Liron Meir

Greenhorn

Posts: 2


posted 11 years ago


  • Likes 1
  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Yes, but if someone is looking for a solution to read word by word, this is not it.

Campbell Ritchie

Marshal

Posts: 77646


posted 11 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Welcome to the Ranch

That is what I was hinting at when I mentioned Scanner. We prefer not to give the full solution and it says the following on this forum’s title page:

We’re all here to learn, so when responding to others, please focus on helping them discover their own solutions, instead of simply providing answers.

  1. October 21st, 2011, 10:28 AM


    #1

    dylanka is offline


    Junior Member


    Default Reading a text file word by word

    Hi all, new to Java programming. Got into it, cause a friend also has been doing it, but he’s been doing it for a while, so is really good. Anyways.

    So what I want to do it, read a specific .txt file, then go through that, and split it into words. If that makes sense.

    What I want the program to do, is, go through a file, and identify and count how many palindromes are inside the file. So I have been able to do the palindrome part of it. But I am just confused as to how you go about reading the text, word for word, then testing it out.

    And help will be appreciated, and not in a rush, this is just a hobby, that I am really enjoying


  2. Default Related threads:


  3. October 21st, 2011, 11:00 AM


    #2

    Default Re: Reading a text file word by word

    Here is the first hit after googling «java file io»: Lesson: Basic I/O (The Java� Tutorials > Essential Classes)

    But this sounds like a job for the Scanner class. The API is your new best friend: Java Platform SE 6


  4. October 21st, 2011, 12:23 PM


    #3

    dylanka is offline


    Junior Member


    Default Re: Reading a text file word by word

    OK, hi again. I ended up figuring out how to do it. But I have a new problem. When I run the program, I want it to only find single word palindromes. At the moment, when I do it, it is also getting multiple word palindromes. So, in a text file I have, it has, for example «avid diva». It is coming up that each of these are palindromes. And I want it too be only one word palindromes, for example «otto». Any help again is appreciated. Here is what I currently have

    import java.io.*;
    public class PalindromDetector {
    	public static void main(String[] args) {
     
    		try { 
     
    			FileInputStream fstream = new FileInputStream("C:/Test1.txt");
     
    			DataInputStream in = new DataInputStream(fstream);
    			BufferedReader br = new BufferedReader(new InputStreamReader(in));
    			String strLine = null;
    			while ((strLine = br.readLine()) != null)   {
     
     
     
    				String reverse = new
    				StringBuffer(strLine).reverse().
    				toString();
    				int i,j,counter=0;
     
    				String m[]=strLine.split(" ");
    				String[] word=reverse.split(" ");
     
    				System.out.println("The palindrome words are:");
    				for(i=0;i<m.length;i++) {
    					for(j=word.length-1;j>=0;j--) {
    						if(m[i].equalsIgnoreCase(word[j])) {
    							System.out.println(m[i]);
    							counter++;
    							break;
    						}
     
    					}
    				}
    				System.out.println("Number of palindromes:"+counter);
    			}
    		}
    		catch(IOException e){}
     
    	}
    }

    Dunno what I have done. Obviously something silly. I’m sure what I have done is supposed to be harder that what I am supposed to have done


  5. October 21st, 2011, 02:06 PM


    #4

    Default Re: Reading a text file word by word

    Instead of taking the file one line at a time, why don’t you take it one word at a time, since that’s what you really care about?



Recommended Answers

Try This…

StreamReader sr = new StreamReader(Environment.CurrentDirectory + "\abc.txt");
            string line = null;
           do
            {
                line = sr.ReadLine();
                
                if ((line != null))
                {
                     MessageBox.show(line);                         //Read line
                    foreach (char cnt in line)
                    {
                      textBox1.AppendText(cnt.ToString());         //Read characters
                    }

                }
            } while (!(line == null));
            sr.Close();

Hope this will help …

Jump to Post

To read a file and to get word by word (seperated one from another), then you can doit this way:

List<string> list  = new List<T>();
using(StreamReader sr = new StreamReader("filePath"))
{
    string line;
    while((line = sr.ReadLine()) != null)
    {
        string[] words 0 line.Split(' ');
        foreach(string word in …

Jump to Post

All 6 Replies

Member Avatar

11 Years Ago

Try This…

StreamReader sr = new StreamReader(Environment.CurrentDirectory + "\abc.txt");
            string line = null;
           do
            {
                line = sr.ReadLine();
                
                if ((line != null))
                {
                     MessageBox.show(line);                         //Read line
                    foreach (char cnt in line)
                    {
                      textBox1.AppendText(cnt.ToString());         //Read characters
                    }

                }
            } while (!(line == null));
            sr.Close();

Hope this will help you.

Edited

11 Years Ago
by bhagawatshinde because:

n/a

Member Avatar

11 Years Ago

To read a file and to get word by word (seperated one from another), then you can doit this way:

List<string> list  = new List<T>();
using(StreamReader sr = new StreamReader("filePath"))
{
    string line;
    while((line = sr.ReadLine()) != null)
    {
        string[] words 0 line.Split(' ');
        foreach(string word in words)
             list.Add(word);
    }
}

Now you have all the words in the leneric list<T>.
If you want to «combine» them, you can do it this way:

textBox1.Text = String.Join(" ", list.ToArray());

Edited

11 Years Ago
by Mitja Bonca because:

n/a

Member Avatar

11 Years Ago

Thank you sir, it really helped :)

Member Avatar


ddanbe

2,724



Professional Procrastinator



Featured Poster


11 Years Ago

Your foreach loop runs so fast you will be at the word «cat» before you even notice it. Try putting in some slowing down in your loop, if you want to see each word seperately.

Member Avatar


sknake

1,622



Senior Poster



Featured Poster


11 Years Ago

You may also consider using Regex.Split() to split on word boundaries and filter out the punctuation. If you split on ‘ ‘ and you have «hirnhello» then it will show up as one word because but its really on two different lines. I suppose if you use .ReadLine() for the input file it will already handle the new line scenario but perhaps if someone ended a sentence with a period and didn’t have a space after the punctuation…

void button15_Click(object sender, EventArgs e)
    {
      const string s1 = "hi. how are you?rnGood thanks";
      string[] arr1 = s1.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries);
      string[] arr2 = System.Text.RegularExpressions.Regex.Split(s1, @"b");
      Debug.WriteLine("Method1: " + string.Join(" | ", arr1).Replace("r", @"r").Replace("n", @"n"));
      Debug.WriteLine("Method2: " + string.Join(" | ", arr2).Replace("r", @"r").Replace("n", @"n"));
    }

Results in:

Method1: hi. | how | are | you?rnGood | thanks
Method2:  | hi | .  | how |   | are |   | you | ?rn | Good |   | thanks |

Member Avatar

11 Years Ago

Thank you, sir. It helped :)


Reply to this topic

Be a part of the DaniWeb community

We’re a friendly, industry-focused community of developers, IT pros, digital marketers,
and technology enthusiasts meeting, networking, learning, and sharing knowledge.

There are multiple ways of writing and reading a text file. this is required while dealing with many applications. There are several ways to read a plain text file in Java e.g. you can use FileReader, BufferedReader, or Scanner to read a text file. Every utility provides something special e.g. BufferedReader provides buffering of data for fast reading, and Scanner provides parsing ability.

Methods:

  1. Using BufferedReader class
  2. Using Scanner class
  3. Using File Reader class
  4. Reading the whole file in a List
  5. Read a text file as String

We can also use both BufferReader and Scanner to read a text file line by line in Java. Then Java SE 8 introduces another Stream class java.util.stream.Stream which provides a lazy and more efficient way to read a file. 

Tip Note: Practices of writing good code like flushing/closing streams, Exception-Handling etc, have been avoided for better understanding of codes by beginners as well.

Let us discuss each of the above methods to a deeper depth and most importantly by implementing them via a clean java program. 

Method 1: Using BufferedReader class 

This method reads text from a character-input stream. It does buffer for efficient reading of characters, arrays, and lines. The buffer size may be specified, or the default size may be used. The default is large enough for most purposes. In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders as shown below as follows:

BufferedReader in = new BufferedReader(Reader in, int size);

Example:

Java

import java.io.*;

public class GFG {

    public static void main(String[] args) throws Exception

    {

        File file = new File(

            "C:\Users\pankaj\Desktop\test.txt");

        BufferedReader br

            = new BufferedReader(new FileReader(file));

        String st;

        while ((st = br.readLine()) != null)

            System.out.println(st);

    }

}

Output:

If you want to code refer to GeeksforGeeks

Method 2: Using FileReader class

Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. 

Constructors defined in this class are as follows:

  1. FileReader(File file): Creates a new FileReader, given the File to read from
  2. FileReader(FileDescriptor fd): Creates a new FileReader, given the FileDescriptor to read from
  3. FileReader(String fileName): Creates a new FileReader, given the name of the file to read from

Example:

Java

import java.io.*;

public class GFG {

    public static void main(String[] args) throws Exception

    {

        FileReader fr = new FileReader(

            "C:\Users\pankaj\Desktop\test.txt");

        int i;

        while ((i = fr.read()) != -1)

            System.out.print((char)i);

    }

}

Output:

If you want to code refer to GeeksforGeeks

Method 3: Using Scanner class

A simple text scanner that can parse primitive types and strings using regular expressions. A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types using the various next methods.

Example 1: With using loops

Java

import java.io.File;

import java.util.Scanner;

public class ReadFromFileUsingScanner

{

  public static void main(String[] args) throws Exception

  {

    File file = new File("C:\Users\pankaj\Desktop\test.txt");

    Scanner sc = new Scanner(file);

    while (sc.hasNextLine())

      System.out.println(sc.nextLine());

  }

}

Output:

If you want to code refer to GeeksforGeeks

Example 2: Without using loops

Java

import java.io.File;

import java.io.FileNotFoundException;

import java.util.Scanner;

public class ReadingEntireFileWithoutLoop

{

  public static void main(String[] args)

                        throws FileNotFoundException

  {

    File file = new File("C:\Users\pankaj\Desktop\test.txt");

    Scanner sc = new Scanner(file);

    sc.useDelimiter("\Z");

    System.out.println(sc.next());

  }

}

Output:

If you want to code refer to GeeksforGeeks

Method 4: Reading the whole file in a List

Read all lines from a file. This method ensures that the file is closed when all bytes have been read or an I/O error, or other runtime exception, is thrown. Bytes from the file are decoded into characters using the specified charset. 

Syntax:

public static List readAllLines(Path path,Charset cs)throws IOException

This method recognizes the following as line terminators: 

u000D followed by u000A, CARRIAGE RETURN followed by LINE FEED
u000A, LINE FEED
u000D, CARRIAGE RETURN

Example

Java

import java.util.*;

import java.nio.charset.StandardCharsets;

import java.nio.file.*;

import java.io.*;

public class ReadFileIntoList

{

  public static List<String> readFileInList(String fileName)

  {

    List<String> lines = Collections.emptyList();

    try

    {

      lines =

       Files.readAllLines(Paths.get(fileName), StandardCharsets.UTF_8);

    }

    catch (IOException e)

    {

      e.printStackTrace();

    }

    return lines;

  }

  public static void main(String[] args)

  {

    List l = readFileInList("C:\Users\pankaj\Desktop\test.java");

    Iterator<String> itr = l.iterator();

    while (itr.hasNext())

      System.out.println(itr.next());

  }

}

Output:

If you want to code refer to GeeksforGeeks

Method 5: Read a text file as String

Example

Java

package io;

import java.nio.file.*;;

public class ReadTextAsString {

  public static String readFileAsString(String fileName)throws Exception

  {

    String data = "";

    data = new String(Files.readAllBytes(Paths.get(fileName)));

    return data;

  }

  public static void main(String[] args) throws Exception

  {

    String data = readFileAsString("C:\Users\pankaj\Desktop\test.java");

    System.out.println(data);

  }

}

Output:

If you want to code refer to GeeksforGeeks

This article is contributed by Pankaj Kumar. If you like GeeksforGeeks and would like to contribute, you can also write an article using write.geeksforgeeks.org or mail your article to review-team@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks. Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

In this article we will be discussing about ways and techniques to read word documents in Java using Apache POI library. The word document may contain images, tables or plain text. Apart from this a standard word file has header and footers too. Here in the following examples we will be parsing a word document by reading its different paragraph, runs, images, tables along with headers and footers. We will also take a look into identifying different styles associated with the paragraphs such as font-size, font-family, font-color etc.

Maven Dependencies

Following is the poi maven depedency required to read word documents. For latest artifacts visit here

pom.xml

	<dependencies>
		<dependency>
                     <groupId>org.apache.poi</groupId>
                     <artifactId>poi-ooxml</artifactId>
		     <version>3.16</version>
                 </dependency>
	</dependencies>

Reading Complete Text from Word Document

The class XWPFDocument has many methods defined to read and extract .docx file contents. getText() can be used to read all the texts in a .docx word document. Following is an example.

TextReader.java

public class TextReader {
	
	public static void main(String[] args) {
	 try {
		   FileInputStream fis = new FileInputStream("test.docx");
		   XWPFDocument xdoc = new XWPFDocument(OPCPackage.open(fis));
		   XWPFWordExtractor extractor = new XWPFWordExtractor(xdoc);
		   System.out.println(extractor.getText());
		} catch(Exception ex) {
		    ex.printStackTrace();
		}
 }

}

Reading Headers and Foooters of Word Document

Apache POI provides inbuilt methods to read headers and footers of a word document. Following is an example that reads and prints header and footer of a word document. The example .docx file is available in the source which can be downloaded at the end of thos article.

HeaderFooter.java

public class HeaderFooterReader {

	public static void main(String[] args) {
		
		try {
			FileInputStream fis = new FileInputStream("test.docx");
			XWPFDocument xdoc = new XWPFDocument(OPCPackage.open(fis));
			XWPFHeaderFooterPolicy policy = new XWPFHeaderFooterPolicy(xdoc);

			XWPFHeader header = policy.getDefaultHeader();
			if (header != null) {
				System.out.println(header.getText());
			}

			XWPFFooter footer = policy.getDefaultFooter();
			if (footer != null) {
				System.out.println(footer.getText());
			}
		} catch (Exception ex) {
			ex.printStackTrace();
		}

	}

}

Output

This is Header

This is footer

 Other Interesting Posts
Java 8 Lambda Expression
Java 8 Stream Operations
Java 8 Datetime Conversions
Random Password Generator in Java

Read Each Paragraph of a Word Document

Among the many methods defined in XWPFDocument class, we can use getParagraphs() to read a .docx word document paragraph wise.This method returns a list of all the paragraphs(XWPFParagraph) of a word document. Again the XWPFParagraph has many utils method defined to extract information related to any paragraph such as text alignment, style associated with the paragrpahs.

To have more control over the text reading of a word document,each paragraph is again divided into multiple runs. Run defines a region of text with a common set of properties.Following is an example to read paragraphs from a .docx word document.

ParagraphReader.java

public class ParagraphReader {

	public static void main(String[] args) {
		try {
			FileInputStream fis = new FileInputStream("test.docx");
			XWPFDocument xdoc = new XWPFDocument(OPCPackage.open(fis));

			List paragraphList = xdoc.getParagraphs();

			for (XWPFParagraph paragraph : paragraphList) {

				System.out.println(paragraph.getText());
				System.out.println(paragraph.getAlignment());
				System.out.print(paragraph.getRuns().size());
				System.out.println(paragraph.getStyle());

				// Returns numbering format for this paragraph, eg bullet or lowerLetter.
				System.out.println(paragraph.getNumFmt());
				System.out.println(paragraph.getAlignment());

				System.out.println(paragraph.isWordWrapped());

				System.out.println("********************************************************************");
			}
		} catch (Exception ex) {
			ex.printStackTrace();
		}
	}
}

Reading Tables from Word Document

Following is an example to read tables present in a word document. It will print all the text rows wise.

TableReader.java

public class TableReader {

	public static void main(String[] args) {
		try {
			FileInputStream fis = new FileInputStream("test.docx");
			XWPFDocument xdoc = new XWPFDocument(OPCPackage.open(fis));
			Iterator bodyElementIterator = xdoc.getBodyElementsIterator();
			while (bodyElementIterator.hasNext()) {
				IBodyElement element = bodyElementIterator.next();

				if ("TABLE".equalsIgnoreCase(element.getElementType().name())) {
					List tableList = element.getBody().getTables();
					for (XWPFTable table : tableList) {
						System.out.println("Total Number of Rows of Table:" + table.getNumberOfRows());
						for (int i = 0; i < table.getRows().size(); i++) {

							for (int j = 0; j < table.getRow(i).getTableCells().size(); j++) {
								System.out.println(table.getRow(i).getCell(j).getText());
							}
						}
					}
				}
			}
		} catch (Exception ex) {
			ex.printStackTrace();
		}
	}
}

Reading Styles from Word Document

Styles are associated with runs of a paragraph. There are many methods available in the XWPFRun class to identify the styles associated with the text.There are methods to identify boldness, highlighted words, capitalized words etc.

StyleReader.java

public class StyleReader {

	public static void main(String[] args) {
		try {
			FileInputStream fis = new FileInputStream("test.docx");
			XWPFDocument xdoc = new XWPFDocument(OPCPackage.open(fis));

			List paragraphList = xdoc.getParagraphs();

			for (XWPFParagraph paragraph : paragraphList) {

				for (XWPFRun rn : paragraph.getRuns()) {

					System.out.println(rn.isBold());
					System.out.println(rn.isHighlighted());
					System.out.println(rn.isCapitalized());
					System.out.println(rn.getFontSize());
				}

				System.out.println("********************************************************************");
			}
		} catch (Exception ex) {
			ex.printStackTrace();
		}

	}

}

Reading Image from Word Document

Following is an example to read image files from a word document.

public class ImageReader {

	public static void main(String[] args) {

		try {
			FileInputStream fis = new FileInputStream("test.docx");
			XWPFDocument xdoc = new XWPFDocument(OPCPackage.open(fis));
			List pic = xdoc.getAllPictures();
			if (!pic.isEmpty()) {
				System.out.print(pic.get(0).getPictureType());
				System.out.print(pic.get(0).getData());
			}

		} catch (Exception ex) {
			ex.printStackTrace();
		}
	}

}

Conclusion

I hope this article served you that you were looking for. If you have anything that you want to add or share then please share it below in the comment section.

Download source

Like this post? Please share to your friends:
  • Java poi excel html
  • Java parse excel file
  • Java library for excel
  • Java how to read word by word
  • Javascript application vnd ms excel