Which word is in bold and in italics


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

We are using Apache POI 3.0 for reading values from an excel sheet.
We have a requirement which requires us to identify if some word in the excel sheet is in bold or italics or underline.

Can anyone please tell me how to do this?

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Sorry for my delayed reply.

This is the code I am using

void func()
{

try{

InputStream fis = new FileInputStream(«C:/bind.xls»);;
POIFSFileSystem fs = new POIFSFileSystem( fis );
HSSFWorkbook wb = new HSSFWorkbook(fs);
HSSFSheet sheet = wb.getSheetAt(0);
HSSFRow row=sheet.getRow(0);
HSSFCell cell=row.getCell((short)0);

HSSFRichTextString rts=cell.getRichStringCellValue();
System.out.println(«—-«+rts);

}catch(Exception e)
{
e.printStackTrace();
}

}
The problem is once the value of cell is fetched in the code all formatting is lost. Please tell me how to find out the formatting.

The value in cell could be something like

«abc pqr xyz»

where pqr is in bold,
abc is in italics

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Any one please


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

The following snippet will tell you about the style at each character index:

This is another approach that enumerates the formatting runs, but the indices it returns are a bit goofy:

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Thanks for replying.

Originally posted by Ulf Dittmer:
The following snippet will tell you about the style at each character index:

It always returns the same result whether the character is bold or italic or underline. The result I get is always

index = 0, char ‘O’
font # = 0
bold = 400
italic = false
underline = 0
strikeout = false

index = 1, char ‘b’
font # = 0
bold = 400
italic = false
underline = 0
strikeout = false

index = 2, char ‘j’
font # = 0
bold = 400
italic = false
underline = 0
strikeout = false
[ August 05, 2007: Message edited by: Thomas Greene ]

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Originally posted by Ulf Dittmer:

This is another approach that enumerates the formatting runs, but the indices it returns are a bit goofy:

System.out.println(rts.numFormattingRuns());
returns 0, so it never enters the loop.

Ulf Dittmer

Rancher

Posts: 43079


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

That’s odd. I tested it with a string that contains runs of all 4 formatting features, and it reported all of them.

And then I noticed that I had misinterpreted the index numbers, and that they actually do make sense. So in my test case, both approaches work. I’ve put the file here.
[ August 05, 2007: Message edited by: Ulf Dittmer ]

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

That is really surprising. Your excel sheet works fine, while mine does not. Is there a way I can upload my excel sheet for you to have a look?

Ulf Dittmer

Rancher

Posts: 43079


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

I’ve sent you a private message. You can read it by following the «My Profile» link at the top of the page.

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Originally posted by Ulf Dittmer:
I’ve sent you a private message. You can read it by following the «My Profile» link at the top of the page.

Please check your email

Thank You.
[ August 06, 2007: Message edited by: Thomas Greene ]

Ulf Dittmer

Rancher

Posts: 43079


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

The difference seems to be that there are no «runs» in your example — all of the cell contents use the same styling. In that case, you can get the style information from the cell font:

That is apparently the style information used for the first run if there is more than one run in a cell.
[ August 06, 2007: Message edited by: Ulf Dittmer ]

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Thanks a lot Ulf
Can you please tell me what does «run» mean here.

Also, I tried the code you have given. Still not getting absolutely correct results.

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

I have mailed the excel sheet.
If you observe, last word (uuu) is both underline, bold and italics.
But only italics is giving true (assuming bold=400 means normal formatting and not bold)

Thanks again
[ August 06, 2007: Message edited by: Thomas Greene ]

Ulf Dittmer

Rancher

Posts: 43079


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

The code prints repeatedly the cell style, not the character style, that’s why you’re getting the same style for all characters.

It looks correct to me. The cell style is italics, non-bold, non-underline. That’s the first run, i.e. what the string starts with. Then there’s a bold, italic, non-underline run that starts at index 10, and another run that’s bold, italics, underline starting at index 17.

«Run» here is a sequence of characters that use the same style -and by implication, the same font index-. The example has three runs — the first one is represented by the cell style, and the other two are accessed through numFormattingRuns and getFontOfFormattingRun.
[ August 06, 2007: Message edited by: Ulf Dittmer ]

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Thanks for replying Ulf

Originally posted by Ulf Dittmer:
and another run that’s bold, italics, underline starting at index 17.

[ August 06, 2007: Message edited by: Ulf Dittmer ]

But in the output I am getting (starting index 17)

Doesn’t that mean «not bold», «not underline» but only italics

Ulf Dittmer

Rancher

Posts: 43079


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Did you fix the bug in your code that I described in the first sentence of my last post? With that fixed, I’m getting the following, which looks OK.

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Thanks Ulf

I see what you are trying to say.

Instead of
HSSFCellStyle style = cell.getCellStyle();
I should be getting character style. But there does not seem to be any method which will get me the character style. Which method should be used instead?

Ulf Dittmer

Rancher

Posts: 43079


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

But there does not seem to be any method which will get me the character style.

The code in my first post did exactly that.

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Are you referring to this code

String text = rts.toString();

for (int i=0; i<rts.length(); i++) {
System.out.println(«index = «+i+», char ‘»+text.charAt(i)+»‘»);
System.out.println(«font # = «+rts.getFontAtIndex(i));
HSSFFont font = wb.getFontAt(rts.getFontAtIndex(i));
System.out.println(«bold = «+font.getBoldweight());
System.out.println(«italic = «+font.getItalic());
System.out.println(«underline = «+font.getUnderline());
System.out.println(«strikeout = «+font.getStrikeout());
System.out.println();
}

But as mentioned above it is giving inconsistent results.

Ulf Dittmer

Rancher

Posts: 43079


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Maybe that’s an artifact if there is only a single run. Judging by my tests, the algorithm would be something like:

— the style of the first run of text (which may be the only one) is the one in wb.getFontAt(cell.getCellStyle().getFontIndex());

— if there are more runs (determined by rts.numFormattingRuns()), then starting with the second run, the style can be determined by wb.getFontAt(rts.getFontOfFormattingRun(i)) starting at index rts.getIndexOfFormattingRun(i)

That operates at a higher level than characters (with runs), and may thus simplify the processing.

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Thanks for replying Ulf.

rts.numFormattingRuns() returns 0, so I never get into the for loop.

Ulf Dittmer

Rancher

Posts: 43079


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

rts.numFormattingRuns() returns 0, so I never get into the for loop.

Just to clarify: it does that for a cell that has more than one run, i.e. at least two different styles in it? If so, you might want to file a bug report with the POI folks.

As stated before, I’ve seen it do that for cells that only have a single style, but not for cells that have more than one.

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Thanks Ulf.

Is there a way that this problem can be solved?

Actually my requirement is something like this.
In the excel sheet, some of the cells may have values like

abc xyz

This value needs to be fetched using Java code and send to some system in the form of

<b>abc</b> <i>xyz</i>

These HTML tags are required since the content will then be pulled out and displayed on HTML page.

Is there something that can be done to solve this problem?

Thank You.

Ulf Dittmer

Rancher

Posts: 43079


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Short of fixing the bug -if there is one- I don’t think this can be solved.

But let me ask again, because I’m still not clear on this from what you write: numFormattingRuns() returns 0 for a cell that contains more than one style, e.g. «bbb iii«? That would be a bug, but I’m not seeing it. Feel free to send me an XLS file that exhibits this behavior.

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Originally posted by Ulf Dittmer:
Short of fixing the bug -if there is one- I don’t think this can be solved.

But let me ask again, because I’m still not clear on this from what you write: numFormattingRuns() returns 0 for a cell that contains more than one style, e.g. «bbb iii«? That would be a bug, but I’m not seeing it. Feel free to send me an XLS file that exhibits this behavior.

You are right Ulf. Seems to be my mistake. I might be pointing to the wrong cell.
Although I am getting a value for numFormattingRuns but it is not the correct value. I’ll mail you that sheet in a while.


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Originally posted by Thomas Greene:

You are right Ulf. Seems to be my mistake. I might be pointing to the wrong cell.
Although I am getting a value for numFormattingRuns but it is not the correct value. I’ll mail you that sheet in a while.

Seems it doesn’t give the index of first word but starts from the space after the first word.
Why is it so?

If you are surprised about my new id, then please have a look at

https://coderanch.com/t/3240/Ranch-Office/Changed-mail-id-incorrect
[ August 08, 2007: Message edited by: Thomas Greane ]

Ulf Dittmer

Rancher

Posts: 43079


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

It looks like the «algorithm» I gave before works fine for this case. The first run of text is obtained from the cell style, and the other runs after that by numFormattingRuns() and the associated methods. So the number of runs returned by is() is one less than the actual number of runs (because the first one is considered the cell style). A bit confusing, but not at all hard to code. As to why that is, I have no idea — that’s a question for the POI project members.

Thomas Greane

Greenhorn

Posts: 5


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Originally posted by Ulf Dittmer:
So the number of runs returned by is() is one less than the actual number of runs (because the first one is considered the cell style).

Shouldn’t that make it one more.
Suppose I have one word in bold, second word in italics and third word on underline. Now shouldn’t the algorithm be giving me 4 values.
One for cell style, second for bold, third for italics and fourth for underline.

Ulf Dittmer

Rancher

Posts: 43079


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Maybe the API is modeled after the internal structure of the XLS file format, and that this is how the file format stores it.

The important thing is that it is possible to get at all data even if it’s non-obvious how to do so. I’d recommend that you submit improved documentation for the HSSFRichTextString class to the POI folks, so that the next person wondering about this knows what’s going on.

Thomas Greane

Greenhorn

Posts: 5


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Originally posted by Ulf Dittmer:
Maybe the API is modeled after the internal structure of the XLS file format, and that this is how the file format stores it.

The important thing is that it is possible to get at all data even if it’s non-obvious how to do so. I’d recommend that you submit improved documentation for the HSSFRichTextString class to the POI folks, so that the next person wondering about this knows what’s going on.

I would surely do that. But I am still wondering how to get the formatting of the first word in the cell.

Ulf Dittmer

Rancher

Posts: 43079


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

But I am still wondering how to get the formatting of the first word in the cell.

By using the cell style. See my post starting with «The difference seems to be that there…».

I had described the algorithm to use -no matter how many runs of styled text there are- in the post starting «Maybe that’s an artifact if there …».

BTW, did you see this?
[ August 08, 2007: Message edited by: Ulf Dittmer ]

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Originally posted by Ulf Dittmer:

BTW, did you see this?

[ August 08, 2007: Message edited by: Ulf Dittmer ]

Thanks Ulf. I am back to my old log in.

Thomas Greene

Ranch Hand

Posts: 132


posted 15 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Originally posted by Ulf Dittmer:

By using the cell style. See my post starting with «The difference seems to be that there…».

I had described the algorithm to use -no matter how many runs of styled text there are- in the post starting «Maybe that’s an artifact if there …».

BTW, did you see this?

[ August 08, 2007: Message edited by: Ulf Dittmer ]

Thanks Ulf, it’s working fine

I remember seeing such a word before, but I can’t for the life of me remember what it was. I’m googling and having no luck.

The classic web comic Pokey the Penguin used this technique quite a bit.

asked Jan 7, 2013 at 22:29

DeeDee's user avatar

7

Bold and italic and underline are all examples of typographical emphasis. Others include (but are not limited to) strikethrough, overlining, changed font and size, capitalisation and letter spacing.

Although bold, italic and underline are the most common, they are not often used in combination, so I don’t think there’s a special word or phrase for it.

You could refer to it as triple emphasis, or if you need precision, stick with bold, italic and underline.

Community's user avatar

answered Jan 8, 2013 at 5:32

Hugo's user avatar

HugoHugo

66.7k20 gold badges201 silver badges314 bronze badges

I’m not sure about the italicized part, but this source calls the kind of balloon and bold and underlined words used to indicate screaming dialogue a burst balloon (in contrast to a radio balloon for more normal dialogue):
enter image description here

www.admiraalit.nl's user avatar

answered Jan 7, 2013 at 23:14

JLG's user avatar

JLGJLG

23.1k1 gold badge42 silver badges99 bronze badges

3

Artemy Lebedev

December 28, 2001

It happened somehow that in its brief history book printing has failed to produce any diversity on the typesetting side. Uniformity may be scaled down by using different fonts, formats and sizes.

If a text has been composed in some font, the further maneuvers you may want to take are pretty much limited. You may change the leading, letter spacing, alignment. Apart from half a dozen punctuation marks, there are hardly any means to stress intonation in writing.

There are several other typographical means of semantic marking. These means are broken down into the acceptable and unacceptable categories.

Acceptable means: boldface and italics. Unacceptable means: splitting, underlining, color highlighting. I’d like to note that these three ways are unacceptable in the main text body. They will do fine in a jobbing work (i.e. as special artistic decoration, for captions, on billboards etc.). But in a book (the author is unflinchingly committed to this belief) spacing out or underlining are off-limits: these tricks are the aftermath of typewriters’ helpless attempts to make words stand out one way or another.

What it boils down to is that the only two acceptable ways of highlighting words in the text that we’ve inherited from our forerunners are boldface and italics.

06

Housekeeping tip

Italics were first used in printing by the printer Aldus Manutius in the late 15th century. Cursive fonts were styled to imitate the typeface of the Papal Chancellery documents. Since the whole affair was staged in Italy, the font that later on spread across Europe was called in a similar way—Italian. These days every secretary knows it as italic.

You don’t need to worry about your soul when using boldface and italics in composition. Boldface better fits captions, italics—the emphasizing of words without changing the visual uniformity of the page. Boldface can also be used for emphasizing words in the text, but it has a special feature: a word in bold is too conspicuous and can be seen on the page long before the reader gets around to the fragment being emphasized. This can be the cause for the reader to be prematurely let in on the secret that the author had in store, so italics would be a better tool here. They emphasize words distinctively enough without leaping to the eye before it reaches the relevant line. For example:

I have tonight shaved la barba—what you call the weeskers of the Presidente himself, of this countree! I have tonight shaved la barba—what you call the weeskers of the Presidente himself, of this countree!

If we speak about correct use, italic typeface is best for a typeset text, boldface—for captions.

Just as with any other special means, you should not overuse the tools of emphasis. Typesetting the entire page in italics would be disrespect to the reader’s eyesight.

If you need to emphasize a word in a boldface text, use bold italics. Bold italics amid ordinary text are almost always illicit. Example:

I have tonight shaved la barba—what you call the weeskers of the Presidente himself, of this countree!

If you need to stress a word in an italics text, use ordinary, non-italics typeface. Example:

I have tonight shaved la barba—what you call the weeskers of the Presidente himself, of this countree!

You must have really compelling reasons for using bold italics in a text which is fully typeset in italics. Boldface normally serves illustrative and advertising purposes and hardly ever finds its way into fiction texts. Italics, on the other hand, are a very good option for fiction and quotes. Examples:

If you don’t buy our plastic tubes, you’ll be haunted by nightmares in your afterlife and lose the entire meaning of your existence. Plastic tubes are your only hope in this complicated world. I think it was the first time I heard that word, which went on to have an incredibly sleazy career in the seventies and coke-soaked eighties. Mostly in politics. I think credibility died of shame around 1986…

Like this post? Please share to your friends:
  • Which word is different topics
  • Which word is different top bottom
  • Which word is different milk
  • Which word is correct center or centre
  • Which word is correct canceled or cancelled