Index from word list

You are looking to insert an Index.

http://office.microsoft.com/en-ie/mac-word-help/create-or-edit-an-index-HA102929532.aspx

You can create an index entry for a specific word, phrase, or symbol, or for a topic that spans a range of pages.

  1. Mark index entries for words or phrases

  2. Select the text that you want to use as an index entry.

  3. On the Insert menu, click Index and Tables.

  4. On the Index tab, click Mark Entry.

    TIP To go directly to the Mark Index Entry dialog box, press COMMAND + OPTION + SHIFT + X .

Type or edit the text in the Main entry box.
TIP

To create a subentry, specify the main index entry, and then type the subentry in the Subentry box.
To create a third-level entry, type the subentry text followed by a colon (:) and the text of the third-level entry.

  1. Do one of the following:

TO MARK «The index entry» CLICK «Mark»

TO MARK «The first occurrence of this text in each paragraph in the document that exactly matches the uppercase and lowercase letters in the entry» CLICK «Mark All»

  1. TIP To mark index entries for symbols such as @, in the Main entry box, immediately following the symbol, type ;# (semicolon followed by the number sign), and then click Mark. When you build the index, Word puts the symbols at the beginning of the index.

  2. To mark additional index entries, select the text or click immediately after it, click in the Mark Index Entry dialog box, and then repeat steps 4 and 5.
    NOTE Word inserts each marked index entry as an XE (Index Entry) field in hidden text format. If you do not see the XE field, click Show/Hide Show button on the Standard toolbar.

There are two things that should be mentioned :)

  1. set() object is supposed to mimic the set in math, so there is no index number accompanied with each member of the list. The only relation between an element and the set is membership.
  2. When I ran the program I saw the function has nothing to do with the second argument window_size (creates a fixed-length context for totally different window_sizes). There is a similar function in the official PyTorch tutorial for doing the same job which I highly recommend you to take a look at.

Finally, if you are going to use words indices I suggest converting word_set object to a list object which already has index() method. The following is my implementation:

word_list = ['this','start','wonderland','amaze','this','read', 'instal', 'instruct', 'nis', '2004', 'nav', '2004', 'prior', 'latsni', 'still', 'end', 'result', 'junk', 'rawtfos', 'whi', 'instal', 'ina', 'type', 'softwar', 'instal', 'krow', 'proper', 'norton', 'tcudorp', '3', 'latsni', 'either', 'eno', 'norton', 'product', 'neither', 'krow', 'latsni', 'mcafe', 'anti', 'virus', '8', '2', 'comput', 'owner', 'sinc', 'purchas', 'mcafe', 'anti', 'virus', '8', 'instal']
word_set = set(word_list)
list_of_word_set = list(word_set) 
def CBOW(raw_text, window_size=2):
   data = []
   for i in range(window_size, len(raw_text) - window_size):
       context = [raw_text[i - window_size], raw_text[i - (window_size - 1)], raw_text[i + (window_size - 1)], raw_text[i + window_size]]
       context = [list_of_word_set.index(item) for item in context]
       target = raw_text[i]
       data.append((context, target)) 
   return data
CBOW(word_list, 10)

output:

[([11, 27, 21, 25], 'nav'),
 ([27, 5, 25, 0], '2004'),
 ...

I am in bad situation on job. Book with 600 indd pages without index tags. I have list with names which need to index in simple way, name — pages…

Need script which search from this word list and create index list. Search in reverse, Strayhorn, Billy is probably written as «Billy Strayhorn» in the book, sometimes is only first name. BUT, script need to ignore in search all non explicit letters, Croatian language. Example, in word list is Strayhorn, Billy. Script need to find all occurrence Billy Strayhorn in any combination on that page, like Billy(s) Strayhorn(s), and write page numbers to it.

I find this life saver script but this need to be modified for above needs. Any help is welcome…

//DESCRIPTION: Index direct

// Peter Kahrel — www.kahrel.plus.com

#target indesign;

#targetengine index_direct;

if (app.documents.length < 2 || app.selection.length == 0 || app.selection[0].parentStory.constructor.name != «Story»)

  errorM («Select a text frame or an insertion pointr(and open two or more documents).»);

try {index_independent (app.documents[0])}

  catch (e) {alert (e.message + «r(line » + e.line + «)»)};

//=======================================================================

function index_independent (doc)

{

var obj = get_data (doc);

create_index (obj);

}

function create_index (obj)

  {

app.scriptPreferences.enableRedraw = false;

//~ if (app.selection.length == 0 && app.activeDocument.textFrames.length > 1)

//~ errorM («Select a text frame or an insertion point.»);

  var top_text, pages;

  check_list (app.activeDocument);

  grep_settings (obj); // grep_settings MUST follow check_list

  // get the topics from the concordance list as paragraph objects

  var tops = app.selection[0].parentStory.paragraphs;

  // get the names of all open documents (creates array of doc. names)

  var docs = app.documents.everyItem().name;

  // and delete current document (the concordance list) from the array (but it will stay open)

  docs.shift();

  // initialise message window

  mess = createmessagewindow (40);

  for (var i = 0; i < tops.length; i++)

  {

  // create text string from topic

  top_text = make_topic (tops, obj);

  // get page numbers of the topic from all open docs

  pages = get_pages (docs, top_text, obj);

  // If any, append to topic in concordance list.

  // The last one is added at ins. point -2, the others at -1.

  if (pages.length > 0)

  {

  if (i == tops.length-1)

  tops.insertionPoints[-1].contents = obj.topic_separator + pages;

  else

  tops.insertionPoints[-2].contents = obj.topic_separator + pages;

  }

  else

  if (obj.mark) tops.strikeThru = true;

  }

if (obj.section_markers)

add_sections (tops);

  mess.parent.close();

  }

function add_sections (par)

{

mess.text = ‘Adding sections…’;

app.findGrepPreferences = null;

app.findGrepPreferences.findWhat = ‘\w+?’;

var ch1, ch2;

for (var i = par.length-2; i >= 0; i—)

{

try

{

ch1 = par.findGrep()[0].contents.toUpperCase();

ch2 = par[i+1].findGrep()[0].contents.toUpperCase();

if (ch1 != ch2)

par.insertionPoints[-1].contents = ch2+’r’;

}

catch (_) {}

}

try {par[0].insertionPoints[0].contents = par[0].findGrep()[0].contents.toUpperCase()+’r’;} catch (_){}

}

function make_topic (t, obj)

  {

  // remove trailing return

  var s = t.contents.replace (/r$/, «»);

  // show the topic — can’t do that anywhere else

  mess.text = s;

  // delete everything from comma or parenthesis, including any preceding space

//~ s = s.replace (/s?[,(].+$/, «»);

  s = s.replace (/(,|_().+$/, «»);

  // extract any subtopic

  s = s.split(«__»).pop();

  // whole-word-only search

  s = «\b»+s+»\b»;

  // case sensitive if necessary

  if (obj.case_sensitive == false)

  s = «(?i)» + s;

  return s

  }

function get_pages (docs, t, obj)

  {

  var pages = [];

  for (var i = 0; i < docs.length; i++)

  {

  var temp = get_one_doc (app.documents.item (docs), t, obj)

  if (temp.length > 0)

  pages = pages.concat (temp);

  }

  // sort and remove duplicates

  if (pages.length > 1)

  pages = rerange (pages, obj);

  return pages

  }

function get_one_doc (doc, t, obj)

  {

  var page, array = [];

  app.findGrepPreferences.findWhat = t;

  var pp = doc.findGrep();

  for (var i = 0; i < pp.length; i++)

  {

  if (check_style (pp, obj))

  {

  page = find_page (pp);

  if (page != null)

  array.push (page.name);

  }

  }

  return array

  }

function find_page (o)

  {

  try

  {

  if (o.hasOwnProperty («parentPage»))

  return o.parentPage;

  if (o.constructor.name == «Page»)

  return o;

  switch (o.parent.constructor.name)

  {

  case «Character»: return find_page (o.parent);

  case «Cell»: return find_page (o.parent.texts[0].parentTextFrames[0]);

  case «Table» : return find_page (o.parent);

  case «TextFrame» : return find_page (o.parent);

  case «Group» : return find_page (o.parent);

  case «Story»: return find_page (o.parentTextFrames[0]);

  case «Footnote»: return find_page (o.parent.storyOffset);

  case «Page» : return o.parent;

  }

  }

  catch (_) {return null}

  }

function check_style (w, obj)

  {

if (obj.selected_styles == «») return true;

  // exclude the selected paragraphs

  if (obj.include == 0 && obj.selected_styles.indexOf («£$»+w.appliedParagraphStyle.name+»£$») < 0) return true;

  // include just the selected paragraphs

  if (obj.include == 1 && obj.selected_styles.indexOf («£$»+w.appliedParagraphStyle.name+»£$») > -1) return true;

  }

// remove all trailing spaces and returns from the word list

function check_list (doc)

  {

  set_grep («\s+$», «», {}); // remove trailing space

  doc.changeGrep();

  set_grep («\x20\x20+», «», {}); // remove spurious spaces

  doc.changeGrep();

  }

function createmessagewindow( le )

  {

  dlg = new Window (‘palette’);

  dlg.alignChildren = [‘left’, ‘top’];

  txt = dlg.add (‘statictext’, undefined, «»);

  txt.characters = le;

  dlg.show();

  return txt

  }

//=================================================================

// Sort, remove duplicates, and range page numbers

function rerange (pagenum_array, obj)

  {

  var page_nums = remove_duplicates (pagenum_array);

  // split array into two: one roman, the other arabic

  var page_nums = split_roman_arabic (page_nums);

  page_nums.arabic = sort_range (page_nums.arabic, obj).join («, «);

  if (page_nums.roman.length > 1)

  {

  // convert roman numbers to arabic

  page_nums.roman = roman_to_arabic (page_nums.roman);

  page_nums.roman = sort_range (page_nums.roman, obj)

  // convert the arabic numbers in the roman array back to roman, return a string

  page_nums.roman = arabic_to_roman (page_nums.roman.join («, «));

  }

  // concatenate the arrays

  if (page_nums.roman.length > 0 && page_nums.arabic.length > 0) page_nums.roman += «, «;

  page_nums = page_nums.roman + page_nums.arabic;

  // Counter-intuitive construction here, but it’s necessary

  return page_nums;

  }

function sort_range (array, obj)

  {

//~ array = unrange (array); // not needed in this script

  array = array.sort (sort_num);

  if (obj.range_pages == true)

  array = apply_page_ranges (array, obj)

  return array

  }

// return two element object, each element an array,

// one of roman numbers, the other of arabic numbers

function split_roman_arabic (array)

  {

  var roman = [];

  var arab = [];

  for (var i = 0; i < array.length; i++)

  {

  if (array.match (/^[-u2013d]+$/) != null)

  arab.push (array);

  else

  roman.push (array);

  }

  return {roman: roman, arabic: arab}

  }

//~ function arabic_to_roman (array)

//~ {

//~ for (var i = array.length-1; i > -1; i—)

//~ array = arabic2roman (array);

//~ return array

//~ }

function arabic_to_roman (s)

  {

  return s.replace (/w+/g, arabic2roman)

  }

function roman_to_arabic (array)

  {

  for (var i = array.length-1; i > -1; i—)

  array = roman2arabic (array);

  return array;

  }

function sort_num (a, b) {return a — b}

//~ function sort_roman (a, b) {return roman2arabic (a) — roman2arabic (b)}

function arabic2roman (arab)

  {

  var roman = «»;

  if (arab < 10000)

  {

  var rom = [[«»,»i»,»ii»,»iii»,»iv»,»v»,»vi»,»vii»,»viii»,»ix»],

  [«»,»x»,»xx»,»xxx»,»xl»,»l»,»lx»,»lxx»,»lxxx»,»xc»],

  [«»,»c»,»cc»,»ccc»,»cd»,»d»,»dc»,»dcc»,»dccc»,»cm»],

  [«»,»m»,»mm»,»mmm»,»4m»,»5m»,»6m»,»7m»,»8m»,»9m»]];

  arab = String (arab).split(«»).reverse().join(«»);

  for (var i = 0; i < arab.length; i++)

  roman = rom[arab.charAt(i)] + roman;

  }

  return roman

  }

function roman2arabic (roman)

  {

  var i;

  var rom2arab = {i: 1, v: 5, x: 10, l: 50, c: 100, d: 500, m: 1000};

  var arabic = rom2arab [roman.substr (-1)];

  for (i = roman.length-2; i > -1; i—)

  {

  if (rom2arab [roman] < rom2arab [roman[i+1]])

  arabic -= rom2arab [roman];

  else

  arabic += rom2arab [roman];

  }

  return arabic

  }

function remove_duplicates (array)

  {

  var temp = [];

  var dup = [];

  for (var i = array.length-1; i > -1; i—)

  {

  if (array != undefined && !dup[array])

  {

  dup[array] = true;

  temp.push (array);

  }

  }

  return temp

  } // remove_duplicates

function apply_page_ranges (array, obj)

  {

  var tolerance = obj.tolerance+1;

  var temp = [];

  var range = false;

  for (var i = 0; i < array.length; i++)

  {

  temp.push (array);

  while (array[i+1] — array <= tolerance)

  {i++; range = true}

  if (range)

  temp[temp.length-1] += obj.dash + array;

  range = false;

  }

  return temp;

  } // apply_page_ranges

// undo ranging (and digit dropping)

function unrange (array)

  {

  function expand_num ()

  {

  // 123-6 > 123-126

  function undrop (from, to) {return from.slice (0, from.length-to.length) + to};

  var expanded = «», start = arguments[1], stop = arguments[2];

  if (start.length > stop.length)

  stop = undrop (start, stop);

  start = +start; stop = +stop;

  for (var i = start; i < stop; i++)

  expanded += i + «,»;

  expanded += stop;

  return expanded

  } // expand_num

  var s = array.join («,»);

  s = s.replace (/(d+)[-u2013](d+)/g, expand_num);

  return s.split («,»)

  } // unrange

// End rerange ============================================================

function errorM (m)

  {

  alert (m);

  exit ();

  }

function set_grep (find, replace, options)

  {

  app.findGrepPreferences = app.changeGrepPreferences = null;

  app.findGrepPreferences.findWhat = find;

  app.changeGrepPreferences.changeTo = replace;

  if (options == undefined) options = {};

  if (options.PS !== undefined) app.findGrepPreferences.appliedParagraphStyle = options.PS;

  if (options.CS !== undefined) app.findGrepPreferences.appliedCharacterStyle = options.CS;

  app.findChangeGrepOptions.properties =

  {

  includeFootnotes: options.FN !== undefined,

  includeMasterPages: options.M !== undefined,

  includeHiddenLayers: options.HL !== undefined,

  includeLockedLayersForFind: options.LL !== undefined,

  includeLockedStoriesForFind: options.LS !== undefined

  }

  }

function grep_settings (o)

  {

  app.findGrepPreferences = app.changeGrepPreferences = null;

  app.findChangeGrepOptions.properties = {

  includeFootnotes: o.incFN,

  includeHiddenLayers: o.incHL,

  includeLockedLayersForFind: o.incLL,

  includeLockedStoriesForFind: o.incLS

  }

  }

// End create_index ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

function get_data (doc)

  {

  var history_file = script_dir() + «/index_direct.txt»;

  var history = read_history (history_file);

  // Get the document’s paragraph styles

  var list2 = doc.paragraphStyles.everyItem().name;

  // remove the first item ([No Paragraph])

  list2.shift ();

// get the styles in the other doc just in case we want to load them (can’t do that after the dialog is displayed)

var parstyles = app.documents[1].paragraphStyles.everyItem().name;

parstyles.shift();

  var w = new Window («dialog», «Create independent index», undefined, {closeButton: false});

  w.alignChildren = «left»;

  var panel = w.add («panel», undefined, «Select paragraph styles»);

  panel.orientation = «row»;

  var list1 = panel.add («listbox», undefined, list1, {multiselect: true});

  var addbuttons = panel.add («group»);

  addbuttons.orientation = «column»;

  addbuttons.alignChildren = «fill»;

  var add_ = addbuttons.add («button», undefined, «<—Add selected»);

  var add_all = addbuttons.add («button», undefined, «<—Add all»);

  var remove_ = addbuttons.add («button», undefined, «Remove selected —>»);

  var remove_all = addbuttons.add («button», undefined, «Remove all —>»);

  var load_styles = addbuttons.add («button», undefined, «Load styles»);

  var sort_styles = addbuttons.add («button», undefined, «Sort styles»);

  var list2 = panel.add («listbox», undefined, parstyles, {multiselect: true});

  list1.preferredSize = list2.preferredSize = [200, 200];

  var clude = w.add («panel»);

  clude.orientation = «row»;

  clude.alignment = «fill»;

  clude.alignChildren = «left»;

  clude.add («radiobutton», undefined, «u00A0Exclude the selected paragraph styles»);

  clude.add («radiobutton», undefined, «u00A0Include ONLY the selected paragraph styles»);

  clude.children[0].value = true;

  var group23 = w.add («group»);

  group23.alignChildren = «top»;

  var group2 = group23.add («group»);

  group2.orientation = «column»;

  group2.alignChildren = «left»;

  var gr0 = group2.add («group»);

  gr0.add («statictext», undefined, «Topic-page separator: «);

  var topic_sep = gr0.add («dropdownlist», undefined, [«Space», «En-space», «Comma+space»]);

  topic_sep.minimumSize.width = 120;

  topic_sep.selection = 1;

  var csense = group2.add («checkbox», undefined, «u00A0Match topics case-sensitively»);

  csense.value = true;

  var ranging = group2.add («group»);

  var range = ranging.add («checkbox», undefined, «u00A0Range pages»);

  range.value = true;

  var ranging_sub = ranging.add («group»);

  ranging_sub.add («statictext», undefined, «Use: «);

  var range_dash = ranging_sub.add («dropdownlist», undefined, [«Hyphen», «En-dash»]);

  range_dash.minimumSize.width = 80;

  range_dash.selection = 1;

  ranging_sub.add («statictext», undefined, «Tolerance:»)

  var tolerance = ranging_sub.add («dropdownlist», undefined, [«0″,»1″,»2″,»3″,»4″,»5″,»6″,»7″,»8″,»9″,»10»]);

  tolerance.minimumSize.width = 50;

  tolerance.selection = 0;

var section_markers = group2.add (‘checkbox’, undefined, ‘u00A0Add section headings’)

  var mark = group2.add («checkbox», undefined, «u00A0Mark topics without page references»);

  var group3 = group23.add («group»);

  group3.orientation = «column»;

  group3.margins = [40,0,0,0];

  group3.alignChildren = «left»;

  var includeLL = group3.add («checkbox», undefined, «u00A0Include locked layers»);

  var includeHL = group3.add («checkbox», undefined, «u00A0Include hidden layers»);

  var includeLS = group3.add («checkbox», undefined, «u00A0Include locked stories»);

  var includeFN = group3.add («checkbox», undefined, «u00A0Include footnotes»);

  var buttons = w.add («group»);

  buttons.alignment = «right»;

  var ok_button = buttons.add («button», undefined, «OK»);

  var cancel_button = buttons.add («button», undefined, «Cancel»);

  range.onClick = function () {ranging_sub.enabled = this.value}

  // Restore the selections from the previous run in the dialog ——————————————————

  if (history.selected_styles.length > 0)

  previously_selected_styles (history.selected_styles.split («£$»));

  clude.children[history.include].value = true;

  topic_sep.selection = topic_sep.find (history.topic_separator);

  csense.value = history.case_sensitive;

  range.value = history.range_pages;

  mark.value = history.mark;

section_markers.value = history.section_markers;

  range_dash.selection = range_dash.find (history.dash);

  tolerance.selection = tolerance.find (history.tolerance);

  includeLL.value = history.incLL;

  includeHL.value = history.incHL;

  includeLS.value = history.incLS;

  includeFN.value = history.incFN;

  list2.selection = 0;

  // Set dependencies

  clude.enabled = list1.items.length > 0;

  ranging_sub.enabled = range.value;

  function previously_selected_styles (array)

  {

  for (var i = 0; i < array.length; i++)

  {

  if (list2.find (array) != null)

  {

  list1.add («item», array);

  list2.remove (array)

  }

  }

  } // previously_selected_styles

  // End restore settings —————————————————————

  // enable/disable buttons depending on whether a list has any items

  add_.enabled = add_all.enabled = list2.items.length;

  remove_.enabled = remove_all.enabled = list1.items.length;

  add_.onClick = function () {if (list2.selection != null) move_item (list2.selection, list2, list1); clude.enabled = list1.items.length > 0};

  add_all.onClick = function () {move_all (list2, list1)};

  remove_.onClick = function () {if (list1.selection != null) move_item (list1.selection, list1, list2); clude.enabled = list1.items.length > 0};

  remove_all.onClick = function () {move_all (list1, list2); clude.enabled = false};

  load_styles.onClick = function () {load_pstyles ()};

  sort_styles.onClick = function () {sort_listbox (list2)};

  list1.onChange = function () {var sel = list1.selection; list2.selection = null; list1.selection = sel};

  list2.onChange = function () {var sel = list2.selection; list1.selection = null; list2.selection = sel};

  function load_pstyles ()

  {

//~ app.documents[0].importStyles(ImportFormat.paragraphStylesFormat, app.documents[1].fullName, GlobalClashResolutionStrategy.doNotLoadTheStyle);

//~ var temp = doc.paragraphStyles.everyItem().name;

//~ temp.shift ();

  list2.removeAll();

  for (var i = 0; i < parstyles.length; i++)

  list2.add («item», parstyles);

  list2.selection = 0;

  }

  function move_item (to_add, source, target)

  {

  // Record the index of the (first) selected item so that we can replace the cursor

  var sel = source.selection[0].index;

  for (var i = 0; i < to_add.length; i++)

  target.add («item», to_add.text);

  for (var i = 0; i < to_add.length; i++)

  source.remove (to_add.text);

  sort_listbox (target);

  add_.enabled = add_all.enabled = list2.items.length;

  remove_.enabled = remove_all.enabled = list1.items.length;

  // Replace the cursor

  if (source.items.length > 0)

  {

  if (sel >= source.items.length)

  sel = source.items.length-1;

  source.selection = sel;

  }

  } // move_item

  function move_all (source, target)

  {

  var to_sort = target.items.length > 0;

  for (var i = 0; i < source.items.length; i++)

  target.add («item», source.items.text);

  source.removeAll ();

  if (to_sort == true)

  sort_listbox (target);

  add_.enabled = add_all.enabled = list2.items.length;

  remove_.enabled = remove_all.enabled = list1.items.length;

  } // move_all

  function sort_listbox (list_box)

  {

  var array = list_to_stringarray (list_box);

  array = array.sort (nocase);

  list_box.removeAll ();

  for (i = 0; i < array.length; i++)

  list_box.add («item», array);

  }

  function nocase (a, b) {return a.toLowerCase() > b.toLowerCase()}

  function list_to_stringarray (list)

  {

  var array = [];

  for (var i = 0; i < list.items.length; i++)

  array.push (list.items.text);

  return array;

  }

  function tsep (s)

  {

  switch (s)

  {

  case «Space»: return » «;

  case «En-space»: return «u2002»;

  case «Comma+space»: return «, «;

  default: return » «;

  }

  }

  function psep (s)

  {

  switch (s)

  {

  case «Hyphen»: return «-«;

  case «En-dash»: return «u2013»;

  default: return «u2013»;

  }

  }

//~ cancel_button.onClick = function () {w.close(); exit ()};

if (w.show() == 1)

  {

if (list1.items.length == 0)

var sel_styles = «»;

else

var sel_styles = «£$»+list_to_stringarray (list1).join («£$»)+»£$»;

  var obj = {selected_styles: sel_styles,

  include: clude.children[0].value ? 0 : 1,

  case_sensitive: csense.value,

  topic_separator: topic_sep.selection.text,

  range_pages: range.value,

section_markers: section_markers.value,

  mark: mark.value,

  dash: range_dash.selection.text,

  tolerance: tolerance.selection.text,

  incLL: includeLL.value,

  incHL: includeHL.value,

  incLS: includeLS.value,

  incFN: includeFN.value

  }

  write_history (history_file, obj);

  obj.topic_separator = tsep (obj.topic_separator);

  obj.dash = psep (obj.dash);

w.close();

return obj;

  }

else

{

w.close();

exit ();

}

  } // index_independent

function read_history (f)

  {

  // default values in case there’s no history file

  var obj = {selected_styles: [],

  include: 0,

  cs: true,

  topic_separator: «En-space»,

  range_pages: true,

  mark: true,

  dash: «En-dash»,

  tolerance: 0,

section_markers: true,

  incLL: true,

  incHL: false,

  incLS: true,

  incFN: true

  };

  f = File (f);

  if (f.exists)

  {

  f.open («r»);

  obj = eval (f.read ());

  f.close ();

  }

  return obj;

  }

function write_history (f, obj)

  {

  f = File (f);

  f.open («w»);

  f.write (obj.toSource ());

  f.close ();

  }

function script_dir()

  {

  try {return File (app.activeScript).path}

  catch(e) {return File (e.fileName).path}

  }

//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Given a String, our task is to write a Python program to extract the start and end index of all the elements of words of another list from a string.

Input : test_str = “gfg is best for all CS geeks and engineering job seekers”, check_list = [“geeks”, “engineering”, “best”, “gfg”]

Output : {‘geeks’: [23, 27], ‘engineering’: [33, 43], ‘best’: [7, 10], ‘gfg’: [0, 2]}

Explanation : “geeks” starts from index number 23 till 27, hence the result.

Input : test_str = “gfg is best for all CS geeks and engineering job seekers”, check_list = [“geeks”, “gfg”]

Output : {‘geeks’: [23, 27], ‘gfg’: [0, 2]}

Explanation : “geeks” starts from index number 23 till 27, hence the result.

Method #1 : Using loop + index() + len()

In this, loop is used to get each element from list. The index() gets the initial index and len() gets the last index of all the elements from list in the string.

Python3

test_str = "gfg is best for all CS geeks and engineering job seekers"

print("The original string is : " + str(test_str))

check_list = ["geeks", "engineering", "best", "gfg"]

res = dict()

for ele in check_list :

    if ele in test_str:

        strt = test_str.index(ele)

        res[ele] = [strt, strt + len(ele) - 1]

print("Required extracted indices  : " + str(res))

Output:

The original string is : gfg is best for all CS geeks and engineering job seekers

Required extracted indices  : {‘geeks’: [23, 27], ‘engineering’: [33, 43], ‘best’: [7, 10], ‘gfg’: [0, 2]}

Time Complexity: O(n^2)
Auxiliary Space: O(n)

Method #2 : Using dictionary comprehension + len() + index()

In this, we perform tasks similar to the above function but the construction of the result dictionary is done using shorthand using dictionary comprehension. 

Python3

test_str = "gfg is best for all CS geeks and engineering job seekers"

print("The original string is : " + str(test_str))

check_list = ["geeks", "engineering", "best", "gfg"]

res = {key: [test_str.index(key), test_str.index(key) + len(key) - 1]

       for key in check_list if key in test_str}

print("Required extracted indices  : " + str(res))

Output:

The original string is : gfg is best for all CS geeks and engineering job seekers

Required extracted indices  : {‘geeks’: [23, 27], ‘engineering’: [33, 43], ‘best’: [7, 10], ‘gfg’: [0, 2]}

The time and space complexity for all the methods are the same:

Time Complexity: O(n)

Auxiliary Space: O(n)

Method #3 : Using loop+find()+len() methods
 

Python3

test_str = "gfg is best for all CS geeks and engineering job seekers"

print("The original string is : " + str(test_str))

check_list = ["geeks", "engineering", "best", "gfg"]

res = dict()

for ele in check_list :

    if ele in test_str:

        strt = test_str.find(ele)

        res[ele] = [strt, strt + len(ele) - 1]

print("Required extracted indices : " + str(res))

Output

The original string is : gfg is best for all CS geeks and engineering job seekers
Required extracted indices : {'geeks': [23, 27], 'engineering': [33, 43], 'best': [7, 10], 'gfg': [0, 2]}

Time complexity: O(n*m),

Auxiliary space: O(k),

Хочу заново издать академический перевод Махабхараты на русском языке. Для этого в частности полезно будет обновить указатели. Какое последнее слово техники? Речь про десятки тысяч «страниц» и 10 000 рубрик по гнездовому принципу. Видел, что vbatushev давно в теме. Но не знаю чем тема закончилась. Последнюю ветку нашел за 2013 год. За 7 лет новинок не было? Самое близкое, что я нашел, было 15 лет тому назад, ветка Скрипт создания предметного указателя.

1) нужно делать словарь к скрипту со словоформами

Воз и ныне там?

2) нужно будет подумать о вложенных (многоуровневых) индексах

Воз и ныне там?

Кусок копипаста после OCR из печатной книги:
Арджуна Картавирья 22, 40, 68, 168, 202, 212, 249, 250, 290, 705, 710, 717, 719
Арджуна, Пандава 25, 28, 36, 37, 39, 42. 61, 62, 64, 68, 74, 86, 92-96, 98-100, 101-108, 111-116, 167-169, 202, 203, 210, 211, 251, 256, 287, 290, 297, 317, 322, 324, 330, 333-336, 338, 340, 341, 343-345, 347-349, 352, 353, 355, 356, 361, 368, 465, 467, 468, 476-480, 482, 483, 488, 489, 494, 497, 506, 511, 513-517, 575, 586, 589, 590-592, 597, 613, 614, 616, 623-625, 653-655, 660, 662, 666, 674, 676, 677, 679, 694, 696, 704, 705, 708, 710, 711, 713, 715-720, 722-726, 735, 736, 740
Белоконный 289, 290, 334, 591, 666, 711
Ардра 403
Арка см. Солнце Аруджа 541
Аруна 527, 710

Спасибо,
Ваш,
М.

  • Remove From My Forums
  • Question

  • Hi everyone.

    i want to get all words in a Word Document and List all by ascending order then check the words finally replace wrong word with correct word…

    can everyone help me… solutions…suggestions…?

    • Moved by

      Sunday, January 23, 2011 6:21 AM
      Word, not VSTO-specific (From:Visual Studio Tools for Office)

Answers

  • Use:

    Sub WordFrequency()
             Dim SingleWord As String           ‘Raw word pulled from doc
            Const maxwords = 9000              ‘Maximum unique words allowed
            Dim Words(maxwords) As String      ‘Array to hold unique words
            Dim Freq(maxwords) As Integer      ‘Frequency counter for Unique Words
            Dim WordNum As Integer             ‘Number of unique words
            Dim ByFreq As Boolean              ‘Flag for sorting order
            Dim ttlwds As Long                 ‘Total words in the document
            Dim Excludes As String             ‘Words to be excluded
            Dim Found As Boolean               ‘Temporary flag
            Dim j, k, l, Temp As Integer       ‘Temporary variables
            Dim tword As String                ‘
             ‘ Set up excluded words
    ‘         Excludes = «[the][a][of][is][to][for][this][that][by][be][and][are]»
            Excludes = «»
            Excludes = InputBox$(«Enter words that you wish to exclude, surrounding each word with [ ].», «Excluded Words», «»)
    ‘        Excludes = Excludes & InputBox$(«The following words are excluded: » & Excludes & «. Enter words that you wish to exclude, surrounding each word with [ ].», «Excluded Words», «»)
    ‘ Find out how to sort
    ByFreq = True
    Ans = InputBox$(«Sort by WORD or by FREQ?», «Sort order», «FREQ»)
    If Ans = «» Then End
    If UCase(Ans) = «WORD» Then
       ByFreq = False
    End If
    Selection.HomeKey Unit:=wdStory
    System.Cursor = wdCursorWait
    WordNum = 0
    ttlwds = ActiveDocument.Words.Count
    Totalwords = ActiveDocument.BuiltInDocumentProperties(wdPropertyWords)
            ‘ Control the repeat
            For Each aword In ActiveDocument.Words
                SingleWord = Trim(aword)
                If SingleWord < «A» Or SingleWord > «z» Then SingleWord = «» ‘Out of range?
                If InStr(Excludes, «[» & SingleWord & «]») Then SingleWord = «» ‘On exclude list?
                If Len(SingleWord) > 0 Then
                    Found = False
                    For j = 1 To WordNum
                        If Words(j) = SingleWord Then
                            Freq(j) = Freq(j) + 1
                            Found = True
                            Exit For
                        End If
                    Next j
                    If Not Found Then
                        WordNum = WordNum + 1
                        Words(WordNum) = SingleWord
                        Freq(WordNum) = 1
                    End If
                    If WordNum > maxwords — 1 Then
                        j = MsgBox(«The maximum array size has been exceeded. Increase maxwords.», vbOKOnly)
                        Exit For
                    End If
                End If
                ttlwds = ttlwds — 1
                StatusBar = «Remaining: » & ttlwds & »     Unique: » & WordNum
            Next aword
             ‘ Now sort it into word order
            For j = 1 To WordNum — 1
                k = j
                For l = j + 1 To WordNum
                    If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq And Freq(l) > Freq(k)) Then k = l
                Next l
                If k <> j Then
                    tword = Words(j)
                    Words(j) = Words(k)
                    Words(k) = tword
                    Temp = Freq(j)
                    Freq(j) = Freq(k)
                    Freq(k) = Temp
                End If
                StatusBar = «Sorting: » & WordNum — j
            Next j
             ‘ Now write out the results
            tmpName = ActiveDocument.AttachedTemplate.FullName
            Documents.Add Template:=tmpName, NewTemplate:=False
            Selection.ParagraphFormat.TabStops.ClearAll
            With Selection
                For j = 1 To WordNum
                    .TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) & vbCrLf
                Next j
            End With
            ActiveDocument.Range.Select
            Selection.ConvertToTable
            Selection.Collapse wdCollapseStart
            ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1)
            ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore «Word»
            ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore «Occurrences»
            ActiveDocument.Tables(1).Range.ParagraphFormat.Alignment = wdAlignParagraphCenter
            ActiveDocument.Tables(1).Rows.Add
            ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore «Total words in Document»
            ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 2).Range.InsertBefore Totalwords
            ActiveDocument.Tables(1).Rows.Add
            ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 1).Range.InsertBefore «Number of different words in Document»
            ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count, 2).Range.InsertBefore Trim(Str(WordNum))
            System.Cursor = wdCursorNormal
            j = MsgBox(«There were » & Trim(Str(WordNum)) & » different words «, vbOKOnly, «Finished»)
        Selection.HomeKey wdStory

    End Sub
     — Hope this helps.

    Doug Robbins — Word MVP,
    dkr[atsymbol]mvps[dot]org
    Posted via the Community Bridge

    «Adnan Ebrahimi» wrote in message news:8ddb02da-d954-40f4-b4a5-403fa8fa5a69@communitybridge.codeplex.com…

    Hi everyone.

    i want to get all words in a Word Document and List all by ascending order then check the words finally replace wrong word with correct word…

    can everyone help me… solutions…suggestions…?


    Doug Robbins — Word MVP dkr[atsymbol]mvps[dot]org

    • Marked as answer by
      Bessie Zhao
      Monday, February 7, 2011 10:02 AM

  • Hi Abdan

    If you’re working with the Word object model, going through the .NET/COM, Word OLE interface, and given the speed of processing in that interface, I don’t think you can get anything faster than what you’ve got. It’s the nature of the thing you’re doing.

    Possibly, if the file is in the Office 2007/2010 Open XML file format (docx, for example) you can work with that file as you would any XML file. But we can’t help you here with that. You’ll find more information on Open XML file format at openXMLDeveloper.org


    Cindy Meister, VSTO/Word MVP

    • Marked as answer by
      Bessie Zhao
      Monday, February 7, 2011 10:02 AM

Понравилась статья? Поделить с друзьями:
  • Insert files in word document
  • Ingredient word for egg
  • Insert files in excel
  • Index col pandas read excel
  • Ing word for summer