Search indexing of documents

Documents can be included in the search index. Thus, content in external files can be found in the search. Indexing includes data and metadata of the documents.

The following document types are supported.

Type

XML-based

Supported

Not supported

File extension

Excel

Yes

from Excel '97-2003 file format,

Excel 2007+ .xlsx OOXML

No limitation known

xls, xla, xlw, xlt

Word

Yes

from Word '97(-2007) file format,

Word 2007+ .docx OOXML

No limitation known

doc, dot

PowerPoint

Yes

from Powerpoint 2007+ .pptx OOXML

No limitation known

ppt, pps, ppa, pot

Adobe PDF

No

No limitation known

No limitation known

pdf

Open Document Format or Open Office 2.0

Yes

No limitation known (but possibly amplified 'noise')

No limitation known (but possibly amplified 'noise')

odg, dtg, odp, otp, odt, ott, odf, ods, ots

Open Office 1.0

Yes

No limitation known (but possibly amplified 'noise')

No limitation known (but possibly amplified 'noise')

sxd, std, sxi, sti, sxw, stw, sxc, stc, sxm

Star Office

StarDraw 3.0, StarImpress 5.0, 4.0

StarDraw 5.0

StarWriter 5.0 / 4.0 / 3.0

StarMath 5.0

StarCalc 5.0 / 4.0 / 3.0

Yes

Xml-based (very likely amplified 'noise')

No limitation known

vor, sdd, sda, sdw, smf, sdc

Xml

Yes

Amplified noise

No limitation known

xml (consider configuration settings, so that also mindmaps etc. are searchable).

Requirements:

  • In the schema, the Include objects in index option is enabled for the Document category.

  • The old index (folder with the same name in the database directory) must be deleted before Aeneis is started.

    See also: Delete index

  • In the Portal report, the Document category must be referenced in the Searched Categories entry.

Limitations:

  1. There is no guarantee of the completeness of the indexing of the contents of documents. (This also depends on the functionality of underlying libraries.) This applies especially to unsupported document types, but also to the supported ones.

  2. The memory requirements for the index can increase drastically.

  3. The duration of the queries can possibly be slowed down considerably by the contents of the documents. This depends on the performance of the search engine "Lucene".

  4. Especially with files in xml-based format, unwanted indexing of file format information may occur, which is actually not useful in the search (values like 'true', 'false', coordinates etc.). → Called 'noise' above.

  5. Any issues arising in these contexts that do not result in exceptions can generally be handled in support cases as an enhancement but not as a bug.