« Public Service Alert – Dangers of DHMO April 1, 2004 | Main | Content and XML »

Content and Automated Analysis

Its time to wrap up this series about AIIM, the Content and Document Management show that I attended last week. In my last entry I talked about XML, and its value in describing or Metatagging content. Having your documents converted to XML, or having a parallel XML summary of each document would be wonderful. But if you have lots of documents, or complex documents, converting them is a significant investment.

Can the available technology help us avoid this conversion investment?

I saw a class of software tools at the show which are called Semantic Analysis tools. These are sophisticated software applications that analyze the content of your document set, gather together the shared concepts, nouns and phrases, and automatically create lists of concepts that can be used by a User to sort or choose documents from the set. An example is a search for “saucer” on a home furnishings site that brings back 31 hits, but also shows categorizations such as: price range; made of; theme; and occasion.

Examples of these tools are offered by Convera and Inxight, to name two. The demos I saw were very impressive. Lists of concepts (phrases) and objects (nouns) appear as if by magic from a set of documents – without human intervention. This is important for two reasons: One, the tedious, expensive and error-prone process of creating metatags or summaries of each document is reduced or eliminated. Two, the process can be dynamic, at query time, and can be done after the document set is selected or filtered. For instance, by knowing something about the user or his task, you can pre-filter the document set, then run the semantic analysis on it, creating (it is hoped) a more targeted set of categories, concepts, and summaries.

This is heady stuff. Code that helps the user by extracting concepts from the document set, and using the concept lists to organize the presentation of the documents to the user.

Great, if it works on our content set, and we can afford it – stay tuned!

Posted on Thursday, December 29, 2005 at 03:17PM by Registered CommenterLarry Cone in , | CommentsPost a Comment

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
All HTML will be escaped. Hyperlinks will be created for URLs automatically.