Content and Automated Analysis
Its time to wrap up this series about AIIM, the Content and Document Management show that I attended last week. In my last entry I talked about XML, and its value in describing or Metatagging content. Having your documents converted to XML, or having a parallel XML summary of each document would be wonderful. But if you have lots of documents, or complex documents, converting them is a significant investment.
Can the available technology help us avoid this conversion investment?
I saw a class of software tools at the show which are called Semantic Analysis tools. These are sophisticated software applications that analyze the content of your document set, gather together the shared concepts, nouns and phrases, and automatically create lists of concepts that can be used by a User to sort or choose documents from the set. An example is a search for “saucer” on a home furnishings site that brings back 31 hits, but also shows categorizations such as: price range; made of; theme; and occasion.
Examples of these tools are offered by Convera and Inxight, to name two. The demos I saw were very impressive. Lists of concepts (phrases) and objects (nouns) appear as if by magic from a set of documents – without human intervention. This is important for two reasons: One, the tedious, expensive and error-prone process of creating metatags or summaries of each document is reduced or eliminated. Two, the process can be dynamic, at query time, and can be done after the document set is selected or filtered. For instance, by knowing something about the user or his task, you can pre-filter the document set, then run the semantic analysis on it, creating (it is hoped) a more targeted set of categories, concepts, and summaries.
This is heady stuff. Code that helps the user by extracting concepts from the document set, and using the concept lists to organize the presentation of the documents to the user.
Great, if it works on our content set, and we can afford it – stay tuned!
Reader Comments