Technology

A Different Approach

Innovation

Overview

Related Events

Related Case Studies

Related Resources

A Different Approach

More than 90% of all data in an enterprise is unstructured information. This encompasses telephone conversations, voicemails, emails, electronic documents, paper documents, images, web pages, video and hundreds of other formats. Unfortunately, attempts to leverage this immense and strategic resource often fail because many businesses lack the requisite technology to understand and effectively utilize content that resides outside the scope of structured databases.

Similarly, unstructured processes are equally unwieldy yet comprise the bulk of business operations. Current trends anticipate the rapid proliferation of rich media, widespread adoption of VOIP, growing use of IPTV and increased scrutiny of white-collar crimes. This overwhelming growth demands an automated solution that can effectively manage an unstructured digital morass.

These concerns necessitate an information infrastructure platform that addresses all classes of information in a manner analogous to well established methods for structured databases. Akin to the Relational Database Management System (RDBMS) that revolutionized the computing industry in the 1960s, this innovative platform enabled computers to process not only structured data, but also vast amounts of semi-structured and unstructured information using a global relational index.

Autonomy's ability to process all forms of digital information on a single platform offers a unique solution to a growing number of applications and devices that are increasingly dependent on utilizing unstructured information. Autonomy employs a unique combination of technologies to enable computers to form a contextual understanding of all digital content, as well as understand people's interaction with the data. Autonomy's technology eliminates the traditionally manual and costly operation of processing and analyzing information by performing these functions automatically and in real-time. This represents substantial savings for every type of organization and industry and is driving the accelerated adoption of Autonomy's technology across a diverse range of vertical markets.

"Autonomy does an exceptional job at analyzing unstructured data"

Technology Review, 2011

SEARCH : CLUSTER : CATEGORIZE : LINK : ALERT : PROCESS

Further Reference:

Autonomy Technology White Paper

top of page

A Unique Combination of Technologies

Open Philosophy

Autonomy maintains an open philosophy with regards to the techniques it uses and is dedicated to selecting methods which optimize its technology, whether they are old or new. Autonomy embraces traditional or legacy methods such as keyword, Boolean, parametric and others. However, Autonomy is best known for its pioneering work in conceptual search based on computational pattern recognition (non-linear adaptive digital signal processing) and contextual linguistic analysis.

Built upon the seminal mathematical works of Thomas Bayes and Claude Shannon, and on a range of innovations that are covered by 170 patents, Autonomy technology identifies the patterns that naturally occur in text, voice and video files based on the usage and frequency of terms that correspond to specific concepts. By studying the preponderance of one pattern over another, Autonomy's technology understands that there is X% probability that the content in question deals with a specific subject. In this way, Autonomy extracts the content's digital essence, encodes the unique "signature" of the concepts, and enables a host of operations to be automatically performed on emails, phone conversations, video, documents and even people's interests.

top of page

Bayesian Inference

Thomas Bayes was an 18th century English cleric whose work has become a central tenet of modern statistical probability modeling. Bayes' efforts centered on calculating the probabilistic relationships between multiple variables and determining the extent to which these relationships are affected when new information is obtained.

A traditional statistical argument posits that if a coin is tossed 100 times and comes up heads every time, it still has an even chance of coming up tails on the next throw. An alternative, Bayesian approach, is to say that 100 consecutive heads are evidence that the coin is biased. What Bayes' theorem clearly demonstrated is that: a) the more information given, the more accurate the view of the world will be, and b) prior experience should be used to inform new data.

In a typical problem such as judging the relevance of content to a given query, Bayesian theory dictates that this calculation be related to details that are already known.

A good example of this theory at work is Autonomy's agent profile technology. Users can create agents to automatically track the latest information related to their interests, and IDOL determines the relevance of a document based on the model of the agent.

Adaptive Probabilistic Concept Modeling (APCM) algorithms are also used to analyze, sort and cross-reference unstructured information. In a similar manner, knowledge about the documents deemed relevant by a user to an agent's profile can be used in judging the relevance of future documents.

While most other models start with a prior knowledge of the state of the system and apply training to it, Autonomy begins with a blank slate and allows incoming data to dictate the model. In true Bayesian fashion, the model mixes new information with a growing body of older content to refine and retrain the engine.

Bayesian Inference

Shannon's Information Theory

top of page

Shannon's Information Theory

Shannon's Information Theory forms the mathematical foundation for all digital communications systems. Claude Shannon stated that information could be treated as a quantifiable value in communications. Natural languages contain a high degree of redundancy or nonessential content. For example, a conversation in a noisy room can be understood even when some of the words cannot be heard, and the essence of a news article can be grasped simply by skimming over the text. Information Theory provides a framework for extracting the concepts from this redundancy.

Autonomy's approach to concept modeling relies on Shannon's theory that the less frequently a unit of communication occurs, the more information it conveys. Therefore, ideas, which are rarer within the context of a communication, tend to be more indicative of its meaning. It is this theory that enables Autonomy's software to determine the most important, or informative, concepts within a document.

Performance of IDOL's Conceptual Retrieval

Built on a unique pattern-matching technology, IDOL's conceptual query mechanism allows a seemingly simple query expression to be evaluated in complex ways; as well as the matching of the basic terms within documents using patented weighting algorithms, it is able to develop the terms to 'read between the lines' and determine conceptual matches that legacy search engines would be unable to locate.

However, IDOL is able to perform these evaluations with surprisingly little overhead above the equivalent keyword query. The reasons for this are two-fold. Firstly, the majority of the work in the calculation and initialization of the conceptual matching is done at index time, as opposed to query time; the documents are analyzed while the data is being processed to form a statistical 'pool' from which queries can draw key conceptual information, as well as an overlying Bayesian network in which apparently unrelated pieces of information are automatically linked via dynamic probabilities. The second reason is that the documentmatching algorithm itself within IDOL uses widespread "short-circuiting" and iterative calculation to ensure that it only performs exactly as much calculation as is required. In essence, the key conceptual information is already available before the query has even started, and once it does begin, it feeds directly from the statistical core to load the information. The uniqueness of the query then forces the only truly complex step, a one-off calculation in which combination algorithms arrive at the most relevant set of documents to the query. These can then be returned without the need for looping through every potential match.

top of page

Manual or Automatic - It's Not an Either/Or Choice

top of page

Avoiding Black Box Solution Pitfalls

Some vendors only offer "black box" solutions, mistakenly believing that their technology can always provide the best answers with no tuning required. However, this idea demonstrates a naïve understanding of enterprise demands, for not even the best of automated systems can anticipate the special needs of each enterprise. These "black boxes" offer only a few, if any, tuning options for relevancy and do not reveal how the results were generated. In stark contrast, Autonomy's technology provides the best of both worlds, automatically retrieving the most accurate results using its conceptual understanding of content and also offering the flexibility to modify the relevancy algorithm if needed. The computational process is fully transparent to the administrator and Autonomy reveals the basis for its determinations through easily understood representations such as dominant terms and idea distances.

Both system administrators and business users are provided with a full workbench to control and tune the relevancy of search results. Some unique advantages offered by Autonomy include:

WYSIWYG (What You See Is What You Get) user interface

The weight of virtually every field (e.g. title, author) can be manipulated and many operators are available to alter relevancy

Full support for voting and document rating. End users can rate the usefulness of specific documents, and this information is subsequently used to calculate relevancy or primacy. The voting can be limited to those people whose profiles show a strong match to the subject of the document

Access to an extensive range of pertinent information, including common queries, misspellings, query types – all presented in friendly visuals

Full support for business modulated result "sponsoring" or "placement." For example, a business user can elect to promote a certain result, set of results or object (such as an advertisement) to a defined position within a result set in response to a given query or input. If a user queries for "yellow Prius," the administrator can define a rule to return the same set of results as a query for "gold Prius," with a link to the advantages of hybrid cars being the first on the returned list

Autonomy Collaborative Classifier module, which creates a workflow in which the subject matter experts and knowledge engineers, as identified by the organization, collaborate in real-time to create, modify, distribute and manage taxonomies. As the classifications are created and managed by the people who actually use them, information is organized in ways that are specific and germane to the organization

Protects user privacy by respecting entitlement rights and separating the administrator role from the "super-user", thereby ensuring the administrator will be restricted from information they are not privileged to view

In addition to providing administrators with comprehensive tools to alter the relevancy modeling, Autonomy is transparent in the methods it uses to arrive at such results. Autonomy uses the full text of the document to determine relevancy, and even with no manual configuration, administrators and users can easily understand how the results were selected. Autonomy uses many ways to justify relevancy, these include:

% relevancy: Provides percentage similarity of the document to the query

Automatic highlighting: Highlights key terms/concepts within the document

Ideas cloud: Presents list of concepts present in the results list, with variable font size and boldness to represent the number of results that include that concept

Cluster tree: Displays hierarchy of relevant entities (concepts or metadata) extracted from the results list; it displays the count of documents that contain that entity

Automatic summarization of content: Demonstrates the key concepts extracted, which may come from different parts of the document

Query journey: Delineates the logical path that IDOL took to arrive at a given set of results by revealing the key concept terms that were found, along with pertinent metadata, repositories searched and the relevancy threshold reached

Query bread-crumbing: Traces all aspects of a user’s query interaction

Autonomy enables an entire range of information processing options, both manual and automatic. The system can be configured to support as much or as little manual involvement as necessary, ensuring that Autonomy is not a "black box" where the running of the technology cannot be seen or adapted by administrators.

This is a selection of our forthcoming events, please visit our seminars page for more information.

Automatic Hyperlinks provided by IDOL Server

This is a small selection of the Autonomy case studies available, please visit our publications site at http://publications.autonomy.com/ for more information.

Automatic Hyperlinks provided by IDOL Server

This is a small selection of the Autonomy Product Briefs available, please visit our publications site at http://publications.autonomy.com/ for more information.

Automatic Hyperlinks provided by IDOL Server

This is a small selection of the Autonomy White Papers available, please visit our publications site at http://publications.autonomy.com/ for more information.

Automatic Hyperlinks provided by IDOL Server