Technology |
![]() ![]() |
Understanding Meaning: An Evolution | | | A Different Approach | | | Innovation |
![]() ![]() |


![]() |
![]() |
![]() |
A Different Approach

More than 90% of all data in an enterprise is unstructured information. This encompasses telephone conversations, voicemails, emails, electronic documents, paper documents, images, web pages, video and hundreds of other formats. Unfortunately, attempts to leverage this immense and strategic resource often fail because many businesses lack the requisite technology to understand and effectively utilize content that resides outside the scope of structured databases.
Similarly, unstructured processes are equally unwieldy yet comprise the bulk of business operations. Current trends anticipate the rapid proliferation of rich media, widespread adoption of VOIP, growing use of IPTV and increased scrutiny of white-collar crimes. This overwhelming growth demands an automated solution that can effectively manage an unstructured digital morass.
These concerns necessitate an information infrastructure platform that addresses all classes of information in a manner analogous to well established methods for structured databases. Akin to the Relational Database Management System (RDBMS) that revolutionized the computing industry in the 1960s, this innovative platform enabled computers to process not only structured data, but also vast amounts of semi-structured and unstructured information using a global relational index.
Autonomy's ability to process all forms of digital information on a single platform offers a unique solution to a growing number of applications and devices that are increasingly dependent on utilizing unstructured information. Autonomy employs a unique combination of technologies to enable computers to form a contextual understanding of all digital content, as well as understand people's interaction with the data. Autonomy's technology eliminates the traditionally manual and costly operation of processing and analyzing information by performing these functions automatically and in real-time. This represents substantial savings for every type of organization and industry and is driving the accelerated adoption of Autonomy's technology across a diverse range of vertical markets.
Further Reference: |
![]() |
Autonomy Technology White Paper |
A Unique Combination of Technologies
Open Philosophy
Autonomy maintains an open philosophy with regards to the techniques it uses and is dedicated to selecting methods which optimize its technology, whether they are old or new. Autonomy embraces traditional or legacy methods such as keyword, Boolean, parametric and others. However, Autonomy is best known for its pioneering work in conceptual search based on computational pattern recognition (non-linear adaptive digital signal processing) and contextual linguistic analysis.
Built upon the seminal mathematical works of Thomas Bayes and Claude Shannon, and on a range of innovations that are covered by 170 patents, Autonomy technology identifies the patterns that naturally occur in text, voice and video files based on the usage and frequency of terms that correspond to specific concepts. By studying the preponderance of one pattern over another, Autonomy's technology understands that there is X% probability that the content in question deals with a specific subject. In this way, Autonomy extracts the content's digital essence, encodes the unique "signature" of the concepts, and enables a host of operations to be automatically performed on emails, phone conversations, video, documents and even people's interests.
Bayesian Inference
Thomas Bayes was an 18th century English cleric whose work has become a central tenet of modern statistical probability modeling. Bayes' efforts centered on calculating the probabilistic relationships between multiple variables and determining the extent to which these relationships are affected when new information is obtained.
A traditional statistical argument posits that if a coin is tossed 100 times and comes up heads every time, it still has an even chance of coming up tails on the next throw. An alternative, Bayesian approach, is to say that 100 consecutive heads are evidence that the coin is biased. What Bayes' theorem clearly demonstrated is that: a) the more information given, the more accurate the view of the world will be, and b) prior experience should be used to inform new data.
In a typical problem such as judging the relevance of content to a given query, Bayesian theory dictates that this calculation be related to details that are already known.
A good example of this theory at work is Autonomy's agent profile technology. Users can create agents to automatically track the latest information related to their interests, and IDOL determines the relevance of a document based on the model of the agent.
Adaptive Probabilistic Concept Modeling (APCM) algorithms are also used to analyze, sort and cross-reference unstructured information. In a similar manner, knowledge about the documents deemed relevant by a user to an agent's profile can be used in judging the relevance of future documents.
While most other models start with a prior knowledge of the state of the system and apply training to it, Autonomy begins with a blank slate and allows incoming data to dictate the model. In true Bayesian fashion, the model mixes new information with a growing body of older content to refine and retrain the engine.
![]() Bayesian Inference
|
![]() Shannon's Information Theory
|
Shannon's Information Theory
Shannon's Information Theory forms the mathematical foundation for all digital communications systems. Claude Shannon stated that information could be treated as a quantifiable value in communications. Natural languages contain a high degree of redundancy or nonessential content. For example, a conversation in a noisy room can be understood even when some of the words cannot be heard, and the essence of a news article can be grasped simply by skimming over the text. Information Theory provides a framework for extracting the concepts from this redundancy.
Autonomy's approach to concept modeling relies on Shannon's theory that the less frequently a unit of communication occurs, the more information it conveys. Therefore, ideas, which are rarer within the context of a communication, tend to be more indicative of its meaning. It is this theory that enables Autonomy's software to determine the most important, or informative, concepts within a document.
Performance of IDOL's Conceptual Retrieval
Built on a unique pattern-matching technology, IDOL's conceptual query mechanism allows a seemingly simple query expression to be evaluated in complex ways; as well as the matching of the basic terms within documents using patented weighting algorithms, it is able to develop the terms to 'read between the lines' and determine conceptual matches that legacy search engines would be unable to locate.
However, IDOL is able to perform these evaluations with surprisingly little overhead above the equivalent keyword query. The reasons for this are two-fold. Firstly, the majority of the work in the calculation and initialization of the conceptual matching is done at index time, as opposed to query time; the documents are analyzed while the data is being processed to form a statistical 'pool' from which queries can draw key conceptual information, as well as an overlying Bayesian network in which apparently unrelated pieces of information are automatically linked via dynamic probabilities. The second reason is that the documentmatching algorithm itself within IDOL uses widespread "short-circuiting" and iterative calculation to ensure that it only performs exactly as much calculation as is required. In essence, the key conceptual information is already available before the query has even started, and once it does begin, it feeds directly from the statistical core to load the information. The uniqueness of the query then forces the only truly complex step, a one-off calculation in which combination algorithms arrive at the most relevant set of documents to the query. These can then be returned without the need for looping through every potential match.
Manual or Automatic - It's Not an Either/Or Choice
Avoiding Black Box Solution Pitfalls
Some vendors only offer "black box" solutions, mistakenly believing that their technology can always provide the best answers with no tuning required. However, this idea demonstrates a naïve understanding of enterprise demands, for not even the best of automated systems can anticipate the special needs of each enterprise. These "black boxes" offer only a few, if any, tuning options for relevancy and do not reveal how the results were generated. In stark contrast, Autonomy's technology provides the best of both worlds, automatically retrieving the most accurate results using its conceptual understanding of content and also offering the flexibility to modify the relevancy algorithm if needed. The computational process is fully transparent to the administrator and Autonomy reveals the basis for its determinations through easily understood representations such as dominant terms and idea distances.
Both system administrators and business users are provided with a full workbench to control and tune the relevancy of search results. Some unique advantages offered by Autonomy include:
In addition to providing administrators with comprehensive tools to alter the relevancy modeling, Autonomy is transparent in the methods it uses to arrive at such results. Autonomy uses the full text of the document to determine relevancy, and even with no manual configuration, administrators and users can easily understand how the results were selected. Autonomy uses many ways to justify relevancy, these include:
Autonomy enables an entire range of information processing options, both manual and automatic. The system can be configured to support as much or as little manual involvement as necessary, ensuring that Autonomy is not a "black box" where the running of the technology cannot be seen or adapted by administrators.
Forthcoming Events for A Different Approach
Archived Events for A Different Approach
Protect Webinar |
Power Webinar |
Power Webinar |
This is a selection of our forthcoming events, please visit our seminars page for more information.
Automatic Hyperlinks provided by IDOL Server
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
This is a small selection of the Autonomy case studies available, please visit our publications site at http://publications.autonomy.com/ for more information.
Automatic Hyperlinks provided by IDOL Server
Technology |
![]() ![]() |
Understanding Meaning: An Evolution | | | A Different Approach | | | Innovation |
![]() ![]() |