Technology

Security, Scalability and Performance

Global Language Support

Overview

Related Events

Related Case Studies

Related Resources

Security, Scalability and Performance

The world's largest and most secure intelligence organizations have deployed Autonomy's Intellectual Asset Protection System (IAS) to safeguard their most sensitive information assets. Autonomy provides all aspects of security management, including front-end user authentication, back-end entitlement checking and secure encrypted communication between the IDOL Server and its client applications with 128-bit Block Tiny Encryption Algorithm (BTEA). IDOL's mapped security model is the only empirically proven index security model that scales in the enterprise.

There are three general security models currently available:

1. Unmapped Security

Unmapped security is the traditional method used by source repositories and search engines. For every potential match to a given query, a call is made via the native repository's API (e.g. Documentum) to ascertain the access privileges for that particular document. A single query consequently bombards the native repository with document privilege requests as the retrieval system attempts to assemble a relevant results list from thousands of candidate hits. This method presents significant performance and scalability problems.

Unmapped Security

Mapped Security

"Security is a key differentiator for IDOL. IDOL offers "mapped security" and near real-time synchronization of security entitlements with source content repositories - making it a great fit for highly secure search scenarios"

The Forrester WaveTM: Enterprise Search Platforms, Matthew Brown

Autonomy recommends mapped security but also offers the choice between mapped, unmapped and a hybrid of both. Autonomy also supplies plug-in sample code, so that customers, OEMs and partners are able to develop and implement their own form of security plug-in.

2. Cached Security

Cached security is the method of choice for legacy systems. Cached security marginally relieves the scalability problem of unmapped security by storing results for queries it has already seen. Consequently, when a user repeats a query, the result set can be retrieved from the cache rather than triggering a network-mediated request. However, this approach still relies on calling out across the network directly to the repository for each new query. In addition, it also misses potential results, as the result sets stored within its memory do not dynamically update new information.

3. Autonomy's Unique IAS Mapped Security

Only Autonomy offers mapped security - a highly configurable, secure, accurate, and fast method for respecting third party security entitlements. IDOL maps the underlying security model in the form of ACL, group, role, protective markings, etc. from all of the underlying repositories directly into the kernel of the IDOL engine itself, and stores the information in an encrypted field. As a result, IDOL does not need to send any requests across the network to the data stores when building up a results list. What the user is allowed to see is assessed "inline" within the IDOL kernel at speeds that exceed the response times of the native repository. Unlike other techniques, the security model is never out of date as the transitional signaling mechanism within the connector layer informs IDOL in real-time of any updates or changes to permissions within the underlying content.

Since IDOL's architecture is inherently modular by design, it requires multiple subsystems to communicate with each other, often across insecure networks. All communication between these processes may be encrypted (Secure Sockets Layer), so that packet sniffers who are able to break past a firewall are unable to read the content of traffic between IDOL modules. All of the system's modules are capable of operating in a secure communications mode providing, at minimal processing overhead, the protection of 128-bit encryption. Additionally, IDOL can leverage SSL for both aggregation and querying of content, including access to SSL encrypted sites.

"Autonomy fully meets the stringent requirements of managing records in a classified environment where security and access are critical."

IT Business Net, 2011

Scalability and Performance

The management of structured and unstructured content requires a platform that can meet the most rigorous performance requirements and be easily resized commensurate to business needs. IDOL scales to support the largest enterprise-wide and portal deployments in the world, with presence in virtually every vertical market. Since IDOL's scalability is based on its modular, distributed architecture, it can handle massive amounts of data on commodity dual-CPU servers. For instance, only a few hundred entry-level enterprise machines are required to support ChoicePoint's 10 billion record footprint. By comparison, a competitor uses 150,000 machines to handle the same amount of data.

A single IDOL engine can:

Support an estimated 30 million documents on 32-bit architectures and over 250 million on 64-bit platforms

Accurately index in excess of 60 GB/hour with guaranteed index commit times (i.e. how fast an asset can be queried after it is indexed) of sub 5ms

Execute over 2,000 queries per second, while querying the entire index for relevant information, with subsecond response times on a single machine with two CPUs when used against 30 million pieces of content

Support hundreds of thousands of enterprise users, or millions of web users, accessing hundreds of terabytes of data

Save storage space with an overall footprint of less than 30% of the original file size

This enhanced scalability results in hardware cost-savings as well as the ability to address larger volumes of content. Though IDOL scales extremely well on commodity servers, its flexible architecture can take full advantage of massive parallelism, SMP processing capabilities, 64-bit environments (such as Intel Itanium 64-bit architecture), software platforms (such as Solaris 10, Linux 64, Win64, etc), distributed server farms, and all common forms of external disk arrays (i.e. NAS, SAN etc) to further improve performance. This flexibility extends to being able to leverage one or a combination of these different environments.

How It Works

Content from various repositories is aggregated by connectors and then indexed into the IDOL Server or for dissemination across multiple IDOL Servers, through the Distributed Index Handler (DIH). The DIH can efficiently split and index copious quantities of data into multiple IDOL Server instances, optimizing performance by batching data, replicating all index commands and invoking dynamic load distribution. The DIH can perform data-dependent operations, such as distributing the content by date, which allows for more efficient querying. Performance is augmented by the Distributed Action Handler (DAH), a distribution server that allows the user to distribute action commands, such as querying, to IDOL Servers. Multiple copies of IDOL Servers, to which the DAH propagates actions, further ensure uninterrupted service in the event of server failure. For flexibility, both the DAH and the DIH can be configured to run in mirroring mode (IDOL Servers are exact copies of each other) and non-mirroring mode (each IDOL Server is configured differently and contains different data). In addition, the Distributed Service Handler (DiSH) component allows effective auditing, monitoring and alerting of all other Autonomy components.

Linear Scalability

Performance and capacity can be doubled by simply replicating the existing machine. This allows scaling predictions to be made without worry about bottlenecks.

Load Balancing

Data is automatically replicated across multiple servers and user requests are load-balanced across these replicas, guaranteeing performance, reducing latency and improving user-experience.

Mirroring / Failover

Automatically generated replicas are used to provide a pool of servers, the primary resource is automatically selected and the system switches to secondary systems if it fails so that service continues uninterrupted.

Distribution

For organizations that are geographically distributed, local replicas are automatically created and utilized where possible. Remote copies are only used when a local system fails, thereby building fault tolerance whilst maintaining the benefits of local performance and a reduction of resource overhead into a single, seamless service.

Adaptive Probabilistic Concept Caching

Frequently used concepts are maintained in memory and query results are returned as quickly and efficiently as possible.

Multi-dimensional Index & Query Throttling

By using a multi-dimensional index to provide valuable information to the distribution components, IDOL precludes bottlenecks and unbalanced peak loads during the indexing and query process.

Autonomy provides prioritized throttling based on:

Time: maximize index/query performance based on the time of day (i.e. work hours)

Location: prioritize activity based on the server landscape

Status: arbitrarily assign prioritized status for processing

"We have worked with Autonomy for a number of years due to their ability to offer a next-generation enterprise search platform that doesn't necessitate a trade-off between performance, security and scalability."

Mr. K. Sriram, Senior Vice President, Satyam Consulting and Enterprise Solutions Practice

Instruction-Level Parallelism

IDOL programmatically expresses itself as an expanding collection of operations. These operations can and are executed in serial pipeline form yet the inherent logic of simultaneously processing disparate forms of unstructured, semi-structured and structured data requires a high degree of parallelism. Not only does IDOL need to ingest multiple streams and types of data, it must also provide a real-time answer or decision against that data as it is indexed rather than force the user to wait an arbitrary period until serially accessed resources becomes available.

As a consequence IDOL has been designed with instruction-level parallelism (ILP) as the core of its process and operation model. ILP by definition is limited by the serial instruction model of scalar processors and thus Autonomy has been an extremely conscious early adopter of all forms of parallel architecture from multi-CPU, hyper-threading and now single die multi-core processing.

The engine's default process model is multi-threaded (using a configurable number of threads). IDOL operations can either be grouped by class, with indexing and querying performed by separate threads or for n-core models a single operation can be "atomized" into multiple threads. Concurrent querying and indexing is the default with no requirement whatsoever for "locking" any part of the indexes while querying takes place. All major multi-core manufacturers are supported, including Intel, AMD and the latest Niagara offerings from Sun Microsystems.

Classic scalar models that rely on Moore's predicted doubling of transistor density over 18 month intervals have already demonstrated wire and memory access latencies in addition to heat sealings. As a result, hardware manufacturers such as Intel have declared multi-core strategies as key to crossing the consumer "teraflop" threshold and aim to produce n-core 32 billion transistor die within the next 10 years. Autonomy is actively pursuing a Tera computing R&D simulation program in anticipation of increasing transistor and core density and the declared aim of such manufacturers. Autonomy is currently performing "coalition" simulations of split thread IDOL operations against n-core "battalion" processor units that blend general-purpose cores with more specialist cores such as those dedicated to signal processing. These blended core units are predicted to be the first consumer teraflop chips. Autonomy is developing process thread models that dynamically co-opt different core types to act in "coalition" to perform the simultaneous deconstruction and analysis of unstructured sources such as video that combine visual and auditory attributes.

This is a selection of our forthcoming events, please visit our seminars page for more information.

Automatic Hyperlinks provided by IDOL Server

This is a small selection of the Autonomy case studies available, please visit our publications site at http://publications.autonomy.com/ for more information.

Automatic Hyperlinks provided by IDOL Server

This is a small selection of the Autonomy Product Briefs available, please visit our publications site at http://publications.autonomy.com/ for more information.

Automatic Hyperlinks provided by IDOL Server

This is a small selection of the Autonomy White Papers available, please visit our publications site at http://publications.autonomy.com/ for more information.

Automatic Hyperlinks provided by IDOL Server