| | FRAMINGHAM (09/06/2007) - As a researcher at the University of California, Berkeley, in the early 1970s, Michael Stonebraker co-created the Ingres and Postgres technology that underlies many leading relational databases today: Microsoft Corp.'s SQL Server, Sybase Inc.'s Adaptive Server Enterprise, Ingres Corp.'s eponymous product, IBM's Informix, and others. But Stonebraker now argues that relational databases, also known as RDBMSes, are "long in the tooth" and "should be considered legacy technology." In an entry Tuesday at a new blog, The Database Column, Stonebraker also argued that today's relational databases lag badly in performance behind a new wave of databases that flip database tables 90 degrees. Column-oriented databases -- such as the one built by Stonebraker's latest start-up, Andover, Mass.-based Vertica Systems Inc. -- store data vertically in table columns rather than in successive rows. By putting similar data together, column-oriented databases minimize the time to read the disk, which can add up when executing large-scale calculations such as those typically done in a data warehouse. Column databases "will take over the warehouse market over time, completely displacing row stores," Stonebraker wrote. "Since many warehouse users are in considerable pain (can't load in the available load window, can't support ad-hoc queries, can't get better performance without a "fork-lift" upgrade), I expect this transition to column stores will occur fairly quickly." Column-oriented database systems are not new. Sybase has successfully sold its column-based IQ database for years as a high-performance business intelligence solution. More recently BigTable, the database that Google Inc. built to handle a number of its applications, stores data in columns. But they remain a niche offering. In comparison, the leading players in the mainstream database market, which is estimated at US$15 billion annually worldwide, all rely on systems using row-based tables. Organizing data by rows does have its advantages. Writing data to disk in row format is faster than doing so by columns. That is key for high-transaction database applications where data is constantly being read and written to the database, though markedly less important for data warehouses, where data is typically written just once and accessed many times after that. Stonebraker, who is a co-founder and chief technology officer of Vertica, claims that his latest start-up has other performance-boosting features, such as very aggressive data compression and a query executor that "runs against compressed data." As a result, "Vertica beats all row stores on the planet -- typically by a factor of 50," he wrote. "The only engines that come closer are other column stores, which Vertica typically beats by around a factor of 10." Stonebraker says other firms similar to Vertica can do just as well. "In every major application area I can think of, it is possible to build a SQL DBMS engine with vertical market-specific internals that outperforms the 'one size fits all' engines by a factor of 50 or so," he wrote. Other contributors to the Database Column blog include Don Haderle, a retired IBM fellow who is considered the "father" of its DB2 database, as well as Jerry Held, who helped create Tandem Computer's NonStop database. | The Database Schema Browser is not complex and uses Java APIs for processing. You can easily extend it, for example, and make Java classes for each table and write simple getters and setters. Instead of printing the output on the response writer, you can redirect the output to Java I/O stream classes. Then by writing simple code to parse data you can embed the table and column information into a Java class template. | As designed, SQL works with regular two-dimensional tables with data arranged neatly into rows and columns. Web-based information -- in the form of HTML -- just isn't like that, as you can see from the hierarchy of elements comprising the JavaWorld search page (Figure 4). | This approach is simple but not intelligent. A lot of coding and runtime overhead results from the caller producing the correct SQL text string for each Java object. The solution is also slow. Every time the access method sends a raw SQL statement, it runs the overhead of parsing, compiling, and optimizing the statement. An even bigger problem: the raw SQL-based method cannot be cross-server portable. Different database servers have slightly different SQL syntax. For example, some databases expect YYYY-MM-DD type syntax for the SQL Date field while others might expect DD,MM,YYYY syntax. Databases can also have different SQL text escape requirements. For example, MySQL server uses backslash (\) to escape illegal characters while Microsoft SQL server uses single quote ('). That means any raw SQL-based implementation must target a specific database server. | In many cases persistence could be achieved more simply using an embedded object database engine. This is a good time to look at db4o. Created by Carl Rosenberg, db4o was at one time only available commercially, but now it is open source and has recently been licensed under the GPL. | One typical software design goal is responsiveness, understood as how easy and quick it is for the user to interrupt the current operation. Certain operations -- such as complex database queries; network I/O handling; extensive calculations; sorting of, or searching in, large data sets -- can take seconds or even minutes before they complete. Well-designed software allows the user to cancel such a long operation in progress. In this article I will demonstrate how to cancel a time-consuming database query by simply interrupting the thread in which the query runs. Such an interruptible database query will enable you to develop truly interactive programs that respond promptly even to the most impatient users. | With the DBTags tag library, a JDBC connection is obtained with the sql:connection tag. In the example JSP, SQLDBTags.jsp, a connection with a Oracle database is established with the Oracle OCI Type 2 driver as follows: | | The new db4o Replication System (dRS) now allows you to join together the different worlds of object and relational databases. Based on Hibernate, it provides the capability to replicate data between db4o and relational databases such as Oracle and MySQL. In practice, this can mean you can synchronize data between the local db4o database on a partially connected device and an enterprise RDBMS. It also means that your db4o data becomes available on an SQL-friendly platform for ad hoc access. | This second column in my Learning SQLJ series explores how to connect to a database and embed SQL statements in your Java programs using SQLJ. These columns reference numerous SQL scripts, source code, and other files that are available for download at O'Reilly's Web site. For more specific information about the files used, or to get an introduction to SQLJ, read my first column, Setting Up Your Environment to Develop SQLJ Programs. | Because database metadata methods that generate ResultSet objects are slow compared to other JDBC methods, their frequent use can impair system performance. The guidelines in this section will help you optimize system performance when selecting and using database metadata. | This article proposes a reversed approach where the modelling is done in the relational tier and as much business logic as possible is handled within the database by employing a set of stored procedures as the middle tier. A lightweight Java API, called Amber, is introduced that uses Java annotations instead of XML descriptors to help marshal result sets to Java objects and back into the database. | Java DB is Sun's supported distribution of the open source Apache Derby 100% Java technology database. Rick Hillegas, Sun Senior Staff Engineer and Apache Derby developer, provides insights into uses of JavaDB, developing in a distributed environment and upcoming features in the next release of JavaDB. | In this article, I'm going to cover the basics of using the Java Edition of Berkeley DB. We'll cover the basics of embedded databases, discuss Berkeley DB, and talk about some of the basic things you need to know in order to use it. | In the first article of this series, we went through the basics of using Berkeley DB. In this article, we're going to walk through a more extended example of using it. The example I'm going to use is session management. While this series of examples doesn't illustrate the full power of Berkeley DB, it will give you a good feel for how to use it. And you might be surprised at how complicated some aspects of using Berkeley DB are. | Some intermediary data services provide a Web service interface to the data, but it is also possible for that intermediary tier to be a JDBC-accessible virtual database. Such a virtual database can then provide its own schema, pulling data from possibly several underlying databases. | Providing fast access to data is often next to the need to provide high availability as key reasons for enterprise architects to consider a cluster-based deployment, or even a cluster-centric architecture. The general concern is that databases are considered "slow," or at least the limiting factor, in an application's throughput. | Programming interfaces such as Apache Jakarta POI allow a Java application to programmatically modify Excel spreadsheets (see "Learn to Read and Write Microsoft Excel Documents with Jakarta's POI"). Unfortunately, such APIs allow only cell-by-cell interaction with spreadsheets. They do not enable interacting with a spreadsheet as if it were a database, which utilizes the power of SQL. By tapping into the JDBC-ODBC bridge driver, you can work with spreadsheets as databases. | SchemaCrawler is just such an API. A free and open-source API available under the LGPL license, SchemaCrawler is written in Java, making it operating system agnostic. Since it leverages JDBC, it is also database independent. It deliberately doesn't have any RDBMS-specific code. That is the reason you won't find any triggers in it; there is no way to get trigger metadata using JDBC. | When using a database management system from Java, however, the landscape isn't as clearly marked. How do you know if performance is being limited by your application, the database, or your JDBC driver? | Java DB is a 100 percent Java-based, lightweight RDBMS that can be run with any standard Java Virtual Machine. Though lightweight, it meets the ACID (Atomic, Consistent, Isolation, Durable) properties of the relational database system. As such, it will recover data up to before a crash without losing committed data or risking corruption. It supports all the relational database standards such as SQL and JDBC. As a result, it is platform independent and interoperable with any open standards-based database. | LiquiBase ? available since 2006 ? is an open source, freely available tool for migrating from one database version to another (see Resources). A handful of other open source database-migration tools are on the scene as well, including openDBcopy and dbdeploy. LiquiBase supports 10 database types, including DB2, Apache Derby, MySQL, PostgreSQL, Oracle, Microsoft® SQL Server, Sybase, and HSQL. | This time, I'll continue that theme -- the many options found in db4o -- with a look at how it handles refactoring. As of version 6.1, db4o automatically recognizes and handles three different kinds of refactoring: adding a field, removing a field, and adding a new interface to a class. I won't cover all three (I'll focus on adding a field and changing a class name), but I will introduce you to what's most exciting about refactoring with db4o -- which is its introduction of backward- and forward-compatibility to database change management. | HSQLDB is an embeddable database engine written in the Java? language. It has table types for both in-memory and disk-based tables, and was designed for being embedded entirely within an application, eliminating the administrative overhead associated with most real databases. To load the data into HSQLDB, it would only be necessary to write a Visitor that traversed the in-memory data structure and generated the appropriate INSERT statements for each entity to be stored. Then you could execute SQL queries against the in-memory database tables to do the reports and throw away the "database" when finished. | When you are designing distributed applications, you must consider availability and performance. A common solution is to include a data store on the client system. Typically, the client will require a lightweight data store as a result of limited resources. This approach poses a challenge for data synchronization between heterogeneous data stores. One resolution to this problem is a Java-based approach using JDBC and SyncML standards for heterogeneous database replication, but first a little background. | This means that more and more developers find themselves choosing between database management systems (DBMSes). This can be a daunting choice considering the many available DBMSes, both open and closed source, and the broad spectrum of differences between them. This article provides some guidance through the maze of available DBMS features and methodologies, to help the developer quickly narrow the choices to the best candidate. | If it is not possible to access the Unicode database by DB2 Command Center, at least you can use the Windows Character Map to find out the hex-codes of Unicode characters, and you can enter them via the DB2 CLP. This does only work in case of UCS-2 encoding (i.e., GRAPHIC/VARGRAPHIC columns) as the Windows Character Map shows the hex-codes for UCS-2 only. You can use the following syntax to enter characters by their hex-codes with a SQL statement: | The design and implementation of disaster recovery procedures is very specific to the nature of the application being maintained. There's no "one size fits all" scheme for every possible set of requirements. Applications that are highly transaction intensive may want to take more frequent log-only backups, while others may only require a nightly full backup. For an excellent tutorial on developing robust backup and recovery procedures, see chapter 9 of Breck Carter's book SQL Anywhere Studio 9 Developer's Guide, ISBN: 1-55622-506-7. | In this article, you covered the basic concepts involved with connecting to a database from a Java application using JDBC. Incorporating a database into an application greatly expands your development options. Although this is a series of articles on object technologies, you may have noticed that you used a relational database, Microsoft Access. You will explore this issue in more detail next month. Also, as already noted, JDBC can be used in conjunction with any number of databases as long as you have the appropriate driver installed on your system. | This is Chapter 6: The Database Control from the book BEA WebLogic Workshop Kick Start (ISBN:0-672-32417-2) written by Joseph Weber and Mark Wutka, published by Sams Publishing. | Locking occurs when one transaction obtains a lock on a resource so that another transaction cannot modify this resource. This mechanism exists to preserve data consistency. Applications that interact with the database must be designed to handle locks and resource unavailability situations. Locking is a complex subject that requires a separate discussion, but for the purpose of this article, I will say that locking is supposed to be a temporary event—this means that if a resource is locked now, it will be released after some time. Deadlocks are situations in which multiple processes accessing the same database each hold locks needed by the other processes in such a way that none of the processes can proceed. | The reasoning behind this step should be obvious: how are you going to know what should be optimized if you're not actively monitoring how MySQL is operating? Thankfully, MySQL's developers have been particularly proactive in providing developers with the tools for keeping abreast of database performance. | A second criticism is that putting configuration data in the database makes it more difficult to modify without using the application's interface. For starters, codelist and codelist_value are tables. A user with the appropriate permissions can issue SQL statements to modify settings. Also, although it may be convenient for a developer to modify a framework's XML files by hand, it is less likely that an end user will need to do this. Some software applications like to provide this option to users. However, allowing end users to modify configurations by hand can be error-prone. It is debatable whether you want to provide them with this functionality at all. | This article describes the implementation of a simple library for doing something that many programs have to do: save data. It's not going to solve all of our data-saving needs, because, regardless of what some database vendors might tell you, there is no one piece of software that can properly handle every data-saving situation. | After sketching out the path to your data, you've finally arrived at the page that contains the data itself. You now need to map out the page in a way that your data can be identified from the rest of the insignificant details, styling, and advertisements! I've always believed in syntax highlighting and have become accustomed to vim's flavor of highlighting. I've got the View Source With .. Extension configured to use gvim. So, I right-click and, with any luck, the page source is displayed in the gvim buffer with syntax highlighting enabled. If the page has a weird extension, or no extension, I might have to "set syntax=html" if it's not presenting the proper page headers. Search through the source file, correlating the visual representations in the browser with the source code that's generating them. You'll need to find landmarks in the HTML to use as a means to guide your parser through an obscure landscape of markup language. If you're having problems, another indispensible tool provided by Firefox is "View Selection Source.To use it, simply highlight some content and then right-click -> "View Selection Source." A Mozilla Source viewer opens with just the HTML that generated the selected content highlighted with some surrounding HTML to provide context. | Cristoph Rupp's hamsterdb is a lightweight, embedded database engine designed for ease of use, high performance, stability, and portability. In the database world, you have typically two extremes. On the high end, you have the full-featured and sometimes unwieldy Relational DBMS with SQL and a daemon/server process (such as Oracle). On the low end, there are b+tree-based systems, which are essentially just a database engine that is linked into the application and usually are without SQL support. As a lightweight database engine, hamsterdb fits into the latter category. It is very fast, but only supports the minimum needed operations. Specifically, it is embeddable, and therefore does not have the external dependencies or installation hassles of an SQL server. It is simply a database engine, but not a database management system (DBMS) and it has no relational functions or other features provided by SQL. For many apps that need to manage a lot of data, but don't need an externally accessible database for report writers or 3rd party tools, hamsterdb may be your "pet" solution. | Suppose you have a set of records in an Access database that you have to view through a front-end tool. You can design a user interface by using various programming languages such as Visual Basic, Visual C++, etc. Java, however, provides a more consistent approach in developing these interfaces through the javax.swing package. Moreover, Java provides the Java Database Connectivity (JDBC) API, with which you can connect your app to any database designed either using Microsoft Access or SQL Server. In this article, we will examine the basic steps required to handle JDBC using javax.swing for creating user interfaces. | To summarize, Hypersonic SQL is well adapted for standalone desktop applications written in Java. It is also useful for testing and prototyping. The Web site further recommends using it with databases stored on CD-ROMs, although I have not tested this feature. I think it may also be useful for embedded applications. | PREVIOUSLY, IN OUR WORK WITH PHP, WE used a flat file to store and retrieve data. When we looked at this file in Chapter 2, “Storing and Retrieving Data,” we mentioned that relational database systems make a lot of these storage and retrieval tasks easier, safer, and more efficient in a web application. Now, having worked with MySQL to create a database, we can begin connecting this database to a web-based front end. | Java Database Connectivity (JDBC) is a method of Java calling SQL and PL/SQL. The DML operations of SELECT, INSERT, UPDATE, and DELETE as well as calling PL/SQL procedures and returning of resultsets can be done using JDBC. | If Derby were the only product to come out of the open source arena, I think I’d be happy with that! That’s an exaggeration of course, but it points to the importance of Derby. Database technology has traditionally occupied an exalted and expensive position in the world of software. The advent of Derby ends this dominance and opens up the database field to us all. Sun Microsystems has already released its own distribution of Derby called JavaDB. So, you’re likely to be hearing more about Derby as time goes on. | The initial promise of Derby has been its applicability to embedded solutions—i.e., situations in which a complete light-footprint database engine is incorporated into a small software component or environment. In this context, Derby really delivers on the promise of component-oriented development because with it you can embed a standalone database into an application based on a browser. This aspect of Derby has received much attention, and you can follow up on this by studying the detailed examples that come with Sun's Derby distribution. However, Derby has another side that hasn't received so much attention: It can function as a network database server engine. This is an interesting development because it might potentially start to compete with the established vendors in this space. |
|