1. Abstract
The corporate databases can be linked to the Web in a manner that allows clients or employees to access to corporate data through a Web browser. Our report first describes the bridge between the Web and corporate databases and discusses a series of related concepts. Secondly, a number of linking methods and their analysis are presented. Finally, application architecture analysis is reported.
2. Introduction
The World Wide Web (known as "WWW" or "Web") is growing at a phenomenal rate. The current Web is largely based on file system technology, which can deal well with the resources that are primarily static. However, with the unprecedented growth of resources, it is no longer adequate to rely on this conventional file technology for organizing, storing and accessing large amount of information on the Web. Thus, many large Web sites today are turning to database technology to keep track of the increasing amount of data. Database technology has played a critical role in the information management field during the past years. It is believed that the integration of the Web and database technology will bring many opportunities for creating advanced information management applications.
With the increasing popularity and advancement of Web technology, many organizations want to Web-enable their existing applications and databases without having to modify existing host-based applications. This not only gives all of the existing applications a common, modern look and feel but also can deploy them on corporate Intranets, the public Internet, and newer Extranets.
Taking simple data from a database and placing it on the Web is a relatively simple task. However, in most cases, the corporate data is maintained in a variety of sources, including legacy, relational, and object databases. It is much more complicated when these diverse data sources must be queried or updated. The methods, techniques, and tools are in great demand to bridge the gap between the Web and database applications so that smooth, interactive, and integrated Web-to-database applications are made possible.
There are many players in the industry taking this challenge. These include major database vendors, mainframe vendors, third party software firms, Web browser vendors, and Web server vendors. A wide range of tools and philosophies has been proposed for connecting and integrating the Web and databases.
3. Overview of Web-Database Technologies
Database client technologies provide abstractions. That is their purpose. A database is a very complex piece of software. Writing programs to communicate with a database through its native interface can be very complicated. Database client technologies simplify this process.
Database client technologies provide an interface that is less complex than the underlying database. They enable you to write relatively simple programs that leverage an enormous amount of code (code that resides in the database) to perform very complex tasks.
A good database interface is like a magnifying glass for your code, as shown in Figure:
A database interface as a code magnifier
Writing programs to communicate with a database through its native interface not only can be complex but also can result in limited and inflexible applications. An application written to use a particular database's native interface is limited, of course, to using only that particular database. The process of enabling such an application to use another database can be very difficult and time-consuming, if not impossible.
Database client technologies provide a uniform interface for communicating with different and disparate database systems. With modern database client interfaces, you can write a single program that performs complex operations using multiple types of databases, as shown in Figure below:
A uniform interface to disparate database systems
A good database interface magnifies your code and provides a uniform interface to different database systems. In the recent past, several database interfaces have been developed. These database interfaces differ from each other in the things they accomplish and the way they go about them. These will be explained later in the document.
4. Need For Connecting Databases To Web
There are several reasons why organizations want to employ these databases in their Web applications.
> The data you maintain can be made available to the general public and internal users.
> Data from databases within different parts of an organization can be consolidated using Web application and served to users as though it were from a single source.
> The functionality of your Web server can be extended so that new expanded services can be made available to visitors to your Web site.
> To better support and use legacy database systems, the information they contain, and existing applications.
> Joint operation over multiple databases.
> To unlock the potential of unused information held within organizational databases.
Databases play an important role in many applications being designed for customer use and to improve the services provided to them. Some of the uses of databases in Web applications are:
> Business agreements, accounting information, transaction rules, and purchasing and billing schedules that allow automatic ordering, purchasing, invoicing, and payment transactions to occur between businesses.
> Site access statistics, customer counting, sales trends and product tracking.
> Product information that enable customers to search for and obtain product specifications, prices, suggested uses and troubleshooting information.
> Inventory information can be used to generate online shopping catalogs and practical stores offering goods that meet certain criteria specified by each individual shopper.
5. Bridge Between The Web And Databases
Delivering data over the Web is cost effective and fast, and gives Internet users easy access to databases from any locations. Users hope to access databases via Web browsers with the same functions as provided by normal database application software. Businesses want to provide their users or customers various functions such as purchasing goods, tracking orders, searching through catalogues, receiving customized content, and viewing interesting graphics. The Web-to-database integration has become central to the jobs of corporate information systems construction.
Making database information available to Web users requires converting it from the database format to a markup language such as HTML or XML. Database packages store information in files optimized for quick access by front-end programs. When the Web server sends information to a client, the internal database format must be converted to HTML so that it is displayed correctly. A bridge between the Web and databases needs to be built. This bridge lets the Web browser replace the front-end program normally used to access the corporate databases.
6. User-Interaction via Internet
Web browsers are the user's interface to the World Wide Web. They communicate with servers for the transfer of hypermedia documents.
The Web browser is responsible for receiving input from the user. It sends the document transfer requests to Web servers, then receiving the HTML document from the server, and finally converting the same document into formatted output to be displayed on the screen as shown in the diagram below:
7. Web-to-Database Connecting Technology:
To build a bridge between Web and enterprise database, a number of alternative technologies and architectures have been available. These include:
> CGI (Common Gateway Interface) is a Web standard for accessing external programs, to integrate databases with Web servers. The CGI dynamically generates HTML documents from back-end databases;
> Web server APIs, such as Microsoft's Information Server API (ISAPI), Netscape API (NSAPI), are invoked by third party software to access remote databases;
> Web-ODBC (Open Database Connectivity) gateways rely on an open API (Application Programming Interface) to access to database systems;
> Vendor-specific Web browser/data warehousing interfaces are in response to the inherent advantages of the two technologies;
> JDBC (Java Database Connectivity) is used in its Java programming language to program Java applets to access back-end databases.
> ActiveX Data Objects (ADO), OLE DB(Object Linking and Embedding Database), DAO (Data Access Objects) and RDO (Remote Data Objects).
Each of the above technologies has strengths and weaknesses. Several factors should be considered when making selections. These include the complexity of data, the speed of deployment, the expected number of simultaneous users, and the frequency of database updates. However, new technology is emerging and several tools are already available that make this Web-to-database access optimized for improved performance.
8. Database Middleware
Generally, middleware can be said to be the glue (or logic) that lies between clients and servers. It deals with all the "grim stuff" of incompatible operating systems and file structures. Programmers on both client and server ends use APIs for requesting or receiving services and data. Middleware is used to connect diverse products that do not have a common language. There are five different kinds of middleware: object request brokers (ORB), message-oriented middleware (MOM), database middleware, transaction-processing (TP) monitors middleware, and remote procedure call (RPC) middleware.
Middleware technology is becoming popular to connect databases with the Web. Middleware is in the midst of an evolutionary growth spurt. As it relates to the Web, the middle tier will evolve to play an important role in things such as enabling advanced multitier-application deployment, using the Web for distributed transactional systems, managing multiple execution environments with Java, C++, and ActiveX, and providing the links to existing mission-critical information resources.
9. Analysis of Interfacing Methods
9.1 Common Gateway Interface (CGI):
9.1.1 Introduction:
CGI is a standard for interfacing external programs with Web servers. The server submits client requests encoded in URLs to the appropriate registered CGI program, which executes and returns results encoded as MIME messages back to the server. CGI's openness avoids the need to extend HTTP. Most vendors of Web server extension tools continue to support CGI even as more advanced APIs have been added. This is due to the fact that many prewritten scripts are freely available for a variety of platforms and most of the popular Web servers.
CGI programs are executable programs that run on the Web server. They can be written in any scripting language (interpreted) or programming language (must be compiled first) available to be executed on a Web server, including C, C++, Fortran, PERL, TCL, Unix shells, Visual Basic, Applescript, and others. Arguments to CGI programs are transmitted from client to server via environment variables encoded in URLs. The CGI program typically returns HTML pages on the fly. CGI lets Webmasters add common features, such as counters and date/time displays, on-line order forms, chat pages and search engines.
9.1.2 Advantages:
There are various advantages of CGI:
> Simplicity: CGI provides a simple way of running programs on the server when a request is received and it is conceptually easy to understand the underlying process.
> Process Isolation: Since CGI applications run in separate processes, buggy applications will not crash the Web server or access the server's private internal state.
> Portability: CGI is an open standard. CGI is not tied to any particular (such as single- or multi-threaded) server architecture. CGIs are far more portable over other alternatives such as server extension APIs.
> Language Independency: CGI applications can be written in nearly any language.
> Support: CGI is a proven technology, and some form of CGI has been implemented on almost every Web server on a variety of platforms. There are many CGI scripts available for free for a variety of applications: as user-friendly front-ends to databases, search engines, scientific analysis tools, traditional inventory systems, gateways to network services such as gopher or whois.
9.1.3 Drawbacks:
CGI also has several drawbacks:
> Each time a CGI script is spawned, it creates an additional process on the server machine, slowing the server's response time.
> Also, if the CGI script is not set up correctly, security holes can occur on the server, rendering the Web site vulnerable to attacks by hackers.
> Another problem is that it is difficult to maintain state: that is, to preserve information about the client from one HTTP request to the next.
CGI is an early Web-to-database integration mechanism that is being replaced by more complex software programs that lie between the Web and database servers.
9.2 Server API:
An alternative to modifying or extending the abilities of the server is to use its API. APIs allow the developer to modify the server's default behaviour and give it new capabilities. In addition to addressing some of the drawbacks of CGI, the use of an API offers other features and benefits, such as the ability to share data and communications resources with a server, the ability to share function libraries, and additional capabilities in authentication and error handling. Because an API application remains in memory between client requests, information about a client can be stored and used again when the client makes another request.
There are, however, some drawbacks to this approach. Unlike CGI, API functions are server-specific, because each server has a different API. Buggy API code can crash a server. And more complexity is involved in developing the code, which must manage multiple process threads and clean up memory after it is run.
9.3 ODBC and JDBC:
ODBC and JDBC are types of database access middleware. ODBC is, by far, the most popular database access middleware in use today. Vendor support for ODBC is pervasive. JDBC support isn't quite at the level of ODBC support, but JDBC is growing and flourishing. Database vendors and several third-party software houses offer ODBC and JDBC drivers for a variety of databases and operating environments.
From a network administrator's point of view, they consist of client and server driver software (i.e., program files). From a programmer's point of view, they are APIs that the programmer inserts in his or her software to store and retrieve database content. While a system analyst perceives ODBC or JDBC as a conceptual connection between the application and the database, database vendors regard ODBC and JDBC as ways to entice customers who say they want to use industry standard interfaces rather than proprietary ones. And managers of data processing department view ODBC and JDBC as insurance interfaces that offer managers some measure of flexibility should they find it necessary to replace one database product with another.
ODBC technology now allows Web servers to be used to directly connect with databases, rather than using third party solutions. JDBC can also directly access server ODBC drivers through a JDBC/ODBC Bridge driver, available from SunSoft. ODBC driver vendors are also building bridges from ODBC to JDBC. JDBC is intended for developing client/server applications to access a wide range of backend database resources.
9.4 DAO (Data Access Objects):
Data Access Objects is a set of (COM) Automation interfaces for the Microsoft Access/Jet database engine. DAO talks directly to Access/Jet databases. DAO can also communicate with other databases through the Jet engine, as shown in Figure
DAO Architecture
The COM-based Automation interface of DAO provides more than a function-based API. DAO provides an object model for database programming.
The DAO object model is better suited to object-oriented development than a straight API. Integrating a set of disparate API functions into an object-oriented application typically means that the developer must write her own set of classes to encapsulate the API functions.
Rather than provide merely a bunch of functions, DAO provides a set of objects for connecting to a database and performing operations on the data. These DAO objects are easy to integrate into the source code of an object-oriented application.
In addition to including classes for connecting to a database and manipulating data, the DAO object model also encapsulates the structural pieces of an Access database, such as tables, queries, indexes, and so on. This means that DAO also enables you to directly modify the structure, or schema, of Access databases without having to use SQL DDL statements.
DAO is easier to use than the ODBC API but doesn't provide the degree of low-level control afforded by the ODBC API. Therefore, DAO could be classified as a high-level database interface.
9.5 RDO (Remote Data Objects):
RDO was originally developed as an abstraction of the ODBC API for Visual Basic programmers. Therefore, RDO is closely tied to ODBC and Visual Basic.
RDO is easier to use than the ODBC API but doesn't offer the low-level control provided by the ODBC API. Therefore, RDO could be classified as a high-level database interface.
Because RDO calls the ODBC API directly (rather than through Jet, like DAO), it can provide good performance for applications that use relational database servers.
RDO can be used with Visual C++ applications by inserting the RemoteData control in the application. The RemoteData control is an OLE Control that can be bound to controls in the app's UI. You can call RDO functions through the RemoteData control's methods.
9.6 OLE DB (Object linking and Embedding Database):
OLE DB expands on ODBC in two important ways. First, OLE DB provides an OLE - actually, a COM - interface for database programming. Second, OLE DB provides an interface to both relational and nonrelational data sources.
OLE DB provides an OLE (COM) interface. OLE was the original name for COM. When OLE DB was being created, OLE was still used as the name for COM. Since that time, COM has become the name for the foundation of Microsoft's component technology and OLE has come to be associated with UI components such as OLE Controls (OCX Controls).
OLE DB's provision of a COM interface for database programming is important because a COM interface can be much more robust and flexible than a traditional call-level interface, such as the ODBC interface. This flexibility can result in better performance and more robust error handling and can enable interfacing with nonrelational data sources.
Like ODBC, OLE DB could be classified as a low-level database API. OLE DB incorporates the functionality of ODBC for relational databases and expands on it by providing access to nonrelational data sources.
9.7 ADO (ActiveX Data Objects):
ADO stands for ActiveX Data Objects. ADO is built on top of OLE DB. ADO is an OLE DB consumer. Applications that use ADO use the OLE DB interfaces indirectly.
ADO provides an object model for database programming that's similar to, but more flexible than, DAO's object model. For instance, you can create Recordset objects in ADO without first creating a Connection object (which is something you can't do in DAO).
ADO simplifies OLE DB. OLE DB is large and complex; a program that uses OLE DB must use some complex COM interfaces. ADO is much simpler to use than OLE DB and can be classified as a high-level database interface.
Also, ADO can be used with more programming languages than OLE DB. ADO provides an Automation interface. This enables ADO to be used from scripting languages, such as VBScript and JavaScript. (OLE DB can't be used from scripting languages because scripting languages don't have pointers and therefore can't use COM interfaces.)
10. Application Architecture Analysis
PERL as development tool has clear advantages in terms of portability. PERL scripts and other traditional CGI applications are portable across different Web servers. Using PERL with character-delimited copies of database tables limits its extensibility. Without built-in SQL support provided by specialized database connectivity extensions, database operations such as JOINs and complex SELECT statements spanning multiple tables require complex programming. PERL code is easy to read because there are few components and concepts. Thus this simplicity enhances maintainability.
11. Conclusion
This paper first has described the bridge between the Web and corporate databases. Then a number of linking methods and their analysis have been presented. Finally, An application developed by using three different interfacing methods is described. It details important concepts about various web-database interfacing technologies and performs an analysis over them.
12. References:
1. Bernstein, P. A.(1996). Middleware: A Model for Distributed Services, Communications of the ACM, vol. 39, no. 2, pp. 86-87, February.
2. Carriere, J., & Kazman, R. (1997). WebQuery: searching and visualizing the Web through connectivity, in: Proc. Of the 6th International WWW Conference, pp. 701-711.
3. Deep, J., & Holfelder, P. (1996). Developing CGI Applications with Perl. Wiley Computer Publishing.
4. Duan, N. N. (1996). Distributed Database Access in a Corporate Environment Using Java, in the 5th International World Wide Web Conference, May 6-10, Paris, France.
5. Feng, L., & Lu, H. (1998). Integrating Database and Web Technologies, International Journal of World Wide Web, Vol.1, No.2, pp. 73-86.
6. Frey, A. (1996). Web-to-database communication with API based connectivity software. Network Computing Nov 15 v7 n18: 134(7).
7. Kim, W., & Garza, J. (1995). Requirements for a Performance Benchmark for Objected-Oriented Database Systems, Modern Database Systems: 203-215, Addison Wesley.
8. Kim, P. C. (1997). A Taxonomy on the Architecture of Database Gateways for the Web. In Proceedings of The 13th International Conference on Advanced Science and Technology (ICAST97). pp. 226-232.
9. Lazar, Z. P., & Holfelder, P. (1997). Web Database Connectivity with Scripting Languages. Web Journal, Vol. 2, Issue 2.
10. Lu, J., Zhao, W. G., & Glasson, B. C. (1998). Formal specifications of Web-to-database interfacing models. In Proceedings of Asia Pacific Web Conference (APWeb98). International Academic Publishers. pp. 133-140.
11. Rao, B. R. (1995). Making the most of middleware. Data Communications International, vol. 24, no. 12, pp. 89-96.
12. Reichard, K. (1996). Web servers for database applications. DBMS v9(n11), p31-36.
13. Saleeb, H. (1997). Real-Time Database Theory and World Wide Web Caching, Harvard University.
14. Whetzel, J. K. (1996). Integrating the World Wide Web and Database Technology. AT&T technical journal 75(2): 38-46.
15. Wong, W. (1997). Back-end Web Databases (Making corporate data available through Web servers). Network VAR, 5(12), pp. 67-72.