Factors Impacting Scalability

User base
Hardware, software and network limits
Database and Software Design

Several factors must be considered when plans to upscale involve increasing the number of concurrent connections to the server.

User base

First one needs to identify the actual concurrency requirement: are we talking about actual connections (the number of users who are expected to make a connection to a database at peak load times) or about defining the limits for the size of the user base?

Hardware, software and network limits

Firebird can be deployed in a range of “models”. Of interest here are the two “full server” models, Superserver and Classic. Both models are available on Windows, Linux and several other platforms. Although the server engine is identical, regardless of model, the choice of model ascends in importance as the potential number of logged-in users increases. Hardware and OS platform limitations also come into play here.

Superserver

Each Firebird Superserver is limited to a practical maximum of, at most, around 150 to 400 concurrent connections. The reason for this is that a Superserver and all of its connections to all databases are wrapped in a single, 32-bit process. A 32-bit process cannot address more than 2 Gb of memory. Each connection to a database causes the database engine to start one or more threads in which to process requests from that connection; the Superserver also runs process threads of its own for background garbage collection and other on-line tasks.

The Superserver also needs RAM for other purposes, of course.

A configurable page cache—one for each database that has active connections—is maintained to keep frequently-used database and index pages in memory. On Superserver, this cache is shared by all connections to that database. The size of the page cache defaults to 2048 pages so, for a database having the default page size of 4Kb, 8 Mb of RAM needs to be available for efficient use of the cache. (If the cache needed to be paged out to virtual memory, its purpose would be defeated!).

Typically, databases are created with an 8Kb page size and it is a common source of resource starvation when developers and DBAs over-configure the cache drastically, in the belief that “more is better” (what the author refers to as “the Lambourghini syndrome”: if you drive to work in the CBD at peak-hour in a Lambourghini, you're going to get there faster than I do in my Mazda 121!).

Sometimes this measure is taken to try to speed up response time that becomes degraded by poor transaction management in client code and by careless connection management. It is pointless. Such problems have nothing to do with the size of the cache and everything to do with inattention to resource cleanup. Ramping up the cache simply exacerbates the problem.

Consider the effect on resources when, for example, the cache for a typical 8Kb page size is ramped up to 20,000 pages. That represents 160 Mb that must be preserved in memory to make caching useful. The server maintains its lockfiles in RAM and these will grow dynamically (up to a configurable maximum) as more connections come online. Firebird 1.5 will use RAM for sort operations if it is available. This is great for response time, but many poorly-written applications hold open ordered output sets comprising thousands of rows for long periods.

Database servers love RAM. The more that is available to the server, the faster it goes. However, that 2Gb barrier kills Superserver in an overpopulated environment because each Superserver connection uses about 2Mb of RAM to instantiate its process thread and maintain its resources. So 500 Superserver connections will use a gigabyte, leaving somewhat less than a gigabyte of total addressable RAM available to the database server for processing, sorting, transaction accounting, lock tables, in-memory storage and new connections.

Classic

The Classic “server” is in fact not a database server at all, but a background service (or, on Linux, an xinetd daemon) that listens for connection requests. For each successful connection the service instantiates a single Firebird server process that is owned exclusively by that connection. The result is that each connection uses more resources, but the number of connections becomes a question of the amount of RAM available on the host machine. In addition to the ~2Mb to instantiate its connection, each also maintains its own, non-shared page cache. All connections get the same starting cache and, since the cache does not need to cater for shared use, it can (and should) be smaller. The recommended size is around 2Mb (512 pages for a 4Kb page size) on a 4Gb dedicated machine, but it can be much less. It is important to keep enough RAM available for the lock manager's table to “grow into” for peak connection loads.

64-bit Addressing

A way out of the 32-bit RAM limitations is to build a 64-bit Superserver. A 64-bit Superserver is possible on POSIX platforms where the OS has stable support for 64-bit hardware. An experimental 64-bit Superserver for Firebird 1.5.2 on AMD64/Linux i64 was released more than a year ago but it proved unstable. A 64-bit AMD64/Linux i64 build is included in the Firebird 2.0 beta 2 field test armoury. On Windows, it is not yet possible to build 64-bit capable Superserver because of issues with the current releases of the Microsoft Visual Studio 7 C++ compiler, the release-standard build environment for Firebird 2 on Windows. Private builds of both v.1.5.x and v.2.0 on non-Microsoft compilers are known to be working stably in production environments and the core development team has signalled its intention to keep a 64-bit Windows release in sight for Firebird 2.0.

Connection Pooling

Using middleware to hold and manage pre-allocated resources to keep a finite number of connections ready for incoming connection requests is known as connection pooling. There are several ways to implement connection pooling and compelling reasons to do so when a growing user base must be catered for. It would be unrealistic to set out to build an extensible multi-tier system without it. Connection pooling may be combined with a “queuing” mechanism if resources are inadequate to cope with peak loads.

The middleware developer needs to take special care to monitor attachments and detachments, especially from “stateless” clients like web browsers, to avoid resource leakages. Middleware should guard against recursive threaded workflows that could monopolise the pool and, for architectural reasons, should prevent threading across connections totally.

Competition for RAM

Trying to run a database server on a system that is running competing services, such as a web server, Exchange server, domain server, etc., will risk having those other services steal resources (RAM and CPU time) from the database server, denying Superserver even the 2 Gb that it is capable of using or limiting the number of connections to Classic, and slowing down database response time.

SMP Support

The incompatibility of the Superserver's threading implementation with the Windows implementation of symmetric multiprocessing (SMP) could be regarded as a “scaling limitation”, because of the arbitrary way Windows switches the entire affinity of a process between CPUs, resulting in a “see-saw effect”, whereby performance will continually come to a standstill at busy times, while the system waits for the OS to shift all of the Superserver's active memory resources from one CPU to another when it detects unevenness in CPU utilisation. Superserver is configured by default to be pinned to a single CPU to avoid this.

On Linux, multiprocessor handling does not cause this see-saw effect on 2.6 kernels and higher, although it has been reported on some 2.4 kernels on older SMP hardware. However, SMP does not gain any significant performance benefit for Superserver on Linux, either.

Support for configurable levels of SMP-aware, fine-grained multi-threading has been architected into the Vulcan engine and will become a feature of Firebird 3. It has already demonstrated sufficiently useful effects on memory-bound operations to make it something to look forward to.

At the same time, given the inherent importance of disk I/O to responsive online transaction processing, much of the noise about SMP capability is attributable to the Lambourghini syndrome. In the appropriate environment, Superserver is an efficient uniprocessor system! Classic—being a single process per connection—does not suffer from SMP inhibition, even on Windows. Until Firebird 3, when the distinctions between the two resource models will disappear, to be replaced by configurable resource management, Classic is the better choice for any upscaling plan that is able to accommodate increasing demand by augmenting system resources.

Note

There have been reports of kernel-level problems with SMP on some Linux versions with 8 CPUs and hyperthreading enabled, that do not appear on a 4X system. Since full support for setting CPU affinity on Linux did not arrive until kernel v.2.6, setting CPU affinity via firebird.conf is not supported for Linux in the current Superserver release.

"Windows Networking"

An adequately resourced TCP/IP network using the appropriate Firebird server model is amenable to upscaling. A Linux host with no competing applications (including Xserver!) running on it will deliver the optimum performance at the enterprise end of the scale. As a database server, a Windows host suffers from some inherent overheads that will have a noticeable impact on performance under conditions of high demand.

However, there is another “gotcha” attached to Windows networking. The native Windows Named Pipes transport (“NetBEUI”), which Firebird's network layer supports for largely historical reasons, has an absolute limit that cannot exceed 930 connections to any server. Named Pipes should be avoided for environments other than small workgroup LANs anyway, since it is a “noisy” transport that gets noisier as contention rises.

Incidentally, databases on Windows should always be stored on NTFS partitions which, if properly protected, offer better performance and less risk than FAT32. Older hard disks that have been retained on systems that previously ran NT 4.0 should be also be regarded as risky, since the older NTFS does not reliably support filesystem protection measures added in later implementations.

Database and Software Design

The availability and scalability of any software system can be negatively affected by poor database design, careless query specification, inappropriate workflows and, especially, poor transaction management.

The multi-generational architecture ensures robustness, excellent, optimistic task isolation and highly efficient throughput. Well-performing databases result from careful design, clean normalization, good indexing and, on the client side, well-planned queries and careful attention to timely completion of transactions. On the contrary, poor database design results in complicated queries and careless indexing that can cripple the capability of the optimizer to make effective query plans. Slow queries are generally the result of a combination of these flaws.

Garbage Collection

MGA accumulates “garbage”, in the form of old record versions. The engine performs in-line garbage collection. However, it will not permit garbage collection of old record versions that are still “interesting” to some transaction. Transactions that remain interesting for long periods trap record versions pertinent to any transactions that started later, causing garbage to build up to an unmanageable level. It is essential that Firebird developers understand how this could happen and write software that avoids it, keeping the level of garbage low enough for the garbage collection subsystem to cope with.

Commit Retaining and “Autocommit”

Commit Retaining is a “feature” of Firebird that was inherited from the ancestor, InterBase. It was implemented as a means to retain server-side resources and keep them available for developers using Borland's rapid application development tools and the BDE, the generic data access layer that was used to connect Windows graphical applications to databases. Its purpose was to present a level playing field for RAD development, regardless of the database running at the back-end.

Autocommit—a client-side mechanism that rolls the statement-level Post and the transaction-level Commit phases into a single step—was provided to enable developers to avoid transactions altogether. In the RAD components, Commit Retaining and Autocommit were bound together. This fitted well with desktop databases like dBase and Paradox (whose engine is, in fact, the BDE!), which do not have transactions.

With Firebird (and InterBase), Commit Retaining causes transactions to remain interesting indefinitely. Garbage collection effectively ceases on the “standard” Borland RAD tools database application and any other applications that make use of Commit Retaining. Such systems are fraught with problems of progressively degrading performance that cannot be resolved except by shutting down the database and allowing these old transactions to die.

Autocommit and Commit Retaining are not restricted to the Borland tools, of course. They are supported by most data access interfaces and Commit Retaining is available in SQL, so it behoves the application developer to understand the effects and to use these features with extreme care and control.