|Firebird Documentation Index → Gfix - Database Housekeeping → DB Housekeeping and Garbage Collection|
Garbage, for want of a better name, is the detritus that Firebird leaves around in the database after a rollback has been carried out. This is basically a copy of the row(s) from the table(s) that were being updated (or deleted) by the transaction prior to the rollback.
Almost all garbage is created by committed transactions. Since around V2.5 transactions that rollback are cleaned up immediately - assuming that Firebird is still running.
The major cause of garbage build-up is long running transactions that require Firebird to keep old versions of records that are frequently updated. Another source of garbage is an application strategy that deletes records and never revisits them.
What actually happens on delete is that Firebird stores a "deleted stub" with the full record as a back version. Until the delete is mature - meaning that all active transactions started after the delete was committed - the old version must be preserved.
Imagine a table that's indexed and accessed by date. On some schedule, records age out and are deleted. In the application, records are accessed by date and the deleted records are so old no query every asks for them. So there they sit, taking up space and doing no good to anyone. Even with a garbage collect thread, some active transaction has to stumble over deleted records or records with unneeded back versions before the record will be garbage collected.
In cooperative garbage collection, that particular record will be cleaned up immediately (or at least when the transaction gets some cycles). The dedicated garbage collection thread should clean up all the records on a page, but not until an active transaction tells it that there's a page that needs cleaning.
Because Firebird uses multi-generational architecture, every time a row is updated or deleted, Firebird keeps a copy in the database. These copies use space in the pages and can remain in the database for some time, especially if there are no active transactions stumbling across them!
There are a number of causes of garbage:
Remnants from a committed transaction. This is the main cause of garbage since around Firebird version 2.5.
Remnants from an aborted (rolled back) transaction may exist in Firebird versions prior to 2.5, newer versions perform immediate clean up after a rollback however, if the Firebird Server, the Operating System or the physical server crashed, then these remnants may still exists, even in later versions of Firebird.
Applications, described above, which delete database records, but then, subsequently, never revisit those deleted versions to garbage collect them automatically.
With regard to the remnants from aborted or rolled back transactions, Firebird (now) carries out record keeping to facilitate save points. This housekeeping allows Firebird to identify and, if necessary, undo all changes made by a transaction in the event that it is rolled back, or which failed due to a lost connection.
If a failed transaction is rolled back in either manner, its state is set to committed as there are no differences between a failed transaction and one in which it committed after making no changes.
These remnants are simply older copies of the rows that were being updated by the respective transactions. The differences are that:
Whenever a subsequent transaction reaches garbage from a committed transaction, that garbage is automatically cleared out, but see above for reasons where this may not take place often enough.
Rolled back garbage looks just like record versions created by active transactions. Those records can be accessed either sequentially (during a full table scan) or by index - assuming that the index entry was made before the crash that left the garbage around. The index entries will exist in the case of all but the last change made. When one transaction reads a record version created by a transaction that's listed in the transaction bit vector as active, the reader attempts to get a lock on the apparently active transaction id. If the lock request succeeds, then the other transaction is dead and the reader will either clean up the mess or notify the garbage collect thread to do so.
Firebird will automatically sweep through the database and remove the remnants of rolled back transactions and this has two effects:
The space recovered is made available for reuse by the same table, however, if this results in the page becoming completely empty, then it can be used for any purpose within the database.
The performance of the database may be affected while the sweep is in progress.
One other method of clearing out old rolled back transactions' garbage is simply to carry out a database backup. Gbak reads every table sequentially and thus visits every row in every table. Applications which also visit every row in one or more tables, will also cause the garbage in those tables to be collected.
Neither will affect the database's OIT (Oldest Intersting Transaction) or OST (Oldest Snapshot) settings however.
In the Super Server version of Firebird 2.0, garbage collection
has been vastly improved. There are now three different ways of
operation and these are configurable by setting the
GCPOLICY parameter in the
firebird.conf configuration file. By default, Super
Server uses combined while Classic Server uses
cooperative. The other option is
Classic Server ignores the setting and always uses cooperative garbage collection.
Normally, when a "back" or old version of a row in a table is created, it will be stored on the same page as the newset version. This is usually fine as the back version is not normally a complete copy of the old version, merely a list of differences from the newest version. Enough information is retained in the old version, to be able to recreate it, if necessary.
If the database is suffereing from a lack of garbage colecting, either deliberately, or down to the application design, then it is possible that there will be a build up of enough back versions to fill the target page. When the chain of old versions gets too big, Firebird has to move the old versions to a diffreent page which, if it occurs as part of an UPDATE statement, as it normally will, the UPDATE will run a lot slower than usual and will greatly increase the cost of subsequent garbage collection againts that table.
This is the default setting, indeed the only setting, that Classic Server uses. In this mode, the normal operation - as described above - takes place. When a full scan is performed (perhaps during a backup) old versions of the rows are deleted at that point in time. Record versions which are old enough that no active transactions have any interest in them will be removed, as will any versions created by failed transactions, if there are any present. (Which there shouldn't be!)
Super Server has, even since before version 1.0, performed background garbage collection where the server informs the garbage collector about old versions of updated and deleted rows when they are ready to be cleaned up. This helps avoid the need to force a full scan of each record in the database tables to get the garbage collector to remove these old versions. An active transaction has to recognize the need for garbage collection and notify the server which puts that record id on a list for the garbage collect thread. So an unvisited record will not attract the garbage collector unless another record on that page is read and does need cleanup.
When all rows in a table are read by the server, any old record versions are flagged to the garbage collector as being ready to be cleared out. They are not deleted by the scanning process as in the cooperative method. The garbage collector runs as a separate background thread and it will, at some point, remove these older record versions from the database.
The default sweep interval for a new database is 20,000. The sweep interval is the difference between the Oldest Snapshot Transaction, or OST and the Oldest Interesting Transaction or OIT.
This doesn't mean that every 20,000 transaction a sweep will take place. It will take place when the difference between the OST and the OIT is greater than the sweep interval.
An interesting transaction is one which has not yet committed. It may be still active, in limbo or may have been rolled back. (Limbo transactions are never garbage collected.)
The sweep facility runs through the database and gets rid of old rows in tables that are out of date. This prevents the database from growing too big and helps reduce the time it takes to start a new transaction on the database.
If you find that starting a new transaction takes a long time, it may be a good idea to run a manual sweep of the database in case the need for a sweep is causing the hold-up.
You can check if a manual sweep may be required by running the gstat utility to check the database header page and extract the Oldest Transaction (OIT) and Oldest Snapshot (OST) numbers from the output. If OST - OIT is small (less than the sweep interval) then a manual sweep may be in order. The SHOW DATABASE command in isql will also show the details you need.
Alternatively, another idea is to run gstat with the switches set to show old record versions. If that shows a problem, then it may be a good idea to start looking for long running transactions.
The options for this are:
gstat <database> -r[ecord]
gstat <database> -d[ata] -r[ecord]
gstat <database> -r[ecord] -t[able] <table_names>
tux> gstat test.fdb -r -t NORMAN ... Analyzing database pages ... NORMAN (142) Primary pointer page: 268, Index root page: 269 Average record length: 0.00, total records: 15 Average version length: 9.00, total versions: 15, max versions: 1 Data pages: 1, data page slots: 1, average fill: 16% ...
The information is shown in the 'record versions' statistic. In this example, there are 15 versions and as the 'total records' is also 15, then all the records have been deleted and need gargabge collecting.
A manual sweep can be run by using the
-s[weep] command. (See below).
To alter the database's automatic sweep interval, use the following command:
gfix -h[ousekeeping] INTERVAL database_name
The INTERVAL parameter is the new value for the sweep interval. The database name parameter is the database upon which you wish to alter the setting for automatic sweeping. The following example shows the setting being changed from the default to a new value of 1,000.
linux> gfix -h 1000 my_employee linux> gstat -header my_employee | grep Sweep Sweep interval: 1000
If automatic sweeping has been turned off, or only runs rarely because of the sweep interval setting, the DBA can manually force a sweep to be performed. The command to carry out this task is:
gfix -s[weep] [-i[gnore]] database_name
This command will force the garbage left over from old rolled back transactions to be removed, reducing the database size and improving the performance of new transactions. .Rolled back transactions are less of a problem than old versions from committed transactions, however, when the newest versions is being used by all current and future active transactions.
-i[gnore] option may be supplied. This
forces Firebird to ignore checksum errors on database pages. This is not
a good idea and should rarely need to be used, however, if your database
has suffered some problems it might be necessary to use it.
Checksums have not been used for a number of years as it was found that a significant percentage of CPU was consumed by check summing to find partial page writes - none of which were ever found!
The following example shows a manual database sweep being implemented:
linux> gfix -sweep my_employee
If you set the sweep interval to zero then automatic sweeping will be disabled. This implies that there will be no automatic housekeeping done so your database performance will not suffer as a result of the processing requirements of the automatic sweep.
If you disable sweeping you are advised to run a manual sweep at regular intervals when the database is quiet. Alternatively, simply make sure that you take regular backups of the database and as this is something you should be doing anyway, it shouldn't be a problem.
|Firebird Documentation Index → Gfix - Database Housekeeping → DB Housekeeping and Garbage Collection|