Chapter 6. Data Manipulation (DML) Statements
DML — data manipulation language — is the subset of SQL that is used by applications and procedural modules to extract and change data.
Extraction, for the purpose of reading data, both raw and manipulated, is achieved with the SELECT
statement.
INSERT
is for adding new data and DELETE
is for erasing data that are no longer required.
UPDATE
, MERGE
and UPDATE OR INSERT
all modify data in various ways.
6.1. SELECT
Used forRetrieving data
Available inDSQL, ESQL, PSQL
Global syntax
|
[WITH [RECURSIVE] <cte> [, <cte> ...]]
|SELECT
| [FIRST m] [SKIP n]
| [{DISTINCT | ALL}] <columns>
|FROM
| <source> [[AS] alias]
| [<joins>]
|[WHERE <condition>]
|[GROUP BY <grouping-list>
|[HAVING <aggregate-condition>]]
|[WINDOW <window_definition> [, <window_definition> ...]
|[PLAN <plan-expr>]
|[UNION [{DISTINCT | ALL}] <other-select>]
|[ORDER BY <ordering-list>]
|[{ ROWS <m> [TO <n>]
| | [OFFSET n {ROW | ROWS}]
| [FETCH {FIRST | NEXT} [m] {ROW | ROWS} ONLY]
|}]
|[FOR UPDATE [OF <columns>]]
|[WITH LOCK]
|[INTO <variables>]
|
|<variables> ::= [:]varname [, [:]varname ...]
The SELECT
statement retrieves data from the database and hands them to the application or the enclosing SQL statement.
Data are returned in zero or more rows, each containing one or more columns or fields.
The total of rows returned is the result set of the statement.
The only mandatory parts of the SELECT
statement are:
The
SELECT
keyword, followed by a columns list. This part specifies what you want to retrieve.The
FROM
keyword, followed by a selectable object. This tells the engine where you want to get it from.
In its most basic form, SELECT
retrieves a number of columns from a single table or view, like this:
|select id, name, address
|from contacts
Or, to retrieve all the columns:
|select * from sales
In practice, a SELECT
statement is usually executed with a WHERE
clause, which limits the rows returned.
The result set may be sorted by an ORDER BY
clause, and FIRST … SKIP
, OFFSET … FETCH
or ROWS
may further limit the number of returned rows, and can - for example - be used for pagination.
The column list may contain all kinds of expressions instead of just column names, and the source need not be a table or view: it may also be a derived table, a common table expression (CTE) or a selectable stored procedure (SP).
Multiple sources may be combined in a JOIN
, and multiple result sets may be combined in a UNION
.
The following sections discuss the available SELECT
subclauses and their usage in detail.
6.1.1. FIRST
, SKIP
Used forRetrieving a slice of rows from an ordered set
Available inDSQL, PSQL
Syntax
|
SELECT
| [FIRST <m>] [SKIP <n>]
| FROM ...
| ...
|
|<m>, <n> ::=
| <integer-literal>
| | <query-parameter>
| | (<integer-expression>)
FIRST
and SKIP
ClausesArgument | Description |
---|---|
integer-literal | Integer literal |
query-parameter | Query parameter place-holder.
|
integer-expression | Expression returning an integer value |
FIRST
and SKIP
are non-standard syntaxFIRST
and SKIP
are Firebird-specific clauses.
Use the SQL-standard Section 6.1.12, “OFFSET
, FETCH
” syntax wherever possible.
FIRST
limits the output of a query to the first m rows.
SKIP
will suppress the given n rows before starting to return output.
FIRST
and SKIP
are both optional.
When used together as in
, the n topmost rows of the output set are discarded, and the first m rows of the rest of the set are returned.FIRST m SKIP n
6.1.1.1. Characteristics of FIRST
and SKIP
Any argument to
FIRST
andSKIP
that is not an integer literal or an SQL parameter must be enclosed in parentheses. This implies that a subquery expression must be enclosed in two pairs of parentheses.SKIP 0
is allowed but totally pointless.FIRST 0
is also allowed and returns an empty set.Negative
SKIP
and/orFIRST
values result in an error.If a
SKIP
lands past the end of the dataset, an empty set is returned.If the number of rows in the dataset (or the remainder left after a
SKIP
) is less than the value of the m argument supplied forFIRST
, that smaller number of rows is returned. These are valid results, not error conditions.
Examples of FIRST/SKIP
The following query will return the first 10 names from the
People
table:|
select first 10 id, name from People
|order by name asc
The following query will return everything but the first 10 names:
|
select skip 10 id, name from People
|order by name asc
And this one returns the last 10 rows. Notice the double parentheses:
|
select skip ((select count(*) - 10 from People))
|id, name from People
|order by name asc
This query returns rows 81 to 100 of the People table:
|
select first 20 skip 80 id, name from People
|order by name asc
See alsoSection 6.1.12, “OFFSET
, FETCH
”, ROWS
6.1.2. The SELECT
Columns List
The columns list contains one or more comma-separated value expressions.
Each expression provides a value for one output column.
Alternatively, *
(select star
or select all
) can be used to stand for all the columns in a relation (i.e. a table, view or selectable stored procedure).
Syntax
|
SELECT
| [...]
| [{DISTINCT | ALL}] <select_list>
| [...]
| FROM ...
|
|<select_list> ::= * | <output_column> [, <output_column> ...]
|
|<output_column> ::=
| <qualifier>.*
| | <value_expression> [COLLATE collation] [[AS] alias]
|
|<value_expression> ::=
| [<qualifier>.]col_name
| | [<qualifier>.]selectable_SP_outparm
| | <literal>
| | <context-variable>
| | <function-call>
| | <single-value-subselect>
| | <CASE-construct>
| | any other expression returning a single
| value of a Firebird data type or NULL
|
|<qualifier> ::= a relation name or alias
|
|<function-call> ::=
| <normal_function>
| | <aggregate_function>
| | <window_function>
|
|<normal_function> ::=
| !! See Built-in Scalar Functions !!
|
|<aggregate_function> ::=
| !! See Aggregate Functions !!
|
|<window_function> ::=
| !! See Window Functions !!
SELECT
Columns ListArgument | Description |
---|---|
qualifier | Name of relation (view, stored procedure, derived table); or an alias for it |
collation | Only for character-type columns: a collation name that exists and is valid for the character set of the data |
alias | Column or field alias |
col_name | Name of a table or view column |
selectable_SP_outparm | Declared name of an output parameter of a selectable stored procedure |
literal | A literal |
context-variable | Context variable |
function-call | Scalar, aggregate, or window function expression |
single-value-subselect | A subquery returning one scalar value (singleton) |
CASE-construct | CASE construct setting conditions for a return value |
It is always valid to qualify a column name (or
) with the name or alias of the table, view or selectable SP to which it belongs, followed by a dot (*
).
For example, .
relationname.columnname
, relationname.*
, alias.columnname
, alias.*
.
Qualifying is required if the column name occurs in more than one relation taking part in a join.
Qualifying
is always mandatory if it is not the only item in the column list.*
Aliases hide the original relation name: once a table, view or procedure has been aliased, only the alias can be used as its qualifier throughout the query. The relation name itself becomes unavailable.
The column list may optionally be preceded by one of the keywords DISTINCT
or ALL
:
DISTINCT
filters out any duplicate rows. That is, if two or more rows have the same values in every corresponding column, only one of them is included in the result setALL
is the default: it returns all of the rows, including duplicates.ALL
is rarely used; it is supported for compliance with the SQL standard.
A COLLATE
clause will not change the appearance of the column as such.
However, if the specified collation changes the case or accent sensitivity of the column, it may influence:
The ordering, if an
ORDER BY
clause is also present and it involves that columnGrouping, if the column is part of a
GROUP BY
clauseThe rows retrieved (and hence the total number of rows in the result set), if
DISTINCT
is used
Examples of SELECT
queries with different types of column lists
A simple SELECT
using only column names:
|select cust_id, cust_name, phone
|from customers
|where city = 'London'
A query featuring a concatenation expression and a function call in the columns list:
|select 'Mr./Mrs. ' || lastname, street, zip, upper(city)
|from contacts
|where date_last_purchase(id) = current_date
A query with two subselects:
|select p.fullname,
|(select name from classes c where c.id = p.class) as class,
|(select name from mentors m where m.id = p.mentor) as mentor
|from pupils p
The following query accomplishes the same as the previous one using joins instead of subselects:
|select p.fullname,
|c.name as class,
|m.name as mentor
|join classes c on c.id = p.class
|from pupils p
|join mentors m on m.id = p.mentor
This query uses a CASE
construct to determine the correct title, e.g.
when sending mail to a person:
|select case upper(sex)
|when 'F' then 'Mrs.'
|when 'M' then 'Mr.'
|else ''
|end as title,
|lastname,
|address
|from employees
Query using a window function. Ranks employees by salary.
|SELECT
|id,
|salary,
|name ,
|DENSE_RANK() OVER (ORDER BY salary) AS EMP_RANK
|FROM employees
|ORDER BY salary;
Querying a selectable stored procedure:
|select * from interesting_transactions(2010, 3, 'S')
|order by amount
Selecting from columns of a derived table.
A derived table is a parenthesized SELECT
statement whose result set is used in an enclosing query as if it were a regular table or view.
The derived table is shown in bold here:
|select fieldcount,
|count(relation) as num_tables
|from (select r.rdb$relation_name as relation,
|count(*) as fieldcount
|from rdb$relations r
|join rdb$relation_fields rf
|on rf.rdb$relation_name = r.rdb$relation_name
|group by relation)
|group by fieldcount
Asking the time through a context variable (CURRENT_TIME
):
|select current_time from rdb$database
For those not familiar with RDB$DATABASE
: this is a system table that is present in all Firebird databases and is guaranteed to contain exactly one row.
Although it wasn’t created for this purpose, it has become standard practice among Firebird programmers to select from this table if you want to select from nothing
, i.e. if you need data that are not bound to a table or view, but can be derived from the expressions in the output columns alone.
Another example is:
|select power(12, 2) as twelve_squared, power(12, 3) as twelve_cubed
|from rdb$database
Finally, an example where you select some meaningful information from RDB$DATABASE
itself:
|select rdb$character_set_name from rdb$database
As you may have guessed, this will give you the default character set of the database.
See alsoFunctions, Aggregate Functions, Window Functions, Context Variables, CASE
, Subqueries
6.1.3. The FROM
clause
The FROM
clause specifies the source(s) from which the data are to be retrieved.
In its simplest form, this is just a single table or view.
However, the source can also be a selectable stored procedure, a derived table, or a common table expression.
Multiple sources can be combined using various types of joins.
This section focuses on single-source selects. Joins are discussed in a following section.
Syntax
|
SELECT
| ...
| FROM <source>
| [<joins>]
| [...]
|
|<source> ::=
| table [[AS] alias]
| | selectable-stored-procedure [(<args>)] [[AS] alias]
| | <derived-table>
| | LATERAL <derived-table>
|
|<derived-table> ::=
| (<select-statement>) [[AS] alias] [(<column-aliases>)]
|
|<column-aliases> ::= column-alias [, column-alias ...]
FROM
ClauseArgument | Description |
---|---|
table | Name of a table, view or CTE |
selectable-stored-procedure | Name of a selectable stored procedure |
args | Selectable stored procedure arguments |
derived-table | Derived table query expression |
select-statement | Any SELECT statement |
column-aliases | Alias for a column in a relation, CTE or derived table |
alias | The alias of a data source (table, view, procedure, CTE, derived table) |
6.1.3.1. Selecting FROM
a table or view
When selecting from a single table or view, the FROM
clause requires nothing more than the name.
An alias may be useful or even necessary if there are subqueries that refer to the main select statement (as they often do — subqueries like this are called correlated subqueries).
Examples
|select id, name, sex, age from actors
|where state = 'Ohio'
|select * from birds
|where type = 'flightless'
|order by family, genus, species
|select firstname,
|middlename,
|lastname,
|date_of_birth,
|(select name from schools s where p.school = s.id) schoolname
|from pupils p
|where year_started = '2012'
|order by schoolname, date_of_birth
If you specify an alias for a table or a view, you must always use this alias in place of the table name whenever you query the columns of the relation (and wherever else you make a reference to columns, such as ORDER BY
, GROUP BY
and WHERE
clauses).
Correct use:
|SELECT PEARS
|FROM FRUIT;
||
SELECT FRUIT.PEARS
|FROM FRUIT;
||
SELECT PEARS
|FROM FRUIT F;
||
SELECT F.PEARS
|FROM FRUIT F;
Incorrect use:
|SELECT FRUIT.PEARS
|FROM FRUIT F;
6.1.3.2. Selecting FROM
a stored procedure
A selectable stored procedure is a procedure that:
contains at least one output parameter, and
utilizes the
SUSPEND
keyword so the caller can fetch the output rows one by one, just as when selecting from a table or view.
The output parameters of a selectable stored procedure correspond to the columns of a regular table.
Selecting from a stored procedure without input parameters is just like selecting from a table or view:
|select * from suspicious_transactions
|where assignee = 'John'
Any required input parameters must be specified after the procedure name, enclosed in parentheses:
|select name, az, alt from visible_stars('Brugge', current_date, '22:30')
|where alt >= 20
|order by az, alt
Values for optional parameters (that is, parameters for which default values have been defined) may be omitted or provided. However, if you provide them only partly, the parameters you omit must all be at the tail end.
Supposing that the procedure visible_stars
from the previous example has two optional parameters: min_magn
(numeric(3,1)
) and spectral_class
(varchar(12)
), the following queries are all valid:
|select name, az, alt
|from visible_stars('Brugge', current_date, '22:30');
||
select name, az, alt
|from visible_stars('Brugge', current_date, '22:30', 4.0);
||
select name, az, alt
|from visible_stars('Brugge', current_date, '22:30', 4.0, 'G');
But this one isn’t, because there’s a hole
in the parameter list:
|select name, az, alt
|from visible_stars('Brugge', current_date, '22:30', 'G');
An alias for a selectable stored procedure is specified after the parameter list:
|select
|number,
|(select name from contestants c where c.number = gw.number)
|from get_winners('#34517', 'AMS') gw
If you refer to an output parameter (column
) by qualifying it with the full procedure name, the procedure alias should be omitted:
|select
|number,
|(select name from contestants c where c.number = get_winners.number)
|from get_winners('#34517', 'AMS')
See alsoStored Procedures, CREATE PROCEDURE
6.1.3.3. Selecting FROM
a derived table
A derived table is a valid SELECT
statement enclosed in parentheses, optionally followed by a table alias and/or column aliases.
The result set of the statement acts as a virtual table which the enclosing statement can query.
Syntax
|
(<select-query>)
| [[AS] derived-table-alias]
| [(<derived-column-aliases>)]
|
|<derived-column-aliases> := column-alias [, column-alias ...]
The set returned data set by this
style of statement is a virtual table that can be queried within the enclosing statement, as if it were a regular table or view.SELECT FROM (SELECT FROM..)
LATERAL
Derived Tables
The keyword LATERAL
marks a table as a lateral derived table.
Lateral derived tables can reference tables (including derived tables) that occur earlier in the FROM
clause.
See Section 6.1.4.7, “Joins with LATERAL
Derived Tables” for more information.
Example using a derived table
The derived table in the query below returns the list of table names in the database, and the number of columns in each table.
A drill-down
query on the derived table returns the counts of fields and the counts of tables having each field count:
|SELECT
|FIELDCOUNT,
|COUNT(RELATION) AS NUM_TABLES
|FROM (SELECT
|R.RDB$RELATION_NAME RELATION,
|COUNT(*) AS FIELDCOUNT
|FROM RDB$RELATIONS R
|JOIN RDB$RELATION_FIELDS RF
|ON RF.RDB$RELATION_NAME = R.RDB$RELATION_NAME
|GROUP BY RELATION)
|GROUP BY FIELDCOUNT
A trivial example demonstrating how the alias of a derived table and the list of column aliases (both optional) can be used:
|SELECT
|DBINFO.DESCR, DBINFO.DEF_CHARSET
|FROM (SELECT *
|FROM RDB$DATABASE) DBINFO
|(DESCR, REL_ID, SEC_CLASS, DEF_CHARSET)
Derived tables can
be nested
be unions, and can be used in unions
contain aggregate functions, subqueries and joins
be used in aggregate functions, subqueries and joins
be calls to selectable stored procedures or queries to them
have
WHERE
,ORDER BY
andGROUP BY
clauses,FIRST
/SKIP
orROWS
directives, et al.
Furthermore,
Each column in a derived table must have a name. If it does not have a name, such as when it is a constant or a run-time expression, it should be given an alias, either in the regular way or by including it in the list of column aliases in the derived table’s specification.
The list of column aliases is optional but, if it exists, it must contain an alias for every column in the derived table
The optimizer can process derived tables very effectively. However, if a derived table is included in an inner join and contains a subquery, the optimizer will be unable to use any join order.
A more useful example
Suppose we have a table COEFFS
which contains the coefficients of a number of quadratic equations we have to solve.
It has been defined like this:
|create table coeffs (
|a double precision not null,
|b double precision not null,
|c double precision not null,
|constraint chk_a_not_zero check (a <> 0)
|)
Depending on the values of a
, b
and c
, each equation may have zero, one or two solutions.
It is possible to find these solutions with a single-level query on table COEFFS
, but the code will look rather messy and several values (like the discriminant) will have to be calculated multiple times per row.
A derived table can help keep things clean here:
|select
|iif (D >= 0, (-b - sqrt(D)) / denom, null) sol_1,
|iif (D > 0, (-b + sqrt(D)) / denom, null) sol_2
|from
|(select b, b*b - 4*a*c, 2*a from coeffs) (b, D, denom)
If we want to show the coefficients next to the solutions (which may not be a bad idea), we can alter the query like this:
|select
|a, b, c,
|iif (D >= 0, (-b - sqrt(D)) / denom, null) sol_1,
|iif (D > 0, (-b + sqrt(D)) / denom, null) sol_2
|from
|(select a, b, c, b*b - 4*a*c as D, 2*a as denom
|from coeffs)
Notice that whereas the first query used a column aliases list for the derived table, the second adds aliases internally where needed. Both methods work, as long as every column is guaranteed to have a name.
All columns in the derived table will be evaluated as many times as they are specified in the main query. This is important, as it can lead to unexpected results when using non-deterministic functions. The following shows an example of this.
|SELECT
|UUID_TO_CHAR(X) AS C1,
|UUID_TO_CHAR(X) AS C2,
|UUID_TO_CHAR(X) AS C3
|FROM (SELECT GEN_UUID() AS X
|FROM RDB$DATABASE) T;
The result if this query produces three different values:
|C1 80AAECED-65CD-4C2F-90AB-5D548C3C7279
|C2 C1214CD3-423C-406D-B5BD-95BF432ED3E3
|C3 EB176C10-F754-4689-8B84-64B666381154
To ensure a single result of the GEN_UUID
function, you can use the following method:
|SELECT
|UUID_TO_CHAR(X) AS C1,
|UUID_TO_CHAR(X) AS C2,
|UUID_TO_CHAR(X) AS C3
|FROM (SELECT GEN_UUID() AS X
|FROM RDB$DATABASE
|UNION ALL
|SELECT NULL FROM RDB$DATABASE WHERE 1 = 0) T;
This query produces a single result for all three columns:
|C1 80AAECED-65CD-4C2F-90AB-5D548C3C7279
|C2 80AAECED-65CD-4C2F-90AB-5D548C3C7279
|C3 80AAECED-65CD-4C2F-90AB-5D548C3C7279
An alternative solution is to wrap the GEN_UUID
query in a subquery:
|SELECT
|UUID_TO_CHAR(X) AS C1,
|UUID_TO_CHAR(X) AS C2,
|UUID_TO_CHAR(X) AS C3
|FROM (SELECT
|(SELECT GEN_UUID() FROM RDB$DATABASE) AS X
|FROM RDB$DATABASE) T;
This is an artifact of the current implementation. This behaviour may change in a future Firebird version.
6.1.3.4. Selecting FROM
a Common Table Expression (CTE)
A common table expression — or CTE — is a more complex variant of the derived table, but it is also more powerful.
A preamble, starting with the keyword WITH
, defines one or more named CTE's, each with an optional column aliases list.
The main query, which follows the preamble, can then access these CTE's as if they were regular tables or views.
The CTE's go out of scope once the main query has run to completion.
For a full discussion of CTE's, please refer to the section Section 6.1.16, “Common Table Expressions (
)”.WITH … AS … SELECT
The following is a rewrite of our derived table example as a CTE:
|with vars (b, D, denom) as (
|select b, b*b - 4*a*c, 2*a from coeffs
|)
|select
|iif (D >= 0, (-b - sqrt(D)) / denom, null) sol_1,
|iif (D > 0, (-b + sqrt(D)) / denom, null) sol_2
|from vars
Except for the fact that the calculations that have to be made first are now at the beginning, this isn’t a great improvement over the derived table version.
However, we can now also eliminate the double calculation of sqrt(D)
for every row:
|with vars (b, D, denom) as (
|select b, b*b - 4*a*c, 2*a from coeffs
|),
|vars2 (b, D, denom, sqrtD) as (
|select b, D, denom, iif (D >= 0, sqrt(D), null) from vars
|)
|select
|iif (D >= 0, (-b - sqrtD) / denom, null) sol_1,
|iif (D > 0, (-b + sqrtD) / denom, null) sol_2
|from vars2
The code is a little more complicated now, but it might execute more efficiently (depending on what takes more time: executing the SQRT
function or passing the values of b
, D
and denom
through an extra CTE).
Incidentally, we could have done the same with derived tables, but that would involve nesting.
All columns in the CTE will be evaluated as many times as they are specified in the main query. This is important, as it can lead to unexpected results when using non-deterministic functions. The following shows an example of this.
|WITH T (X) AS (
|SELECT GEN_UUID()
|FROM RDB$DATABASE)
|SELECT
|UUID_TO_CHAR(X) as c1,
|UUID_TO_CHAR(X) as c2,
|UUID_TO_CHAR(X) as c3
|FROM T
The result if this query produces three different values:
|C1 80AAECED-65CD-4C2F-90AB-5D548C3C7279
|C2 C1214CD3-423C-406D-B5BD-95BF432ED3E3
|C3 EB176C10-F754-4689-8B84-64B666381154
To ensure a single result of the GEN_UUID
function, you can use the following method:
|WITH T (X) AS (
|SELECT GEN_UUID()
|FROM RDB$DATABASE
|UNION ALL
|SELECT NULL FROM RDB$DATABASE WHERE 1 = 0)
|SELECT
|UUID_TO_CHAR(X) as c1,
|UUID_TO_CHAR(X) as c2,
|UUID_TO_CHAR(X) as c3
|FROM T;
This query produces a single result for all three columns:
|C1 80AAECED-65CD-4C2F-90AB-5D548C3C7279
|C2 80AAECED-65CD-4C2F-90AB-5D548C3C7279
|C3 80AAECED-65CD-4C2F-90AB-5D548C3C7279
An alternative solution is to wrap the GEN_UUID
query in a subquery:
|WITH T (X) AS (
|SELECT (SELECT GEN_UUID() FROM RDB$DATABASE)
|FROM RDB$DATABASE)
|SELECT
|UUID_TO_CHAR(X) as c1,
|UUID_TO_CHAR(X) as c2,
|UUID_TO_CHAR(X) as c3
|FROM T;
This is an artifact of the current implementation. This behaviour may change in a future Firebird version.
See alsoSection 6.1.16, “Common Table Expressions (
)”.WITH … AS … SELECT
6.1.4. Joins
Joins combine data from two sources into a single set.
This is done on a row-by-row basis and usually involves checking a join condition in order to determine which rows should be merged and appear in the resulting dataset.
There are several types (INNER
, OUTER
) and classes (qualified, natural, etc.) of joins, each with its own syntax and rules.
Since joins can be chained, the datasets involved in a join may themselves be joined sets.
Syntax
|
SELECT
| ...
| FROM <source>
| [<joins>]
| [...]
|
|<source> ::=
| table [[AS] alias]
| | selectable-stored-procedure [(<args>)] [[AS] alias]
| | <derived-table>
| | LATERAL <derived-table>
|
|<joins> ::= <join> [<join> ...]
|
|<join> ::=
| [<join-type>] JOIN <source> <join-condition>
| | NATURAL [<join-type>] JOIN <source>
| | {CROSS JOIN | ,} <source>
|
|<join-type> ::= INNER | {LEFT | RIGHT | FULL} [OUTER]
|
|<join-condition> ::= ON <condition> | USING (<column-list>)
JOIN
ClausesArgument | Description |
---|---|
table | Name of a table, view or CTE |
selectable-stored-procedure | Name of a selectable stored procedure |
args | Selectable stored procedure input parameter(s) |
derived-table | Derived table query expression |
alias | An alias for a data source (table, view, procedure, CTE, derived table) |
condition | Join condition (criterion) |
column-list | The list of columns used for an equi-join |
6.1.4.1. Inner vs. Outer Joins
A join always combines data rows from two sets (usually referred to as the left set and the right set). By default, only rows that meet the join condition (i.e. that match at least one row in the other set when the join condition is applied) make it into the result set. This default type of join is called an inner join. Suppose we have the following two tables:
ID | S |
---|---|
87 | Just some text |
235 | Silence |
CODE | X |
---|---|
-23 | 56.7735 |
87 | 416.0 |
If we join these tables like this:
|select *
|from A
|join B on A.id = B.code;
then the result set will be:
ID | S | CODE | X |
---|---|---|---|
87 | Just some text | 87 | 416.0 |
The first row of A
has been joined with the second row of B
because together they met the condition
.
The other rows from the source tables have no match in the opposite set and are therefore not included in the join.
Remember, this is an A.id = B.code
INNER
join.
We can make that fact explicit by writing:
|select *
|from A
|inner join B on A.id = B.code;
However, since INNER
is the default, it is usually ommitted.
It is perfectly possible that a row in the left set matches several rows from the right set or vice versa. In that case, all those combinations are included, and we can get results like:
ID | S | CODE | X |
---|---|---|---|
87 | Just some text | 87 | 416.0 |
87 | Just some text | 87 | -1.0 |
-23 | Don’t know | -23 | 56.7735 |
-23 | Still don’t know | -23 | 56.7735 |
-23 | I give up | -23 | 56.7735 |
Sometimes we want (or need) all the rows of one or both of the sources to appear in the joined set, regardless of whether they match a record in the other source.
This is where outer joins come in.
A LEFT
outer join includes all the records from the left set, but only matching records from the right set.
In a RIGHT
outer join it’s the other way around.
FULL
outer joins include all the records from both sets.
In all outer joins, the holes
(the places where an included source record doesn’t have a match in the other set) are filled up with NULL
s.
In order to make an outer join, you must specify LEFT
, RIGHT
or FULL
, optionally followed by the keyword OUTER
.
Below are the results of the various outer joins when applied to our original tables A
and B
:
|select *
|from A
|left [outer] join B on A.id = B.code;
ID | S | CODE | X |
---|---|---|---|
87 | Just some text | 87 | 416.0 |
235 | Silence | <null> | <null> |
|select *
|from A
|right [outer] join B on A.id = B.code
ID | S | CODE | X |
---|---|---|---|
<null> | <null> | -23 | 56.7735 |
87 | Just some text | 87 | 416.0 |
|select *
|from A
|full [outer] join B on A.id = B.code
ID | S | CODE | X |
---|---|---|---|
<null> | <null> | -23 | 56.7735 |
87 | Just some text | 87 | 416.0 |
235 | Silence | <null> | <null> |
6.1.4.2. Qualified joins
Qualified joins specify conditions for the combining of rows.
This happens either explicitly in an ON
clause or implicitly in a USING
clause.
Syntax
|
<qualified-join> ::= [<join-type>] JOIN <source> <join-condition>
|
|<join-type> ::= INNER | {LEFT | RIGHT | FULL} [OUTER]
|
|<join-condition> ::= ON <condition> | USING (<column-list>)
6.1.4.2.1. Explicit-condition joins
Most qualified joins have an ON
clause, with an explicit condition that can be any valid Boolean expression, but usually involves some comparison between the two sources involved.
Quite often, the condition is an equality test (or a number of AND
ed equality tests) using the
operator.
Joins like these are called equi-joins.
(The examples in the section on inner and outer joins were al equi-joins.)=
Examples of joins with an explicit condition:
|/* Select all Detroit customers who made a purchase
|in 2013, along with the purchase details: */
|select * from customers c
|join sales s on s.cust_id = c.id
|where c.city = 'Detroit' and s.year = 2013;
|/* Same as above, but include non-buying customers: */
|select * from customers c
|left join sales s on s.cust_id = c.id
|where c.city = 'Detroit' and s.year = 2013;
|/* For each man, select the women who are taller than he.
|Men for whom no such woman exists are not included. */
|select m.fullname as man, f.fullname as woman
|from males m
|join females f on f.height > m.height;
|/* Select all pupils with their class and mentor.
|Pupils without a mentor are also included.
|Pupils without a class are not included. */
|select p.firstname, p.middlename, p.lastname,
|c.name, m.name
|from pupils p
|join classes c on c.id = p.class
|left join mentors m on m.id = p.mentor;
6.1.4.2.2. Named columns joins
Equi-joins often compare columns that have the same name in both tables. If this is the case, we can also use the second type of qualified join: the named columns join.
Named columns joins are not supported in Dialect 1 databases.
Named columns joins have a USING
clause which states just the column names.
So instead of this:
|select * from flotsam f
|join jetsam j
|on f.sea = j.sea
|and f.ship = j.ship;
we can also write:
|select * from flotsam
|join jetsam using (sea, ship)
which is considerably shorter.
The result set is a little different though — at least when using
:SELECT *
The explicit-condition join — with the
ON
clause — will contain each of the columnsSEA
andSHIP
twice: once from tableFLOTSAM
, and once from tableJETSAM
. Obviously, they will have the same values.The named columns join — with the
USING
clause — will contain these columns only once.
If you want all the columns in the result set of the named columns join, set up your query like this:
|select f.*, j.*
|from flotsam f
|join jetsam j using (sea, ship);
This will give you the exact same result set as the explicit-condition join.
For an OUTER
named columns join, there’s an additional twist when using
or an unqualified column name from the SELECT *
USING
list:
If a row from one source set doesn’t have a match in the other but must still be included because of the LEFT
, RIGHT
or FULL
directive, the merged column in the joined set gets the non-NULL
value.
That is fair enough, but now you can’t tell whether this value came from the left set, the right set, or both.
This can be especially deceiving when the value came from the right hand set, because
always shows combined columns in the left hand part — even in the case of a *
RIGHT
join.
Whether this is a problem or not depends on the situation.
If it is, use the
approach shown above, with a.*, b.*
a
and b
the names or aliases of the two sources.
Or better yet, avoid
altogether in your serious queries and qualify all column names in joined sets.
This has the additional benefit that it forces you to think about which data you want to retrieve and where from.*
It is your responsibility to make sure the column names in the USING
list are of compatible types between the two sources.
If the types are compatible but not equal, the engine converts them to the type with the broadest range of values before comparing the values.
This will also be the data type of the merged column that shows up in the result set if
or the unqualified column name is used.
Qualified columns on the other hand will always retain their original data type.SELECT *
If, when joining by named columns, you are using a join column in the WHERE
clause, always use the qualified column name, otherwise an index on this column will not be used.
|SELECT 1 FROM t1 a JOIN t2 b USING (x) WHERE x = 0;
||
-- PLAN JOIN (A NATURAL , B INDEX (RDB$2))
However:
|SELECT 1 FROM t1 a JOIN t2 b USING (x) WHERE a.x = 0;
|-- PLAN JOIN (A INDEX (RDB$1), B INDEX (RDB$2))
||
SELECT 1 FROM t1 a JOIN t2 b USING (x) WHERE b.x = 0;
|-- PLAN JOIN (A INDEX (RDB$1), B INDEX (RDB$2))
The fact is, the unspecified column in this case is implicitly replaced by `COALESCE(a.x, b.x). This clever trick is used to disambiguate column names, but it also interferes with the use of the index.
6.1.4.3. Natural joins
Taking the idea of the named columns join a step further, a natural join performs an automatic equi-join on all the columns that have the same name in the left and right table. The data types of these columns must be compatible.
Natural joins are not supported in Dialect 1 databases.
Syntax
|
<natural-join> ::= NATURAL [<join-type>] JOIN <source>
|
|<join-type> ::= INNER | {LEFT | RIGHT | FULL} [OUTER]
Given these two tables:
|create table TA (
|a bigint,
|s varchar(12),
|ins_date date
|);
|create table TB (
|a bigint,
|descr varchar(12),
|x float,
|ins_date date
|);
A natural join on TA
and TB
would involve the columns a
and ins_date
, and the following two statements would have the same effect:
|select * from TA
|natural join TB;
|select * from TA
|join TB using (a, ins_date);
Like all joins, natural joins are inner joins by default, but you can turn them into outer joins by specifying LEFT
, RIGHT
or FULL
before the JOIN
keyword.
If there are no columns with the same name in the two source relations, a CROSS JOIN
is performed.
We’ll get to this type of join in a minute.
6.1.4.4. Cross joins
A cross join produces the full set product of the two data sources. This means that it successfully matches every row in the left source to every row in the right source.
Syntax
|
<cross-join> ::= {CROSS JOIN | ,} <source>
Use of the comma syntax is discouraged, and we recommend using the explicit join syntax.
Cross-joining two sets is equivalent to joining them on a tautology (a condition that is always true). The following two statements have the same effect:
|select * from TA
|cross join TB;
|select * from TA
|join TB on 1 = 1;
Cross joins are inner joins, because they only include matching records – it just so happens that every record matches! An outer cross join, if it existed, wouldn’t add anything to the result, because what outer joins add are non-matching records, and these don’t exist in cross joins.
Cross joins are seldom useful, except if you want to list all the possible combinations of two or more variables. Suppose you are selling a product that comes in different sizes, different colors and different materials. If these variables are each listed in a table of their own, this query would return all the combinations:
|select m.name, s.size, c.name
|from materials m
|cross join sizes s
|cross join colors c;
6.1.4.4.1. Implicit Joins
In the SQL:89 standard, the tables involved in a join were specified as a comma-delimited list in the FROM
clause (in other words, a cross join).
The join conditions were then specified in the WHERE
clause among other search terms.
This type of join is called an implicit join.
An example of an implicit join:
|/*
|* A sample of all Detroit customers who
|* made a purchase.
|*/
|SELECT *
|FROM customers c, sales s
|WHERE s.cust_id = c.id AND c.city = 'Detroit'
6.1.4.4.2. Mixing Explicit and Implicit Joins
Mixing explicit and implicit joins is not recommend, but is allowed. However, some types of mixing are not supported by Firebird.
For example, the following query will raise the error Column does not belong to referenced table
|SELECT *
|FROM TA, TB
|JOIN TC ON TA.COL1 = TC.COL1
|WHERE TA.COL2 = TB.COL2
That is because the explicit join cannot see the TA
table.
However, the next query will complete without error, since the restriction is not violated.
|SELECT *
|FROM TA, TB
|JOIN TC ON TB.COL1 = TC.COL1
|WHERE TA.COL2 = TB.COL2
6.1.4.5. A Note on Equality
This note about equality and inequality operators applies everywhere in Firebird’s SQL language, not just in JOIN
conditions.
The
operator, which is explicitly used in many conditional joins and implicitly in named column joins and natural joins, only matches values to values.
According to the SQL standard, =
NULL
is not a value and hence two NULL
s are neither equal nor unequal to one another.
If you need NULL
s to match each other in a join, use the IS NOT DISTINCT FROM
operator.
This operator returns true if the operands have the same value or if they are both NULL
.
|select *
|from A join B
|on A.id is not distinct from B.code;
Likewise, in the — extremely rare — cases where you want to join on inequality, use IS DISTINCT FROM
, not
, if you want <>
NULL
to be considered different from any value and two NULL
s considered equal:
|select *
|from A join B
|on A.id is distinct from B.code;
6.1.4.6. Ambiguous field names in joins
Firebird rejects unqualified field names in a query if these field names exist in more than one dataset involved in a join.
This is even true for inner equi-joins where the field name figures in the ON
clause like this:
|select a, b, c
|from TA
|join TB on TA.a = TB.a;
There is one exception to this rule: with named columns joins and natural joins, the unqualified field name of a column taking part in the matching process may be used legally and refers to the merged column of the same name.
For named columns joins, these are the columns listed in the USING
clause.
For natural joins, they are the columns that have the same name in both relations.
But please notice again that, especially in outer joins, plain colname
isn’t always the same as left.colname
or right.colname
.
Types may differ, and one of the qualified columns may be NULL
while the other isn’t.
In that case, the value in the merged, unqualified column may mask the fact that one of the source values is absent.
6.1.4.7. Joins with LATERAL
Derived Tables
A derived table defined with the LATERAL
keyword is called a lateral derived table.
If a derived table is defined as lateral, then it is allowed to refer to other tables in the same FROM
clause, but only those declared before it in the FROM
clause.
Lateral Derived Table Examples
|
/* select customers with their last order date and number */
|select c.name, ox.order_date as last_order, ox.number
|from customer c
| left join LATERAL (
| select first 1 o.order_date, o.number
| from orders o
| where o.id_customer = c.id
| order by o.ORDER_DATE desc
| ) as ox on true
|--
|select dt.population, dt.city_name, c.country_name
|from (select distinct country_name from cities) AS c
| cross join LATERAL (
| select first 1 city_name, population
| from cities
| where cities.country_name = c.country_name
| order by population desc
| ) AS dt;
|--
|select salespeople.name,
| max_sale.amount,
| customer_of_max_sale.customer_name
|from salespeople,
| LATERAL ( select max(amount) as amount
| from all_sales
| where all_sales.salesperson_id = salespeople.id
| ) as max_sale,
| LATERAL ( select customer_name
| from all_sales
| where all_sales.salesperson_id = salespeople.id
| and all_sales.amount = max_sale.amount
| ) as customer_of_max_sale;
6.1.5. The WHERE
clause
The WHERE
clause serves to limit the rows returned to the ones that the caller is interested in.
The condition following the keyword WHERE
can be as simple as a check like
or it can be a multilayered, convoluted expression containing subselects, predicates, function calls, mathematical and logical operators, context variables and more.AMOUNT = 3
The condition in the WHERE
clause is often called the search condition, the search expression or simply the search.
In DSQL and ESQL, the search expression may contain parameters.
This is useful if a query has to be repeated a number of times with different input values.
In the SQL string as it is passed to the server, question marks are used as placeholders for the parameters.
They are called positional parameters because they can only be told apart by their position in the string.
Connectivity libraries often support named parameters of the form :id
, :amount
, :a
etc.
These are more user-friendly;
the library takes care of translating the named parameters to positional parameters before passing the statement to the server.
The search condition may also contain local (PSQL) or host (ESQL) variable names, preceded by a colon.
Syntax
|
SELECT ...
| FROM ...
| [...]
| WHERE <search-condition>
| [...]
WHERE
Parameter | Description |
---|---|
search-condition | A Boolean expression returning TRUE, FALSE or possibly UNKNOWN (NULL) |
Only those rows for which the search condition evaluates to TRUE
are included in the result set.
Be careful with possible NULL
outcomes: if you negate a NULL
expression with NOT
, the result will still be NULL
and the row will not pass.
This is demonstrated in one of the examples below.
Examples
|select genus, species from mammals
|where family = 'Felidae'
|order by genus;
|select * from persons
|where birthyear in (1880, 1881)
|or birthyear between 1891 and 1898;
|select name, street, borough, phone
|from schools s
|where exists (select * from pupils p where p.school = s.id)
|order by borough, street;
|select * from employees
|where salary >= 10000 and position <> 'Manager';
|select name from wrestlers
|where region = 'Europe'
|and weight > all (select weight from shot_putters
|where region = 'Africa');
|select id, name from players
|where team_id = (select id from teams where name = 'Buffaloes');
|select sum (population) from towns
|where name like '%dam'
|and province containing 'land';
|select password from usertable
|where username = current_user;
The following example shows what can happen if the search condition evaluates to NULL
.
Suppose you have a table listing some children’s names and the number of marbles they possess. At a certain moment, the table contains these data:
CHILD | MARBLES |
---|---|
Anita | 23 |
Bob E. | 12 |
Chris | <null> |
Deirdre | 1 |
Eve | 17 |
Fritz | 0 |
Gerry | 21 |
Hadassah | <null> |
Isaac | 6 |
First, please notice the difference between NULL
and 0: Fritz is known to have no marbles at all, Chris’s and Hadassah’s marble counts are unknown.
Now, if you issue this SQL statement:
|select list(child) from marbletable where marbles > 10;
you will get the names Anita, Bob E., Eve and Gerry. These children all have more than 10 marbles.
If you negate the expression:
|select list(child) from marbletable where not marbles > 10
it’s the turn of Deirdre, Fritz and Isaac to fill the list. Chris and Hadassah are not included, because they aren’t known to have ten marbles or less. Should you change that last query to:
|select list(child) from marbletable where marbles <= 10;
the result will still be the same, because the expression NULL <= 10
yields UNKNOWN
.
This is not the same as TRUE
, so Chris and Hadassah are not listed.
If you want them listed with the poor
children, change the query to:
|select list(child) from marbletable
|where marbles <= 10 or marbles is null;
Now the search condition becomes true for Chris and Hadassah, because
obviously returns marbles is null
TRUE
in their case.
In fact, the search condition cannot be NULL
for anybody now.
Lastly, two examples of SELECT
queries with parameters in the search.
It depends on the application how you should define query parameters and even if it is possible at all.
Notice that queries like these cannot be executed immediately: they have to be prepared first.
Once a parameterized query has been prepared, the user (or calling code) can supply values for the parameters and have it executed many times, entering new values before every call.
How the values are entered and the execution started is up to the application.
In a GUI environment, the user typically types the parameter values in one or more text boxes and then clicks an Execute
, Run
or Refresh
button.
|select name, address, phone frome stores
|where city = ? and class = ?;
|select * from pants
|where model = :model and size = :size and color = :col;
The last query cannot be passed directly to the engine; the application must convert it to the other format first, mapping named parameters to positional parameters.
6.1.6. The GROUP BY
clause
GROUP BY
merges output rows that have the same combination of values in its item list into a single row.
Aggregate functions in the select list are applied to each group individually instead of to the dataset as a whole.
If the select list only contains aggregate columns or, more generally, columns whose values don’t depend on individual rows in the underlying set, GROUP BY
is optional.
When omitted, the final result set of will consist of a single row (provided that at least one aggregated column is present).
If the select list contains both aggregate columns and columns whose values may vary per row, the GROUP BY
clause becomes mandatory.
Syntax
|
SELECT ... FROM ...
| GROUP BY <grouping-item> [, <grouping-item> ...]
| [HAVING <grouped-row-condition>]
| ...
|
|<grouping-item> ::=
| <non-aggr-select-item>
| | <non-aggr-expression>
|
|<non-aggr-select-item> ::=
| column-copy
| | column-alias
| | column-position
GROUP BY
ClauseArgument | Description |
---|---|
non-aggr-expression | Any non-aggregating expression that is not included in the |
column-copy | A literal copy, from the |
column-alias | The alias, from the |
column-position | The position number, in the |
A general rule of thumb is that every non-aggregate item in the SELECT
list must also be in the GROUP BY
list.
You can do this in three ways:
By copying the item verbatim from the select list, e.g.
orclass
.'D:' || upper(doccode)
By specifying the column alias, if it exists.
By specifying the column position as an integer literal between 1 and the number of columns. Integer values resulting from expressions or parameter substitutions are simply invariables and will be used as such in the grouping. They will have no effect though, as their value is the same for each row.
If you group by a column position, the expression at that position is copied internally from the select list. If it concerns a subquery, that subquery will be executed again in the grouping phase. That is to say, grouping by the column position, rather than duplicating the subquery expression in the grouping clause, saves keystrokes and bytes, but it is not a way of saving processing cycles!
In addition to the required items, the grouping list may also contain:
Columns from the source table that are not in the select list, or non-aggregate expressions based on such columns. Adding such columns may further subdivide the groups. However, since these columns are not in the select list, you can’t tell which aggregated row corresponds to which value in the column. So, in general, if you are interested in this information, you also include the column or expression in the select list — which brings you back to the rule:
every non-aggregate column in the select list must also be in the grouping list
.Expressions that aren’t dependent on the data in the underlying set, e.g. constants, context variables, single-value non-correlated subselects etc. This is only mentioned for completeness, as adding such items is utterly pointless: they don’t affect the grouping at all.
Harmless but useless
items like these may also figure in the select list without being copied to the grouping list.
Examples
When the select list contains only aggregate columns, GROUP BY
is not mandatory:
|select count(*), avg(age) from students
|where sex = 'M';
This will return a single row listing the number of male students and their average age.
Adding expressions that don’t depend on values in individual rows of table STUDENTS
doesn’t change that:
|select count(*), avg(age), current_date from students
|where sex = 'M';
The row will now have an extra column showing the current date, but other than that, nothing fundamental has changed.
A GROUP BY
clause is still not required.
However, in both the above examples it is allowed. This is perfectly valid:
|select count(*), avg(age) from students
|where sex = 'M'
|group by class;
This will return a row for each class that has boys in it, listing the number of boys and their average age in that particular class.
(If you also leave the current_date
field in, this value will be repeated on every row, which is not very exciting.)
The above query has a major drawback though: it gives you information about the different classes, but it doesn’t tell you which row applies to which class.
In order to get that extra bit of information, the non-aggregate column CLASS
must be added to the select list:
|select class, count(*), avg(age) from students
|where sex = 'M'
|group by class;
Now we have a useful query.
Notice that the addition of column CLASS
also makes the GROUP BY
clause mandatory.
We can’t drop that clause anymore, unless we also remove CLASS
from the column list.
The output of our last query may look something like this:
CLASS | COUNT | AVG |
---|---|---|
2A | 12 | 13.5 |
2B | 9 | 13.9 |
3A | 11 | 14.6 |
3B | 12 | 14.4 |
… | … | … |
The headings COUNT
and AVG
are not very informative.
In a simple case like this, you might get away with that, but in general you should give aggregate columns a meaningful name by aliasing them:
|select class,
|count(*) as num_boys,
|avg(age) as boys_avg_age
|from students
|where sex = 'M'
|group by class;
As you may recall from the formal syntax of the columns list, the AS
keyword is optional.
Adding more non-aggregate (or rather: row-dependent) columns requires adding them to the GROUP BY
clause too.
For instance, you might want to see the above information for girls as well;
and you may also want to differentiate between boarding and day students:
|select class,
|sex,
|boarding_type,
|count(*) as number,
|avg(age) as avg_age
|from students
|group by class, sex, boarding_type;
This may give you the following result:
CLASS | SEX | BOARDING_TYPE | NUMBER | AVG_AGE |
---|---|---|---|---|
2A | F | BOARDING | 9 | 13.3 |
2A | F | DAY | 6 | 13.5 |
2A | M | BOARDING | 7 | 13.6 |
2A | M | DAY | 5 | 13.4 |
2B | F | BOARDING | 11 | 13.7 |
2B | F | DAY | 5 | 13.7 |
2B | M | BOARDING | 6 | 13.8 |
… | … | … | … | … |
Each row in the result set corresponds to one particular combination of the columns CLASS
, SEX
and BOARDING_TYPE
.
The aggregate results — number and average age — are given for each of these rather specific groups individually.
In a query like this, you don’t see a total for boys as a whole, or day students as a whole.
That’s the tradeoff: the more non-aggregate columns you add, the more you can pinpoint very specific groups, but the more you also lose sight of the general picture.
Of course, you can still obtain the coarser
aggregates through separate queries.
6.1.6.1. HAVING
Just as a WHERE
clause limits the rows in a dataset to those that meet the search condition, so the HAVING
sub-clause imposes restrictions on the aggregated rows in a grouped set.
HAVING
is optional, and can only be used in conjunction with GROUP BY
.
The condition(s) in the HAVING
clause can refer to:
Any aggregated column in the select list. This is the most widely used case.
Any aggregated expression that is not in the select list, but allowed in the context of the query. This is sometimes useful too.
Any column in the
GROUP BY
list. While legal, it is more efficient to filter on these non-aggregated data at an earlier stage: in theWHERE
clause.Any expression whose value doesn’t depend on the contents of the dataset (like a constant or a context variable). This is valid but utterly pointless, because it will either suppress the entire set or leave it untouched, based on conditions that have nothing to do with the set itself.
A HAVING
clause can not contain:
Non-aggregated column expressions that are not in the
GROUP BY
list.Column positions. An integer in the
HAVING
clause is just an integer.Column aliases –- not even if they appear in the
GROUP BY
clause!
Examples
Building on our earlier examples, this could be used to skip small groups of students:
|select class,
|count(*) as num_boys,
|avg(age) as boys_avg_age
|from students
|where sex = 'M'
|group by class
|having count(*) >= 5;
To select only groups that have a minimum age spread:
|select class,
|count(*) as num_boys,
|avg(age) as boys_avg_age
|from students
|where sex = 'M'
|group by class
|having max(age) - min(age) > 1.2;
Notice that if you’re really interested in this information, you’d normally include min(age)
and max(age)
-– or the expression
–- in the select list as well!max(age) - min(age)
To include only 3rd classes:
|select class,
|count(*) as num_boys,
|avg(age) as boys_avg_age
|from students
|where sex = 'M'
|group by class
|having class starting with '3';
Better would be to move this condition to the WHERE
clause:
|select class,
|count(*) as num_boys,
|avg(age) as boys_avg_age
|from students
|where sex = 'M' and class starting with '3'
|group by class;
6.1.7. The WINDOW
Clause
The WINDOW
clause defines one or more named Windows that can be referenced by window functions in the current query specification.
Syntax
|
<query_spec> ::=
| SELECT
| [<limit_clause>]
| [<distinct_clause>]
| <select_list>
| <from_clause>
| [<where_clause>]
| [<group_clause>]
| [<having_clause>]
| [<named_windows_clause>]
| [<plan_clause>]
|
|<named_windows_clause> ::=
| WINDOW <window_definition> [, <window_definition> ...]
|
|<window definition> ::=
| new_window_name AS <window_specification>
|
|<window_specification> ::=
| !! See Window (Analytical) Functions !!
In a query with multiple SELECT
and WINDOW
clauses (for example, with subqueries), the scope of the `new_window_name_ is confined to its query context.
That means a window name from an inner context cannot be used in an outer context, nor vice versa.
However, the same window name can be used independently in different contexts, though to avoid confusion it might be better to avoid this.
For more information, see Chapter 10, Window (Analytical) Functions.
Example Using Named Windows
|select
|id,
|department,
|salary,
|count(*) over w1,
|first_value(salary) over w2,
|last_value(salary) over w2
|from employee
|window w1 as (partition by department),
|w2 as (w1 order by salary)
|order by department, salary;
6.1.8. The PLAN
Clause
The PLAN
clause enables the user to submit a data retrieval plan, thus overriding the plan that the optimizer would have generated automatically.
Syntax
|
PLAN <plan-expr>
|
|<plan-expr> ::=
| (<plan-item> [, <plan-item> ...])
| | <sorted-item>
| | <joined-item>
| | <merged-item>
| | <hash-item>
|
|<sorted-item> ::= SORT (<plan-item>)
|
|<joined-item> ::=
| JOIN (<plan-item>, <plan-item> [, <plan-item> ...])
|
|<merged-item> ::=
| [SORT] MERGE (<sorted-item>, <sorted-item> [, <sorted-item> ...])
|
|<hash-item> ::=
| HASH (<plan-item>, <plan-item> [, <plan-item> ...])
|
|<plan-item> ::= <basic-item> | <plan-expr>
|
|<basic-item> ::=
| <relation> { NATURAL
| | INDEX (<indexlist>)
| | ORDER index [INDEX (<indexlist>)] }
|
|<relation> ::= table | view [table]
|
|<indexlist> ::= index [, index ...]
PLAN
ClauseArgument | Description |
---|---|
table | Table name or its alias |
view | View name |
index | Index name |
Every time a user submits a query to the Firebird engine, the optimizer computes a data retrieval strategy.
Most Firebird clients can make this retrieval plan visible to the user.
In Firebird’s own isql
utility, this is done with the command SET PLAN ON
.
If you are studying query plans rather than running queries, SET PLANONLY ON
will show the plan without executing the query.
Use SET PLANONLY OFF
to execute the query and show the plan.
A more detailed plan can be obtained when you enable an advanced plan.
In isql this can be done with SET EXPLAIN ON
.
The advanced plan displayes more detailed information about the access methods used by the optimizer, however it cannot be included in the PLAN
clause of a statement.
The description of the advanced plan is beyond the scope of this Language Reference.
In most situations, you can trust that Firebird will select the optimal query plan for you. However, if you have complicated queries that seem to be underperforming, it may very well be worth your while to examine the plan and see if you can improve on it.
6.1.8.1. Simple plans
The simplest plans consist of just a relation name followed by a retrieval method.
For example, for an unsorted single-table select without a WHERE
clause:
|select * from students
|plan (students natural);
Advanced plan:
|Select Expression
|-> Table "STUDENTS" Full Scan
If there’s a WHERE
or a HAVING
clause, you can specify the index to be used for finding matches:
|select * from students
|where class = '3C'
|plan (students index (ix_stud_class));
Advanced plan:
|Select Expression
|-> Filter
|-> Table "STUDENTS" Access By ID
|-> Bitmap
|-> Index "IX_STUD_CLASS" Range Scan (full match)
The INDEX
directive is also used for join conditions (to be discussed a little later).
It can contain a list of indexes, separated by commas.
ORDER
specifies the index for sorting the set if an ORDER BY
or GROUP BY
clause is present:
|select * from students
|plan (students order pk_students)
|order by id;
Advanced plan:
|Select Expression
|-> Table "STUDENTS" Access By ID
|-> Index "PK_STUDENTS" Full Scan
ORDER
and INDEX
can be combined:
|select * from students
|where class >= '3'
|plan (students order pk_students index (ix_stud_class))
|order by id;
Advanced plan:
|Select Expression
|-> Filter
|-> Table "STUDENTS" Access By ID
|-> Index "PK_STUDENTS" Full Scan
|-> Bitmap
|-> Index "IX_STUD_CLASS" Range Scan (lower bound: 1/1)
It is perfectly OK if ORDER
and INDEX
specify the same index:
|select * from students
|where class >= '3'
|plan (students order ix_stud_class index (ix_stud_class))
|order by class;
Advanced plan:
|Select Expression
|-> Filter
|-> Table "STUDENTS" Access By ID
|-> Index "IX_STUD_CLASS" Range Scan (lower bound: 1/1)
|-> Bitmap
|-> Index "IX_STUD_CLASS" Range Scan (lower bound: 1/1)
For sorting sets when there’s no usable index available (or if you want to suppress its use), leave out ORDER
and prepend the plan expression with SORT
:
|select * from students
|plan sort (students natural)
|order by name;
Advanced plan:
|Select Expression
|-> Sort (record length: 128, key length: 56)
|-> Table "STUDENTS" Full Scan
Or when an index is used for the search:
|select * from students
|where class >= '3'
|plan sort (students index (ix_stud_class))
|order by name;
Advanced plan:
|elect Expression
|-> Sort (record length: 136, key length: 56)
|-> Filter
|-> Table "STUDENTS" Access By ID
|-> Bitmap
|-> Index "IX_STUD_CLASS" Range Scan (lower bound: 1/1)
Notice that SORT
, unlike ORDER
, is outside the parentheses.
This reflects the fact that the data rows are retrieved unordered and sorted afterwards by the engine.
When selecting from a view, specify the view and the table involved.
For instance, if you have a view FRESHMEN
that selects just the first-year students:
|select * from freshmen
|plan (freshmen students natural);
Advanced plan:
|Select Expression
|-> Table "STUDENTS" as "FRESHMEN" Full Scan
Or, for instance:
|select * from freshmen
|where id > 10
|plan sort (freshmen students index (pk_students))
|order by name desc;
Advanced plan:
|Select Expression
|-> Sort (record length: 144, key length: 24)
|-> Filter
|-> Table "STUDENTS" as "FRESHMEN" Access By ID
|-> Bitmap
|-> Index "PK_STUDENTS" Range Scan (lower bound: 1/1)
If a table or view has been aliased, it is the alias, not the original name, that must be used in the PLAN
clause.
6.1.8.2. Composite plans
When a join is made, you can specify the index which is to be used for matching.
You must also use the JOIN
directive on the two streams in the plan:
|select s.id, s.name, s.class, c.mentor
|from students s
|join classes c on c.name = s.class
|plan join (s natural, c index (pk_classes));
Advanced plan:
|Select Expression
|-> Nested Loop Join (inner)
|-> Table "STUDENTS" as "S" Full Scan
|-> Filter
|-> Table "CLASSES" as "C" Access By ID
|-> Bitmap
|-> Index "PK_CLASSES" Unique Scan
The same join, sorted on an indexed column:
|select s.id, s.name, s.class, c.mentor
|from students s
|join classes c on c.name = s.class
|plan join (s order pk_students, c index (pk_classes))
|order by s.id;
Advanced plan:
|Select Expression
|-> Nested Loop Join (inner)
|-> Table "STUDENTS" as "S" Access By ID
|-> Index "PK_STUDENTS" Full Scan
|-> Filter
|-> Table "CLASSES" as "C" Access By ID
|-> Bitmap
|-> Index "PK_CLASSES" Unique Scan
And on a non-indexed column:
|select s.id, s.name, s.class, c.mentor
|from students s
|join classes c on c.name = s.class
|plan sort (join (s natural, c index (pk_classes)))
|order by s.name;
Advanced plan:
|Select Expression
|-> Sort (record length: 152, key length: 12)
|-> Nested Loop Join (inner)
|-> Table "STUDENTS" as "S" Full Scan
|-> Filter
|-> Table "CLASSES" as "C" Access By ID
|-> Bitmap
|-> Index "PK_CLASSES" Unique Scan
With a search condition added:
|select s.id, s.name, s.class, c.mentor
|from students s
|join classes c on c.name = s.class
|where s.class <= '2'
|plan sort (join (s index (fk_student_class), c index (pk_classes)))
|order by s.name;
Advanced plan:
|Select Expression
|-> Sort (record length: 152, key length: 12)
|-> Nested Loop Join (inner)
|-> Filter
|-> Table "STUDENTS" as "S" Access By ID
|-> Bitmap
|-> Index "FK_STUDENT_CLASS" Range Scan (lower bound: 1/1)
|-> Filter
|-> Table "CLASSES" as "C" Access By ID
|-> Bitmap
|-> Index "PK_CLASSES" Unique Scan
As a left outer join:
|select s.id, s.name, s.class, c.mentor
|from classes c
|left join students s on c.name = s.class
|where s.class <= '2'
|plan sort (join (c natural, s index (fk_student_class)))
|order by s.name;
Advanced plan:
|Select Expression
|-> Sort (record length: 192, key length: 56)
|-> Filter
|-> Nested Loop Join (outer)
|-> Table "CLASSES" as "C" Full Scan
|-> Filter
|-> Table "STUDENTS" as "S" Access By ID
|-> Bitmap
|-> Index "FK_STUDENT_CLASS" Range Scan (full match)
If there are no indices available to match the join condition (or if you don’t want to use it), then it is possible connect the streams using HASH
or MERGE
method.
To connect using the HASH
method in the plan, the HASH
directive is used instead of the JOIN
directive.
In this case, the smaller (secondary) stream is materialized completely into an internal buffer.
While reading this secondary stream, a hash function is applied and a pair {hash, pointer to buffer} is written to a hash table.
Then the primary stream is read and its hash key is tested against the hash table.
|select *
|from students s
|join classes c on c.cookie = s.cookie
|plan hash (c natural, s natural)
Advanced plan:
|Select Expression
|-> Filter
|-> Hash Join (inner)
|-> Table "STUDENTS" as "S" Full Scan
|-> Record Buffer (record length: 145)
|-> Table "CLASSES" as "C" Full Scan
For a MERGE
join, the plan must first sort both streams on their join column(s) and then merge.
This is achieved with the SORT
directive (which we’ve already seen) and MERGE
instead of JOIN
:
|select * from students s
|join classes c on c.cookie = s.cookie
|plan merge (sort (c natural), sort (s natural));
Adding an ORDER BY
clause means the result of the merge must also be sorted:
|select * from students s
|join classes c on c.cookie = s.cookie
|plan sort (merge (sort (c natural), sort (s natural)))
|order by c.name, s.id;
Finally, we add a search condition on two indexable colums of table STUDENTS
:
|select * from students s
|join classes c on c.cookie = s.cookie
|where s.id < 10 and s.class <= '2'
|plan sort (merge (sort (c natural),
|sort (s index (pk_students, fk_student_class))))
|order by c.name, s.id;
As follows from the formal syntax definition, JOIN
s and MERGE
s in the plan may combine more than two streams.
Also, every plan expression may be used as a plan item in an encompassing plan.
This means that plans of certain complicated queries may have various nesting levels.
Finally, instead of MERGE
you may also write SORT MERGE
.
As this makes absolutely no difference and may create confusion with real
SORT
directives (the ones that do make a difference), it’s probably best to stick to plain MERGE
.
In addition to the plan for the main query, you can specify a plan for each subquery. For example, the following query with multiple plans will work:
|select *
|from color
|where exists (
|select *
|from hors
|where horse.code_color = color.code_color
|plan (horse index (fk_horse_color)))
|plan (color natural)
Occasionally, the optimizer will accept a plan and then not follow it, even though it does not reject it as invalid. One such example was
|MERGE (unsorted stream, unsorted stream)
It is advisable to treat such as plan as deprecated
.
6.1.9. UNION
The UNION
clause concatenates two or more datasets, thus increasing the number of rows but not the number of columns.
Datasets taking part in a UNION
must have the same number of columns, and columns at corresponding positions must be of the same type.
Other than that, they may be totally unrelated.
By default, a union suppresses duplicate rows.
UNION ALL
shows all rows, including any duplicates.
The optional DISTINCT
keyword makes the default behaviour explicit.
Syntax
|
<union> ::=
| <individual-select>
| UNION [{DISTINCT | ALL}]
| <individual-select>
| [
| [UNION [{DISTINCT | ALL}]
| <individual-select>
| ...
| ]
| [<union-wide-clauses>]
|
|<individual-select> ::=
| SELECT
| [TRANSACTION name]
| [FIRST m] [SKIP n]
| [{DISTINCT | ALL}] <columns>
| [INTO <host-varlist>]
| FROM <source> [[AS] alias]
| [<joins>]
| [WHERE <condition>]
| [GROUP BY <grouping-list>
| [HAVING <aggregate-condition>]]
| [PLAN <plan-expr>]
|
|<union-wide-clauses> ::=
| [ORDER BY <ordering-list>]
| [{ ROWS <m> [TO <n>]
| | [OFFSET n {ROW | ROWS}]
| [FETCH {FIRST | NEXT} [m] {ROW | ROWS} ONLY]
| }]
| [FOR UPDATE [OF <columns>]]
| [WITH LOCK]
| [INTO <PSQL-varlist>]
Unions take their column names from the first select query.
If you want to alias union columns, do so in the column list of the topmost SELECT
.
Aliases in other participating selects are allowed and may even be useful, but will not propagate to the union level.
If a union has an ORDER BY
clause, the only allowed sort items are integer literals indicating 1-based column positions, optionally followed by an ASC
/DESC
and/or a NULLS {FIRST | LAST}
directive.
This also implies that you cannot order a union by anything that isn’t a column in the union.
(You can, however, wrap it in a derived table, which gives you back all the usual sort options.)
Unions are allowed in subqueries of any kind and can themselves contain subqueries. They can also contain joins, and can take part in a join when wrapped in a derived table.
Examples
This query presents information from different music collections in one dataset using unions:
|select id, title, artist, length, 'CD' as medium
|from cds
|union
|select id, title, artist, length, 'LP'
|from records
|union
|select id, title, artist, length, 'MC'
|from cassettes
|order by 3, 2 -- artist, title;
If id
, title
, artist
and length
are the only fields in the tables involved, the query can also be written as:
|select c.*, 'CD' as medium
|from cds c
|union
|select r.*, 'LP'
|from records r
|union
|select c.*, 'MC'
|from cassettes c
|order by 3, 2 -- artist, title;
Qualifying the stars
is necessary here because they are not the only item in the column list.
Notice how the c
aliases in the first and third select do not conflict with each other: their scopes are not union-wide but apply only to their respective select queries.
The next query retrieves names and phone numbers from translators and proofreaders.
Translators who also work as proofreaders will show up only once in the result set, provided their phone number is the same in both tables.
The same result can be obtained without DISTINCT
.
With ALL
, these people would appear twice.
|select name, phone from translators
|union distinct
|select name, telephone from proofreaders;
A UNION
within a subquery:
|select name, phone, hourly_rate from clowns
|where hourly_rate < all
|(select hourly_rate from jugglers
|union
|select hourly_rate from acrobats)
|order by hourly_rate;
6.1.10. ORDER BY
When a SELECT
statement is executed, the result set is not sorted in any way.
It often happens that rows appear to be sorted chronologically, simply because they are returned in the same order they were added to the table by INSERT
statements.
This is not something you should rely on: the order may change depending on the plan or updates to rows, etc.
To specify an explicit sorting order for the set specification, an ORDER BY
clause is used.
Syntax
|
SELECT ... FROM ...
|...
|ORDER BY <ordering-item> [, <ordering-item> …]
|
|<ordering-item> ::=
| {col-name | col-alias | col-position | <expression>}
| [COLLATE collation-name]
| [ASC[ENDING] | DESC[ENDING]]
| [NULLS {FIRST|LAST}]
ORDER BY
ClauseArgument | Description |
---|---|
col-name | Full column name |
col-alias | Column alias |
col-position | Column position in the |
expression | Any expression |
collation-name | Collation name (sorting order for string types) |
The ORDER BY
consists of a comma-separated list of the columns on which the result data set should be sorted.
The sort order can be specified by the name of the column — but only if the column was not previously aliased in the SELECT
columns list.
The alias must be used if it was used in the select list.
The ordinal position number of the column in the SELECT
column list, the alias given to the column in the SELECT
list with the help of the AS
keyword, or the number of the column in the SELECT
list can be used without restriction.
The three forms of expressing the columns for the sort order can be mixed in the same ORDER BY
clause.
For instance, one column in the list can be specified by its name and another column can be specified by its number.
If you sort by column position or alias, then the expression corresponding to this position (alias) will be copied from the SELECT
list.
This also applies to subqueries, thus, the subquery will be executed at least twice.
If you use the column position to specify the sort order for a query of the SELECT *
style, the server expands the asterisk to the full column list in order to determine the columns for the sort.
It is, however, considered sloppy practice
to design ordered sets this way.
6.1.10.1. Sorting Direction
The keyword ASCENDING
— usually abbreviated to ASC
— specifies a sort direction from lowest to highest.
ASCENDING
is the default sort direction.
The keyword DESCENDING
— usually abbreviated to DESC
— specifies a sort direction from highest to lowest.
Specifying ascending order for one column and descending order for another is allowed.
6.1.10.2. Collation Order
The keyword COLLATE
specifies the collation order for a string column if you need a collation that is different from the normal one for this column.
The normal collation order will be either the default one for the database character set, or the one set explicitly in the column’s definition.
6.1.10.3. NULLs Position
The keyword NULLS
defines where NULL in the associated column will fall in the sort order: NULLS FIRST
places the rows with the NULL
column above rows ordered by that column’s value;
NULLS LAST
places those rows after the ordered rows.
NULLS FIRST
is the default.
6.1.10.4. Ordering UNION
-ed Sets
The discrete queries contributing to a UNION
cannot take an ORDER BY
clause.
The only option is to order the entire output, using one ORDER BY
clause at the end of the overall query.
The simplest — and, in some cases, the only — method for specifying the sort order is by the ordinal column position. However, it is also valid to use the column names or aliases, from the first contributing query only.
The ASC
/DESC
and/or NULLS
directives are available for this global set.
If discrete ordering within the contributing set is required, use of derived tables or common table expressions for those sets may be a solution.
6.1.10.5. Examples of ORDER BY
Sorting the result set in ascending order, ordering by the RDB$CHARACTER_SET_ID
and RDB$COLLATION_ID
columns of the RDB$COLLATIONS
table:
|SELECT
|RDB$CHARACTER_SET_ID AS CHARSET_ID,
|RDB$COLLATION_ID AS COLL_ID,
|RDB$COLLATION_NAME AS NAME
|FROM RDB$COLLATIONS
|ORDER BY RDB$CHARACTER_SET_ID, RDB$COLLATION_ID;
The same, but sorting by the column aliases:
|SELECT
|RDB$CHARACTER_SET_ID AS CHARSET_ID,
|RDB$COLLATION_ID AS COLL_ID,
|RDB$COLLATION_NAME AS NAME
|FROM RDB$COLLATIONS
|ORDER BY CHARSET_ID, COLL_ID;
Sorting the output data by the column position numbers:
|SELECT
|RDB$CHARACTER_SET_ID AS CHARSET_ID,
|RDB$COLLATION_ID AS COLL_ID,
|RDB$COLLATION_NAME AS NAME
|FROM RDB$COLLATIONS
|ORDER BY 1, 2;
Sorting a SELECT *
query by position numbers — possible, but nasty and not recommended:
|SELECT *
|FROM RDB$COLLATIONS
|ORDER BY 3, 2;
Sorting by the second column in the BOOKS
table, or — if BOOKS
has only one column — the FILMS.DIRECTOR
column:
|SELECT
|BOOKS.*,
|FILMS.DIRECTOR
|FROM BOOKS, FILMS
|ORDER BY 2;
Sorting in descending order by the values of column PROCESS_TIME
, with NULL
s placed at the beginning of the set:
|SELECT *
|FROM MSG
|ORDER BY PROCESS_TIME DESC NULLS FIRST;
Sorting the set obtained by a UNION
of two queries.
Results are sorted in descending order for the values in the second column, with NULL
s at the end of the set;
and in ascending order for the values of the first column with NULL
s at the beginning.
|SELECT
|DOC_NUMBER, DOC_DATE
|FROM PAYORDER
|UNION ALL
|SELECT
|DOC_NUMBER, DOC_DATE
|FROM BUDGORDER
|ORDER BY 2 DESC NULLS LAST, 1 ASC NULLS FIRST;
6.1.11. ROWS
Used forRetrieving a slice of rows from an ordered set
Available inDSQL, PSQL
Syntax
|
SELECT <columns> FROM ...
| [WHERE ...]
| [ORDER BY ...]
| ROWS m [TO n]
ROWS
ClauseArgument | Description |
---|---|
m, n | Any integer expressions |
ROWS
is non-standard syntaxROWS
is a Firebird-specific clause.
Use the SQL-standard Section 6.1.12, “OFFSET
, FETCH
” syntax wherever possible.
Limits the amount of rows returned by the SELECT
statement to a specified number or range.
The ROWS
clause also does the same job as the FIRST
and SKIP
clauses, but neither are SQL-compliant.
Unlike FIRST
and SKIP
, and OFFSET
and FETCH
, the ROWS
and TO
clauses accept any type of integer expression as their arguments, without parentheses.
Of course, parentheses may still be needed for nested evaluations inside the expression, and a subquery must always be enclosed in parentheses.
Numbering of rows in the intermediate set — the overall set cached on disk before the
slice
is extracted — starts at 1.OFFSET
/FETCH
,FIRST
/SKIP
, andROWS
can all be used without theORDER BY
clause, although it rarely makes sense to do so — except perhaps when you want to take a quick look at the table data and don’t care that rows will be in a non-deterministic order. For this purpose, a query like
would return the first 20 rows instead of a whole table that might be rather big.SELECT * FROM TABLE1 ROWS 20
Calling ROWS m
retrieves the first m records from the set specified.
Characteristics of using ROWS m
without a TO
clause:
If m is greater than the total number of records in the intermediate data set, the entire set is returned
If m = 0, an empty set is returned
If m < 0, the
SELECT
statement call fails with an error
Calling ROWS m TO n
retrieves the rows from the set, starting at row m and ending after row n — the set is inclusive.
Characteristics of using ROWS m
with a TO
clause:
If m is greater than the total number of rows in the intermediate set and n >= m, an empty set is returned
If m is not greater than n and n is greater than the total number of rows in the intermediate set, the result set will be limited to rows starting from m, up to the end of the set
If m < 1 and n < 1, the
SELECT
statement call fails with an errorIf n = m - 1, an empty set is returned
If n < m - 1, the
SELECT
statement call fails with an error
Using a TO
clause without a ROWS
clause:
While ROWS
replaces the FIRST
and SKIP
syntax, there is one situation where the ROWS
syntax does not provide the same behaviour: specifying SKIP n
on its own returns the entire intermediate set, without the first n rows.
The ROWS … TO
syntax needs a little help to achieve this.
With the ROWS
syntax, you need a ROWS
clause in association with the TO
clause and deliberately make the second (n) argument greater than the size of the intermediate data set.
This is achieved by creating an expression for n that uses a subquery to retrieve the count of rows in the intermediate set and adds 1 to it.
6.1.11.1. Replacing of FIRST
/SKIP
and OFFSET
/FETCH
The ROWS
clause can be used instead of the SQL-standard OFFSET
/FETCH
or non-standard FIRST
/SKIP
clauses, except the case where only OFFSET
or SKIP
is used, that is when the whole result set is returned except for skipping the specified number of rows from the beginning.
In order to implement this behaviour using ROWS
, you must specify the TO
clause with a value larger than the size of the returned result set.
6.1.11.2. Mixing ROWS
and FIRST
/SKIP
or OFFSET
/FETCH
ROWS
syntax cannot be mixed with FIRST
/SKIP
or OFFSET
/FETCH
in the same SELECT
expression.
Using the different syntaxes in different subqueries in the same statement is allowed.
6.1.11.3. ROWS
Syntax in UNION
Queries
When ROWS
is used in a UNION
query, the ROWS
directive is applied to the unioned set and must be placed after the last SELECT
statement.
If a need arises to limit the subsets returned by one or more SELECT
statements inside UNION
, there are a couple of options:
Use
FIRST
/SKIP
syntax in theseSELECT
statements — bearing in mind that an ordering clause (ORDER BY
) cannot be applied locally to the discrete queries, but only to the combined output.Convert the queries to derived tables with their own
ROWS
clauses.
6.1.11.4. Examples of ROWS
The following examples rewrite the examples used in the section about FIRST
and SKIP
, earlier in this chapter.
Retrieve the first ten names from the output of a sorted query on the PEOPLE
table:
|SELECT id, name
|FROM People
|ORDER BY name ASC
|ROWS 1 TO 10;
or its equivalent
|SELECT id, name
|FROM People
|ORDER BY name ASC
|ROWS 10;
Return all records from the PEOPLE
table except for the first 10 names:
|SELECT id, name
|FROM People
|ORDER BY name ASC
|ROWS 11 TO (SELECT COUNT(*) FROM People);
And this query will return the last 10 records (pay attention to the parentheses):
|SELECT id, name
|FROM People
|ORDER BY name ASC
|ROWS (SELECT COUNT(*) - 9 FROM People)
|TO (SELECT COUNT(*) FROM People);
This one will return rows 81-100 from the PEOPLE
table:
|SELECT id, name
|FROM People
|ORDER BY name ASC
|ROWS 81 TO 100;
See alsoSection 6.1.1, “FIRST
, SKIP
”, Section 6.1.12, “OFFSET
, FETCH
”
6.1.12. OFFSET
, FETCH
Used forRetrieving a slice of rows from an ordered set
Available inDSQL, PSQL
Syntax
|
SELECT <columns> FROM ...
| [WHERE ...]
| [ORDER BY ...]
| [OFFSET <m> {ROW | ROWS}]
| [FETCH {FIRST | NEXT} [ <n> ] { ROW | ROWS } ONLY]
|
|<m>, <n> ::=
| <integer-literal>
| | <query-parameter>
OFFSET
and FETCH
ClauseArgument | Description |
---|---|
integer-literal | Integer literal |
query-parameter | Query parameter place-holder.
|
The OFFSET
and FETCH
clauses are an SQL:2008 compliant equivalent for FIRST
/SKIP
, and an alternative for ROWS
.
The OFFSET
clause specifies the number of rows to skip.
The FETCH
clause specifies the number of rows to fetch.
When <n> is left out of the FETCH
clause (eg FETCH FIRST ROW ONLY
), one row will be fetched.
The choice between ROW
or ROWS
, or FIRST
or NEXT
in the clauses is just for aesthetic purposes (eg making the query more readable or grammatically correct).
Technically there is no difference between OFFSET 10 ROW
or OFFSET 10 ROWS
, or FETCH NEXT 10 ROWS ONLY
or FETCH FIRST 10 ROWS ONLY
.
As with SKIP
and FIRST
, OFFSET
and FETCH
clauses can be applied independently, in both top-level and nested query expressions.
Firebird doesn’t support the percentage
FETCH
defined in the SQL standard.Firebird doesn’t support the
FETCH … WITH TIES
defined in the SQL standard.The
FIRST
/SKIP
andROWS
clause are non-standard alternatives.The
OFFSET
and/orFETCH
clauses cannot be combined withROWS
orFIRST
/SKIP
on the same query expression.Expressions, column references, etc are not allowed within either clause.
Contrary to the
ROWS
clause,OFFSET
andFETCH
are only available onSELECT
statements.
6.1.12.1. Examples of OFFSET
and FETCH
Return all rows except the first 10, ordered by column COL1
|
SELECT *
|FROM T1
|ORDER BY COL1
|OFFSET 10 ROWS
Return the first 10 rows, ordered by column COL1
|
SELECT *
|FROM T1
|ORDER BY COL1
|FETCH FIRST 10 ROWS ONLY
Using OFFSET
and FETCH
clauses in a derived table and in the outer query
|
SELECT *
|FROM (
| SELECT *
| FROM T1
| ORDER BY COL1 DESC
| OFFSET 1 ROW
| FETCH NEXT 10 ROWS ONLY
|) a
|ORDER BY a.COL1
|FETCH FIRST ROW ONLY
The following examples rewrite the FIRST
/SKIP
examples and ROWS
examples earlier in this chapter.
Retrieve the first ten names from the output of a sorted query on the PEOPLE
table:
|SELECT id, name
|FROM People
|ORDER BY name ASC
|FETCH NEXT 10 ROWS ONLY;
Return all records from the PEOPLE
table except for the first 10 names:
|SELECT id, name
|FROM People
|ORDER BY name ASC
|OFFSET 10 ROWS;
And this query will return the last 10 records.
Contrary to FIRST
/SKIP
and ROWS
we cannot use expressions (including sub-queries).
To retrieve the last 10 rows, reverse the sort to the first (last) 10 rows, and then sort in the right order.
|SELECT id, name
|FROM (
|SELECT id, name
|FROM People
|ORDER BY name DESC
|FETCH FIRST 10 ROWS ONLY
|) a
|ORDER BY name ASC;
This one will return rows 81-100 from the PEOPLE
table:
|SELECT id, name
|FROM People
|ORDER BY name ASC
|OFFSET 80 ROWS
|FETCH NEXT 20 ROWS;
See alsoSection 6.1.1, “FIRST
, SKIP
”, Section 6.1.11, “ROWS
”
6.1.13. FOR UPDATE [OF]
Syntax
|
SELECT ... FROM single_table
| [WHERE ...]
| [FOR UPDATE [OF <column_list>]]
FOR UPDATE
does not do what its name suggests.
It’s only effect currently is to disable the pre-fetch buffer.
It is likely to change in future: the plan is to validate cursors marked with FOR UPDATE
if they are truly updateable and reject positioned updates and deletes for cursors evaluated as non-updateable.
The OF
sub-clause does not do anything at all.
6.1.14. WITH LOCK
Used forLimited pessimistic locking
Available inDSQL, PSQL
Syntax
|
SELECT ... FROM single_table
| [WHERE ...]
| [FOR UPDATE [OF <column_list>]]
| WITH LOCK
WITH LOCK
provides a limited explicit pessimistic locking capability for cautious use in conditions where the affected row set is:
extremely small (ideally singleton), and
precisely controlled by the application code.
The need for a pessimistic lock in Firebird is very rare indeed and should be well understood before use of this extension is considered.
It is essential to understand the effects of transaction isolation and other transaction attributes before attempting to implement explicit locking in your application.
If the WITH LOCK
clause succeeds, it will secure a lock on the selected rows and prevent any other transaction from obtaining write access to any of those rows, or their dependants, until your transaction ends.
WITH LOCK
can only be used with a top-level, single-table SELECT
statement.
It is not available:
in a subquery specification
for joined sets
with the
DISTINCT
operator, aGROUP BY
clause or any other aggregating operationwith a view
with the output of a selectable stored procedure
with an external table
with a
UNION
query
As the engine considers, in turn, each record falling under an explicit lock statement, it returns either the record version that is the most currently committed, regardless of database state when the statement was submitted, or an exception.
Wait behaviour and conflict reporting depend on the transaction parameters specified in the TPB block:
TPB mode | Behaviour |
---|---|
isc_tpb_consistency | Explicit locks are overridden by implicit or explicit table-level locks and are ignored. |
isc_tpb_concurrency + isc_tpb_nowait | If a record is modified by any transaction that was committed since the transaction attempting to get explicit lock started, or an active transaction has performed a modification of this record, an update conflict exception is raised immediately. |
isc_tpb_concurrency + isc_tpb_wait | If the record is modified by any transaction that has committed since the transaction attempting to get explicit lock started, an update conflict exception is raised immediately. If an active transaction is holding ownership on this record (via explicit locking or by a normal optimistic write-lock) the transaction attempting the explicit lock waits for the outcome of the blocking transaction and, when it finishes, attempts to get the lock on the record again. This means that, if the blocking transaction committed a modified version of this record, an update conflict exception will be raised. |
isc_tpb_read_committed + isc_tpb_nowait | If there is an active transaction holding ownership on this record (via explicit locking or normal update), an update conflict exception is raised immediately. |
isc_tpb_read_committed + isc_tpb_wait | If there is an active transaction holding ownership on this record (via explicit locking or by a normal optimistic write-lock), the transaction attempting the explicit lock waits for the outcome of blocking transaction and when it finishes, attempts to get the lock on the record again. Update conflict exceptions can never be raised by an explicit lock statement in this TPB mode. |
6.1.14.1. Usage with a FOR UPDATE
Clause
If the FOR UPDATE
sub-clause precedes the WITH LOCK
sub-clause, buffered fetches are suppressed.
Thus, the lock will be applied to each row, one by one, at the moment it is fetched.
It becomes possible, then, that a lock which appeared to succeed when requested will nevertheless fail subsequently, when an attempt is made to fetch a row which has become locked by another transaction in the meantime.
As an alternative, it may be possible in your access components to set the size of the fetch buffer to 1. This would enable you to process the currently-locked row before the next is fetched and locked, or to handle errors without rolling back your transaction.
OF <column_list>
This optional sub-clause does nothing at all.
See alsoFOR UPDATE [OF]
6.1.14.2. How the engine deals with WITH LOCK
When an UPDATE
statement tries to access a record that is locked by another transaction, it either raises an update conflict exception or waits for the locking transaction to finish, depending on TPB mode.
Engine behaviour here is the same as if this record had already been modified by the locking transaction.
No special gdscodes are returned from conflicts involving pessimistic locks.
The engine guarantees that all records returned by an explicit lock statement are actually locked and do meet the search conditions specified in WHERE
clause, as long as the search conditions do not depend on any other tables, via joins, subqueries, etc.
It also guarantees that rows not meeting the search conditions will not be locked by the statement.
It can not guarantee that there are no rows which, though meeting the search conditions, are not locked.
This situation can arise if other, parallel transactions commit their changes during the course of the locking statement’s execution.
The engine locks rows at fetch time.
This has important consequences if you lock several rows at once.
Many access methods for Firebird databases default to fetching output in packets of a few hundred rows (buffered fetches
).
Most data access components cannot bring you the rows contained in the last-fetched packet, when an error occurred.
6.1.14.3. Caveats using WITH LOCK
Rolling back of an implicit or explicit savepoint releases record locks that were taken under that savepoint, but it doesn’t notify waiting transactions. Applications should not depend on this behaviour as it may get changed in the future.
While explicit locks can be used to prevent and/or handle unusual update conflict errors, the volume of deadlock errors will grow unless you design your locking strategy carefully and control it rigorously.
Most applications do not need explicit locks at all. The main purposes of explicit locks are:
to prevent expensive handling of update conflict errors in heavily loaded applications, and
to maintain integrity of objects mapped to a relational database in a clustered environment.
If your use of explicit locking doesn’t fall in one of these two categories, then it’s the wrong way to do the task in Firebird.
Explicit locking is an advanced feature; do not misuse it! While solutions for these kinds of problems may be very important for web sites handling thousands of concurrent writers, or for ERP/CRM systems operating in large corporations, most application programs do not need to work in such conditions.
6.1.14.4. Examples using explicit locking
Simple:
|
SELECT * FROM DOCUMENT WHERE ID=? WITH LOCK;
Multiple rows, one-by-one processing with DSQL cursor:
|
SELECT * FROM DOCUMENT WHERE PARENT_ID=?
|FOR UPDATE WITH LOCK;
6.1.15. INTO
Used forPassing SELECT
output into variables
Available inPSQL
SyntaxIn PSQL the INTO
clause is placed at the very end of the SELECT
statement.
|SELECT [...] <column-list>
|FROM ...
|[...]
|[INTO <variable-list>]
||
<variable-list> ::= [:]psqlvar [, [:]psqlvar ...]
The colon prefix before local variable names in PSQL is optional in the INTO
clause.
In PSQL code (triggers, stored procedures and executable blocks), the results of a SELECT
statement can be loaded row-by-row into local variables.
It is often the only way to do anything with the returned values at all, unless an explicit or implicit cursor name is specified.
The number, order and types of the variables must match the columns in the output row.
A plain
SELECT
statement can only be used in PSQL if it returns at most one row, i.e., if it is a singleton select.
For multi-row selects, PSQL provides the FOR SELECT
loop construct, discussed later in the PSQL chapter.
PSQL also supports the DECLARE CURSOR
statement, which binds a named cursor to a SELECT
statement.
The cursor can then be used to walk the result set.
Examples
Selecting some aggregated values and passing them into previously declared variables
min_amt
,avg_amt
andmax_amt
:|
select min(amount), avg(cast(amount as float)), max(amount)
|from orders
|where artno = 372218
|into min_amt, avg_amt, max_amt;
ⓘNoteThe
CAST
serves to make the average a real number; otherwise, sinceamount
is presumably an integer field, SQL rules would truncate it to the nearest lower integer.A PSQL trigger that retrieves two values as a
BLOB
field (using theLIST()
function) and assigns itINTO
a third field:|
select list(name, ', ')
|from persons p
|where p.id in (new.father, new.mother)
|into new.parentnames;
6.1.16. Common Table Expressions (WITH … AS … SELECT
)
WITH … AS … SELECT
Available inDSQL, PSQL
Syntax
|
<cte-construct> ::=
| <cte-defs>
| <main-query>
|
|<cte-defs> ::= WITH [RECURSIVE] <cte> [, <cte> ...]
|
|<cte> ::= name [(<column-list>)] AS (<cte-stmt>)
|
|<column-list> ::= column-alias [, column-alias ...]
Argument | Description |
---|---|
cte-stmt | Any |
main-query | The main |
name | Alias for a table expression |
column-alias | Alias for a column in a table expression |
A common table expression or CTE can be described as a virtual table or view, defined in a preamble to a main query, and going out of scope after the main query’s execution. The main query can reference any CTEs defined in the preamble as if they were regular tables or views. CTEs can be recursive, i.e. self-referencing, but they cannot be nested.
CTE Notes
A CTE definition can contain any legal
SELECT
statement, as long as it doesn’t have a
preamble of its own (no nesting).WITH…
CTEs defined for the same main query can reference each other, but care should be taken to avoid loops.
CTEs can be referenced from anywhere in the main query.
Each CTE can be referenced multiple times in the main query, using different aliases if necessary.
When enclosed in parentheses, CTE constructs can be used as subqueries in
SELECT
statements, but also inUPDATE
s,MERGE
s etc.In PSQL, CTEs are also supported in
FOR
loop headers:|
for
|with my_rivers as (select * from rivers where owner = 'me')
|select name, length from my_rivers into :rname, :rlen
|do
|begin
|..
|end
Example
|with dept_year_budget as (
|select fiscal_year,
|dept_no,
|sum(projected_budget) as budget
|from proj_dept_budget
|group by fiscal_year, dept_no
|)
|select d.dept_no,
|d.department,
|dyb_2008.budget as budget_08,
|dyb_2009.budget as budget_09
|from department d
|left join dept_year_budget dyb_2008
|on d.dept_no = dyb_2008.dept_no
|and dyb_2008.fiscal_year = 2008
|left join dept_year_budget dyb_2009
|on d.dept_no = dyb_2009.dept_no
|and dyb_2009.fiscal_year = 2009
|where exists (
|select * from proj_dept_budget b
|where d.dept_no = b.dept_no
|);
6.1.16.1. Recursive CTEs
A recursive (self-referencing) CTE is a UNION
which must have at least one non-recursive member, called the anchor.
The non-recursive member(s) must be placed before the recursive member(s).
Recursive members are linked to each other and to their non-recursive neighbour by UNION ALL
operators.
The unions between non-recursive members may be of any type.
Recursive CTEs require the RECURSIVE
keyword to be present right after WITH
.
Each recursive union member may reference itself only once, and it must do so in a FROM
clause.
A great benefit of recursive CTEs is that they use far less memory and CPU cycles than an equivalent recursive stored procedure.
Execution Pattern
The execution pattern of a recursive CTE is as follows:
The engine begins execution from a non-recursive member.
For each row evaluated, it starts executing each recursive member one by one, using the current values from the outer row as parameters.
If the currently executing instance of a recursive member produces no rows, execution loops back one level and gets the next row from the outer result set.
Example of recursive CTEs
|WITH RECURSIVE DEPT_YEAR_BUDGET AS (
|SELECT
|FISCAL_YEAR,
|DEPT_NO,
|SUM(PROJECTED_BUDGET) BUDGET
|FROM PROJ_DEPT_BUDGET
|GROUP BY FISCAL_YEAR, DEPT_NO
|),
|DEPT_TREE AS (
|SELECT
|DEPT_NO,
|HEAD_DEPT,
|DEPARTMENT,
|CAST('' AS VARCHAR(255)) AS INDENT
|FROM DEPARTMENT
|WHERE HEAD_DEPT IS NULL
|UNION ALL
|SELECT
|D.DEPT_NO,
|D.HEAD_DEPT,
|D.DEPARTMENT,
|H.INDENT || ' '
|FROM DEPARTMENT D
|JOIN DEPT_TREE H ON H.HEAD_DEPT = D.DEPT_NO
|)
|SELECT
|D.DEPT_NO,
|D.INDENT || D.DEPARTMENT DEPARTMENT,
|DYB_2008.BUDGET AS BUDGET_08,
|DYB_2009.BUDGET AS BUDGET_09
|FROM DEPT_TREE D
|LEFT JOIN DEPT_YEAR_BUDGET DYB_2008 ON
|(D.DEPT_NO = DYB_2008.DEPT_NO) AND
|(DYB_2008.FISCAL_YEAR = 2008)
|LEFT JOIN DEPT_YEAR_BUDGET DYB_2009 ON
|(D.DEPT_NO = DYB_2009.DEPT_NO) AND
|(DYB_2009.FISCAL_YEAR = 2009);
The next example returns the pedigree of a horse. The main difference is that recursion occurs simultaneously in two branches of the pedigree.
|WITH RECURSIVE PEDIGREE (
|CODE_HORSE,
|CODE_FATHER,
|CODE_MOTHER,
|NAME,
|MARK,
|DEPTH)
|AS (SELECT
|HORSE.CODE_HORSE,
|HORSE.CODE_FATHER,
|HORSE.CODE_MOTHER,
|HORSE.NAME,
|CAST('' AS VARCHAR(80)),
|0
|FROM
|HORSE
|WHERE
|HORSE.CODE_HORSE = :CODE_HORSE
|UNION ALL
|SELECT
|HORSE.CODE_HORSE,
|HORSE.CODE_FATHER,
|HORSE.CODE_MOTHER,
|HORSE.NAME,
|'F' || PEDIGREE.MARK,
|PEDIGREE.DEPTH + 1
|FROM
|HORSE
|JOIN PEDIGREE
|ON HORSE.CODE_HORSE = PEDIGREE.CODE_FATHER
|WHERE
|PEDIGREE.DEPTH < :MAX_DEPTH
|UNION ALL
|SELECT
|HORSE.CODE_HORSE,
|HORSE.CODE_FATHER,
|HORSE.CODE_MOTHER,
|HORSE.NAME,
|'M' || PEDIGREE.MARK,
|PEDIGREE.DEPTH + 1
|FROM
|HORSE
|JOIN PEDIGREE
|ON HORSE.CODE_HORSE = PEDIGREE.CODE_MOTHER
|WHERE
|PEDIGREE.DEPTH < :MAX_DEPTH
|)
|SELECT
|CODE_HORSE,
|NAME,
|MARK,
|DEPTH
|FROM
|PEDIGREE
Aggregates (
DISTINCT
,GROUP BY
,HAVING
) and aggregate functions (SUM
,COUNT
,MAX
etc) are not allowed in recursive union members.A recursive reference cannot participate in an outer join.
The maximum recursion depth is 1024.