9.4. Linear Regression Aggregate Functions
Linear regression functions are useful for trend line continuation. The trend or regression line is usually a pattern followed by a set of values. Linear regression is useful to predict future values. To continue the regression line, you need to know the slope and the point of intersection with the y-axis. As set of linear functions can be used for calculating these values.
In the function syntax, y is interpreted as an x-dependent variable.
The linear regression aggregate functions take a pair of arguments, the dependent variable expression (y) and the independent variable expression (x), which are both numeric value expressions.
Any row in which either argument evaluates to NULL
is removed from the rows that qualify.
If there are no rows that qualify, then the result of REGR_COUNT
is 0
(zero), and the other linear regression aggregate functions result in NULL
.
9.4.1. REGR_AVGX()
Available inDSQL, PSQL
Result typeDOUBLE PRECISION
Syntax
|
REGR_AVGX ( <y>, <x> )
REGR_AVGX
Function ParametersParameter | Description |
---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_AVGX
calculates the average of the independent variable (x) of the regression line.
The function REGR_AVGX(<y>, <x>)
is equivalent to
|SUM(<exprX>) / REGR_COUNT(<y>, <x>)
||
<exprX> :==
|CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <x> END
See alsoSection 9.4.2, “REGR_AVGY()
”, Section 9.4.3, “REGR_COUNT()
”, Section 9.2.6, “SUM()
”
9.4.2. REGR_AVGY()
Available inDSQL, PSQL
Result typeDOUBLE PRECISION
Syntax
|
REGR_AVGY ( <y>, <x> )
REGR_AVGY
Function ParametersParameter | Description |
---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_AVGY
calculates the average of the dependent variable (y) of the regression line.
The function REGR_AVGY(<y>, <x>)
is equivalent to
|SUM(<exprY>) / REGR_COUNT(<y>, <x>)
||
<exprY> :==
|CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <y> END
See alsoSection 9.4.1, “REGR_AVGX()
”, Section 9.4.3, “REGR_COUNT()
”, Section 9.2.6, “SUM()
”
9.4.3. REGR_COUNT()
Available inDSQL, PSQL
Result typeDOUBLE PRECISION
Syntax
|
REGR_COUNT ( <y>, <x> )
REGR_COUNT
Function ParametersParameter | Description |
---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_COUNT
counts the number of non-empty pairs of the regression line.
The function REGR_COUNT(<y>, <x>)
is equivalent to
|COUNT(*) FILTER (WHERE <x> IS NOT NULL AND <y> IS NOT NULL)
See alsoSection 9.2.2, “COUNT()
”
9.4.4. REGR_INTERCEPT()
Available inDSQL, PSQL
Result typeDOUBLE PRECISION
Syntax
|
REGR_INTERCEPT ( <y>, <x> )
REGR_INTERCEPT
Function ParametersParameter | Description |
---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_INTERCEPT
calculates the point of intersection of the regression line with the y-axis.
The function REGR_INTERCEPT(<y>, <x>)
is equivalent to
|REGR_AVGY(<y>, <x>) - REGR_SLOPE(<y>, <x>) * REGR_AVGX(<y>, <x>)
9.4.4.1. REGR_INTERCEPT
Examples
Forecasting sales volume
|with recursive years (byyear) as (
|select 1991
|from rdb$database
|union all
|select byyear + 1
|from years
|where byyear < 2020
|),
|s as (
|select
|extract(year from order_date) as byyear,
|sum(total_value) as total_value
|from sales
|group by 1
|),
|regr as (
|select
|regr_intercept(total_value, byyear) as intercept,
|regr_slope(total_value, byyear) as slope
|from s
|)
|select
|years.byyear as byyear,
|intercept + (slope * years.byyear) as total_value
|from years
|cross join regr
|BYYEAR TOTAL_VALUE
|------ ------------
|1991 118377.35
|1992 414557.62
|1993 710737.89
|1994 1006918.16
|1995 1303098.43
|1996 1599278.69
|1997 1895458.96
|1998 2191639.23
|1999 2487819.50
|2000 2783999.77
|...
See alsoSection 9.4.1, “REGR_AVGX()
”, Section 9.4.2, “REGR_AVGY()
”, Section 9.4.6, “REGR_SLOPE()
”
9.4.5. REGR_R2()
Available inDSQL, PSQL
Result typeDOUBLE PRECISION
Syntax
|
REGR_R2 ( <y>, <x> )
REGR_R2
Function ParametersParameter | Description |
---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The REGR_R2 function calculates the coefficient of determination, or R-squared, of the regression line.
The function REGR_R2(<y>, <x>)
is equivalent to
|POWER(CORR(<y>, <x>), 2)
See alsoSection 9.3.1, “CORR()
”, POWER
9.4.6. REGR_SLOPE()
Available inDSQL, PSQL
Result typeDOUBLE PRECISION
Syntax
|
REGR_SLOPE ( <y>, <x> )
REGR_SLOPE
Function ParametersParameter | Description |
---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_SLOPE
calculates the slope of the regression line.
The function REGR_SLOPE(<y>, <x>)
is equivalent to
|COVAR_POP(<y>, <x>) / VAR_POP(<exprX>)
||
<exprX> :==
|CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <x> END
See alsoSection 9.3.2, “COVAR_POP()
”, Section 9.3.6, “VAR_POP()
”
9.4.7. REGR_SXX()
Available inDSQL, PSQL
Result typeDOUBLE PRECISION
Syntax
|
REGR_SXX ( <y>, <x> )
REGR_SXX
Function ParametersParameter | Description |
---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_SXX
calculates the sum of squares of the independent expression variable (x).
The function REGR_SXX(<y>, <x>)
is equivalent to
|REGR_COUNT(<y>, <x>) * VAR_POP(<exprX>)
||
<exprX> :==
|CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <x> END
See alsoSection 9.4.3, “REGR_COUNT()
”, Section 9.3.6, “VAR_POP()
”
9.4.8. REGR_SXY()
Available inDSQL, PSQL
Result typeDOUBLE PRECISION
Syntax
|
REGR_SXY ( <y>, <x> )
REGR_SXY
Function ParametersParameter | Description |
---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_SXY
calculates the sum of products of independent variable expression (x) times dependent variable expression (y).
The function REGR_SXY(<y>, <x>)
is equivalent to
|REGR_COUNT(<y>, <x>) * COVAR_POP(<y>, <x>)
See alsoSection 9.3.2, “COVAR_POP()
”, Section 9.4.3, “REGR_COUNT()
”
9.4.9. REGR_SYY()
Available inDSQL, PSQL
Result typeDOUBLE PRECISION
Syntax
|
REGR_SYY ( <y>, <x> )
REGR_SYY
Function ParametersParameter | Description |
---|---|
y | Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
x | Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions. |
The function REGR_SYY
calculates the sum of squares of the dependent variable (y).
The function REGR_SYY(<y>, <x>)
is equivalent to
|REGR_COUNT(<y>, <x>) * VAR_POP(<exprY>)
||
<exprY> :==
|CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <y> END
See alsoSection 9.4.3, “REGR_COUNT()
”, Section 9.3.6, “VAR_POP()
”