T-SQL Fundamentals Chapter 2

In Itzik Ben-Gan's book, Microsoft® SQL Server® 2008 T-SQL Fundamentals, Chapter 2 introduces the SELECT statement focusing on queries against a single table. This chapter describes logical query processing including the series of logical phrases involved in producing the correct result set of a given SELECT query.

Elements of the SELECT Statement

1. The purpose of a SELECT statement is to query tables, apply some logical manipulation, and return a result. (p. 25)

2. By "logical query processing"; referring to the conceptual way ANSI SQL defines that a query should be processed and the final result achieved. (p. 25)

3. The Server engine doesn't have to follow logical query processing to the letter; rather, it is free to physically process a query differently by rearranging processing phases, as long as the final result would be the same as dictated by logical query processing. (p. 26)

4. In most programming languages the lines of code are processed in the order that they are written. In SQL things are different. Even though the SELECT clause appears first in the query, it is logically processed almost last. The clauses are logically processed in the following order:

  1. FROM
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. SELECT
  6. ORDER BY

Unfortunately, we cannot write the query in correct logical order. We have to start with the SELECT clause. (p. 26-27)

The FROM Clause

5. The FROM clause is the very first clause that is logically processed. In this clause you specify the names of the tables that you want to query and table operators that operate on those tables. (p. 27)

6. To return all rows from a table with no special manipulation, all you need is a query with a FROM clause where you specify the table you want to query, and a SELECT clause where you specify the attributes you want to return. For example, the following statement queries all rows from the Orders table in the Sales schema, selecting the attributes orderid, custid, empid, orderdate, and freight.

SELECT orderid, custid, empid, orderdate, freight
FROM Sales.Orders;

Output Page 28

7. As long as identifiers in your query comply with rules for the format of regular identifiers, you don't need to delimit the identifier names used for schemas, tables, and columns. If an identifier is irregular-for example, has embedded spaces or special characters, starts with a digit, or is a reserved keyword-you have to delimit it. The ANSI SQL form is to used double quotes-for example, "Order Details". The SQL Server specific form is to use square brackets-for example, [Order Details], but it also supports the standard form. With identifiers that do comply with the rules for the format of regular identifiers, delimiting is optional. Preference is not to use delimiters when they are not required because they tend to clutter the code. When assigning identifiers, it is recommended to use regular ones. (p. 28-29)

The WHERE Clause

8. In the WHERE clause, you specify a predicate or logical expression to filter the rows returned by the FROM phase. Only rows for which the logical expression evaluates to TRUE are returned by the WHERE phase to the subsequent logical query processing phase.

The query:

SELECT orderid, custid, empid, orderdate, freight
FROM Sales.Orders;

originally returned 830 rows. But the addition of a WHERE phase filters only the 31 rows where the customer ID is equal to 71.

SELECT orderid, custid, empid, orderdate, freight
FROM Sales.Orders
WHERE custid = 71;

Where Output Page 29

(p. 29)

9. SQL Server evaluates the use of indexes to access the required data. By using indexes, SQL Server can sometimes get the required data with much less work compared to applying full table scans. (p. 30)

10. The WHERE phase returns rows for which the logical expression evaluates to TRUE and doesn't return rows for which the logical expression evaluates to FALSE or UNKNOWN. (p. 30)

The GROUP BY Clause

11. The GROUP BY phase allows you to arrange the rows returned by the previous logical query processing phase in groups. The groups are determined by the elements you specify in the GROUP BY clause.

Here the GROUP BY phase produces a group for each unique combination of employee ID and order year values that appears in the data returned by the WHERE phase. The expression YEAR(orderdate) invokes the YEAR function to return only the year part from the orderdate column.

The WHERE phase returned 31 rows, within which there are 16 unique combinations of employee ID and order year values:

SELECT orderid, custid, empid, orderdate, freight
FROM Sales.Orders
WHERE custid = 71
GROUP BY empid, YEAR(orderdate);

Group By Output Page 30

(p. 30-31)

12. If the query involves grouping, all phases subsequent to the GROUP BY phase-including HAVING, SELECT, and ORDER BY-must operate on groups as opposed to operating on individual rows. Each group is ultimately represented by a single row in the final result of the query. Expressions based on elements that participate in the GROUP BY list meet the requirement because by definition each group has only one unique occurrence of each GROUP BY element. For example, in the group for employee ID 8 and order year 2007, there's only one unique employee ID value and only one unique order year value. Therefore, you're allowed to refer to the expressions empid and YEAR(orderdate) in clauses that are processed in phases subsequent to the GROUP BY phase, such as the SELECT clause.

SELECT empid, YEAR(orderdate) AS orderyear
FROM Sales.Orders
WHERE custid = 71
GROUP BY empid, YEAR(orderdate);

Group By Output Page 32
(p. 31-32)

13. Because an aggregate function returns a single value per group, elements that do not participate in the GROUP BY list are only allowed as inputs to an aggregate function such as COUNT, SUM, AVG, MIN, or MAX. For example, the following query returns the total freight and number of orders per each employee and order year:

SELECT empid, YEAR(orderdate) AS orderyear, SUM(freight) AS totalfreight, COUNT(*) AS numorders
FROM Sales.Orders
WHERE custid = 71
GROUP BY empid, YEAR(orderdate);

Group By Output Page 32 Second Example

The expression SUM(freight) returns the sum of all freight values in each group, and the function COUNT(*) returns the count of rows in each group-which means number of orders. If you try to refer to an attribute that does not participate in the GROUP BY list (sch as freight) and not as an input to an aggregate function in any clause that is processed after the GROUP BY clause, you get an error.

SELECT empid, YEAR(orderdate) as orderyear, freight
FROM Sales.Orders
WHERE custid = 71
GROUP BY empid, YEAR(orderdate);

Error Message Page 33

(p. 32-33)

14. All aggregate functions ignore NULLs with one exception-COUNT(*). For example, consider a group of five rows with the values 30, 10, NULL, 10, 10 in a column called qty. The expression COUNT(*) would return 5 because there are five rows in the group, while COUNT(qty) would return 4 because there are four known values. If you want to handle only distinct occurrences of known values, specify the DISTINCT keyword in the parentheses of the aggregate function. For example, the expression COUNT(DISTINCT qty) would return 2 since there are two distinct known values. The DISTINCT keyword can be used with other functions as well.

SELECT empid, YEAR(orderdate) as orderyear, COUNT(DISTINCT custid) AS numcusts
FROM Sales.Orders
GROUP BY empid, YEAR(orderdate);

Distinct Output Page 33

(p. 33-34)

The HAVING Clause

15. With the HAVING clause you can specify a predicate/logical expression to filter groups as opposed to filtering individual rows, which happens in the WHERE phase. Only groups for which the logical expression in the HAVING clause evaluates to TRUE are returned by the HAVING phase to the next logical query processing phase. Groups for which the logical expression evaluates to FALSE or UNKNOWN are filtered out.

16. Because the HAVING clause is processed after the rows have been grouped, you can refer to aggregate functions in the logical expression. For example, in the query:

SELECT empid, YEAR(orderdate) AS orderyear
FROM Sales.Orders
WHERE custid = 71
GROUP BY empid, YEAR(orderdate)
HAVING COUNT(*) > 1;

the HAVING phase filters only groups (employee and order year) with more than one row. The GROUP BY phase created 16 groups of employee ID and order year. Seven of those groups have only one row, so after the HAVING clause is processed, nine groups remain.

Having Clause Output Page 35

The SELECT Clause

17. The SELECT clause is where you specify the attributes (columns) that you want to return in result table of the query. You can base the expressions in the SELECT list on attributes from the queried tables, with or without further manipulation. If an expression refers to an attribute with no manipulation, such as empid, the name of the target attribute is the same as the name of the source attribute. You can optionally assign your own name to the target attribute by using the AS clause. Expressions that do apply manipulation, or are not based on a source attribute, don't have a name in the result of the query if you don't alias them. (p. 35)

18. If you omit a comma between two column names in the SELECT list, SQL Server will assume the second name is an alias for the first column name.

19. Remember that the SELECT clause is processed after the FROM, WHERE, GROUP BY, and HAVING clauses. This means that aliases assigned to expressions in the SELECT clause do not exist as far as clauses that are processed before the SELECT clause are concerned. A very typical mistake made by programmers who are not familiar with the correct logical processing order of query clauses is to refer to expression aliases in clauses that are processed prior to the SELECT clause. Here's an example of such an invalid attempt in the WHERE clause:

SELECT orderid, YEAR(orderdate) AS orderyear
FROM Sales.Orders
WHERE orderyear > 2006;

Here is the correct way to express this query:

SELECT orderid, YEAR(orderdate) AS orderyear
FROM Sales.Orders
WHERE YEAR(orderdate) > 2006;

(p. 37)

20. SQL provides the means to guarantee uniqueness in the result of a SELECT statement in the form of a DISTINCT clause that removes duplicate rows. For example:

SELECT DISTINCT empid, YEAR(orderdate) AS orderyear
FROM Sales.Orders
WHERE custid = 71;

Produces only 16 rows in the set returned by the query as compared to 31 rows without it.

Distinct Select Output Page 39

(p. 39)

21. SQL supports the use of an asterisk (*) in the SELECT list to request all attributes from the queried tables instead of listing them explicitly, as in the following example:

SELECT *
FROM Sales.Shippers;

Such use of an asterisk is a bad programming practice in most cases, with very few exceptions. It is recommended that you explicitly specify the list of attributes that you need even if you need all of the attributes from the queried table. (p. 39)

22. Within the SELECT clause you are still not allowed to refer to a column alias that was created in the same SELECT clause, regardless of whether the expression that assigns the alias appears to the left or right of the expression that attempts to refer to it. For example, the following attempt is invalid:

SELECT order id, YEAR(orderdate) AS orderyear, orderyear + 1 AS nextyear
FROM Sales.Orders;

One of the ways around this problem is to repeat the expression:

SELECT orderid, YEAR(orderdate) AS orderyear, YEAR(orderdate) + 1 AS nextyear
FROM Sales.Orders;

(p. 40)

The ORDER BY Clause

23. The ORDER BY clause allows you to sort the rows in the output for presentation purposes. In terms of logical query processing, ORDER BY is the very last clause to be processed.

One of the most important points to understand about SQL is that a table has no guaranteed order, because a table is supposed to represent a set, and a set has no order. This means that when you query a table without specifying an ORDER BY clause, the query returns a table result, and SQL Server is free to return the rows in the output in any order. The only way for you to guarantee that the rows in the result are sorted is to explicitly specify an ORDER BY clause. However, if you do specify an ORDER BY clause, the result cannot qualify as a table because the order of the rows in the result is guaranteed. A query with an ORDER BY clause results in what ANSI calls a cursor - a nonrelational result with order guaranteed among rows. Some language elements and operations in SQL expect to work with table results of queries and not with cursors; examples include table expressions and set operations.

Order By Output Page 41
(p. 40-41)

24. The ORDER BY phase is the only phase in which you can refer to column aliases created in the SELECT phase, because it is the only phase that is processed after the SELECT phase. (p. 41)

25. When you want to sort by an expression in ascending order, you either specify ASC right after the expression, such as orderyear ASC, or don't specify anything after the expression because ASC is the default. If you want to sort in descending order, you need to specify DESC after the expression, such as orderyear DESC. (p. 41)

26. T-SQL allows you to specify elements in the ORDER BY clause that do not appear in the SELECT clause, meaning that you can sort by something that you don't necessarily want to return in the output. For example, the following query sorts the employee rows by hire date without returning the hiredate attribute:

Order By Output Page 42

However, when DISTINCT is specified, you are restricted in the ORDER BY list only to elements that appear in the SELECT list. The reasoning behind this restriction is that when DISTINCT is specified, a single result rows might represent multiple source rows; therefore, it might not be clear which of the multiple possible values in the ORDER BY expression should be used. (p. 42)

The TOP Option

27. The TOP option is a proprietary T-SQL feature that allows you to limit the number or percentage of rows that your query returns. When an ORDER BY clause is specified in the query, the TOP option relies on it to define the logical precedence among rows. For example, to return from the Orders table the five most recent orders, you would specify TOP(5) in the SELECT clause and orderdate DESC in the ORDER BY clause.

TOP Option Output Page 43 (p. 43)

28. You can use the TOP option with the PERCENT keyword, in which case SQL Server calculates the number of rows to return based on a percentage of the number of qualifying rows, rounded up. For example, the following query requests the top one percent of the most recent orders:

TOP Option Percentage Page 44

The query returns 9 rows because the Orders table has 830 rows, and 1 percent of 830, rounded up, is 9.

(p. 43-44)

29. Multiple rows can have the same order date. In a case where no tiebreaker is specified, precedence among rows in case of ties (rows with the same order date) is undefined. This fact makes the query nondeterministic - more than one result can be considered correct. In case of ties, SQL Server chooses rows based on whichever row it physically happens to access first.

If you want the query to be deterministic, you need to make the ORDER BY list unique; in other words, add a tiebreaker. For example, you can add orderid DESC to the ORDER BY list so that in case of ties, precedence is determined by order ID descending.

TOP Option Output Page 44

The result shown in this query is the only possible valid result. (p. 44)

30. Instead of adding a tiebreaker to the ORDER BY list, you can request to return all ties. You can ask to return all other rows from the table that have the same sort value (orderdate)as the last one found (May 5, 2008). This is achieved by adding the WITH TIES option:

TOP Option With TIES Page 45

Notice that the output has eight rows even though you specified TOP(5). (p. 45)

The OVER Clause

31. The OVER clause exposes a window of rows to certain kinds of calculations. Think of a window of rows simply as a certain set of rows that the calculation operates on. Aggregate and ranking functions are the types of calculations that support the OVER clause. The whole point of an aggregate function is to aggregate a set of values. Aggregate functions traditionally operate in the context of GROUP BY queries returning only one row for each group; therefore all expressions are restricted to returning a single value per group. In this case, an aggregate function operates against a set of values in a window of rows that you expose to it using the OVER clause, and not in the context of a GROUP BY query. Therefore you don't have to group the data, and you can return base row attributes and aggregate in the same row.

Think of a view as if it were a table.

An OVER clause with empty parentheses exposes all rows to the calculation. The rows exposed are those available after the FROM, WHERE, GROUP BY and HAVING BY phases are completed.

Note that the OVER clause is allowed only in the SELECT and ORDER BY phases.

If you want to restrict or partition the rows, you can use the PARTITION BY clause. Instead of returning the total value of all OrderValues rows, you want to return the total value of the current customer (out of all rows with the same custid as in the current row), specify SUM(val) OVER(PARTITION BY custid).

Partition By Page 46

(p. 46)

32. One benefit of the OVER clause is that by enabling you to return base row attributes and aggregate them in the same row, it also enables you to write expressions that mix the two. For example, the following query calculates for each OrderValues row the percentage of the current value out of the grand total, and also the percentage of the current value out of the customer total:

Partition By Page 47

(p. 47)

33. The OVER clause is also supported with four ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE. The following query demonstrates the use of these functions:

Ranking Functions Page 48

The ROW_NUMBER function assigns incrementing sequential number integers to the rows in the result set of a query, based on logical order that is specified in the ORDER BY subclause of the OVER clause. Even when the ordering value doesn't increase, the row number still must increase. Therefore, if the ROW_NUMBER function's ORDER BY list is non-unique, as in the preceding example, the query is nondeterministic. If you want to make a row number calculation deterministic, you need to add elements to the ORDER BY list to make it unique; meaning that the list of elements in the ORDER BY clause would uniquely identify rows. For example, you can add the orderid column as a tiebreaker to the ORDER BY list to make the row number calculation deterministic.

The difference between RANK and DENSE_RANK is that RANK indicates how many rows have a lower ordering value, while DENSE_RANK indicates how many distinct ordering values are lower.

The NTILE function allows you to associate the rows in the result with tiles (equally sized groups of rows) by assigning a tile number to each row. You specify as input to the function how many tiles you are after, and in the OVER clause you specify the logical ordering. If the number of rows doesn't divide evenly by the number of tiles, from the remainder an extra row is added to each of the first tiles. For example, if there are 102 rows and 5 tiles requested, the first two tiles would have 21 rows instead of 20. (p. 48-49)

34. Like aggregate window functions, ranking functions also support a PARTITION BY clause in the OVER clause. It's probably easy to understand the meaning of the PARTITION BY clause in the context of ranking calculations; think of it as making the calculation independent for each partition, or window. For example, the expression ROW_NUMBER() OVER(PARTITION BY custid ORDER BY val) assigns row numbers for each subset of rows with the same custid independently, as opposed to assigning those across the whole set.

Partition By Page 49

(p. 49)

35. If specified in the SELECT phase, window calculations are processed before the DISTINCT clause (if one exists).

The following list presents the logical order in which all clauses discussed so far are processed:

  1. FROM
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. SELECT
    1. OVER
    2. DISTINCT
    3. TOP

Are you wondering why it matters that the DISTINCT clause is processed after the window calculations that appear in the SELECT clause are processed, and not before? Consider the following query and its output:

Distinct-Over-Output-Page-50

The ROW_NUMBER function is processed before the DISTINCT clause. First, unique row numbers are assigned to the 830 rows from the OrderValues view. Then the DISTINCT clause is processed - therefore, no duplicate rows to remove.

You can consider it a best practice not specify both DISTINCT and ROW_NUMBER in the same SELECT clause as the DISTINCT clause has no effect in such a case. If you want to assign row numbers to the 795 unique values, you need to come up with a different solution. For example, because the GROUP BY phase is processed before the SELECT phase, you could use the following query:

Distinct-Over-Output-Page-51

Here, the GROUP BY phase produces 795 groups for the 795 distinct values, and then the SELECT phase produces a row for each group with the value and a row number based on val order. (p. 49-51)

Predicates and Operators

T-SQL has different language elements where logical expressions can be specified - for example, query filters such as WHERE and HAVING, CHECK constraints, and others. Logical expressions can make use of various predicates (expressions that evaluate to TRUE, FALSE, or UNKNOWN) and operators.

Examples for predicates supported by T-SQL include IN, BETWEEN, and LIKE.

36. The IN predicate allows you to check whether a value, or scalar expression, is equal to at least one of the elements in a set.

In Predicate Page 52

(p. 52)

37. The BETWEEN predicate allows you to check whether a value is in a specified range, inclusive of the two specified boundary values. For example, the following query returns all orders in the range 10300 through 10310:

Between Predicate Page 52

(p. 52)

38. The LIKE predicate allows you to check whether a character string value meets a specified pattern.

LIKE Predicate Page 52

If you're curious about the use of the letter N to prefix the string 'D%', it stands for National and is used to denote that a character string is of a Unicode data type (NCHAR or NVARCHAR), as opposed to a regular character data type (CHAR or VARCHAR). Because the data type of the lastname attribute is NVARCHAR(40), the letter N is used to prefix the string. (p. 52)

39. T-SQL supports the following comparison operators: =, >, <, >=, <=, <>, !=, !>, !<, out of which the last three are not standard.

Comparison Operators Page 52

If you need to combine logical expressions, you can use the logical operators OR and AND. If you want to negate an expression, you can use the NOT operator. For example, the following query returns orders placed on or after January 1, 2008, and were handled by one of the employees 1, 3, 5:

Logical Operators Page 52

(p. 52)

40. T-SQL supports the four obvious arithmetic operators: +, -, *, /, and also the % operator (modulo) that returns the remainder of integer division. For example, the following query calculates the net value as a result of arithmetic manipulation of the quantity, unitprice, and discount attributes:

Arithmetic Operators Page 53

(p. 53)

41. Note that the data type of a scalar expression involving two operands is determined in T-SQL by the higher of the two in terms of data type precedence. If both operands are of the same data type, the result of the expression is of the same data type as well.

For example, a division between two integers (INT) yields an integer. The expression 5/2 returns the integer 2 and not the numeric 2.5. This is not a problem when dealing with constants because you can always specify the values as numeric ones with a decimal point. But when dealing with, say, two integer columns, such as col1/col2, you need to cast the operands to the appropriate type if you want the calculation to be a numeric one: CAST(col1 AS NUMERIC(12, 2)) / CAST(col2 AS NUMERIC(12, 2)). The data type NUMERIC(12, 2) has precision 12 and scale 2, meaning twelve digits in total, two of which are after the decimal point.

If the two operands are of different types, the one with the lower precedence is promoted to the one that is higher. For example, in the expression 5/2.0 the first operand is INT and the second is NUMERIC. Because NUMERIC is considered higher than INT, the INT operand 5 is implicitly converted to the NUMERIC 5.0 before the arithmetic operation, and you get the result 2.5. (p. 53)

42. When multiple operators appear in the same expression, SQL Server evaluates them based on operator precedence rules. The following list has the precedence among operators, from highest to lowest:

  1. ( ) (Parentheses)
  2. * (Multiply), / (Division), % (Modulo)
  3. + (Positive), - (Negative), + (Add), (+ Concatenate), - (Subtract)
  4. =, >, <, >=, <=, <>, !=, !>, !< (Comparison operators)
  5. NOT
  6. AND
  7. BETWEEN, IN, LIKE, OR
  8. = (Assignment)

For example, in the following query, AND has precedence over OR:

AND Precedes OR Page 54

The query returns orders that were either placed by customer 1 and handled by employees 1, 3, or 5 or placed by customer 85 and handled by employees 2, 4, or 6. (p. 53- 54)

43. Parentheses have the highest precedence, so they give you full control. For the sake of other people who need to review or maintain your code and for readability purposes, it is a good practice to use parentheses even when not required, as well as indentation. For example, the following query is the logical equivalent of the previous query, only its logic is much clearer:

Equivalent To Page 54

(p. 54)

Case Expressions

44. A CASE expression is a scalar expression that returns a value based on conditional logic. Because CASE is a scalar expression, it is allowed wherever scalar expressions are allowed, such as the SELECT, WHERE, HAVING, and ORDER BY clauses; CHECK constraints; and so on. (p. 54- 55)

45. The two forms of CASE expression are simple and searched. The simple form allows you to compare one value, or scalar expression, with a list of possible values, and return a value back for the first match. If no value in the list is equal to the tested value, the CASE expression returns the value that appears in the ELSE clause (if one exists). If a CASE expression doesn't have an ELSE clause, it defaults to ELSE NULL. The simple CASE form has a single test value, or expression, right after the CASE keyword that is compared with a list of possible values in the WHEN clauses. For example:

Case Expression Simple Output Page 55

(p. 55)

46. Here is another simple CASE expression that queries the Sales.OrderValues view into three tiles based on val logical order and translates them to tile descriptions Low, Medium, and High:

Case Expression Simple Output Page 56

(p. 56)

47. The searched CASE form is more flexible because it allows you to specify predicates, or logical expressions, in the WHEN clauses rather than restricting you to equality comparisons. The searched CASE expression returns the value in the THEN clause that is associated with the first WHEN logical expression that evaluates to TRUE. If none of the WHEN expressions evaluates to TRUE, the CASE expression returns the values that appears in the ELSE clause (or NULL if an ELSE clause is not specified). For example, the following query produces a value category description based on whether the value is less than 1,000.00 between 1,000.00 and 3,000.00, or greater than 3,000.00:

Case Expression Search Output Page 57

(p. 57)

NULLS

48. SQL supports NULL to represent missing values, and uses three-valued logic, meaning that predicates can evaluate to TRUE, FALSE, or UNKNOWN. A logical expression involving only existing or present values evaluates to either TRUE or FALSE. When the logical expression involves a missing value, it evaluates to UNKNOWN. In a query filter (the WHERE and HAVING clauses), rows or groups for which the expression evaluates to TRUE are returned, while those for which the expression evaluates to FALSE are filtered out. The correct definition of the treatment SQL has for query filters is "accept TRUE," meaning that both FALSE and UNKNOWN are filtered out. With three-valued predicated logic, "accept TRUE" rejects UNKNOWN. (p. 58)

49. One of the tricky aspects of UNKNOWN is that when you negate it, you still get UNKNOWN. For example, given the predicate NOT (salary > 0), when salary is NULL, salary > 0 evaluates to UNKNOWN, and NOT UNKNOWN remains UNKNOWN. (p. 58)

50. An expression comparing two NULLs (NULL = NULL) evaluates to UNKNOWN. The reasoning for this is that a NULL represents a missing or unknown value, and you can't really tell whether one unknown value is equal to another. Therefore, SQL provides the predicates IS NULL and IS NOT NULL instead of = NULL and <> NULL.

For example, the Sales.Customers table has three attributes called country, region, and city, where the customer's location information is stored. All locations have existing countries or cities. Some have existing regions, yet for some the region element is missing or inapplicable. The following query attempts to return all customers where the region is equal to WA:

NULLS Where WA Region Page 59

Out of 91 rows in the Customers table, the query returns three rows where the region attribute is equal to WA. The query neither returns rows where the value in the region attribute is present and different than WAS (predicate evaluates to FALSE) nor those where the region attribute is NULL (predicate evaluates to UNKNOWN).

The following query incorrectly attempts to return all customers where the region is different than WA:

Nulls Where Not WA Page 59

If you expected to get 88 rows back (91 rows in the table minus 3 returned by the previous query), you might find the fact that this query returned only 28 rows surprising. But remember, a query filter "accepts TRUE" meaning that it rejects both rows for which the logical expression evaluates to FALSE and those for which it evaluates to UNKNOWN.

This query returned rows in which a values was present in the region attribute and that value was different than WA. It returned neither rows where the region attribute was equal to WA nor rows where region was NULL.

If you want all rows where region is NULL, do not use the predicate region = NULL, because the expression evaluates to UNKNOWN in all rows - both those where the value is present and those where the value is missing (is NULL).

Nulls Where Region Equals Null Page 60

Instead you should us the IS NULL predicate:

Nulls Where Region Is Null Page 61

If you want to return all rows where the region attribute is not WA, including those in which the value is present and different than WA, along with those in which the value is missing, you need to include an explicit test for NULLs:

Null Where Not Region WA Correctly Page 61

(p. 58-62)

 All-At-Once Operations

Skipped.

Working with Character Data

Data Types

51. SQL Server supports two kinds of character data types - regular and Unicode. Regular data types include CHAR and VARCHAR, and Unicode data types include NCHAR and NVARCHAR. The difference is that regular characters use one byte of storage for each character, while Unicode characters require two bytes per character. (p. 64)

52. Any data type without the VAR element (CHAR, NCHAR) in its name is fixed length, which means that SQL Server preserves space in the row based on the column's defined size and not on the actual number of characters in the character string. A data type with the VAR element (VARCHAR, NVARCHAR) in its name is variable length, which means that SQL Server uses as much storage space in the row as required to store the characters that appear in the character string, plus two extra bytes for offset data. (p. 64)

Collation

Skipped.

Operators and Functions

String Concatenation (Plus Sign [+] Operator)

53. T-SQL provides the plus sign (+) operator to concatenate strings. For example, the following query produces the fullname result column by concatenating firstname, a space, and lastname:

String Concatenation Page 67

(p. 67)

54. ANSI SQL dictates that a concatenation with a NULL should yield a NULL. This is the default behavior of SQL Server. (p. 67)

55. You can change the way SQL Server treats concatenation by setting a session option called CONCAT_NULL_YIELDS_NULL to OFF. SQL Server treats a NULL set to OFF as an empty string for concatenation purposes. To treat a NULL as an empty string - or more accurately, to substitute a NULL with an empty string - you can use the COALESCE function. This function accepts a list of input values and returns the first that is not NULL. Example:

Coalesce Empty Set Page 69 (p. 68 - 69)

56. The SUBSTRING function extracts a substring from a string. The function operates on the input string, and extracts a substring starting at position start, length characters long. If the value of the third argument exceeds the end of the input string, the function returns everything until the end without raising an error.

SUBSTRING( string, start, length )

Substring Page 69

(p. 69)

57. The LEFT and RIGHT functions are abbreviations of the SUBSTRING function, returning a requested number of characters from the left or right of the input string. The first argument, string, is the string the function operates on. The second argument, n, is the number of characters to extract from the left or right of the string.

LEFT( string, n), RIGHT( string, n)

Right Page 69

(p. 69)

58. The LEN function returns the number of characters in the input string. To get the number of bytes, use the DATALENGTH function instead of LEN.

Length Page 70

(p. 69 - 70)

59. The CHARINDEX function returns the position of the first occurrence of a substring within a string. The function returns the position of the first argument, substring, within the second argument, string. You can optionally specify a third argument, start_pos, to tell the function the position from which to start looking. If you don't specify the third argument, the function starts looking from the first character. If the substring is not found, the function returns 0.

CHARINDEX( substring, string [, start_pos] )

CHARINDEX Page 70

(p. 70)

60. The PATINDEX function returns the position of the first occurrence of a pattern within a string.

PATINDEX( pattern, string )

PATINDEX Page 70

(p. 70)

61. The REPLACE function replaces all occurrences of a substring with another. The function replaces all occurrences of substring1 in string with substring2.

REPLACE( string, substring1, substring2 )

Replace Page 71

(p. 70 - 71)

62. You can use the REPLACE function to count the number of occurrences of a character within a string. To achieve this, replace all occurrences of the character with an empty string, and calculate the original length of the string minus the new length. For example, the following query returns for each employee the number of times the character 'e' appears in the lastname attribute:

Replace to Count Page 71

(p. 71)

 

Comments are closed.