Sunday, June 21, 2009

SQL Server 2008 - MERGE Operator

MERGE Operator
Use GROUPING SETS to obtain results similar to those generated by using CUBE and ROLLUP, however GROUPING SETS is more flexible, offers better performance, and is ANSI SQL 2006 compliant. GROUPING SETS enables the GROUP BY clause to generate multiple grouped aggregations in a single result set. It is equivalent to using UNION ALL to return a result set from multiple SELECT statements, each of which has a GROUP BY clause.

SQL Server 2008 – GROUPING SETS

GROUPING SETS

Lets try with some realtime example...

Create sales table and fill some data into it:
CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)
INSERT Sales VALUES(1, 2005, 12000)
INSERT Sales VALUES(1, 2006, 18000)
INSERT Sales VALUES(1, 2007, 25000)
INSERT Sales VALUES(2, 2005, 15000)
INSERT Sales VALUES(2, 2006, 6000)
INSERT Sales VALUES(3, 2006, 20000)
INSERT Sales VALUES(3, 2007, 24000)

Then run the bellow query:
SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH ROLLUP

Output:


Also try this:
SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH CUBE
Output:


We can rewrite these two queries using the new syntax as:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY ROLLUP(EmpId, Yr)

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY CUBE(EmpId, Yr)

The new GROUPING SETS syntax is considerably more powerful. It allows us to specify precisely which aggregations we want to compute. As the following table illustrates, our simple two dimensional schema has a total of only four possible aggregations:



ROLLUP and CUBE are just shorthand for two common usages of GROUPING SETS. We can express the above ROLLUP query as:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY GROUPING SETS((EmpId, Yr), (EmpId), ())



This query explicitly asks SQL Server to aggregate sales by employee and year, to aggregate by employee only, and to compute the total for all employees for all years. The () syntax with no GROUP BY columns denotes the total. Similarly, we can express the above CUBE query by asking SQL Server to compute all possible aggregate combinations:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY GROUPING SETS((EmpId, Yr), (EmpId), (Yr), ())


We can also use GROUPING SETS to compute other results. For example, we can perform a partial rollup aggregating sales by employee and year and by employee only but without computing the total for all employees for all years:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY GROUPING SETS((EmpId, Yr), (EmpId))



We can skip certain rollup levels. For example, we can compute the total sales by employee and year and the total sales for all employees and all years without computing any of the intermediate results:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY GROUPING SETS((EmpId, Yr), ())


We can even compute multiple unrelated aggregations along disparate dimensions. For example, we can compute the total sales by employee and the total sales by year:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY GROUPING SETS((EmpId), (Yr))


Note that we could also write GROUPING SETS (EmpId, Yr) without the extra set of parenthesis, but the extra parenthesis make the intent of the query more explicit and clearly differentiate the previous query from the following query which just performs a normal aggregation by employee and year:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY GROUPING SETS((EmpId, Yr))


Here are some additional points worth noting about the GROUPING SETS syntax:

As with any other aggregation query, if a column appears in the SELECT list and is not part of an aggregate function, it must appear somewhere in the GROUP BY clause. Thus, the following is not valid:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY GROUPING SETS((EmpId), ())

Msg 8120, Level 16, State 1, Line 1
Column 'Sales.Yr' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

The order of the columns within each GROUPING SET and the order of the GROUPING SETS does not matter. So both of the following queries compute the same CUBE although the order that the rows are output differs:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY GROUPING SETS((EmpId, Yr), (EmpId), (Yr), ())

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY GROUPING SETS((), (Yr), (EmpId), (Yr, EmpId))

If the order that the rows are output matters, use an explicit ORDER BY clause to enforce that order.

We can nest CUBE and ROLLUP within a GROUPING SETS clause as shorthand for expressing more complex GROUPING SETS. This shorthand is most useful when we have more than three dimensions in our schema. For example, suppose we add a month column to our sales table:

CREATE TABLE Sales (EmpId INT, Month INT, Yr INT, Sales MONEY)

Now, suppose we want to compute sales for each employee by month and year, by year, and total. We could write out all of the GROUPING SETS explicitly:

SELECT EmpId, Month, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY GROUPING SETS((EmpId, Yr, Month), (EmpId, Yr), (EmpId))

Or we can use ROLLUP to simplify the query:

SELECT EmpId, Month, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY GROUPING SETS((EmpId, ROLLUP(Yr, Month)))

Note that once again the correct use of parenthesis is critical. If we omit one set of parenthesis from the above query, the meaning changes significantly and we end up separately aggregating by employee and then computing the year and month ROLLUP for all employees.

SQL Server 2008 – Some Cool Features for Database Developer

This is an overview of four features in SQL Server 2008 that I think are neat and will be important to both developers and DBAs. For a comprehensive feature list, go to Microsoft's SQL Server 2008 Product Overview page.

1. Table variable parameters

For a long time, I have wished that developers could pass a table variable as a parameter to stored procedures. With the advent of SQL Server 2005, XML variables can be passed into procedures and parsed out with XQUERY inside the procedure to form a table. However, with the ability to pass a table variable in SQL Server 2008, developers will not have to be fully XQuery literate to pass a table of data into or out from a stored procedure. In my opinion, this will be one of the more integral tools for the developer in SQL Server 2008.

Table-valued parameters provide an easy way to marshal multiple rows of data from a client application to SQL Server without requiring multiple round trips or special server-side logic for processing the data. You can use table-valued parameters to encapsulate rows of data in a client application and send the data to the server in a single parameterized command. The incoming data rows are stored in a table variable that can then be operated on by using Transact-SQL.

Column values in table-valued parameters can be accessed using standard Transact-SQL SELECT statements. Table-valued parameters are strongly typed and their structure is automatically validated. The size of table-valued parameters is limited only by server memory.

Note: You cannot return data in a table-valued parameter. Table-valued parameters are input-only; the OUTPUT keyword is not supported.

2. MERGE Statement

The new MERGE operator streamlines the process of populating a data warehouse from a source database. For example, rows that get updated in the source database will probably already exist in the data warehouse but rows that are inserted into the source database will not already exist in the data warehouse. The MERGE statement distinguishes between the new and updated rows from the source database so that the appropriate action (insert or update) can be performed against the data warehouse in one single call.

3. Grouping Sets
Use GROUPING SETS to obtain results similar to those generated by using CUBE and ROLLUP, however GROUPING SETS is more flexible, offers better performance, and is ANSI SQL 2006 compliant. GROUPING SETS enables the GROUP BY clause to generate multiple grouped aggregations in a single result set. It is equivalent to using UNION ALL to return a result set from multiple SELECT statements, each of which has a GROUP BY clause.

SQL Server 2008 - Table Valued Parameters

Table Valued Parameters

In SQL Server 2005 and earlier, it was bit difficult to pass numerous parameters (Table variable) to the T-SQL Statements or Functions or Stored Procedures. The approach we used to take is creating a Temporary Table on Demand and inserts the values to the Temporary Table and then calls it in requisite procedure. In SQL Server 2008 the Table-Valued Parameters have been introduced that helps us eliminating the cumbersome process. This is the ability to easily pass a table to a stored procedure from T-SQL code or from an application as a parameter.
User-Defined Table Type

When first take a look at the new table-valued parameters, I thought that using this feature is a bit complicated. There are several steps involved. The first thing to do is to define a table type. If you look at the Types section under Programmability in the 2008 Management Studio, you will see the new “User-Defined Table Types”


Table-valued Parameters are the new parameter types in SQL Server 2008 and it could be declared by declaring the user defined table types. The Table-valued parameters could be used to send multiple rows of data to a T-SQL statement or routine. Table-valued parameters are like the parameter arrays in OLE DB and ODBC but with better flexibility. It could be used to participate in the set-based operations. Permissions for the Table-valued parameters follow SQL Server Object Model so CREATE, GRANT, DENY, ALTER etc. could be used.

Table Valued Parameters help in the following:
1. They are Strongly Typed
2. We can specify the sort order and the unique key
3. Reduce Round trips to the server
4. We can have the table structure of different cardinality
5. It doesn't need or cause the statement to recompile
6. It doesn't need lock to initially populate the data from the client
7. Enables us to include complex business logic in a single routine
8. It provides a very simple programming model.

Lets start with an example:
Syantax:



After running the above code, the object definition is created and viewable in the “User-Defined Table Type”. You can view the properties there, but not modify them. To modify the type, you will have to drop it and then create it again with the modified definition.



Using the User-Defined Table Type
So far, it seems like we have done quite a bit of work, but we only have the type definition created. Like other programmable objects, it will stick around unless it is dropped. To use it in T-SQL code, you must create a variable of the new type and then populate it as you would any other table. Once it is populated, you can use it in other T-SQL statements. Because it is a variable, it goes out of scope automatically when the batch is completed. Notice in the code below that the name of the data type is the same as the type we just created.



As long as the variable does not go out of scope, you can do just about anything with it that you can do with a regular table variable, such as join another table or be used to populate another table. Like a table variable, you can not modify the table definition.

As I mentioned, the variable is gone once it goes out of scope. For example if you have a T-SQL script that is composed of more than one batch, the variable is only valid within the batch where it was created.

Using the Variable as a Parameter
So far, we haven’t seen anything that is not possible with a regular table variable. The benefit is being able to pass the variable of the new type to a stored procedure. A stored procedure must be created that uses the new type. Here is an example of that along with the code to create a regular table that we will be populating.



Notice the READONLY qualifier after the table parameter. This is required because a copy of the user-defined table variable is not passed to the procedure. To be more efficient, a pointer is passed to the proc. Therefore, to eliminate changes to the variable inside the proc that would affect the original, no changes are allowed.

Finally, let’s put it all together and call the stored procedure. In the next code snippet, code from the previous section is used to create and populate the variable.



In order for a user to use the User-Defined Table Type, EXECUTE or CONTROL permission must be granted. This is the command to grant permission to a user:

GRANT EXECUTE ON TYPE::dbo.MyType TO TestUser;

Calling from a .Net Application
The coolest way to use the table-valued parameter is from a .Net application. To do so, you need to have .NET 3.5 installed and make sure you are using the System.Data.SQLClient namespace. This gives you a new SQL data type called Structured that you will use when creating the parameter.

First create a local DataTable and populate it. Be sure that the DataTable that you create matches the user-defined table type’s column count and data types.



Since we will be working with a stored proc, create a command object and add the two parameters. This code assumes that you have an open connection to your test database.



Notice the data type of the @MyTableParam parameter. This is the new type added with .Net 3.5 to work with this new functionality. Finally, assign the local table to the parameter and execute the command.

Conclusion
The table-valued parameter feature introduced with SQL Server 2008 has been sorely needed. It will allow developers to write much better performing applications by decreasing round-trips to the server and by letting SQL Server work how it does best – on sets of data. It is not difficult to use once all the pieces are put into place.

Let me know if I missed anything...

Expence