SQL Merge Statement - EVIL or DEVIL

SQL Merge Statement: EVIL or DEVIL

The SQL MERGE statement in SQL helps in performing repetitive tasks by a single query when we want to INSERT, DELETE, and UPDATE data from a table. If you want to perform INSERT, UPDATE, and DELETE operations on a target table by comparing the records from the source table, in such a scenario, the MERGE command is the best solution. The standard MERGE statement takes a target table, and merge (INSERT, UPDATE) data from a source table into it. You can specify conditions on which you expect the MERGE statement to insert, update, or delete, etc.

Today's Agenda

Q01. What is the MERGE Statement in SQL? Q02. Why separate MERGE Statement if we have already INSERT, UPDATE, and DELETE commands? Q03. What are the different databases that support the SQL MERGE Statement? Q04. What are the different MERGE Statement Scenarios for DML? Q05. What are the different supported MERGE Statement Clauses in MSSQL? Q06. What will happen if more than one row matches the MERGE Statement? Q07. What are the fundamental rules of SQL MERGE Statement? Q08. What is the use of the TOP and OUTPUT clauses in the SQL Merge Statement? Q09. How can you use the Merge Statement in MSSQL? Q10. How does the SQL MERGE command work in MSSQL? Q11. How can you write the query for the SQL Merge? Q12. Is SQL MERGE Statement Safe to use?

What is SQL MERGE Statement?

The Merge statement is also known as the UPSERT statement in SQL. This synonym is used in PostgreSQL (v9.5+) and SQLite (v3.24+). A MERGE statement in SQL is the combination of three DML statements (INSERT, DELETE, and UPDATE) in a single one. It allows you to write all your DML statements (INSERT, UPDATE, and DELETE) combined in a single statement. The MERGE command runs INSERT, UPDATE, or DELETE operations on a target table from the results of a join with a source table. So if there is a Source Table and a Target Table that are to be merged, then with the help of the MERGE statement, all three operations (INSERT, UPDATE, DELETE) can be performed at once. The MERGE statement updates a target table or view using data from a source table or intermediate resultset. Rows in the target that match the source can be deleted or updated as specified, and rows that do not exist in the target can be inserted. Similarly updating, deleting, or inserting a row in a view updates, deletes, or inserts the row in the tables on which the view is based.

What are the different databases that support the SQL MERGE Statement?

The Merge statement allows us to update records in a specific table based on values that match from another database table. The MERGE statement is one of the most advanced standardized SQL constructs, which is supported by DB2, HSQLDB, Oracle, SQL Server, and Sybase. However, DB2, Oracle, SQL Server, and Sybase also allow for DELETING some data and for adding many additional clauses. In the common databases, PostgreSQL, Oracle Database, IBM Db2, Teradata, EXASOL, Firebird, CUBRID, H2, HSQLDB, MS SQL, Vectorwise, and Apache Derby support the standard syntax.

MSSQL Database:

In MSSQL, the MERGE statement concept was officially introduced in the SQL 2003 standard and expanded in the SQL 2008 standard. The SQL MERGE statement was introduced in the SQL Server 2008 very first time. Microsoft SQL Server extends with supporting guards and also with supporting LEFT JOIN via WHEN NOT MATCHED BY SOURCE clauses.

ORACLE Database:

The MERGE statement was introduced in Oracle 9i to conditionally insert or update data depending on its presence, a process also known as an "upsert". The MERGE statement reduces table scans and can perform the operation in parallel if required. The MERGE statement in Oracle is used to select data from one or more source tables and update or insert it into a target table.

MySQL Database:

Actually, MySQL does not support the standard MERGE statement. However, you can use the INSERT INTO ... ON DUPLICATE KEY UPDATE statement to achieve similar functionality. The only limitation in MySQL is that the join between target and source has to be made only on PRIMARY KEY or UNIQUE KEY constraints, which is not required in the ANSI/ISO standard. If you specify the ON DUPLICATE KEY UPDATE option in the INSERT statement and the new row causes a duplicate value in the PRIMARY Key Index or UNIQUE Key Index, MySQL performs an update to the old row based on the new values. It also supports REPLACE INTO syntax which first attempts an INSERT, and if that fails, DELETE the row if exists, and then inserts the new one. There is also an IGNORE clause for the INSERT statement which tells the server to ignore "duplicate key" errors and go on. It means existing rows will not be inserted or updated, but all new rows will be inserted.

Other Databases:

Some other databases also support this MERGE statement, or very similar behavior, through their own, non-standard SQL extensions. SQLite also uses INSERT OR REPLACE INTO which works similarly. It also supports REPLACE INTO as an alias for compatibility with MySQL. PostgreSQL started supporting the MERGE statement since version 15 but previously supported merging by INSERT INTO ... ON CONFLICT statements. The IBM Db2 also does not support the Merge statement directly. However, it extends the syntax with multiple WHEN MATCHED and WHEN NOT MATCHED clauses. The Firebird supports the MERGE INTO statement but it fails to throw an error when there are multiple Source data rows. Additionally, there is a single-row version, UPDATE OR INSERT INTO table-name (columns) VALUES (values) [MATCHING (columns)], but the latter does not give you the option to take different actions on insert vs. update.

What are the different MERGE Statement Scenarios for DML?

If there are two tables called source table, and target tables, and you need to update the target table based on the values matched from the source table. In such case there could be three scenarios:

image source: tutorialdotnet

SCENARIO 1: Non-Matching Records in the Source Table The source table may have some rows that do not exist in the target table. So in this case, you need to INSERT those rows that are in the source table into the target table. SCENARIO 2: Non-Matching Records in the Target Table The target table also may have some rows that do not exist in the source table. So in this case, you need to DELETE those rows from the target table. SCENARIO 3: Matching Records in Both Tables The source table may have some matching rows with the same keys as the rows in the target table. However, these rows may have different values in the non-key columns. So in this case, you need to UPDATE the rows in the target table with the values coming from the source table. If you are thinking of using the INSERT, UPDATE, and DELETE statements individually for the above three scenarios, you have to construct three separate statements to update the data to the target table from the source table. However, the SQL MERGE statement can do all the above three actions at the same time.

What are the different supported MERGE Statement Clauses in MSSQL?

In SQL MERGE statement when rows are matched between source and target table, SQL Server assigns a matching condition for each row, based on the search_condition. When this condition equates to true, the source row is known as a MATCHED with a target row. When the merge search condition is false the source table row is considered NOT MATCHED. Further, the "NOT MATCHED" state is broken into two different conditions NOT MATCHED BY TARGET and NOT MATCHED BY SOURCE. The "NOT MATCHED BY TARGET" means a row exists in the source table that did not match any row in the target table based on the join criteria. On the other hand, the "NOT MATCHED BY SOURCE" condition means that a row exists in the target table that has no matching rows in the source table. Additionally, when both source and target table comparison columns contain NULL values, the matching value is considered “NOT MATCHED” for both target and source tables.

WHEN MATCH clause:

This clause indicates the rows that match the merge search criteria. These are the rows that match the merge condition. This clause is used to UPDATE or DELETE records based on the matching criteria. Please note that only up to max two "WHEN MATCHED" clauses can be used in any MERGE Statement. In this case both "WHEN MATCHED" clauses cannot be used with only either UPDATE or DELETE statement. If you want to use two "WHEN MATCHED" clauses in the MERGE Statement then:

One of them must be used for the UPDATE operation.
Another one must be used for DELETE operation.
Third, one of them must have additional search conditions along with the "WHEN MATCHED" conditions.

But note that, if multiple WHEN conditions are true, only the first of them takes effect; very much like how the CASE statement works. Therefore the order of the rules is significant. However, the source rows that do not fulfill any WHEN condition do not cause any action.

WHEN NOT MATCHED BY TARGET clause:

Please note that the "BY TARGET" term is optional. So the "NOT MATCHED" is also known as "NOT MATCHED BY TARGET". It indicates the rows that are not matched in both tables. These are the rows from the source table that do not have any matching rows in the target table. This "WHEN NOT MATCHED BY TARGET" clause is used to INSERT the row in the target table. Please note that the "WHEN NOT MATCHED" clause can be used only once in the MERGE Statement. However, in the MERGE Statement, you can use additional search conditions as well along with the WHEN NOT MATCHED BY TARGET clause.

WHEN NOT MATCHED BY SOURCE clause:

This clause indicates the records in the target table for which there are no matching records in the source table. This means these are the rows in the target table that do not match any rows in the source table. The "WHEN NOT MATCHED BY SOURCE" can be used to DELETE the records from the Target table based on the join condition. This can be used only once in the MERGE Statement. However, if you want to have two "WHEN NOT MATCHED BY SOURCE" Clause, then:

The first one needs to have an additional search criterion by AND condition, and that condition can only reference the target columns.
Also, one of the NOT MATCHED BY conditions needs to perform a DELETE action while the other one UPDATE action.
The second NOT MATCHED BY SOURCE condition is only invoked if the first one is not evaluated to be true.

What will happen if more than one row matches the MERGE Statement?

Actually, the standard SQL merge is driven by the source table only. Each row in the source is tested against the WHEN…THEN rules. Rows in the target that have no corresponding source row are ignored. Also, a MERGE statement modifies each target row at most once. If a single target row has multiple corresponding source rows, the merge statement raises an error and aborts. That prevents the unfortunate situation that the order in which the source rows are processed affects the effect of the merge. If no rows are returned from the source table when joining the target and source table then the UPDATE or DELETE operation cannot reference a source column because there are no matching rows between target and source. However, if more than one source row is matched then the corresponding UPDATE or DELETE operation will fail and SQL Server will throw the following error: Msg 8672, Level 16, State 1, Line 110 The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows. Therefore, care needs to be taken when joining target and source rows, so that multiple source rows are not matched with a single target row. This usually means that the join between the target and source data sets needs to be one of equality.

What are the fundamental rules of SQL MERGE Statement?

There are some fundamental rules and restrictions, that you must follow when using the SQL MEGERE statement:

It is mandatory to have SELECT permission on the source table and INSERT, UPDATE, and DELETE permission on the target table for performing the MERGE statement.
As for restrictions, every Merge statement must be terminated by a semi-colon(;). If a semi-colon is not present at the end of the MERGE statement, then an error will be thrown. You can watch the GO vs Semicolon Terminator video here
The total number of table and column references in the Merge statement cannot exceed 256.
MERGE is a deterministic statement, so you cannot update the same row of the target table multiple times in the same MERGE statement. In fact, a variable can't be updated more than once in the same MATCHED clause.
Also, you cannot update any of the columns you are matching in the ON clause. You can watch the ON vs WHERE video here
You can use @@ROWCOUNT after the MERGE statement which will return the number of records that have been modified (inserted, updated, and deleted) by the transaction. You can read different ways to get record count in SQL here.
MERGE does not tell you the number of rows that have been inserted or updated, it only shows the total number of rows merged.
In the Merge statement, at least one of the three MATCHED clauses must be specified, however, they can be specified in any order. One of the MATCHED clauses must be provided in order for the MERGE statement to operate.
Any INSERT, UPDATE, and DELETE action specified on the target table by the MERGE statement is limited by the constraints defined on it, including any cascading referential integrity constraints. If IGNORE_DUP_KEY is ON for any unique indexes on the target table, MERGE ignores this setting. You can read SQL Constraints here.
MERGE is a fully reserved keyword when the database compatibility level is set to 100 or higher. The MERGE statement is available under both 90 and 100 database compatibility levels; however, the keyword isn't fully reserved when the database compatibility level is set to 90.
The MERGE statement will record the row as “merged” if the source and target record is identical and the merge condition is true.

What is the use of the TOP and OUTPUT clauses in the SQL Merge Statement?

Yes, You can use the TOP clause with the MERGE Statement to limit the number of rows that will be affected by the merge statement. SQL Server also allows you to log all the changes done by the MERGE statement using the OUTPUT clause. The OUTPUT clause helps us to understand which row is inserted, updated, or deleted in the target table. The $action is used to know whether the row is Inserted, Updated, or Deleted.

  
MERGE TOP (10) <target_table> USING <source_table>
ON <merge_condition>
WHEN MATCHED
    THEN <update_statement>
WHEN NOT MATCHED BY TARGET 
    THEN <insert_statement>
WHEN NOT MATCHED BY SOURCE
    THEN DELETE;
OUTPUT $action AS [MergeAction], DELETED.*, INSERTED.*;

How can you use the Merge Statement in MSSQL?

First, specify the target table and the source table in the MERGE clause.
Second, define the merge_condition that determines how the rows from the source table are matched to the rows from the target table.
Now, the merge_condition results in three states: MATCHED, NOT MATCHED, and NOT MATCHED BY SOURCE.

  
MERGE <target_table> USING <source_table>
ON <merge_condition>
WHEN MATCHED
    THEN <update_statement>
WHEN NOT MATCHED BY TARGET 
    THEN <insert_statement>
WHEN NOT MATCHED BY SOURCE
    THEN <delete_statement>

Here, please note that:

The table order matters, so make sure which table is the source and which one is the target table first.
Secondly, in the above syntax "BY TARGET" clause is optional because it is the default behavior in the WHEN NOT MATCHED case. But the "BY SOURCE" clause is mandatory if you are using it against the source table.
You can include up to two WHEN MATCHED clauses in your MERGE statement but in such a scenario the first clause must include the AND keyword followed by a search condition.

  
MERGE <target_table> USING <source_table>
ON <merge_condition>
WHEN MATCHED AND <search_condition>
    THEN <update_or_delete_statement>
WHEN MATCHED 
    THEN <update_statement>

Like the WHEN MATCHED clause, you can include up to two WHEN NOT MATCHED BY SOURCE clauses in your MERGE statement. If you include two, then the first clause must include the AND keyword followed by a search condition.

How does the SQL MERGE command work in MSSQL?

The standard SQL merge is driven by the source table. So each row in the source is tested against the WHEN…THEN rules. Rows in the target that have no corresponding source row are ignored. Also, a MERGE statement modifies each target row at most once. Actually, there are two phases in which the SQL MERGE statement works. The standard SQL's MERGE first identifies all changes that must be done. And in the second phase, it applies those changes. It means all WHEN clauses of MERGE are evaluated before any actual changes are made. So Merge first decides what to do, and then it does it. In the meantime, the situation may be changed. Due to this fact, sometimes the MERGE causes confusion by giving surprising error messages. For example, if the MERGE statement evaluates the WHEN NOT MATCHED to true and identifies a row to be INSERTED. Meanwhile, before MERGE actually INSERT the row, another transaction inserts a row with the same Primary Key values and commits the transaction. Now when the merge eventually tries to perform the actual operations, the INSERT fails with a constraint violation error. This might come as a surprise because when not matched the insert rule should not apply if the row exists.

How can you write the query for the SQL Merge?

Suppose we have two tables "Book" and "BookHistory" where "Book" is the source table and "BookHistory" is the target table. The following are the available records in the tables:

  
MERGE BookHistory AS Target
USING Book AS Source
-- MERGE Condition
ON Source.BookID = Target.BookID

-- UPDATE when MATCHED
WHEN MATCHED THEN UPDATE SET
        Target.BookName	= Source.BookName,
        Target.Price	= Source.Price

-- INSERT when Target NOT MATCHED
WHEN NOT MATCHED BY Target THEN
		INSERT (BookID, BookName, Price) 
		VALUES (Source.BookID, Source.BookName, Source.Price)

-- DELETE when Source NOT MATCHED	
WHEN NOT MATCHED BY Source THEN DELETE
		
-- Checking the MERGE Actions
OUTPUT $action, 
	DELETED.BookID AS TargetProductID, 
	DELETED.BookName AS TargetProductName, 
	DELETED.Price AS TargetPrice, 

	INSERTED.BookID AS SourceProductID, 
	INSERTED.BookName AS SourceProductName, 
	INSERTED.Price AS SourcePrice;

-- Records After Merge Operation
SELECT * FROM Book;
SELECT * FROM BookHistory;

You can download the complete SQL Script and data from here for your hands-on. After the Merge Operation by the above SQL statement, you can see the merge changes and the MERGE operations as follows:

Is SQL MERGE Statement Safe to use?

In the above example if you the BookId 4 and 5 from the source table is not inserted into the target table while it should be because it is not present in the target table at all. It happens because of a NULL value. Similarly, there are a lot of bugs and issues in the SQL MERGE statement. It is hazardous to use without a proper understanding of the impacts. The MERGE feature was originally shipped with several inappropriate behaviors and other bugs, some of which have been addressed, and some of which continue to exist in current versions as well. I will cover them in a separate article Evils of SQL Merge, however you can check some open errors here. However, till then you can make yourself aware of SQL Three Valued Logic, ISNULL vs COALESCE, ON vs WHERE Clause and INNER JOIN vs LEFT JOIN because we are going to use these concepts in the next article very soon. Some of the SQL Merge issues are listed here:

Concurrency Issues
NULL Value Handling Issue
Constraint Violations Issues
Duplicate Key Handling Issue
Query Optimization Issue
Deadlock Handling Issue

Wrap Up

Well, if you only want to update data that already exists, or insert data that does not exist, then you might find it easier to write an SQL statement using MERGE than writing an INSERT or an UPDATE statement that looks for matches using a subquery. However, the Merge clause is hazardous as well. So I will recommend using three separate DML commands for INSERT, UPDATE, and DELETE operations for your requirement instead of SQL Merge. You can use the different ways of performing INSERT, DELETE, and UPDATE data from a table here. You can download the above complete SQL Script from the below button:

SQL Merge Statement - EVIL or DEVIL