Wednesday, October 30, 2013

Many Starting Points to the Same Path - What Your Code Says About You

Whether I'm teaching people how to use a computer for the first time or I'm trying to get data properly filtered out of a database.  I often note to myself (and now to you) how there is usually 2-3 ways to accomplish any one task with a computer.  My classic example for the nearly-initiated computer-user is copying text to your clipboard:  

  1. Select text... Menu Bar... "Copy" [new user]
  2. Select text... Right-click... "Copy" [mouse user]
  3. Select text... Ctrl+C  [keyboard user]

Those are the big three, and I don't doubt there are others, but that's the beauty.  Each ultimately does the same thing, but with continued use, one develops a habit of using one method over the others or develops a best practice of using different ones suited to their circumstance.  Some people may go years without knowing about the keyboard shortcut, but that doesn't stop them from doing their work.

This multi-method approach is a boon to the challenge of the human-to-computer interaction.  Each individual brings their own set of attributes, affinities, and afflictions with them when setting about to achieve anything - part of what makes us human.  The computer requires clear, explicit instructions.

In SQL (Structured Query Language), there is a frequent need to exclude data from one's main dataset based on the data in another dataset.  Example inquiries of a database query writer might include: 

  • "I want to see a list of discontinued products that have sold."
  • "Show me how many part-time employees have clocked in this week."
  • "Have we had any hits to our website from outside the United States?"
There are a number of ways to do this.  Let's take the first case, retrieving a list of discontinued products [Products] where there have been orders placed [OrderItems].  The following three queries do the exact same thing. In fact, the execution plan (SQL Server shown below) for each query is the same, which means the database goes through the same steps each time despite the different code written.

SELECT DISTINCT p.ProductID
FROM dbo.Products p
INNER JOIN dbo.OrderItems oi ON p.ProductID = oi.ProductID
WHERE p.Discontinued = 1
[practiced user]

SELECT p.ProductID
FROM dbo.Products p
WHERE p.Discontinued = 1
INTERSECT
SELECT oi.ProductID
FROM dbo.OrderItems oi
[old school or show-off user]

SELECT p.ProductID
FROM dbo.Products p
WHERE p.Discontinued = 1
AND EXISTS (SELECT NULL FROM OrderItems oi WHERE oi.ProductID = p.ProductID)
[seasoned or readability user]

Query Plan:
SQL Server Sample Query Execution Plan











If you've hand-written SQL to any degree, you've probably seen or used the something like the first query: DISTINCT ... INNER JOIN.  This method is so common that initiated SQL users might consider it more readable than the other queries.  The DISTINCT keyword simply limits the output to one row per unique ProductID in the likely event there is more than one sale of any of these discontinued products.

However, if you like your code short, consider INTERSECT.  You should apply extra consideration whether to comment your code since you or another developer may see your code and forget what INTERSECT means.

The WHERE EXISTS is quite readable from a human language standpoint.  It effectively reads, "SELECT discontinued products WHERE OrderItems for those same products EXIST."

I'm sure there are several other ways to do this.  In fact, a bonus method you can check out for yourself is using IN.  Some pseudo-code to get you started:
SELECT discontinued ProductIDs WHERE ProductID IN (OrderItems' ProductIDs)

Check these links for further reading.

No comments:

Post a Comment