Many queries will get you to the same results.
The only difference then is the performance of each query.
Understanding how to make performant queries is a great skill for a data engineer. A data engineer who knows how to optimize queries can prevent bottlenecks, reduce costs, and ensure faster, more reliable access to the data.
A specific order
SELECT
statements follow a defined order of execution, where each step progressively reduces the volume of data processed, ultimately leading to a more efficient query. As each stage refines the dataset—filtering out unnecessary rows and columns—the overall query becomes faster and less resource-intensive.
By understand the order of the operations, you can make sure to craft queries that filter out the maximum amount of data first, so that the rest of the operations happens on a slower subset of the data, making them faster and thus improving the performance of the overall query.
A mnemonic to remember
However, it is difficult for data engineers and data analyst to remember the order of the operations, me included, so I created this silly mnemonic:
“Friendly Jungle Walks Generate Happy Squirrel Dances On Logs”
The next time you’re struggling with performance on an SQL query, close your eyes, and picture yourself having a nice, friendly walk in the jungle. You brought a bag of nuts to feed the local squirrels. As they have recognized you, they dance happily on a log.
Each Capital letter correspond to an evaluation step of your SELECT query:
🐿️ Friendly - FROM
🐿️ Jungle - JOINs
🐿️ Walks - WHERE
🐿️ Generate - GROUP BY
🐿️ Happy - HAVING
🐿️ Squirrel - SELECT
🐿️ Dances - DISTINCT
🐿️ On - ORDER BY
🐿️ Logs - LIMIT
This should help you get started understanding the performance issue of your SQL SELECT query.
Training
I recommend doing the SQL 50 challenge from leetcode to train on how to craft SQL queries. The mnemonic will help you when dealing with performance-oriented queries.
Hope it helps :)