From https://learn.microsoft.com/en-us/sql/t-sql/queries/select-group-by-transact-sql?view=sql-server-ver16 it says "However, each table or view column in any nonaggregate expression in the list must be included in the GROUP BY list". Exactly as you'd expect (there seem to be a couple of exceptions, probably MSSQL specific, regarding string concatenation but they don't seem to be relevant here).
At work I intended to write this:
SELECT
max(STATS_DATE(object_id, stats_id)),
object_name(object_id)
FROM sys.stats
GROUP BY object_name(object_id)
Which works as expected. What I actually wrote by accident was this
SELECT
max(STATS_DATE(object_id, stats_id)),
object_name(object_id)
FROM sys.stats
GROUP BY object_id -- should have been object_name(object_id)
The SELECT contains object_name(object_id)
but the GROUP BY contains object_id
, which is definitely not in the SELECT. ISTM this should have been rejected, but it runs fine.
I don't get it. Is there some support for functional dependencies that I'm not aware of? Is there even a functional dependency between object_id
and object_name(object_id)
(I don't think so because object_name(object_id)
might not necessarily be unique, although my understanding of FDs is pretty poor).
What's going on?
2条答案
按热度按时间k3bvogb11#
With
you get one result row per object_id, say IDs 1, 2, 3 and 4.
With
you get their object names, e.g. 'Object 1', 'Object 2', 'Object 3', 'Object 4'.
As each result row refers to one object ID, you get the name for that object ID in your result row. Nothing wrong with this.
It would be different with
Here you'd group by the object name of the object IDs. Two or more IDs could have the same name. It is not possible hence to select the object ID, because it is not known and not necessarily unique per object name.
But this is not the case in your query.
tnkciper2#
Is there some support for functional dependencies that I'm not aware of?
Sort of, but no.
What you are allowed to do is use the
GROUP BY
value in any way you like in theSELECT
, with any functions or calculations, as long as you are not using any columns or values that have not been grouped.So the below query works, because
object_id
has been grouped, and you are allowed to pass a grouping value to a function.You can add two
GROUP BY
values together, or pass them to a function, or cast them etc.What you cannot do is do the calculation, then group that, and then use the original value from before the calculation. The original value is not present in the
GROUP BY
so cannot be used, only the actual value that was used.Two further notes:
Don't do what you intended to do and group by
object_name
, becausegrouping by a
nvarchar
is less efficient.the compiler does not understand any kind of functional dependency, and therefore may hash or sort where it doesn't need to.
Ideally don't use this type of aggregation at all, as it often forces a full grouping. Instead use
APPLY
to calculate a value per object.Furthermore, you shouldn't use those convenience functions if you can avoid it, as they have a number of issues, primarily around isolation levels and locking. Instead use the catalog views, such as
sys.tables
and dynamic management views such assys.dm_db_stats_properties
.Instead, use the following query: