Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect SQL code generated #26

Open
SimonSntPeter opened this issue Jul 13, 2021 · 6 comments
Open

Incorrect SQL code generated #26

SimonSntPeter opened this issue Jul 13, 2021 · 6 comments

Comments

@SimonSntPeter
Copy link

Came to this from the HN discusson 'against SQL'.
In https://github.com/erezsh/Preql

The translation of this...

print Continent {
... // Include existing fields
density: population / area // Create new a field

} order{^density}

...to this...

WITH subq_1(id, name, area, population, density) AS (
SELECT id, name, area, population, (CAST(population AS float) / area) AS density
FROM Continent
ORDER BY density DESC)
SELECT * FROM subq_1

...is almost certainly wrong. Certainly it is in mssql and likely every other DB. The order by is not honoured except in the last (outermost) select statement.

If tried in MSSQL you correctly get this error:

Msg 1033, Level 15, State 1, Line 6
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

Any 'order by' in a subquery, including a CTE, does not guarantee any ordering at all.

It needs to be

WITH subq_1(id, name, area, population, density) AS (
SELECT id, name, area, population, (CAST(population AS float) / area) AS density
FROM Continent)
SELECT * FROM subq_1
ORDER BY density DESC

(which mssql accepts without complaint)

HTH

@erezsh
Copy link
Owner

erezsh commented Jul 13, 2021

Thanks for bringing this to my attention.

Preql currently doesn't officially support MSSQL, but only Postgresql, MySql, and Sqlite.

I've just tested this code on Sqlite and Postgres and both work and return the correct results.

I must say MSSQL's behavior seems a bit perplexing to me. I see no reason that order by won't work in CTEs. But I might consider adding support for MSSQL next.

@SimonSntPeter
Copy link
Author

No problem. Just checked docs:

Bigquery https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#order_by_clause doesn't seem to specify in those circumstances, redshift https://docs.aws.amazon.com/redshift/latest/dg/r_ORDER_BY_clause.html likewise doesn't seem to.
That's not impressive of either of them. Perhaps you might contact them for clarification.

Anyway, I'd advise caution relying on it.

@erezsh
Copy link
Owner

erezsh commented Jul 13, 2021

Thanks! I'll take it under advisement.

I'm still confused about the intention behind it. After all, order is important in SQL. It affects UNION ALL, OFFSET-LIMIT, window functions, and tons of other stuff. And there is no way to predict how they will be composed. So ignoring a meaningful sorting.. is a questionable policy.

Anyway, I believe it won't be too hard to fix if necessary.

@SimonSntPeter
Copy link
Author

Hi,
please, speaking as a DBA guy with ~25 years experience, you CANNOT assume ordering will do what you want here, and I emphasise experimentation won't help, in fact it'll make things worse because you get the results you expect... until you don't. Which will inevitably be in production (and it's horrible when that happens).
See https://dba.stackexchange.com/questions/184149/is-it-really-possible-that-the-order-will-not-be-guaranteed-for-this-particular

The reason MSSQL correctly forbids it, and the AQL spec says don't rely on any ordering unless orer by is given is to allow for optimisations. These are the same optimisations which may very well alter your execution plan as the load on the DB changes (or may alter it via predicate pushdown, that might be possible).

See https://www.mssqltips.com/sqlservertip/4472/sql-server-enterprise-advanced-scan-aka-merrygoround-scan/ for an example of how one query can affect the order of the results of another "When Query 1 and Query 2 reach page 200,000, Query 1 will complete, but Query 2 will wrap back to the first data page and continue to scan until it reaches page 100,000 and then completes"
So query 2 gets stuff out of order.

It's your choice but TL;DR I politely urge you not to rely on any assumption of ordering, nor to rely on ordering in subqueries.

@SimonSntPeter
Copy link
Author

The less you're allowed to assume, the more leeway the optimiser has.

Cheers, and all the best!

@erezsh
Copy link
Owner

erezsh commented Jul 13, 2021

Thank you, I promise I will look into it.

Having a reliable implementation is a high priority for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants