10 June 2010

Analytic Function bug?

Last Tuesday we had an ODTUG Preview mini-conference at our office in Nieuwegein, The Netherlands. Nine presentations with nine fine speakers. Too bad the session were in parallel, and thus you had to pick and choose which session to attend. Oh well, that's always the case with conferences, even tiny ones like this one.
Because I'm doing a presentation during the ODTUG Kaleidoscope 2010, I also did my presentation last Tuesday.
The feedback I got was good, and I also got some pointers to make the presentation better.

While changing one of the demo's, I rediscovered of what I believe to be a documentation bug. The demo is about showing the Windowing Clause with Analytic Functions. If you are not familiar with Analytic Functions, attend my session and you will know all about it... ;)
Anyway, to create a running total using the EMP table use the SUM with the default Windowing Clause and you're good to go. You would have to know what the default Windowing Clause is in that case.
The documentation states

If you omit the windowing_clause entirely, then the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

Let's go back to the example, a running total on Scott's EMP table

SQL> select ename
2 , sal
3 , sum (sal) over (order by ename) running_total
4 from emp
5 where deptno = 20
6 order by ename
7 ;

ENAME SAL RUNNING_TOTAL
---------- --------- -------------
ADAMS 1100 1100
FORD 3000 4100
JONES 2975 7075
SCOTT 3000 10075
SMITH 800 10875

Because we omitted the Windowing Clause, we get the Default - range between unbounded preceding and current row.
Being explicit about the Windowing Clause, we get the same results - just as we expect.

SQL> select ename
2 , sal
3 , sum (sal) over (order by ename
4 range between unbounded preceding
5 and current row
6 ) running_total
7 from emp
8 where deptno = 20
9 order by ename
10 ;

ENAME SAL RUNNING_TOTAL
---------- --------- -------------
ADAMS 1100 1100
FORD 3000 4100
JONES 2975 7075
SCOTT 3000 10075
SMITH 800 10875

And the short notation of the Default Windowing Clause also yields the same results

SQL> select ename
2 , sal
3 , sum (sal) over (order by ename
4 range unbounded preceding
5 ) running_total
6 from emp
7 where deptno = 20
8 order by ename
9 ;

ENAME SAL RUNNING_TOTAL
---------- --------- -------------
ADAMS 1100 1100
FORD 3000 4100
JONES 2975 7075
SCOTT 3000 10075
SMITH 800 10875

Why do I believe this to be a bug because of the definition of a Range Window.

RANGE specifies the window as a logical offset.

and this part:


  • If value_expr evaluates to a numeric value, then the ORDER BY expr must be a numeric or DATE data type.

  • If value_expr evaluates to an interval value, then the ORDER BY expr must be a DATE data type




Look back at the examples and notice that I didn't use a Numeric or DATE data type in the ORDER BY expression - I used a VARCHAR2 (ename is after all a VARCHAR2). And this is in contradiction with the documentation quotes from above.
Using the Range Window - not the Default one - with the ename column raises an exception

SQL> select ename
2 , sal
3 , sum (sal) over (order by ename
4 range between 400 preceding
5 and 400 following
6 ) running_total
7 from emp
8 where deptno = 20
9 order by sal
10 /

select ename
, sal
, sum (sal) over (order by ename
range between 400 preceding
and 400 following
) running_total
from emp
where deptno = 20
order by sal

ORA-00902: invalid datatype

I believe the Default Windowing Clause should be

rows between unbounded preceding and current row

And to use the Default Windowing explicitly would show this:

SQL> select ename
2 , sal
3 , sum (sal) over (order by ename
4 rows between unbounded preceding
5 and current row -- Should be the default Window Clause
6 ) running_total
7 from emp
8 where deptno = 20
9 order by ename
10 ;

ENAME SAL RUNNING_TOTAL
---------- --------- -------------
ADAMS 1100 1100
FORD 3000 4100
JONES 2975 7075
SCOTT 3000 10075
SMITH 800 10875

Maybe, probably, I misinterpret the Oracle Documentation. If you believe the Oracle Documentation is correct, can you try and explain it to me?
I did make a note of it in the Oracle Documentation, you can add user comments, but it never made it past the moderators... I guess they don't agree with me. ;)
Links
Oracle Documentation on Analytic Functions

4 comments:

  1. Did the moderators gave you any feedback why they rejected your note ?

    ReplyDelete
  2. Alex,
    i believe, documentation is correct on this point. In your example the value_expr is not present at all (i mean the default window), that's why the exception is not raised. In addition, the window limited by unbounded preceding and current row is exactly the same for physical and logical offsets - if the sortkey is unique. However, you may easily see the difference in case the sortkey is not unique - here, physical offsets return undeterministic results, logical offsets - always deterministic. For example:

    select e.*,
    sum(sal) over(order by job) sum_default_window,
    sum(sal) over(order by job rows between unbounded preceding and current row) physical_window,
    sum(sal) over(order by job range between unbounded preceding and current row) logical_window
    from emp e

    Here, you'll see, that default windowing clause yields the same results as with specified range between unbounded preceding and current row. By rows between - the rows with the same values for sort key ( for example scott and ford) become different, arbitrarily assigned values - the result is nondeterministic ( it may be scott with 3000 and may be ford with 3000 - both are in different window, but the assignment is random, because sort key lacks uniqueness). In case of logical offsets - both scott and ford always become 6000 - they are in the same window.
    Sorry for this rather wordy comment ...

    Best regards

    Maxim

    ReplyDelete
  3. "I didn't use a Numeric or DATE data type in the ORDER BY expression - I used a VARCHAR2.... And this is in contradiction with the documentation quotes from above."

    Agree with Maxim. The value_expr it refers to isn't the expression in the ORDER BY. The Documentation (syntax diagram) labels that as the "expr". The label "value_expr" refers to the expression relating to the PRECEDING/FOLLOWING structure (which you don't use at all).

    ReplyDelete
  4. Thank you all for your comments.
    Now it makes more sense. Thank you, Maxim. And I thought I had Analytic Function nailed.... ;)
    Probably because I hardly use the Windowing Clause in my queries, this has been bugging me (no pun intended) for quite some time.
    Never noticed the difference between "expr" and "value_expr" in the documentation before, thanks for pointing that out, Gary.

    And no, the moderators never gave any feedback.

    ReplyDelete