OK Oracle, Tell Me *Approximately* How Many Unique Visitors We Had Last Week

By Kim Berg Hansen

Elevator Pitch

Statistical overviews often use distinct counts, medians and percentiles. On larger datasets they might take more time than the users like. But often approximate answers are “good enough” if they come fast, which you now can do in SQL. Learn how to do FAST=TRUE with approximations.

Description

Imagine doing a SELECT and the answer coming back is “approximately 42, give or take a little.” Normally you wouldn’t want that to happen, you expect a SELECT to give you an accurate answer every time. But building a sales dashboard for your boss, for instance, could be a choice between showing him “we had approximately 4200 unique visitors from France last week” in a quarter of a second, versus showing him “we had exactly 4189 unique visitors from France last week” in 5 seconds. Most likely your boss would be happier with a fast but “good enough” answer.

Oracle supports several Approximate Query Processing Functions, starting with APPROX_COUNT_DISTINCT in version 12.1, adding APPROX_MEDIAN and APPROX_PERCENTILE as well as the capability of doing approximate aggregations on multiple levels in version 12.2, followed by APPROX_RANK, APPROX_COUNT and APPROX_SUM in version 18.1. These functions allow you to write SQL that gets approximate, but fast, answers.

But what if you already have a dashboard application built using good old trusted SQL giving accurate answers, but it is really slightly unsatisfactory as each click takes several seconds to complete? Then you can do a FAST=TRUE - if you are prepared to accept approximate answers. You can do an ALTER SESSION that enables the optimizer to rewrite for example your COUNT(DISTINCT column) to an APPROX_COUNT_DISTINCT(column) automatically. Then your application will give approximate, but fast, answers without needing change.

This presentation discusses the different approximate functions, the approximate aggregations and the ALTER SESSION method of speeding the queries - showing examples of syntax and use cases.

Notes

Topics covered include:

  • Approximate query processing functions APPROX_*.
  • Precision and performance trade-offs.
  • Approximate multilevel aggregation.
  • Materialized views with approximate functions.
  • Initialization parameters for executing exact queries with approximations instead.
  • Query rewrite to use approximate materialized views.

Note: This talk has been accepted for ODTUG Kscope20.