External Tables - Not *Just* Loading a CSV File

By Kim Berg Hansen

Elevator Pitch

With external tables you can SELECT from files - great for loading plain CSV files, but it can do so much more than that. Learn about loading from text files, DataPump dump files, HDFS and HIVE, as well as how to add constraints and partitioning for optimizer optimizations of the external tables.

Description

External tables are a way to map data from files in the OS to columns in a rowsource - in effect allowing you to SELECT directly from a file. This can be useful in many ways allowing you to for example join the external data to lookup tables in the database, as well as use the complete range of SQL syntax and functions to transform the external data if needed.

From version to version, Oracle database has added functionality to the external table, so it now is capable of so much more than simply loading a CSV file. You can load data from flat text files, from binary DataPump dump files, from HDFS (Hadoop Distributed File System) or from Apache HIVE. You can even do a limited unloading to a DataPump dump file. You can have the loading process perform an OS preprocess call, like for example unzip to allow you to load from a compressed file.

Oracle 12.2 also adds the possibility of adding certain metadata to the external table useful for the optimizer. You can define constraints (primary key, referential, unique, not null) for the optimizer to trust and rely on, and you can define partitions to allow partition pruning to read only the necessary files of a multi-file external table.

This presentation discusses the various possibilities and shows examples of syntax and use cases for the various access drivers - ORACLE_LOADER, ORACLE_DATAPUMP, ORACLE_HDFS and ORACLE_HIVE.

Notes

Topics covered include:

  • Access drivers and parameters: ORACLE_LOADER, ORACLE_DATAPUMP, ORACLE_HDFS and ORACLE_HIVE.
  • Location: directories and files, HDFS URIs.
  • Datatype conversion, characterset handling, delimited and fixed-length data.
  • Error handling, logging, bad file.
  • SQL*Loader comparison and use as code generator.
  • XMLTAG not parsing but text tag search.
  • PREPROCESSOR to manipulate or generate datafile.
  • Multiple files, parallelism and partition pruning.
  • Trusted, relied constraints on external tables.
  • CSV and ALL FIELDS OVERRIDE.
  • ORACLE_DATAPUMP reading dump files.
  • ORACLE_DATAPUMP writing dump files with CTAS.
  • HDFS/HIVE parameters for column mappings, overflow handling and more.