How-To Guides
This page documents the advanced features of DataSQRL and extends the other documentation pages.
Testing
DataSQRL supports running automated tests for your SQRL pipeline by annotating test cases with the /*+test */
hint or placing test queries in the tests
folder (or any other folder that's passed via the --tests
command option).
The best practice for writing test cases is to modularize your sources so that you dynamically link different sources for local development, testing, and production. In many cases, you can use the same sources for testing and local development in a single folder.
That data should contain explicit event timestamps for all records. That enables completely deterministic test cases. It also supports reproducing failure scenarios that you experienced in production as local test cases by using the data that caused the failure with the original timestamp. That way, you don't have to externally simulate certain sequences of events that caused the failure in the first place.
In addition, it allows you to build up a repository of failures and edge cases that gets executed automatically to spot regressions.
Script Imports
SQRL supports importing table and function definitions from other SQRL scripts.
Inline Script Imports
Inline imports place table and function definitions from another script into the current scope and requires that table and function names do not clash with those in the importing script.
IMPORT myscript.*;
This statement imports all tables and functions from a SQRL script called myscript.sqrl
in the local folder.
Data Discovery
DataSQRL automatically generates table definitions with connector configuration and schemas for json-line files (with extension .jsonl
) and csv files (with extension .csv
) within the project directory. This makes it easy to import data from such files into a SQRL project.
For example, to import data from a file orders.jsonl
in the folder mydata
you write:
IMPORT mydata.orders;
When you run the compiler, it will create the table configuration file orders.table.sql
which you can then import like any other source. The compiler reads the file and auto-discovers the schema.
To disable automatic discovery of data for a directory, place a file called .nodiscovery
into that directory.
Manual Subgraph Elimination with Noop Function
Sometimes the Flink optimizer is too smart for its own good and will push down predicates that make common subgraph identification impossible. That can result in much larger job graphs and poor performance or high state maintenance.
To inhibit predicate pushdown, SQRL uses the noop
function that takes an arbitrary list of argument and always returns true.
As such, the function serves no purpose other than making it impossible for the optimizer to push down predicates.