Skip to main content

DataSQRL Documentation

DataSQRL is a data streaming framework that simplifies the development of data pipelines for data engineers by providing an integrated framework that automates the data plumbing that holds multiple stages of a pipeline together. DataSQRL compiles SQL scripts into integrated data pipelines.

There are 3 components to building data pipelines with DataSQRL:

  • SQRL Language: SQRL extends Flink SQL (an ANSI SQL compatible dialect) with IMPORT/EXPORT statements for connecting data systems and code modularity, table functions and relationships for interface definitions, hints to control pipeline structure, execution, and access, and doc-strings for semantic annotations. See the full SQRL language specification.
  • SQRL Configuration: A JSON configuration file that defines and configures the data technologies to execute the pipeline (called engines), data dependencies, compiler options, connector templates, and execution values. See the full list of configuration options.
  • DataSQRL Compiler: The compiler transpiles the SQRL scripts and connector definitions according to the configuration into deployment assets for the engines. It also executes the pipeline for quick iterations and runs test manually or as part of a CI/CD pipeline. See the full list of compiler command options

Functions

DataSQRL uses SQL to define the structure and processing of data pipelines augmented by function libraries.

SQRL extends the standard SQL function catalog with additional functionality. In addition, you can import function libraries or implement your own functions.

Learn more about the functions SQRL supports out-of-the box and how to implement your own.

Deployment

You can use the DataSQRL docker image with the run command for local, demo, and non-production deployments. For production deployments, use Kubernetes or hosted cloud services.

The DataSQRL Kubernetes repository contains a Helm chart template for deploying DataSQRL compiled pipelines to Kubernetes using the engine Kubernetes operators and a basic terraform setup for the Kubernetes cluster.

Additional Resources

Community & Support

We aim to enable data engineers to build data pipelines quickly and eliminate the data plumbing busy work. Your feedback is invaluable in achieving this goal. Let us know what works and what doesn't by filing GitHub issues or in the DataSQRL Slack community.