DataSQRL Documentation
DataSQRL is a data streaming framework that simplifies the development of data pipelines for data engineers by providing an integrated framework that automates the data plumbing that holds multiple stages of a pipeline together. DataSQRL compiles SQL scripts into integrated data pipelines.
There are 3 components to building data pipelines with DataSQRL:
- SQRL Language: SQRL extends Flink SQL (an ANSI SQL compatible dialect) with IMPORT/EXPORT statements for connecting data systems and code modularity, table functions and relationships for interface definitions, hints to control pipeline structure, execution, and access, and doc-strings for semantic annotations. See the full SQRL language specification.
- SQRL Configuration: A JSON configuration file that defines and configures the data technologies to execute the pipeline (called
engines
), data dependencies, compiler options, connector templates, and execution values. See the full list of configuration options. - DataSQRL Compiler: The compiler transpiles the SQRL scripts and connector definitions according to the configuration into deployment assets for the engines. It also executes the pipeline for quick iterations and runs test manually or as part of a CI/CD pipeline. See the full list of compiler command options
Functions
DataSQRL uses SQL to define the structure and processing of data pipelines augmented by function libraries.
SQRL extends the standard SQL function catalog with additional functionality. In addition, you can import function libraries or implement your own functions.
Learn more about the functions SQRL supports out-of-the box and how to implement your own.
Deployment
You can use the DataSQRL docker image with the run
command for local, demo, and non-production deployments.
For production deployments, use Kubernetes or hosted cloud services.
The DataSQRL Kubernetes repository contains a Helm chart template for deploying DataSQRL compiled pipelines to Kubernetes using the engine Kubernetes operators and a basic terraform setup for the Kubernetes cluster.
Additional Resources
- Use Connectors to ingest data from and sink data to external systems
- Read the Tutorials for practical examples.
- Check out the How-To Guides for useful tips & tricks.
- Learn about the Concepts underlying stream processing.
- Read the Developer Documentation to learn more about the internals.
Community & Support
We aim to enable data engineers to build data pipelines quickly and eliminate the data plumbing busy work. Your feedback is invaluable in achieving this goal. Let us know what works and what doesn't by filing GitHub issues or in the DataSQRL Slack community.