Connecting to Data Sources
Shaper can query any data source DuckDB supports - including any extensions.
Use Shaper to attach to your production Postgres database, query files on S3 or even query a Google Sheet directly.
To query remote data sources, you will need to configure secrets, attach databases and load extensions. See below for how to do that with Shaper’s init-sql functionality.
Since DuckDB is also a database, you can also store data within Shaper.
- Shaper has an Ingest API to ingest data via an HTTP API.
- You can also use scheduled jobs to routinely load and transform data.
Querying Data
Section titled “Querying Data”The easiest way to query external data is through files.
DuckDB can read local files and it can also read files via HTTP or from S3 and other object storages directly.
Common file formats are supported out of the box:
- CSV
- JSON
- Parquet
- Text files
- Excel
You can find the complete list in the DuckDB documentation.
In addition to files, you can also attach to databases such as Postgres, MySQL and Iceberg data lakes. For more see below.
init-sql
Section titled “init-sql”Specify SQL that is executed when the system is started using the --init-sql flag or
use --init-sql-file to specify a file that contains SQL.
Use init-sql to:
- Create tables and views
- Manage credentials
- Connect to remote databases
- Load extensions
Credential Management
Section titled “Credential Management”To access data that requires credentials, you can use DuckDB Secrets together with Shaper’s init-sql functionality.
Environment variables specified as $VAR or ${VAR} are substituted in the SQL.
For example, you can set up a DuckDB secret for S3 credentials like this:
CREATE OR REPLACE SECRET mys3 ( TYPE s3, KEY_ID '${S3_KEY_ID}', SECRET '${S3_SECRET}', REGION 'my_region', SCOPE 's3://my_bucket');Connecting to Remote Databases
Section titled “Connecting to Remote Databases”DuckDB allows you to attach to remote databases.
There are extensions to attach to many common databases:
- Postgres
- SQLite
- MySQL
- Another DuckDB database file
Attach to a database by using DuckDB’s ATTACH functionality together with Shaper’s init-sql functionality.
Specify SQL that is executed when the system is started using the --init-sql flag.
You can also use --init-sql-file to specify a file that contains SQL.
For example, you can connect to a local Postgres database:
ATTACH IF NOT EXISTS 'postgresql://postgres@127.0.0.1/mydb' AS mydb (TYPE POSTGRES, READ_ONLY);Use environment variables to pass the connection strings and secrets in production:
ATTACH IF NOT EXISTS '${DATABASE_URL}' AS mydb (TYPE POSTGRES, READ_ONLY);DuckDB Extensions
Section titled “DuckDB Extensions”Make use of any DuckDB extension within Shaper.
There are extensions to do HTTP calls and query data sources from Snowflake, Big Query and Clickhouse to Arrow Flight and Google Sheets.
Learn about all existing core and community extensions in the DuckDB documentation.
If you run into any issues with extensions or have any questions, don’t hesitate to reach out.
The many extensions are installed and loaded automatically when they are first used.
Otherwise you can use INSTALL and LOAD with Shaper’s init-sql functionality (see above).
Change the directory extensions are installed in using the --duckdb-ext-dir flag.