Skip to content

Blog

Analytics Dashboards as Code

Shaper has always been about building dashboards using SQL.

And it’s just natural to manage SQL as files just like any other code.

With the latest release Shaper you can now deploy dashboards from files,
and live-preview dashboard changes.

You get to:

  • use your favorite editor - including its AI features
  • track dashboards in Git
  • collaborate with your team using pull requests
  • deploy dashboards via CI/CD
Screenshot of a Preview Dashboard

Getting started and sharing Shaper projects is now simpler than ever since everything is just files. Let’s give it a try:

  1. Clone this demo Git repository:
    Terminal window
    git@github.com:taleshape-com/demo-project.git
  2. Install Shaper in the project directory:
    Terminal window
    make install
  3. Start a local Shaper server to serve the dashboards:
    Terminal window
    make serve
  4. Deploy the two dashboards in the dashboards/ folder by running the deploy command in a second terminal:
    Terminal window
    make deploy
  5. Live-preview changes by running the dev file watcher in the second terminal:
    Terminal window
    make dev

    Now edit or create any SQL file in the dashboards/ folder and see the changes live in your browser.

Usually you will run Shaper on a server. Then you only need shaper dev locally when developing dashboards. Once you configured authentication for your Shaper instance, the dev command will automatically prompt you to login and authenticate. And instead of running deploy manually, you can use the Shaper Github Action to deploy dashboards automatically in your CI/CD pipeline.

All details are in the documentation.

And as always - we are happy to about any feedback via socials or Github issues or discussions!

Getting Started Building a Data Platform

Ever wonder what a data platform is and if your company needs one?

If the idea of hiring a data team to build and manage an enterprise data platform feels overwhelming, you’re not alone.

Let me break down how you can get started from zero and build up data capabilities at your company, one step at a time.

Nowadays every company is a data company. From marketing and sales to product usage and customer support, all aspects of your business generate data.

And that data is waiting to be activated. Turn it into reports to drive decisions. Build dashboards to support operations. Offer new product features and services to your customers that are directly driven by data and automation.

Getting started is much more a cultural challenge than a technical one.

Start small. Make sure you see first successes by doing the work manually without worrying about big investments in making it scalable.

But once you see concrete value and it becomes painfully clear that technology is holding you back, you know it’s time to build out your data capabilities.

Where do you go from here? Can you buy an off-the-shelf solution? Do you hire a data engineer? Do you need a dedicated data team or can your existing engineers handle it?

You can break down data infrastructure into four main layers:

  1. Ingestion: Connect data sources, extract data and load it into a central repository
  2. Storage: Store data in a structured format that is optimized for analytical workloads
  3. Transformation: Clean, enrich, and transform data to make it practical to work with and ensure consistent definitions of key metrics across the organization
  4. Business Intelligence: Create dashboards, reports, and alerts to share insights internally and with your customers and partners

There are many different tools to address all of these layers.

Focus on solving concrete problems you are experiencing and add new tools only when they directly solve a problem.

Start with querying your data directly where it is. Introduce tools to load data into a central repository only when the complexity and volume make this impractical.

You don’t need to address all data sources at once. Focus on the ones creating problems. Accept manual workarounds when practical.

Your data tooling should be able to query data across different data sources. You don’t need to worry about ingestion if directly querying a Postgres database and a Google Sheets gets the job done.

Chances are you already store your data in a database such as PostgreSQL or MySQL. If you’re not having performance issues, there’s no need to introduce a separate database for analytical workloads.

Only if performance or cost becomes an issue should you start addressing it.

Storage is a critical component since it’s where the actual data lives. Data outlives applications built on top of it. Pick an established and open standard to store data.

Keep in mind that there is no one-size-fits-all solution. You might need multiple data stores optimized for different use cases. You’ll know what you’re looking for when you act on concrete problems instead of trying to find a solution for hypothetical future problems.

Start delivering value before adding a separate data transformation step. Introduce a dedicated data transformation layer when queries start taking too long, or metrics become unreliable and hard to maintain because the same logic is repeated many places.

A few materialized views in your database can take you a long way.

You’ll know it’s time to look into real-time stream processing, data lineage and orchestration tools once you experience the issues that these tools are designed to solve.

Many software companies start out by building custom analytics features. As you use data to drive operations and user-facing functionality, building custom solutions for every new workflow and view on the data becomes slow and expensive.

Introduce a data visualization tool to quickly build analytics dashboards and reports. This is a great first step and enables a single data analyst to deliver a lot of value without introducing any other data infrastructure.

I built Shaper to help companies in exactly this situation.

Shaper is a simple interface on top of DuckDB that allows you to build analytics dashboards and automate data workflows with only SQL.

Thanks to DuckDB, it’s easy to query data across various sources ranging from databases to CSV files and Google Sheets.

You can go a long way before having to add more layers to your data stack.

Give it a try and let me know what you think.

Build Your Own Bluesky Analytics Dashboard

Are you using Bluesky and want to stay on top of what’s happening? Are you curious how you can use Shaper to pull data from APIs and build interactive dashboards, all in a single tool and with just SQL?

Let’s automate pulling posts data from the Bluesky API to track topics we are interested in, and then create a data dashboard that visualizes activity around these topics.

You will get a dashboard that looks like this:

Hero Image
  1. Let’s open a terminal, create a new directory, and change into it:
    Terminal window
    mkdir bluesky-dashboard && cd bluesky-dashboard
  2. You will need credentials to authenticate with the Bluesky API.
    Create a Bluesky App Password and save it together with your handle as bluesky_credentials.json:

    Terminal window
    echo '{ "identifier": "", "password": "" }' > bluesky_credentials.json
  3. Now let’s run Shaper. Easiest is to run it via Docker or NPM:
    Terminal window
    docker run --rm -it -p5454:5454 -v ./bluesky_credentials.json:/bluesky_credentials.json -v ./data:/data taleshape/shaper
  4. Open http://localhost:5454 in your browser and click on New.
    Now let’s create a Task to fetch posts from Bluesky and store them in a database table. Select Task in the dropdown at the top of the page and paste in the following SQL code:

    SELECT (date_trunc('hour', now()) + INTERVAL '1h')::SCHEDULE;
    INSTALL http_client FROM community;
    LOAD http_client;
    CREATE SCHEMA IF NOT EXISTS bsky;
    CREATE TABLE IF NOT EXISTS bsky.posts (
    topic VARCHAR,
    created_at TIMESTAMP,
    cid VARCHAR,
    author_handle VARCHAR,
    url VARCHAR,
    text VARCHAR,
    like_count INT,
    reply_count INT,
    quote_count INT,
    repost_count INT,
    loaded_at TIMESTAMP DEFAULT now(),
    );
    SET VARIABLE access_jwt = http_post(
    'https://bsky.social/xrpc/com.atproto.server.createSession',
    headers => MAP {
    'Content-Type': 'application/json',
    'Accept': 'application/json',
    },
    body => (SELECT c FROM './bluesky_credentials.json' c)
    ) ->> 'body' ->> 'accessJwt';
    WITH topics AS (
    SELECT col0 AS topic, col1 AS query_string FROM (
    VALUES
    ('DuckDB', 'duckdb'),
    ('Data Engineering', '"data-engineering" "data engineering" "dataengineering"'),
    ('#databs', '#databs'),
    )
    ),
    topics_with_ts AS (
    SELECT
    topic,
    query_string,
    coalesce(max(loaded_at), (now() - INTERVAL '30 days')::TIMESTAMP) as last_loaded_at,
    FROM topics LEFT JOIN bsky.posts USING(topic)
    GROUP BY ALL
    ),
    json_posts AS (
    SELECT
    topic,
    (http_get(
    'https://bsky.social/xrpc/app.bsky.feed.searchPosts',
    headers => MAP {
    'Accept': 'application/json',
    'Authorization': concat('Bearer ', getvariable('access_jwt')),
    },
    params => MAP {
    'q': query_string,
    'limit': '100',
    'since': strftime(last_loaded_at, '%Y-%m-%dT%H:%M:%SZ'),
    }
    ) ->> 'body' -> '$.posts[*]').unnest() AS p
    FROM topics_with_ts
    )
    INSERT INTO bsky.posts BY NAME (
    SELECT
    topic,
    (p ->> '$.record.createdAt')::TIMESTAMP AS created_at,
    p ->> 'cid' AS cid,
    p ->> '$.author.handle' AS author_handle,
    concat('https://bsky.app/profile/', author_handle, '/post/', split_part(p ->> 'uri', '/', -1)) AS url,
    p ->> '$.record.text' AS text,
    (p -> 'likeCount')::INT AS like_count,
    (p -> 'replyCount')::INT AS reply_count,
    (p -> 'quoteCount')::INT AS quote_count,
    (p -> 'repostCount')::INT AS repost_count,
    FROM json_posts
    );
    The task is configured to run every hour and fetch new posts for the topics “DuckDB”, “Data Engineering”, and “#databs”.
    Replace the topics with your own topics.
    Then click Run to try out the task. If the task runs successfully, click Create and save it as Fetch Bluesky Posts.
  5. With the first data loaded, we can now create a dashboard to visualize the data.
    Click on New again and paste in the following SQL code:

    SELECT 'Bluesky Analytics'::SECTION;
    SELECT
    min(created_at)::DATE::DATEPICKER_FROM AS start_date,
    max(created_at)::DATE::DATEPICKER_TO AS end_date,
    FROM bsky.posts;
    SELECT 'Topics'::LABEL;
    SELECT distinct topic::DROPDOWN_MULTI AS topics FROM bsky.posts;
    CREATE TEMP VIEW posts AS (
    FROM bsky.posts
    WHERE topic in getvariable('topics')
    AND created_at BETWEEN getvariable('start_date')
    AND getvariable('end_date')
    );
    SELECT concat('bluesky_posts_', today())::DOWNLOAD_CSV AS CSV;
    SELECT * FROM posts;
    SELECT count(distinct cid) AS 'Total Posts Overall' FROM posts;
    SELECT
    count() AS 'Total Posts',
    topic AS Topic,
    FROM posts GROUP BY topic ORDER BY ALL DESC;
    SELECT 'Posts per Day'::LABEL;
    SELECT
    topic::CATEGORY,
    date_trunc('day', created_at)::XAXIS,
    count()::BARCHART_STACKED,
    FROM posts GROUP BY ALL ORDER BY ALL;
    SELECT ''::SECTION;
    SELECT 'Top Posters'::LABEL;
    FROM (
    SELECT
    count(distinct cid)::BARCHART AS "Total Posts",
    author_handle::YAXIS,
    FROM posts GROUP BY ALL ORDER BY ALL DESC LIMIT 10
    ) ORDER BY ALL;
    SELECT 'Likes By Time of Day'::LABEL;
    SELECT
    topic::CATEGORY,
    date_trunc('hour', created_at)::TIME::XAXIS,
    sum(like_count)::BARCHART_STACKED,
    FROM posts GROUP BY ALL ORDER BY ALL;
    Now click Create and save the dashboard as Bluesky Analytics.
  6. Click on View Dashboard in the top right corner to have better look at the whole dashboard.

And you are done! Please reach out, ask questions and I would love to see what you built.

Shaper is open source and free to use. It’s simple to run on your own server and so you can easily share dashboards with others. Find out more on Github:

Turn Your DuckDB Projects Into Interactive Dashboards

DuckDB is awesome and it’s a great tool to explore and transform data. But DuckDB doesn’t help you visualize and share data with others.

That’s where Shaper comes in.

With Shaper you can build interactive dashboards completely in SQL.

Shaper is built on top of DuckDB and works with all your existing data and queries.

Running Shaper on your laptop is as easy as running a single command:

Terminal window
npx shaper

Then open http://localhost:5454/new in your browser:

New Dashboard view

And running Shaper on a server is just as simple:

You can then connect Shaper directly to your production data and share dashboards either with simple links or by embedding dashboards directly into your application.

It’s all open source and free to use. So why not give it a try?

Why I am excited about Shaper's new Tasks feature

I just shipped the biggest update since I started building Shaper.

Tasks let you automate data workflows right within Shaper.

A few examples:

  • Load the latest data from a database or API
  • Transform data to speed up dashboard queries
  • Archive old data to S3
  • Send a notification to Slack for critical data insights
  • Email a monthly Excel report to your customers

With the power of DuckDB and its extensions, you can do all this and more with simple SQL queries.

Since the beginning I wanted to Shaper to be a complete data platform in single tool - From ingesting and storing data to visualizing and sharing it.

The Tasks feature is the missing piece in the middle - Transform data to get it into the shape that you can actually visualize and share it.

You can think of tasks as CRON jobs with more flexible scheduling and integrated into the platform. But there is a lot of things Tasks don’t do and likely never will. Shaper’s goal is simplicity when starting out with data projects. And once your data needs become more complex you can introduce a dedicated data processing stack to complement Shaper.

Find all the detauls in the docs.