Run Your Data Project
Run Your Project
Let’s first go into the created project directory.
Run the following command to start the end-to-end data pipeline you’ve just created.
If you’re running this command for the first time, it will take a few minutes to download all the necessary Docker images. Superset in particular is quite heavy and can take a while to start. Please be patient! 🧘🏽♀️
There are two places you can check the status of your services:
- Dagster server - you’ll immediately see the logs in the terminal once you run
sidetrek start
. - Docker Desktop - you can check the status of all the other services in the Docker Desktop app.
Once everything is running, you can access the Dagster dashboard at http://localhost:3000 and the Superset dashboard at http://localhost:8088.
What’s Happening Underneath?
Underneath, all this command is doing is running three separate commands:
dagster dev
to start the Dagster server - this runs the user code (Dagster, Meltano, and DBT).docker-compose up -d
in the project root to run Minio, Iceberg, and Trino in the background.docker-compose up -d
inside superset directory to run Superset in the background.
The step 1 runs all your user code. The step 2 runs all other core services for your data pipeline.
The step 3 runs Superset separately because it’s not part of the core data pipeline. This way you can easily swap it out for another data visualization solution.
Common Issues
Port collisions
Because Sidetrek runs multiple services covering multiple ports, you could get the bind: address already in use
error if you’re already using one of these ports.
If you encounter this error, you’ll have to first free the occupied port to proceed. Run sudo lsof -i :<port-number>
to retrieve the PID for the process that’s occupying the port and then run sudo kill <PID>
to free up that port.
Question or Problems?
If you have any questions or problems, please don’t hesitate to reach out to us on Slack or on GitHub.
We’ll do our best to help!
Next Steps
Awesome job! You’ve successfully created and run your first data project.
You can now start using this data pipeline to work with your own data or explore the example project we’ve included.
If you’re interested in exploring the example project, let’s move onto the next step.