Sidetrek Project Structure
Here’s the overall structure of a Sidetrek project.
"your_project" placeholder
We use "your_project" as a placeholder for the name of your project.
Root Directory
- .venv: All your project dependencies are installed here. You may not see this directory here depending on your Poetry settings.
- .sidetrek: Sidetrek-specific configurations and metadata. It’s safe to push this directory to git.
- superset: Superset files. Superset is managed separately from the rest of the project in this directory for easy replacement.
- trino: Trino configuration files.
- your_project: All your project code goes here.
- init.py: Initializes the
your_project
Python package. - dagster: Contains Dagster project.
- data: Directory for example data files. Typically, you wouldn’t want to store your data in this repository - this is just for example data.
- dbt: Contains DBT project.
- meltano: Contains Meltano project.
- init.py: Initializes the
- .env: Environment variables for the project.
- .gitignore: Specifies files and directories to be ignored by Git.
- docker-compose.yaml: Docker Compose file for core services like Trino, Iceberg, etc. Does NOT include Superset - that’s managed separately in /superset directory.
- poetry.lock: Lock file for Poetry managed dependencies.
- pyproject.toml: Project configuration for Poetry.
- README.md: Project overview and documentation.
- sidetrek.config.yaml: Configuration file for Sidetrek.
Why is Trino and Superset Outside the Project Folder?
Trino and Superset are not included within the “your_project” folder because they are deployed as separate services rather than being part of the project code.
These tools are independently managed and deployed, ensuring that any changes made to the user code do not affect their deployment. This separation allows for independent updates and maintenance of Trino and Superset without impacting the core project, making your data project more modular and manageable.
”dagster” Directory
Contains Dagster project files.
- .logs_queue, .nux, .telemetry: Dagster-specific metadata and configurations.
- history: Stores historical data for runs.
- runs: Directory for individual run data.
- runs.db: SQLite database file for storing run history.
- logs: Directory for logs.
- event.log: Log file for events.
- schedules: Directory for schedule data.
- schedules.db: SQLite database for schedules.
- storage: Directory for storage files.
- your_project: Core Dagster project files.
- init.py: Initializes the
your_project
module. - assets.py, dbt_assets.py, meltano.py: Python scripts defining various assets and integrations.
- init.py: Initializes the
- your_project_tests: Contains tests for the Dagster project.
- init.py: Initializes the
your_project_tests
module. - test_assets.py: Test cases for the assets.
- init.py: Initializes the
- .env: Environment variables specific to Dagster.
- pyproject.toml, setup.cfg, setup.py: Configuration files for Dagster project.
”data” Directory
Contains example data files for the project.
Do NOT put your actual data in this repository
Data is typically should not be kept in the data project - it should live in a proper storage elsewhere.
”dbt” Directory
Contains DBT project files.
- logs: Directory for log files.
- your_project: Core DBT project files.
- analyses: Directory for analyses.
- dbt_packages: Directory for DBT packages.
- logs: Log files for DBT operations.
- macros: Directory for DBT macros.
- models: Directory for DBT models.
- intermediate, marts, staging: Subdirectories for different stages of models.
- seeds: Directory for seed data.
- snapshots: Directory for snapshots.
- target: Directory for target files.
- compiled, dbt.log, graph.gpickle, manifest.json, etc.: Various files generated by DBT during runs.
- tests: Directory for tests.
- dbt_project.yml: DBT project configurations.
- profiles.yml: This is where you define your query engine adapter configurations (e.g. Trino).
“meltano” Directory
Contains Meltano project files.
- .meltano: This directory hosts Meltano generated files as well as virtual env for each extractor and loader.
- .env, .gitignore: Environment variables and Git ignore file for Meltano.
- README.md: Documentation for the Meltano project.
- analyze, extract, load, notebook, orchestrate, output: Directories for different stages of ELT (Extract, Load, Transform) and analysis.
- For example, you can add “example_csv_files_def.json” to “extract” folder for defining files for the extractor “tap-csv”
- plugins: Directory for Meltano plugins.
- transform: Directory for transformation files.
- meltano.yml: Main configuration file for Meltano.
- requirements.txt: Meltano-specific dependencies.