A ready to deploy Data science project
A template for Data Science and Data Analytics projects
DSForge is a template designed to streamline the setup of Data Science projects. It includes separate environments for development and production, each with distinct purposes and configurations. The development environment leverages Jupyter Notebook for exploratory data analysis, while the production environment integrates tools like Streamlit and Airflow for automated workflows and user interfaces.
data
: Contains raw, processed, and final data in three distinct subfolders:
raw
: Unprocessed source data.processed
: Cleaned and prepared data.final
: Aggregated or final datasets.docs
: Contains documentation and additional resources.models
: Stores trained models for predictions or further analysis.notebooks
: Contains Jupyter notebooks for development, data exploration, and model training.scripts
: Contains Python scripts used for automated data processing or analysis.tests
: Contains unit tests and other test files for the project’s codebase.shared
: Acts as a shared workspace between development and production, hosting the data
directory.manager.sh
script for container management.notebooks
folder for development work.shared/data
folder as /data
in the container for seamless access to raw
, processed
, and final
data.requirements.txt
.start
, stop
, restart
, status
, logs
, and build
.(To be implemented)
shared/data
.manager.sh
script for container lifecycle management.raw
, processed
, and final
data structure.data
directory between development and production environments.raw
../manager.sh build
./manager.sh start
data/raw
folder.notebooks
folder to process and clean data, saving results in data/processed
.data/final
.scripts
directory with reusable Python functions for data transformations.