A ready to deploy Data science project
A template for Data Science and Data Analytics projects
DSForge is a template designed to streamline the setup of Data Science projects. It includes separate environments for development and production, each with distinct purposes and configurations. The development environment leverages Jupyter Notebook for exploratory data analysis, while the production environment integrates tools like Streamlit and Airflow for automated workflows and user interfaces.
data: Contains raw, processed, and final data in three distinct subfolders:
raw: Unprocessed source data.processed: Cleaned and prepared data.final: Aggregated or final datasets.docs: Contains documentation and additional resources.models: Stores trained models for predictions or further analysis.notebooks: Contains Jupyter notebooks for development, data exploration, and model training.scripts: Contains Python scripts used for automated data processing or analysis.tests: Contains unit tests and other test files for the project’s codebase.shared: Acts as a shared workspace between development and production, hosting the data directory.manager.sh script for container management.notebooks folder for development work.shared/data folder as /data in the container for seamless access to raw, processed, and final data.requirements.txt.start, stop, restart, status, logs, and build.(To be implemented)
shared/data.manager.sh script for container lifecycle management.raw, processed, and final data structure.data directory between development and production environments.raw../manager.sh build
./manager.sh start
data/raw folder.notebooks folder to process and clean data, saving results in data/processed.data/final.scripts directory with reusable Python functions for data transformations.