A template creator for Python-based analytics projects.
This is highly untested and will probably destroy everything inside your computer. While it hasn't broken my Debian or OSX yet, there are probably some edge cases that will break things. Use at your own risk.
The repo contains two primary scripts for creating project templates:
aq-init-- creates a somewhat overkill project template with a bunch of submodules and a bunch of other stuff typical of model based analytics projectsaq-minit-- creates a minimal project template with a single module and a few other things, good for iterating on small scripts and such
The ideal usage is to symlink the scripts to some /bin/ so that you can call them from anywhere and create a new project wireframe at will, with some nice utilities and metadata based on some prompted details. This is heavily influenced by the cookiecutter data science template, with my attempt at this being more of a personalized version that better fits my own workflow and tooling.
Some core features included in both scripts are:
-
poetryproject initialization (manualsetuptoolsinitialization ifpoetryis not available) -
pyproject.tomlfile with project metadata -
README.mdwith a wireframe project description template -
LICENSEfile with MIT/Unlicense text generation options and metadata -
.gitignorewith some basic python-related ignores -
.envfor local environmental variables (.gitignored) -
Makefilefor running project related tasks (e.g., creating environment, running tests, etc.) -
__init__.pyfiles in the main module and submodules (only in the main module forproject-minit) -
testdirectory intended forcoverageandpytesttesting -
src/<project_name>directory for the main module -
datadirectory -
scripts/check_environment.pyscript for testing base needs for the environment
Some additional features included in the full setup (aq-init) are:
-
docsdirectory with a basic Sphinx template -
notebooksdirectory for Jupyter/quarto notebooks -
src/<project_name>directory with a basic module structure -
datasubdirectories for raw, interim, processed, and external data -
reportsdirectory (with subdirectories for figures and tables) -
referencesdirectory -
configsdirectory -
assetsdirectory -
builddirectory (with subdirectories for models and temporary build files)
This probably mostly works more or less right on UNIX-like systems (e.g., Linux, OSX, etc.). I haven't tested it properly on Windows, but it probably won't work there. It requires mostly access to a proper bash-like interpreter, some python, and ideally you'd also have access to make to be able to benefit from the created project utilities.
Some python packages you'll need to have for project initiation are:
tomlpackaging
The created project templates will prioritize / prefer:
condafor environment creation for specific python versionspoetryfor project initialization and package managementpytestfor unit testingcoveragefor test coverage reportingsphinxfor documentation generationquartofor notebooksblackfor code formattingisortfor import sortingflake8for lintingpyrightfor static type checking
Some of these have fallback alternatives set, others do not.
If you really want to risk trying it, it is recommended to test it first in a docker container or something before deploying anywhere real so that you're aware of how it works and what it might break. For the brave, to get started first clone the repo:
git clone https://github.com/JPK85/analysis_quickstartor using wget or curl from releases and unpacking with tar:
First create a directory where you want to unpack the contents, cd into it, and then:
use either wget:
wget https://github.com/JPK85/analysis_quickstart/releases/download/v0.1.0/aq.tar.gzor curl:
curl -LO https://github.com/JPK85/analysis_quickstart/releases/download/v0.1.0/aq.tar.gzAnd extract:
tar -xzf aq.tar.gzThen
cd to the project root and make the main script executable:
chmod +x ./aq-initOr if you wish to use the lite-version aq-minit, make that executable:
chmod +x ./aq-minitBoth of these will make the subscripts executable as well, and using both is okay too.
I would strongly recommend symlinking whichever script you wish to use to a directory in your path for ease of use e.g.,:
ln -s /path/to/repo/aq-init /usr/local/bin/aq-initor, for the lite-version:
ln -s /path/to/repo/aq-minit /usr/local/bin/aq-minitThe script will attempt to handle paths relative to the current working directory, so you can run the symlinked script from anywhere you have access to. cd into the desired parent folder of where you want your newfound project, and run:
aq-init <my_project>where <my_project> is the name of the project you want to create. This will prompt you for some metadata and create the project template in a subdirectory with the project name and initialize a local git repository with all the generated files committed. If you want to use the lite-version, just replace aq-init with aq-minit in the above command.
Once you've prompted the wireframe, cd into the project directory. The ideal start sequence (assuming conda) is:
make create_env-- creates a virtualcondaenvironment for the project using the python version stated on metadata creation;conda activate <project_name>-- activates the environment before other setup procedures;make setup-all-- installs the project in editable mode with all dependencies including some standarddevdependencies likepynvimandpython-dotenv;make test_env-- runs thescripts/check_environment.pyscript to check that the environment is set up correctly;make help-- lists all the availablemakecommands for the project so you can see what's what- Go to
src/<project_name>and start going to town.
If you don't have access to make, you can also setup the environment manually for more control, which is useful if you don't like poetry and want to have more control over your environment.
.
├── .env # for local environmental variables (.gitignore)
├── .git
├── .gitignore # for gitignore (basic ignores and template related files)
├── LICENSE
├── Makefile # for project utilities (e.g., environment creation, testing, etc.)
├── README.md # for project description, initialized with metadata
├── assets # for images, etc.
├── build # for build artifacts (consider .gitignoring)
│ ├── models # for production model artifacts and versions (consider .gitignoring)
│ └── tmp # for temporary build files (.gitignored by default)
├── configs # for local config files (.gitignored by default)
├── data # for data files (.gitignored by default; use dvc for data)
│ ├── external # from external sources
│ ├── interim # for intermediate/transformed data
│ ├── processed # for final, canonical data
│ └── raw # for raw immutable source data
├── docs # Sphinx documentation
│ ├── Makefile
│ ├── _build
│ ├── make.bat
│ └── source
│ ├── _static
│ ├── _templates
│ ├── conf.py
│ └── index.rst
├── notebooks # exploratory notebooks
├── pyproject.toml # project metadata
├── references # local project references (.gitignored by default)
├── reports # for project reports and their assets
│ ├── figures
│ └── tables
├── src # source code
│ └── <project_name> # main module
│ ├── __init__.py
│ ├── data # scripts for data processing
│ │ ├── README.md
│ │ ├── __init__.py
│ ├── features # scripts for features/transformations
│ │ ├── README.md
│ │ ├── __init__.py
│ ├── models # scripts for model training and evaluation
│ │ ├── README.md
│ │ ├── __init__.py
├── test_env.py # test check for environment handling
└── tests # unit tests
└── __init__.pyAnd for the lite-version we have a little less gunk:
.
├── .env
├── .git
├── .gitignore
├── LICENSE
├── Makefile
├── README.md
├── data
│ └── .gitkeep
├── pyproject.toml
├── scripts
│ └── check_environment.py
├── src
│ └── <project_name>
│ └── __init__.py
└── tests
└── __init__.py- cookiecutter data science template -- for the original idea, and the contents of the created .env file + the Makefile help function
The Unlicense -- see LICENSE for details.
