decipher_batch_correction

Implementing batch correction for Decipher, a deep generative model developed to compare and visualize diverging cell trajectories. This framework was originally developed as a collaboration between the labs of Elham Azizi (Columbia), Dana Pe’er (Sloan Kettering), and David Blei (Columbia).

Overall Methodology

Two separate approaches were implemented: simple concatenation and the attention mechanism.

Important folders

decipher_batch_correction/decipher-batch-correction

decipher-bc includes core implementation models
decipher-bc-concat houses our implementation of simple-concatenation Decipher BC. Foundational models can be found in /Core_Implementation. Visualizations of running simple concatenation on simulated data can be found in /Visualizations
simulated-data-analysis contains the results from running attention-based Decipher-BC on simulated data. This corresponds to the results that we showed for the final presentation

decipher_batch_correction/Bone_Marrow_Hematopoiesis: Many folders here are artifacts from tweaking model architecture. Most weren’t used for the final report, but here’s the parts that do matter:

Preprocessing turned the original .h5ad dataset into a smaller subset
Initial_Training contained results from running attention-based Decipher BC on the 20K subset
Beta_1.0_AttentionHeads_2_Training houses the training and results on the version of attention-based Decipher BC where beta = 1.0 and number of attention heads = 2
AttentionHeads_2_Training shows the training and results on the version of attention-based Decipher BC where beta = 0.1 and number of attention heads = 4
Decipher_BC_Concat_Training shows the application of simple concatenation Decipher BC on the 20k BoneMarrowMap dataset
Subset_by_cell_type shows the results of subsetting into exclusively Erythoid cells and the downstream analysis. Visualizations can be found in /Small Subset Training/Visualizations

simulation

Contains all files relevant to simulated data generation

simulation-requirements.txt : requirements for running scripts within the [simulation] folder
[concat_adata] : concatenated adata with both alpha and delta shifts, used for Native Decipher vs Decipher-BC comparison (Fig. 3). Generated using shift_titration.py
[titration]: concatenated adata from running shift_titration.py saved in "adata" folder and the metrics and visualizagion plots saved in the "results" folder
params.py : fixed parameters for simulated data generation (seed, n_samples, etc)
shift_titration.py : used functions from simulations.py to simulate data for each of the shifts and repeats for multiple magnitudes, concatenates them and calculates metrics for each concatenated datasets, as well as generate visualizations (Fig. 2)
simulations.py : core functions used for simulated data generation that are used for shift_titration

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
Bone_Marrow_Hematopoiesis		Bone_Marrow_Hematopoiesis
Training Scripts		Training Scripts
__pycache__		__pycache__
_decipher_models		_decipher_models
decipher-batch-correction		decipher-batch-correction
decipher-main		decipher-main
new_sims		new_sims
simulation		simulation
.Rhistory		.Rhistory
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

decipher_batch_correction

Overall Methodology

Important folders

simulation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

decipher_batch_correction

Overall Methodology

Important folders

simulation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages