# NanoSplicer
A program for accurately identifying splice junctions using Oxford Nanopore sequencing data.
## Keywords:
Oxford Nanopore sequencing, Transcriptomics, Splice junctions
# Overview
NanoSplicer utilises the raw ouput from nanopore sequencing (measures of electrical current commonly known as squiggles) to improve the identification of splice junctions. Instead of identifying splice junctions by mapping basecalled reads, nanosplicer compares the squiggle from a read with the predicted squiggles of potential splice junctions to identify the best match and likely junction. Mapped nanopore reads are still required in order to identify the presence of splice junctions for downstream NanoSplicer analysis. See [You et al. (2021)][NanoSplicer-bioRxiv] for more description of the method.
The program contains 3 modules `JWR_checker.py`, `JWR_subset.py` and `NanoSplicer.py`, which need to be run in order to get the final result. The first and third modules are required. `JWR_subset.py` is optional but recommended and will significantly decrease the run time of `NanoSplicer.py`.
Some example input files can be found at `example/` to run all of the modules below. Example code is also available at `example/script.sh` (see `examples/README.md` for the expected output).
`JWR_checker.py`: Find junctions within reads (JWRs) from a spliced mapping result (BAM).
`JWR_subset.py`: Subset the result from JWR_checker. Based on our performance assessment, JWRs with high junction alignment quality (JAQ) are usually accurately mapped to their respective splice junctions. This module filters JWRs based on their JAQ, allowing selection of JWRs with lower JAQs that will most benefit from anlaysis with the NanoSplicer module. By default, `JWR_subset.py` selects JWRs with a junction alignment quality of less than 0.95.
`NanoSplicer.py`: Perform splice junction identification on the `JWR_checker.py` (or `JWR_subset.py` if applied) output.
# Installation
```
git clone https://github.com/shimlab/NanoSplicer.git
```
## Conda environment
All [dependencies](https://github.com/shimlab/NanoSplicer/blob/master/#dependencies) can be installed using "conda", the configuration file can be found at `conda_env/`:
```
conda env create -f conda_env/environment.yml
conda activate NanoSplicer
```
**Note:** The conda environment might not be successfully created in windows OS. See [container access](https://github.com/shimlab/NanoSplicer/blob/master/#container) for alternative access to the environment.
## Container access
If there are any problems installing the dependencies above, an alternative way of setting up the environment is available via a container. The dependencies
required for running NanoSplicer have been packaged into a container and can be accessed using `singularity`, which is supported by most high-performance computing environments:
```
singularity pull NanoSplicer_container.sif docker://youyupei/nanosplicer:v1
```
**For people not familiar with containers**: You can run linux commands within the container by using `singularity shell` or `singularity exec`. These commands automatically bind your home directory to the home directory inside the container, which means everything under `~/` (including the sub-directories) will be accessible without extra steps. If your data are saved in a different directory, you'll need to bind the directory with `-B :` when running `singularity shell` or `singularity exec`. For example, if your data are in `/data`, you need to add `-B /data:/folder_in_container`. Everything in `/data` is then accessible in `/folder_in_container`. You may use the same name for convenience (e.g. `-B /data:/data`). A more formal introduction to singularity can be found at https://sylabs.io/singularity/.
## Dependencies
For `JWR_checker.py` and `JWR_subset.py`:
* `pandas`
* `pysam`
* `numpy`
* `tqdm`
* `h5py`
* `tables`
Additional requirements for `NanoSplicer.py`:
* `ont-tombo`
* `ont_fast5_api`
* `matplotlib`
* `intervaltree`
* `scipy`
* `scikit-image`
**Note**:
NanoSplicer has been tested on python 3.6 and 3.7. There might be problems when installing `tombo` with python 3.8. Downgrade to python 3.7 would resove the issues.
# Modules
All the following scripts can be found at `bin/`
## JWR_checker.py
Find junctions within reads (JWRs) from a spliced mapping result (BAM). For each JWR, `JWR_checker` reports the junction alignment quality for the initial mapping.
**Note:** Currently `JWR_checker` outputs a table in HDF5 format, which allows easier read-in in the following steps. A CSV format table can also be output with `--output_csv`, which is easier for users to read. The downstream scripts `JWR_subset.py` and `NanoSplicer.py` only accept HDF5 as input.
```
Finding junctions within reads (JWRs) from a spliced mapping result (BAM).
Usage: python3 JWR_checker.py [OPTIONS]