Set up Squirls
Squirls is a desktop Java application that requires several external files to run. This document explains how to download these files and prepare to run Squirls.
Note: Squirls is written with Java version 11 and will run and compile under Java 11+.
Squirls downloadable resources
There are several external files that must be downloaded prior running Squirls.
Prebuilt Squirls executable
To download the prebuilt Squirls JAR file, go to the Releases section on the Squirls GitHub page and download the latest precompiled version of Squirls.
Squirls database files
Squirls database files are available for download from:
- hg19/GRCh37
Download 2102_hg19 (~10.5 GB for download, ~15 GB unpacked)
- hg38/GRCh38
Download 2102_hg38 (~11.1 GB for download, ~16.5 GB unpacked)
After the download, unzip the archive(s) content into a folder and note the folder path.
Jannovar transcript databases
Functional annotation of variants, which is required for certain Squirls tasks, is performed using Jannovar library.
To run the annotation, Jannovar transcript database files need to be provided. The Jannovar v0.35
database files were
tested to work with Squirls.
For your convenience, the files containing UCSC, RefSeq, or ENSEMBL transcripts for hg19 or hg38 genome assemblies are available for download (~330 MB for download, ~330 MB unpacked).
Build Squirls from source
As an alternative to using prebuilt Squirls JAR file, the Squirls JAR file can also be built from Java sources.
Run the following commands to download Squirls source code from GitHub repository and to build Squirls JAR file:
$ git https://github.com/TheJacksonLaboratory/Squirls
$ cd Squirls
$ ./mvnw package
Note
To build Squirls from sources, JDK 11 or better must be available in the environment
After the successful build, the JAR file is located at squirls-cli/target/squirls-cli-1.0.0.jar
.
To verify that the building process went well, run:
$ java -jar squirls-cli/target/squirls-cli-1.0.0.jar --help
generate-config
- Generate and fill the configuration file
Squirls needs to know about the locations of the external files. The locations are provided in a YAML configuration file.
The command generate-config
generates an empty configuration file:
$ java -jar squirls-cli.jar generate-config squirls-config.yml
The command above generates an empty configuration file squirls-config.yml
in the working directory.
The configuration file has the following content:
# Required properties template, the file follows YAML syntax.
squirls:
# path to directory with Squirls files
data-directory:
# Genome assembly - choose from {hg19, hg38}
genome-assembly:
# Exomiser-like data version (1902 in examples above)
data-version:
# Variant with longer REF allele will not be evaluated
#max-variant-length: 100
#classifier:
# Which classifier to use
#version: v0.4.6
#annotator:
# Which splicing annotator to use
# version: agez
Mandatory parameters
Open the file in your favorite text editor and provide the following three bits of information:
squirls.data-directory
- location the the folder with Squirls data. The directory is expected to have a structure like:squirls_folder |- 1902_hg19: | |- 1902_hg19.fa | |- 1902_hg19.fa.dict | |- 1902_hg19.fa.fai | |- 1902_hg19.phylop.bw | \- 1902_hg19_splicing.mv.db \- 1902_hg38 |- 1902_hg38.fa ...
where
1902_hg19
,1902_hg38
correspond to content of the ZIP files downloaded in the previous sectionsquirls.genome-assembly
- which genome assembly to use, choose from{hg19, hg38}
squirls.data-version
- which data version to use, the data version corresponds to1902
in the example above
Optional parameters
squirls.max-variant-length
- set the maximal length of the variant to be analyzed (100 bp
by default)