Running locally¶
The Reservoir Genome Browser can be run locally to view files on your computer without uploading to the hosted instance at resgen.io.
Usage requires a license. If you already have a resgen subscription, you can get a license from resgen.io.
If you don't have a license, you are entitled to run locally and host directories with a maximum of 10 datasets.
Starting locally¶
Starting locally will start resgen as a server on your local computer. You can access the UI at http://localhost:1807 (The port can be changed using the --port flag in the commands below)
resgen manage start ~/my-directory
If you are on an M* Mac you'll need to add --platform=linux/arm64/v8 for the right Docker image to be used.
resgen manage start ~/my-directory --platform=linux/arm64/v8
Create local user¶
Even when running locally you need to set up a user:
resgen manage create-user ~/my-directory
Synchronize datasets¶
Resgen maintains its own database of datasets, you'll need to synchronize manually when anything is added to the directory outside of resgen.
resgen manage sync-datasets ~/my-directory
Syncing from S3¶
You can sync datasets directly from S3 in two ways:
1. Mount S3 folders within a local project¶
Mount S3 paths as folders that sync alongside your local files:
# Add an S3 mount (folder name defaults to last path component)
resgen manage s3 add s3://my-bucket/reference-data ~/my-directory
# Add with custom folder name
resgen manage s3 add s3://my-bucket/data ~/my-directory --folder refs
# List configured S3 mounts
resgen manage s3 list ~/my-directory
# Remove an S3 mount
resgen manage s3 remove reference-data ~/my-directory
# Sync both local files and S3 mounts
resgen manage sync-datasets ~/my-directory
S3 mount configuration is stored in ~/my-directory/.resgen/mounts.yml. When you sync, both local files and S3-mounted data appear in your project.
Note: S3 mount folder names cannot conflict with existing local folders.
2. Sync directly from an S3 path¶
Sync an entire S3 tree without any local files:
# Start a local resgen instance
resgen manage start .
# Sync from S3 (project name = last path component)
resgen manage sync-datasets s3://my-bucket/genomics-data
This creates a project named "genomics-data" containing all objects under the S3 prefix.
AWS Credentials¶
S3 operations require AWS credentials. Configure them using the AWS CLI:
aws configure
Or set environment variables:
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
Getting logs¶
If an error occurs, you can get the latest logs:
resgen manage logs ~/my-directory [nginx,uwsgi,celery]
The last parameter specifies the service to get logs for. In the majority of cases this will be uwsgi.
Advanced functionality¶
Creating a superuser¶
A superuser can use the admin interface to modify users, projects and datasets.
resgen manage create-superuser ~/my-directory
Starting with a non-standard image¶
One may wish to start resgen with an older image:
resgen manage start --image <image_name> ~/my-directory
Viewing a dataset¶
Use the resgen manage view command to view local datasets.
Displaying a sequence logo¶
The following command will generate a sequence logo plot from the 4th (1-based) column in the given csv file. It will do this by doing a multiple sequence alignment using ClustalO. Be careful with how many sequences are provided and how long they are lest the command take too long too finish.
resgen manage view simulated_sars2_spike_15.csv \
-t colnum:4 \
-tt sequence-logo \
-tp top
The extra options are:
-t- Add a tag to the created dataset saying to use column number 4. It's also possible to use-t colname:blahto tell it to use the column name blah-tt- The track type to use (sequence-logo)-tp- Position this track up top. As opposed to on the left or right
Other options
-t header:false- Indicate that the CSV file has no header-t colname:<column_name>- Indicate that a named column should be used. This will take precedence overcolnum:<column number>
Displaying a pileup plot¶
The following command will generate a pileup plot from the column named 'sequence' (t colname:sequence) in the csv file. It will align all values in the sequence column against the value in the first row (-t refrow: 1)
resgen manage view simulated_sars2_spike_15.csv \
-t colname:sequence \
-t refrow:1 \
-dt reads \
-ft pileup-csv \
--platform linux/amd64
The extra options are:
-t colname:sequence- Use the values in the sequence column of the csv file-t refrow:1- Use the first row as the reference to align reads against. If this is omitted, the reads will be aligned to any dataset with afiletype:fasta_seqtag in the current project. If there is more than one such file it'll throw an error that there's two potential assemblies.-dt reads- Indicate that these are to be treated as "reads" to be displayed in a pileup track-ft:pileup-csv- Treat the file as containing pileup data--platform linux/amd64- Use thelinux/amd64architecture for the resgen docker image
Displaying a pileup against a reference FASTA¶
To align sequences against an external FASTA reference file instead of an inline reference row,
use the pileup subcommand:
resgen manage pileup simulated_sars2_spike_15.csv \
-ref sars2_spike_reference.fa \
-t colname:sequence \
--platform linux/amd64
The reference FASTA filename (without extension) is used as the assembly name to link the two files. Multiple FASTA files can coexist in the same project without ambiguity — each is distinguished by its filename-derived assembly name.