Running locally¶

The Reservoir Genome Browser can be run locally to view files on your computer without uploading to the hosted instance at resgen.io.

Usage requires a license. If you already have a resgen subscription, you can get a license from resgen.io.

If you don't have a license, you are entitled to run locally and host directories with a maximum of 10 datasets.

Starting locally¶

Starting locally will start resgen as a server on your local computer. You can access the UI at http://localhost:1807 (The port can be changed using the --port flag in the commands below)

resgen manage start ~/my-directory

If you are on an M* Mac you'll need to add --platform=linux/arm64/v8 for the right Docker image to be used.

resgen manage start ~/my-directory --platform=linux/arm64/v8

Create local user¶

Even when running locally you need to set up a user:

resgen manage create-user ~/my-directory

Synchronize datasets¶

Resgen maintains its own database of datasets, you'll need to synchronize manually when anything is added to the directory outside of resgen.

resgen manage sync-datasets ~/my-directory

Syncing from S3¶

You can sync datasets directly from S3 in two ways:

1. Mount S3 folders within a local project¶

Mount S3 paths as folders that sync alongside your local files:

# Add an S3 mount (folder name defaults to last path component)
resgen manage s3 add s3://my-bucket/reference-data ~/my-directory

# Add with custom folder name
resgen manage s3 add s3://my-bucket/data ~/my-directory --folder refs

# List configured S3 mounts
resgen manage s3 list ~/my-directory

# Remove an S3 mount
resgen manage s3 remove reference-data ~/my-directory

# Sync both local files and S3 mounts
resgen manage sync-datasets ~/my-directory

S3 mount configuration is stored in ~/my-directory/.resgen/mounts.yml. When you sync, both local files and S3-mounted data appear in your project.

Note: S3 mount folder names cannot conflict with existing local folders.

2. Sync directly from an S3 path¶

Sync an entire S3 tree without any local files:

# Start a local resgen instance
resgen manage start .

# Sync from S3 (project name = last path component)
resgen manage sync-datasets s3://my-bucket/genomics-data

This creates a project named "genomics-data" containing all objects under the S3 prefix.

AWS Credentials¶

S3 operations require AWS credentials. Configure them using the AWS CLI:

aws configure

Or set environment variables:

export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"

Getting logs¶

If an error occurs, you can get the latest logs:

resgen manage logs ~/my-directory [nginx,uwsgi,celery]

The last parameter specifies the service to get logs for. In the majority of cases this will be uwsgi.

Advanced functionality¶

Creating a superuser¶

A superuser can use the admin interface to modify users, projects and datasets.

resgen manage create-superuser ~/my-directory

Starting with a non-standard image¶

One may wish to start resgen with an older image:

resgen manage start --image <image_name> ~/my-directory

Viewing a dataset¶

Use the resgen manage view command to view local datasets.

Displaying a sequence logo¶

The following command will generate a sequence logo plot from the 4th (1-based) column in the given csv file. It will do this by doing a multiple sequence alignment using ClustalO. Be careful with how many sequences are provided and how long they are lest the command take too long too finish.

resgen manage view simulated_sars2_spike_15.csv \
  -t colnum:4 \
  -tt sequence-logo \
  -tp top

The extra options are:

-t - Add a tag to the created dataset saying to use column number 4. It's also possible to use -t colname:blah to tell it to use the column name blah
-tt - The track type to use (sequence-logo)
-tp - Position this track up top. As opposed to on the left or right

Other options

-t header:false - Indicate that the CSV file has no header
-t colname:<column_name> - Indicate that a named column should be used. This will take precedence over colnum:<column number>

Displaying a pileup plot¶

The following command will generate a pileup plot from the column named 'sequence' (t colname:sequence) in the csv file. It will align all values in the sequence column against the value in the first row (-t refrow: 1)

resgen manage view simulated_sars2_spike_15.csv \
  -t colname:sequence \
  -t refrow:1 \
  -dt reads \
  -ft pileup-csv \
  --platform linux/amd64

The extra options are:

-t colname:sequence - Use the values in the sequence column of the csv file
-t refrow:1 - Use the first row as the reference to align reads against. If this is omitted, the reads will be aligned to any dataset with a filetype:fasta_seq tag in the current project. If there is more than one such file it'll throw an error that there's two potential assemblies.
-dt reads - Indicate that these are to be treated as "reads" to be displayed in a pileup track
-ft:pileup-csv - Treat the file as containing pileup data
--platform linux/amd64 - Use the linux/amd64 architecture for the resgen docker image

Displaying a pileup against a reference FASTA¶

To align sequences against an external FASTA reference file instead of an inline reference row, use the pileup subcommand:

resgen manage pileup simulated_sars2_spike_15.csv \
  -ref sars2_spike_reference.fa \
  -t colname:sequence \
  --platform linux/amd64

The reference FASTA filename (without extension) is used as the assembly name to link the two files. Multiple FASTA files can coexist in the same project without ambiguity — each is distinguished by its filename-derived assembly name.