Pipeline Testing
Downloading Genome files
# Set localdownload directory
genomedata="/opt/viafoundry/run_data"
## Main Genomes
wget https://web.dolphinnext.com/umw_biocore/dnext_data/genome_data/human/hg38/ -P ${genomedata}/genome_data/human/hg38/ -l inf -nc -nH --cut-dirs=4 -r --no-parent -R "index.html*"
wget https://web.dolphinnext.com/umw_biocore/dnext_data/genome_data/mouse/mm10/ -P ${genomedata}/genome_data/mouse/mm10/ -l inf -nc -nH --cut-dirs=4 -r --no-parent -R "index.html*"
wget https://web.dolphinnext.com/umw_biocore/dnext_data/genome_data/mousetest/mm10/ -P ${genomedata}/genome_data/mousetest/mm10/ -l inf -nc -nH --cut-dirs=4 -r --no-parent -R "index.html*"
## Optional Genomes
wget https://web.dolphinnext.com/umw_biocore/dnext_data/genome_data/human/hg19/ -P ${genomedata}/genome_data/human/hg19/ -l inf -nc -nH --cut-dirs=4 -r --no-parent -R "index.html*"
wget https://web.dolphinnext.com/umw_biocore/dnext_data/genome_data/rat/rn6/ -P ${genomedata}/genome_data/rat/rn6/ -l inf -nc -nH --cut-dirs=4 -r --no-parent -R "index.html*"
wget https://web.dolphinnext.com/umw_biocore/dnext_data/genome_data/c_elegans/ce11/ -P ${genomedata}/genome_data/c_elegans/ce11/ -l inf -nc -nH --cut-dirs=4 -r --no-parent -R "index.html*"
wget https://web.dolphinnext.com/umw_biocore/dnext_data/genome_data/d_melanogaster/BDGP6_32/ -P ${genomedata}/genome_data/d_melanogaster/BDGP6_32/ -l inf -nc -nH --cut-dirs=4 -r --no-parent -R "index.html*"
wget https://web.dolphinnext.com/umw_biocore/dnext_data/genome_data/dog/canFam3/ -P ${genomedata}/genome_data/dog/canFam3/ -l inf -nc -nH --cut-dirs=4 -r --no-parent -R "index.html*"
wget https://web.dolphinnext.com/umw_biocore/dnext_data/genome_data/e_coli/ASM584v2_NC_000913_3/ -P ${genomedata}/genome_data/e_coli/ASM584v2_NC_000913_3/ -l inf -nc -nH --cut-dirs=4 -r --no-parent -R "index.html*"
wget https://web.dolphinnext.com/umw_biocore/dnext_data/genome_data/n_vectensis/jaNemVect1_1/ -P ${genomedata}/genome_data/n_vectensis/jaNemVect1_1/ -l inf -nc -nH --cut-dirs=4 -r --no-parent -R "index.html*"
wget https://web.dolphinnext.com/umw_biocore/dnext_data/genome_data/s_cerevisiae/sacCer3/ -P ${genomedata}/genome_data/s_cerevisiae/sacCer3/ -l inf -nc -nH --cut-dirs=4 -r --no-parent -R "index.html*"
wget https://web.dolphinnext.com/umw_biocore/dnext_data/genome_data/s_pombe/ASM294v2/ -P ${genomedata}/genome_data/s_pombe/ASM294v2/ -l inf -nc -nH --cut-dirs=4 -r --no-parent -R "index.html*"
wget https://web.dolphinnext.com/umw_biocore/dnext_data/genome_data/zebrafish/GRCz11/ -P ${genomedata}/genome_data/zebrafish/GRCz11/ -l inf -nc -nH --cut-dirs=4 -r --no-parent -R "index.html*"
# If you're using AWS Cloud, Sync data with S3 bucket
aws s3 sync -r ${genomedata} s3://DEST_BUCKET/viafoundry/run_data/genome_data
# Remove the local data after syncing with AWS S3
rm -rf ${genomedata}
# If you're using Google Cloud, Sync data with Google Cloud Storage bucket
gsutil rsync -r ${genomedata} gs://DEST_BUCKET/viafoundry/run_data/genome_data
# Remove the local data after syncing with Google Cloud Storage
rm -rf ${genomedata}
Configuration of Run Environment
Once logged in, click on the profile tab in the top right of the screen. You'll notice several tabs to explore in profile page.
- SSH Keys: Needs to be configured to setup a connection and click hide from user button.
-
Run environments: This is your main segment for creating connection profiles for users.
-
Run environments: -> SSH Keys : Select SSH Keys.
-
Run environments: -> Profile Variables : Set the following directory for DOWNDIR:
## A. For Google Cloud: ## if genome_data located at gs://DEST_BUCKET/viafoundry/run_data/genome_data params.DOWNDIR = "gs://DEST_BUCKET/viafoundry/run_data" ## B. For AWS Cloud: ## if genome_data located at s3://DEST_BUCKET/viafoundry/run_data/genome_data params.DOWNDIR = "s3://DEST_BUCKET/viafoundry/run_data" ## C. For Local Execution (eg. clusters): ## if genome_data located at /opt/viafoundry/run_data/genome_data params.DOWNDIR = "/opt/viafoundry/run_data"
-
Run environments: -> Default Working Directory : Set the following directory for runs:
/opt/runs
-
Run environments: -> Default Bucket Location for Publishing : Set the following directory for cloud environmets:
s3://DEST_BUCKET/viafoundry/runs or gs://DEST_BUCKET/viafoundry/runs
Executing Test Run
To test a pipeline, you can follow these steps:
-
Visit the following link: RNA-Seq pipeline.
-
On the webpage, locate the "download pipeline" button and click on it. This will initiate the download of a zip file.
-
Once the zip file is downloaded, extract its contents. Inside the extracted folder, you will find a file named
main.dn
. This file contains the pipeline definition and can be used for importing the pipeline. -
Now, go to the "pipelines" tab on the ViaFoundry website.
-
Look for the "Create Pipeline" button and click on it. This will open the pipeline creation interface.
-
In the pipeline creation interface, you will find an option to import a pipeline. Drag and drop the
main.dn
file that you extracted earlier into the designated area. -
The pipeline will be imported, and you can proceed with further customization and execution.
Automating Run Environment Creation:
- Access the Foundry profile page and navigate to the "Run Environment" section.
- Identify the ID of the run environment that was utilized in a successful test run.
- Update the
DEFAULT_RUN_ENVIRONMENT
parameter located within the[CONFIG]
section of the/export/vpipe/config/.sec
file. This specific run environment will serve as the template for generating new run environments once a new user registers.
[CONFIG]
DEFAULT_RUN_ENVIRONMENT=YOUR_RUN_ENV_ID