Process Guide
This guide will walk you through the creation of Via Foundry processes.
Process Window
Once you've logged into your Via Foundry account, enter the page of a
pipeline you'd like to add a new process to by clicking on the
Pipeline
tab in the top left of the screen, then on the name of the
desired pipeline. You'll notice several buttons on the left sidebar's
menu. For our purposes now, the most important one is the button
containing the image of an unbroken circle, second from the left. This
is the New Process
button, which enables you to create new processes.
Basics
Once you click the New Process
button, a new window will appear for
you to define the components of your new process.
- Name: Enter the name of your process in this field. After creating the process, its name will appear in the pipeline's left sidebar menu under the selected menu group.
- Description: Provide a brief explanation of how the process works in this field.
- Menu group: Select a menu group so that the process can be
properly categorized within the left sidebar menu. If necessary, you
can add new menu groups by clicking the
Add Menu Group
button. You can also edit or delete these new groups by using theEdit Menu Group
orDelete Menu Group
buttons.
Parameters
This section is for you to create parameters for your process, which
will be used while defining inputs and outputs. You can create new
parameters via the Add Parameter
button.
-
Identifier: The identifier is simply the parameter name, and once created, you'll be allowed to call the same parameters in other processes.
-
Qualifier: Five main types of qualifiers (
file
,set/tuple
,each
,env
andval
) exist in Via Foundry:File: This qualifier is used when the following qualifier represents the name of a file. Example usage in a Nextflow file:
file genome
Set/Tuple: This qualifier enables you to handle a group of input values with other qualifiers. Example usage:
set val(name), file(genome)
Val: With this qualifier, you can access the received input value by its name in the process script. Example usage:
val script_path
Each: This qualifier enables the execution of the process for each entry in the input collection. Example usage:
each bed_list
-
File type: If the qualifier is set to
file
orset
, a "File Type" option will appear. This option is used to filter available nodes when generating pipelines.
Tip: For example, you may create genome
parameter by entering the
identifier as "genome", qualifier as "file," and file type as
"fasta". Similarly, to create the "script_path" parameter you can
define the identifier as "script_path" and the qualifier as "val".
Note: When the qualifier is set to val
, the identifier is used
to filter available nodes when connecting each node. If file
or set
is selected as the qualifier, the file type is used to filter
available nodes. When the qualifier is set to each
, you can enter both
the file type (if connecting with file nodes) and the identifier
(if connecting to val nodes).
Inputs
This section is where you enter all your process's input nodes.
Clicking the Add input...
dropdown will show a list of all existing
inputs, from which you can select those you wish to use. After adding a
selected parameter as a input node, you'll see an Input Name
box, in
addition to Add/Remove Operator
(wrench icon) and Optional
buttons.
The Input Name
box is used to define Nextflow variables that will be
used in the Scripts section. For example, if you
enter the input name as "genome," you can call this variable as
${genome}
in the Script field. Here are some other examples:
Qualifier | Input name | How to Reference in the Script |
---|---|---|
val | script_path | ${script_path} |
each | bed_list | ${bed_list} |
file | genome | ${genome} |
set | val(name), file(genome) | ${genome} |
set | val(name), file(genome) | ${name} |
Additionally, if you need to transform values emitted by a channel, you
can click the Add/Remove Operator
button, then select operators from
the Operators
dropdown menu that appears. For more detailed
information, refer to the Operators section.
- Optional Inputs:
If you want to use an optional input parameter, you can check the
Optional
checkbox. This feature provides flexibility to the user while
defining their process, as the process will still be executed even if
the input parameter is not provided. Here's an example use case:
Here, the filter
parameter will be defined with the NO_FILE
prefix
in case the input parameter genome
is empty. When the genome
parameter is found, though, the filter
parameter will be defined as
---filter ${genome}"
.
filter = genome.name.startsWith('NO_FILE') ? "" : "--filter ${genome}"
Outputs
All the output nodes for a new process are defined in this section. In a
process similar to that of adding inputs, you can click the
Add output...
dropdown menu to select from a list of output
parameters. Once an output has been chosen, you'll see an Output Name
box, as well as an Add/Remove Operator
button (wrench icon) and an
Operators
button.
You can select output files created by the process by using the
Output Name
box. Entering a pattern of output files will highlight and
grab all files starting with that pattern; for instance, entering
"genome.index*" in this field will grab all the files whose names
begin with "genome.index." You can also use Nextflow variables, which
are defined in the Inputs and Scripts sections. Here are some examples
of input and output qualifiers and names:
Input Qualifier | Input name | Output Qualifier | Output name |
---|---|---|---|
file | genome | file | "${genome}_out.txt" |
set | val(name), file(reads) | set | val(name), file("${name}.unmap*.fq") |
set | val(name), file(reads) | file | "${name}.bam" |
set | val(name), file(genome) | file | "genome.index*" |
Additionally, if you need to transform values emitted by a channel, you
can click the Add/Remove Operator
button, then select operators from
the Operators
dropdown menu that appears. For more detailed
information, refer to the Operators section.
- A Note on Optional Outputs:
If your process doesn't generate some outputs in some circumstances,
you can check the Optional
checkbox, which tells Nextflow not to fail
and exit the process if the given output is not created.
Scripts
All scripts for the process are defined in this region. Nextflow supports three different modes: Script, Shell, and Exec. These modes define how the command gets executed by the process.
A. Script:
For simplicity, Via Foundry uses the script format by default. This
means that defined commands will be executed as a BASH script in the
host machine, which is the same as using three double quotes """
at
the start and the end of the command block. If you use three double
quotes on separated lines, Via Foundry will interpret the area between
those lines as a command block. As such, both of the following two
blocks will be executed the same way:
script:
"""
tophat2 -o . ${indexPath} ${reads}
"""
OR
tophat2 -o . ${indexPath} ${reads}
The script block can contain any command or script that is typically used in a terminal shell or BASH script.
Let's analyze a more complicated version of the script block. Here, the
Nextflow variable name
is defined in between the script:
keyword and
three opening double quotes """
:
script:
name = reads.toString() - '.fastq' // local scope for nextflow variables
"""
newPath="/mypath" ## inside of """ block is used to define bash variables in local scope
STAR -o . \${newPath} ${name}
"""
Note:
- The
newPath
variable is defined in the bash script and used in the STAR command as\${newPath}
. (Note that bash variables need to be escaped by backslashes in script block) -
The
name
variable is defined in Groovy's scope as a Nextflow variable, and used in the STAR command as${name}
. -
Conditional Scripts:
Conditional scripts can be used with control statements such as "if" and "switch." To use this feature, you need to start the code block with the keyword "script:". This tells the interpreter to evaluate all the subsequent statements as one code block and identifies the script string to be executed. Here's an example to illustrate this:
script:
name = reads.toString() - '.fastq'
// This block runs if the mate variable is initialized with the value "pair"
if (mate == "pair") {
"""
bowtie2 -x genome.index -1 ${reads.join(' -2 ')} -S ${name}_alignment.sam --un-conc ${name}_unalignedreads
"""
}
// This block runs if the mate variable is initialized with value "single"
else if (mate == "single") {
"""
bowtie2 -x genome.index -U $reads -S ${name}_alignment.sam --un ${name}_unalignedreads
"""
}
Tip: As shown in the example above, you can run Bowtie based on the
mate status of the reads. In order to enable this capability, you must
add the mate
parameter as an input. When you are running the pipeline,
you can choose single
or pair
based on your needs.
B. Shell:
Alternatively, you can use the shell
block, wherein Nextflow variables
are declared with an exclamation mark !
. This feature allows you to
use both Nextflow and BASH variables in the same code without using an
escape character. In order to use the shell mode, add shell:
at the
beginning of your code, and three single quotes ('''
) at the start and
the end of the command block, like so:
shell:
'''
echo $PATH and !{new_path}
'''
Here is an example use case of a Perl script block with correct syntax:
shell:
name = reads.toString() - '.fastq' //local scope for nextflow variables
'''
#!/usr/bin/env perl // inside of ''' block you can define perl (or other language) variables in local scope
$newPath="/mypath";
system("STAR -o . ${newPath} !{name}");
'''
Note:
- The
$newPath
variable is defined in the Perl script, and used in the Tophat command as${newPath}
. (Note that variables don't need to be escaped by backslash if theshell:
keyword is used.) - The
name
variable is defined in Groovy's scope as a Nextflow variable, and used in the Tophat command as!{name}
.
C. Exec:
Nextflow processes allows you to execute native code other than just
system commands. This lets you easily follow local Nextflow variables.
To initialize exec mode, you can add an exec:
block at the beginning
of the script, like this:
exec:
println "${genome}"
Operators
If you need to transform values emitted by a channel, you can click the
Operators
button and select operators from the dropdown menu. This
allows you to apply various operations or modifications to the emitted
values. Optionally, you can specify the operator content to customize
its behavior according to your requirements. Additionally, you can add
multiple operators by enclosing them within parentheses ( )
.
Here are some examples of operators and their usage:
Operator | Operator Content | Usage |
---|---|---|
flatMap | ().buffer(size:3) | Groups 3 emitted items into one new channel. |
mode flatten | To emit each output file as a sole item. | |
groupTuple | Collects tuples (or lists) of values emitted and groups them based on their key value. | |
Tip: To get more information about operators, check out this link to Nextflow's documentation.
Foundry Variables
Foundry Variables are provided by the Foundry platform and are intended to enhance the flexibility and adaptability of your workflows. These variables (e.g. FOUNDRY_RUN_ID
, FOUNDRY_PIPELINE_VERSION
, etc.) can be used in your process scripts or configuration files.
Variable | Description |
---|---|
{{FOUNDRY_PIPELINE_ID}} | The identifier assigned to each pipeline. |
{{FOUNDRY_PIPELINE_URL}} | The URL assigned to each pipeline. |
{{FOUNDRY_PIPELINE_VERSION}} | The version assigned to each pipeline. |
{{FOUNDRY_RUN_ID}} | The unique identifier assigned to each run. |
{{FOUNDRY_RUN_URL}} | The URL that links to the detailed information and logs for the current run. |
{{FOUNDRY_WEB_REPORT_DIR}} | The URL where run reports are located. |
{{FOUNDRY_WEB_RUN_DIR}} | The URL where run logs are located. |
{{FOUNDRY_PUBLISH_DIR}} | The directory path where all reported files are located. |
{{FOUNDRY_LAB}} | The lab associated with the Foundry account under which the current run is running. |
{{FOUNDRY_USERNAME}} | The username associated with the Foundry account under which the current run is running. |
{{FOUNDRY_EMAIL}} | The email address associated with the Foundry account under which the run is running. |
Usage: When scripting within the Foundry platform, you can directly reference these variables by enclosing them in double curly braces. For example:
echo "Current Run ID: {{FOUNDRY_RUN_ID}}"
Header Script
This section, found within the Advanced Options
menu, allows you to
add additional scripts or comments before the beginning of the process
block. This allows you to re-call the same function several times in the
Script section.
Process Options
Via Foundry contains functionality supporting seamless separation of main process inputs and optional user-manipulable parameters, the latter being controllable with the Process Options feature. This section will go over how to configure process options, but before that, here is how the results of the commands entered in the Process Options are displayed; note that all process options are located below the Inputs section of the run page, and can be edited by the user depending on their needs.
In order to create these forms, you need to use the following syntax in
the Script or Process Header section of the Add New Process
or
Edit/Delete Process
page for your process (shown below the syntax):
variableName = defaultValue //* @formType @description:"..." @tooltip:"..." @options:"..."
Note that when defining a defaultValue for a variable in Via Foundry,
you can use single or double quotes for strings, or no quotes for
numbers. If you want to define an array of default values for a variable
using the @style tag, you can use the following format:
variableName = ["defaultValue1", "defaultValue2"]
. This allows you to
set multiple default values for the variable.
@formtype
In Via Foundry's Process Options section, you can use four
different commands to specify the type of form that will be displayed
for a given parameter: @input
, @textbox
, @checkbox
, and
@dropdown
. Here are brief descriptions of each, accompanied by images
highlighting what they look like on the run page:
@input: Creates a single-line text field. Here is an example usage, along with the form field these commands create on the run page:
readsPerFile = 5000000 //* @input @description:"The number of reads per file"
params_tophat = "-N 4" //* @input @description:"Tophat parameters" @tooltip:"parameters for Tophat2 version 2.6"
@textbox: Creates a multi-line text field. Example usage:
Adapter_Sequence = "" //* @textbox @description:"You can enter a single sequence or multiple sequences in different lines."
@checkbox: Creates a checkbox field with options true
or false
by default. Example usage:
run_rRNA_Mapping = "false" //* @checkbox @description:"Check the box to activate rRNA mapping."
rRNA_filtering = "true" //* @checkbox @description:"Check the box to filter rRNA reads."
@dropdown: Creates a dropdown menu, whose options can be specified
with the @options
feature. Example usage:
genomeType = "" //* @dropdown @description:"Genome type for pipeline" @options:"hg19","mm10", "custom"
Non-Form Type Specifiers
@description: Provide brief, informative descriptions of inputs
with the @description
tag. For example, in the @dropdown example
above, "Genome type for pipeline" is provided as the value of
@description
, so in the run page, the blurb beneath the dropdown menu
says "Genome type for pipeline".
@tooltip: Provide more detailed descriptions of inputs and/or
supplement your @description
with the @tooltip
tag. See the example
below for possible usage:
params_tophat = "-N 4" //* @input @tooltip:"parameters for Tophat2 version 2.6" @description:"Tophat parameters"
@title: You can create a header on top of a variable by using the
@title
tag. This allows you to organize complex form structures more
easily. Here's an example:
params_tophat = "-N 4" //* @input @title:"Alignment Section" @description:"Tophat parameters"
In this example, the params_tophat
variable is grouped under the
"Alignment Section" header in the form, making it easier to
distinguish and organize related variables.
@optional: When defining an input, you can designate it as optional
by using the @optional
tag. This gives users the freedom to leave
certain inputs empty if they so desire. See the example below:
params.tsv_input = "" //* @input @optional @description:"TSV input for process"
params.csv_input = "" //* @input @optional @description:"CSV input for process"
@file: You can specify the type of window that will appear for a
selected input. By default, a value modal
is used. However, if you add
the @file tag, the file modal
will be displayed. This allows users to
upload files and use them as a parameter. Here's an example:
params.tsv_input = "" //* @input @file @description:"TSV file path for process"
Explanation: In this example, the params.tsv_input
parameter is
defined as a file input. When the user selects this input in the run
page, a file modal will be displayed, allowing them to upload a TSV file
to be used as the parameter value.
@options: When you define a dropdown form field using the
@dropdown
formType, you should also specify the available options
using the @options
tag. Here's a simple example:
genomeType = "" //* @dropdown @options:"hg19","mm10","custom"
Here, the genomeType
parameter is defined as a dropdown field. The
available options for selection are "hg19," "mm10," and "custom."
When the user selects this input in the run page, a dropdown menu will
be displayed with these options to choose from. The selected option will
be used as the parameter value.
Conditional Options - Version 1 (Advanced Usage):
Since the same process can be viewed differently based on the pipeline being used, in order to control the visibility of dropdown options in different pipelines, you can define variables in the pipeline header starting with an underscore. For example:
_nucleicAcidType = "rna" //In RNA-seq pipeline header
_nucleicAcidType = "dna" //In ChIP-seq pipeline header
You can then use these variables to control which options will be visible in the dropdown fields of different pipelines using the following format:
param = "" //* @dropdown @options:{_nucleicAcidType="rna","rRNA","miRNA","snRNA"},{_nucleicAcidType="dna", "ercc","rmsk"}
Now, the param
dropdown will have three options ("rRNA", "miRNA",
"snRNA") in the RNA-seq pipeline and two options ("ercc", "rmsk")
in the ChIP-seq pipeline. You can also define default options by not
assigning any value, as shown in the example below:
param = "" //* @dropdown @options:{"rRNA","miRNA","snRNA"},{_nucleicAcidType="dna","ercc","rmsk"}
In this case, by default, the three options ("rRNA", "miRNA",
"snRNA") will be visible unless the pipeline header defines
_nucleicAcidType="dna"
.
Conditional Options - Version 2 (Advanced Usage):
To control the visibility of dropdown options based on the selected
parameter in another dropdown, you can use the following syntax. In this
example, the dropdown called sequence
controls the visible options of
the dropdown
aligner:
aligner = "" //* @dropdown @options:{sequence=("rRNA","miRNA","snRNA"),"bowtie","bowtie2"},{sequence="genome", "star"}
When sequence
is selected as one of the options "rRNA", "miRNA",
or "snRNA", the aligner
dropdown will display options "bowtie" and
"bowtie2". Similarly, when sequence
is selected as "genome", the
aligner
dropdown will display the "star" option.
This allows you to dynamically control the available options in a dropdown based on the selected value of another dropdown parameter.
Styles for Process Options
Via Foundry supports the use of additional tags to shape the layouts of
form fields: @multicolumn
, @array
, and condition
.
@multicolumn
The @multicolumn
form type enables you to group variables on the same
line as one another for more coherent visualizations.
Example usage:
var1 = "" //* @input @description:"description of var1"
var2 = "" //* @input @description:"description of var2"
var3 = "" //* @input @description:"description of var3"
var4 = "" //* @input @description:"description of var4"
var5 = "" //* @input @description:"description of var5"
var6 = "" //* @input @description:"description of var6"
//* @style @multicolumn:{var1, var2, var3}, {var5, var6}
Here, var1, var2, and var3 will be displayed on the same row, as they are grouped in the same list in the @multicolumn value. Similarly, var5 and var6 will share a row, and since var4 isn't listed in the @multicolumn argument, it will fill a single row by default. See the image below:
@array
The @array
form type lets you group variables together and link them
with add/remove buttons. Here's an example usage:
var1 = "" //* @input @description:"description of var1" @title:"Step 1"
var2 = "" //* @input @description:"description of var2"
var3 = "" //* @input @description:"description of var3"
var4 = "" //* @input @description:"description of var4" @title:"Step 2"
//* @style @array:{var1, var2}, {var4}
In this example, var1 and var2 are grouped together and linked to add/remove buttons. Clicking the add button will create new var1 and var2 fields just below the existing ones. Similarly, the remove button will remove the generated copies of form fields. The same features apply to var4 as well. The below image demonstrates this nicely.
A helpful tip: You can combine multiple style options on the same
variables. For example,
//* @style @array:{var1, var2}, {var4} @multicolumn:{var1, var2}
will
combine both the multicolumn and array features for var1 and var2, as
seen in this image.
You can also define multiple default values by using the following syntax:
var1 = ["defVal1", "defVal2"] //* @input @description:"description of var1"
With this line, on the run page, two rows will be displayed for var1, and their default values will be "defVal1" and "defVal2".
@condition
The @condition
tag allows you to bind the value of one form field to
the visibility of other form fields. Here's an example:
var1 = "" //* @dropdown @description:"description of var1" @options:"yes", "no" @title:"Step 1"
var2 = "" //* @input @description:"description of var2"
var3 = "" //* @input @description:"description of var3"
var4 = "" //* @input @description:"description of var4"
var5 = "" //* @input @description:"description of var5" @title:"Step 2"
//* @style @condition:{var1="yes", var2}, {var1="no", var3, var4}
In this example, the value of var1 is linked to other form fields. When var1 is selected as "yes", the field for var2 will be shown. On the other hand, when var1 is changed to "no", the var2 field will disappear, and the fields for var3 and var4 will appear. Since var5 is not defined in the @condition tag, it will always be visible regardless of changes in other fields. Feel free to check this image for clarification:
You can combine multiple style options on the same variable, including
@condition. For example,
//* @style @condition:{var1="yes", var2}, {var1="no", var3, var4} @array:{var1, var2, var3, var4} @multicolumn:{var1, var2, var3, var4}
will combine all the features just discussed.
Autofill Feature for Process
To automate the filling of executor properties in Via Foundry, you can utilize the autofill feature. There are two types of autofill options available: hostname-independent autofill and hostname-dependent autofill.
Hostname-Independent Autofill:
To define executor properties that will be automatically filled regardless of the hostname, you can use the following syntax:
//* autofill
<executor properties>
//* autofill
Hostname-Dependent Autofill:
If you need to overwrite the default executor properties based on specific hostnames, you can use hostname-dependent executor properties. Here's the syntax:
//* autofill
<executor properties>
if ($HOSTNAME == "ghpcc06.umassrc.org"){
<hostname dependent executor properties>
}
//* autofill
In this example, the <executor properties>
section will be filled for
all hostnames. However, if the hostname is "ghpcc06.umassrc.org", the
<hostname dependent executor properties>
section will be additionally
filled.
The $HOSTNAME
variable in Via Foundry represents the selected hostname
in the run environment. By using this variable, you can apply specific
executor properties based on the hostname.
Executor Properties:
There are five types of executor properties available to autofill
Executor Settings for All Processes: $TIME
, $CPU
, $MEMORY
,
$QUEUE
, $EXEC_OPTIONS
which respectively define the Time, CPU,
Memory, Queue/Partition and Other Options fields in the executor
settings. See the example below:
//* autofill
$TIME = 1000
if ($HOSTNAME == "ghpcc06.umassrc.org"){
$TIME = 3000
$CPU = 4
$MEMORY = 100
$QUEUE = "long"
$EXEC_OPTIONS = '-E "file /home/garberlab"'
}
//* autofill
In this example, since the run environment is set as
"ghpcc06.umassrc.org", the autofill feature overwrote the default
$TIME
value (1000) with a value of 3000.
Platform Tag:
To isolate platform-dependent parameters in Via Foundry, you can use the platform tag. This allows you to exclude platform-specific parameters from the exported process, and when the process is imported, existing platform-dependent parameters will not be overwritten. Here's an example of how to use the platform tag:
//* autofill
$MEMORY = 32
$CPU = 1
//* platform
if ($HOSTNAME == "ghpcc06.umassrc.org"){
$TIME = 3000
$CPU = 4
$MEMORY = 100
$QUEUE = "long"
$EXEC_OPTIONS = '-E "file /home/garberlab"'
}
//* platform
//* autofill
Permissions, Groups and Publish
By default, all new processes are only seen by their owner. You have the option to share your process with a specific group that you have created in the profile's "Groups" tab. To do this, choose "Only my group" and select the name of the desired group. Members of that group will then be able to view the process on their pipeline page.
When collaborating with multiple individuals on a shared pipeline, you have the option to grant write permission to a group by utilizing the "Group Permission to Write" dropdown. This allows you to specify a group and authorize them to make changes and modifications to the process.
Copying and Revisions
You can easily create a copy of your process by clicking the Settings
button, identifiable by an image of three dots and located at the top
right corner of the process window, and then selecting "Copy Process"
from the dropdown menu. This allows you to create a duplicate of the
process while retaining the original one.
It's important to note that once your process becomes public or is being used by other group members, you are not allowed to make changes to the same revision directly. Instead, a new revision of the process will be created, and any modifications or updates can be made to this new revision. This ensures that the original process remains intact and unchanged, while allowing you to iterate and improve upon a separate copy.
How To Cite Us
If you use Via Foundry (formerly DolphinNext) in your research, please cite:
Yukselen, O., Turkyilmaz, O., Ozturk, A.R. et al. DolphinNext: a distributed data processing platform for high throughput genomics. BMC Genomics 21, 310 (2020). https://doi.org/10.1186/s12864-020-6714-x
Support
For any questions or help, please reach out to support@viascientific.com with your name and question.