Tutorial

This tutorial will guide you through creating and running your first Workforce workflow.

Getting Started

Installation

First, install Workforce:

pip install workforce

Verify the installation:

wf --help

Your First Workflow

Let’s create a simple data processing pipeline that:

Downloads a dataset
Processes the data
Generates a report

Step 1: Launch the GUI

Start Workforce:

wf

This opens the visual workflow editor.

Step 2: Create Nodes

Add the first node:

Double-click on the canvas (empty area)
A popup dialog will appear
Enter the bash command:

echo "Downloading data..." && sleep 2 && echo "Data downloaded" > data.txt
Click “Save” or press Enter

Add the second node:

Double-click on the canvas again
Enter:

echo "Processing..." && sleep 1 && cat data.txt | wc -l > processed.txt
Click “Save”

Add the third node:

Double-click on the canvas
Enter:

echo "Report: $(cat processed.txt) lines processed" > report.txt
Click “Save”

Step 3: Connect Nodes

Create dependencies between nodes. By default, edges are blocking edges that enforce sequential execution. For advanced workflows, you can create non-blocking edges using keyboard modifiers.

Creating Blocking Edges (Default)

Blocking edges enforce strict dependencies. A node only executes when all incoming blocking edges are ready. This is the standard edge type for sequential pipelines.

Method 1: Right-click and drag

Right-click on the first node (download_data) and hold
Drag to the second node (process_data)
Release to create a blocking edge (solid line)

Repeat for the second dependency:

Right-click on process_data and drag to generate_report
Release to create the edge

Method 2: Select and press ‘E’

Click on the first node to select it
Hold Shift and click the second node (multi-select)
Continue selecting nodes in order
Press ‘E’ to connect them in sequence with blocking edges

Your workflow should now show:

[download_data] ─→ [process_data] ─→ [generate_report]
                (blocking)       (blocking)

This ensures download_data completes before process_data starts, and process_data completes before generate_report starts.

Creating Non-Blocking Edges (Optional)

Non-blocking edges are soft triggers that allow nodes to execute without waiting for all dependencies. Use this for advanced patterns like node re-execution or fan-out workflows.

To create a non-blocking edge:

Hold Ctrl+Shift
Right-click and drag from source node to target node
Release to create a non-blocking edge (dashed line)

Example: If you wanted process_data to be re-triggered externally without waiting for download_data, you could:

Right-click download_data and drag to process_data → blocking edge
Then Ctrl+Shift + right-click external_trigger and drag to process_data → non-blocking edge

Now process_data will execute when:

ALL blocking edges are ready (download_data completed), OR
The non-blocking edge triggers (external_trigger is ready)

This allows flexible execution patterns beyond strict sequential order. See Re-Triggering and Dependency Resolution for more details.

Step 4: Save the Workflow

Save your workflow:

Press Ctrl+S or use File → Exit (which saves automatically)
If this is a new workflow, it will be saved as Workfile in the current directory
Or specify a different path when starting: wf myworkflow.graphml

Step 5: Run the Workflow

Execute the workflow:

Click the “Run” button or press ‘R’
Watch as nodes change color:
- Light gray → Not started
- Light cyan → Ready to run
- Light blue → Currently running
- Light green → Completed successfully
- Light coral → Failed (if error occurs)
The workflow will execute in order:
- First node runs first
- Second node runs after first completes
- Third node runs last

Step 6: View Logs

Check the output from any node:

Left-click a node to select it
Press ‘S’ to view logs
In the log popup, press ‘S’ or Escape to close it
See the combined stdout and stderr from the command execution

Verify your files were created:

cat data.txt
cat processed.txt
cat report.txt

Working with the CLI

The same workflow can be created and run using the command line.

Creating via CLI

Create a new workflow file:

# Start with the GUI to create graphically
wf

# Or create nodes via CLI (requires existing Workfile or path)
wf edit add-node Workfile "echo 'Downloading...' && sleep 2 && echo 'Data downloaded' > data.txt" --x 100 --y 100
wf edit add-node Workfile "echo 'Processing...' && cat data.txt | wc -l > processed.txt" --x 200 --y 100
wf edit add-node Workfile "echo 'Report: \$(cat processed.txt) lines' > report.txt" --x 300 --y 100

Add dependencies (note: requires node UUIDs, easier via GUI):

# You'll need the actual node UUIDs from the graph
# wf edit add-edge Workfile <source-uuid> <target-uuid>

# It's much easier to create edges in the GUI by dragging

Running via CLI

Execute the complete workflow:

wf run Workfile

Run specific nodes only:

wf run Workfile --nodes process_data,generate_report

Advanced Tutorial

Running Subsets

Select specific nodes in the GUI:

Left-click to select a node
Shift + Left-click to add more nodes to selection
Press ‘R’ to run only the selected nodes
Only selected nodes (and their dependencies within the selection) execute

Resume Failed Nodes

If a node fails:

Fix the issue (edit the command by double-clicking the node, or fix external resources)
Select the failed node(s)
Press ‘C’ to clear the status (changes fail to "")
Press ‘R’ to run again, which will re-execute failed nodes

Using Command Wrappers

Example: Docker Wrapper

Run all commands in a Docker container:

wf run Workfile --wrapper "docker run -v \$(pwd):/work -w /work ubuntu bash -c '{}'"

Example: Remote Execution

Execute workflow on a remote server:

wf run Workfile --wrapper 'ssh user@remote-server "{}"'

Example: Tmux Integration

Send commands to tmux panes:

wf run Workfile --wrapper 'tmux send-keys -t mysession "{}" C-m'

Complex Workflow Example

Let’s create a more realistic bioinformatics pipeline.

Scenario

Process multiple sample files through quality control, alignment, and variant calling.

Workflow Structure

download_samples → quality_control → trim_adapters → align_to_reference
                                                          ↓
                                                    call_variants → merge_results

Creating the Workflow

# Note: These are simplified examples
# In practice, create nodes in GUI or use UUIDs for edges

# Create nodes with commands
wf edit add-node Workfile "wget https://example.com/samples.tar.gz && tar -xzf samples.tar.gz"
wf edit add-node Workfile "fastqc samples/*.fastq -o qc_reports/"
wf edit add-node Workfile "for f in samples/*.fastq; do trim_galore \$f -o trimmed/; done"

# Connect nodes in GUI or use node UUIDs with add-edge
# Edges require source and target node IDs (UUIDs)

Running with Conda

Activate a conda environment for all commands:

wf run Workfile --wrapper "conda run -n biotools"

Parallel Processing

Process multiple samples in parallel using GNU Parallel:

wf run Workfile --wrapper "parallel -j 4" --suffix ":::" --suffix "sample1 sample2 sample3 sample4"

Python API Tutorial

You can also work with workflows programmatically.

Loading and Modifying Workflows

from workforce.edit.graph import (
    load_graph,
    save_graph,
    add_node_to_graph,
    add_edge_to_graph,
    edit_node_label_in_graph
)

# Load an existing workflow
G = load_graph('tutorial_workflow.graphml')

# Add a new node - returns {'node_id': '<uuid>'}
result = add_node_to_graph(
    'tutorial_workflow.graphml',
    label='test -f report.txt && echo "Validation passed"',
    x=400,
    y=100
)
new_node_id = result['node_id']

# Add an edge (requires UUIDs of source and target)
# You'd need to get the node UUID from the graph first
# add_edge_to_graph('tutorial_workflow.graphml', source_uuid, new_node_id)

# Modify a node's command (requires node UUID)
# edit_node_label_in_graph(
#     'tutorial_workflow.graphml',
#     node_id,
#     'curl -O https://example.com/data.csv'
# )

# Note: Each function automatically saves the graph

Programmatic Execution

from workforce import utils

# Compute workspace ID from file path
workspace_id = utils.compute_workspace_id('tutorial_workflow.graphml')

# Get workspace URL (auto-discovers or starts server)
workspace_url = utils.get_workspace_url(workspace_id)
print(f"Workspace URL: {workspace_url}")

# To run the workflow, use the CLI:
# wf run tutorial_workflow.graphml

# The run client connects via SocketIO and executes nodes
# when it receives NODE_READY events from the server

Best Practices

Workflow Design

Keep commands atomic: Each node should do one thing well
Use meaningful names: Node names should describe their purpose
Check dependencies: Ensure nodes have proper input/output relationships
Handle errors: Use && chains to fail fast: command1 && command2
Test incrementally: Run subsets to verify each step works

File Management

Use absolute paths or ensure working directory is correct
Create output directories before running: mkdir -p output && ...
Clean up temporary files in final nodes
Use Workfile as the default name for easy discovery

Performance

Parallelize independent nodes: Design workflows with multiple independent branches
Use wrappers for resource management: Docker, HPC schedulers, etc.
Monitor resource usage: Large parallelism may overwhelm the system
Consider subset execution: Test with small datasets first

Debugging

Check logs frequently: Press ‘l’ in GUI to view node output
Test commands in isolation: Verify each command works before adding to workflow
Use echo for debugging: Add echo statements to track progress
Resume from failures: Use Shift+R to retry failed nodes after fixes

Next Steps

Now that you’ve completed the tutorial, you can:

Read the Usage guide for comprehensive CLI reference
Explore the Architecture to understand how Workforce works internally
Check the API Reference for programmatic workflow manipulation
Visit the GitHub repository for examples and issues

Happy workflow building!