Tutorial
This tutorial will guide you through creating and running your first Workforce workflow.
Getting Started
Installation
First, install Workforce:
pip install workforce
Verify the installation:
wf --help
Your First Workflow
Let’s create a simple data processing pipeline that:
Downloads a dataset
Processes the data
Generates a report
Step 1: Launch the GUI
Start Workforce:
wf
This opens the visual workflow editor.
Step 2: Create Nodes
Add the first node:
Double-click on the canvas (empty area)
A popup dialog will appear
Enter the bash command:
echo "Downloading data..." && sleep 2 && echo "Data downloaded" > data.txtClick “Save” or press Enter
Add the second node:
Double-click on the canvas again
Enter:
echo "Processing..." && sleep 1 && cat data.txt | wc -l > processed.txtClick “Save”
Add the third node:
Double-click on the canvas
Enter:
echo "Report: $(cat processed.txt) lines processed" > report.txtClick “Save”
Step 3: Connect Nodes
Create dependencies between nodes. By default, edges are blocking edges that enforce sequential execution. For advanced workflows, you can create non-blocking edges using keyboard modifiers.
Creating Blocking Edges (Default)
Blocking edges enforce strict dependencies. A node only executes when all incoming blocking edges are ready. This is the standard edge type for sequential pipelines.
Method 1: Right-click and drag
Right-click on the first node (download_data) and hold
Drag to the second node (process_data)
Release to create a blocking edge (solid line)
Repeat for the second dependency:
Right-click on process_data and drag to generate_report
Release to create the edge
Method 2: Select and press ‘E’
Click on the first node to select it
Hold Shift and click the second node (multi-select)
Continue selecting nodes in order
Press ‘E’ to connect them in sequence with blocking edges
Your workflow should now show:
[download_data] ─→ [process_data] ─→ [generate_report]
(blocking) (blocking)
This ensures download_data completes before process_data starts, and process_data completes before generate_report starts.
Creating Non-Blocking Edges (Optional)
Non-blocking edges are soft triggers that allow nodes to execute without waiting for all dependencies. Use this for advanced patterns like node re-execution or fan-out workflows.
To create a non-blocking edge:
Hold Ctrl+Shift
Right-click and drag from source node to target node
Release to create a non-blocking edge (dashed line)
Example: If you wanted process_data to be re-triggered externally without waiting for download_data, you could:
Right-click download_data and drag to process_data → blocking edge
Then Ctrl+Shift + right-click external_trigger and drag to process_data → non-blocking edge
Now process_data will execute when:
ALL blocking edges are ready (download_data completed), OR
The non-blocking edge triggers (external_trigger is ready)
This allows flexible execution patterns beyond strict sequential order. See Re-Triggering and Dependency Resolution for more details.
Step 4: Save the Workflow
Save your workflow:
Press Ctrl+S or use File → Exit (which saves automatically)
If this is a new workflow, it will be saved as
Workfilein the current directoryOr specify a different path when starting:
wf myworkflow.graphml
Step 5: Run the Workflow
Execute the workflow:
Click the “Run” button or press ‘R’
Watch as nodes change color:
Light gray → Not started
Light cyan → Ready to run
Light blue → Currently running
Light green → Completed successfully
Light coral → Failed (if error occurs)
The workflow will execute in order:
First node runs first
Second node runs after first completes
Third node runs last
Step 6: View Logs
Check the output from any node:
Left-click a node to select it
Press ‘S’ to view logs
In the log popup, press ‘S’ or Escape to close it
See the combined stdout and stderr from the command execution
Verify your files were created:
cat data.txt
cat processed.txt
cat report.txt
Working with the CLI
The same workflow can be created and run using the command line.
Creating via CLI
Create a new workflow file:
# Start with the GUI to create graphically
wf
# Or create nodes via CLI (requires existing Workfile or path)
wf edit add-node Workfile "echo 'Downloading...' && sleep 2 && echo 'Data downloaded' > data.txt" --x 100 --y 100
wf edit add-node Workfile "echo 'Processing...' && cat data.txt | wc -l > processed.txt" --x 200 --y 100
wf edit add-node Workfile "echo 'Report: \$(cat processed.txt) lines' > report.txt" --x 300 --y 100
Add dependencies (note: requires node UUIDs, easier via GUI):
# You'll need the actual node UUIDs from the graph
# wf edit add-edge Workfile <source-uuid> <target-uuid>
# It's much easier to create edges in the GUI by dragging
Running via CLI
Execute the complete workflow:
wf run Workfile
Run specific nodes only:
wf run Workfile --nodes process_data,generate_report
Advanced Tutorial
Running Subsets
Select specific nodes in the GUI:
Left-click to select a node
Shift + Left-click to add more nodes to selection
Press ‘R’ to run only the selected nodes
Only selected nodes (and their dependencies within the selection) execute
Resume Failed Nodes
If a node fails:
Fix the issue (edit the command by double-clicking the node, or fix external resources)
Select the failed node(s)
Press ‘C’ to clear the status (changes
failto"")Press ‘R’ to run again, which will re-execute failed nodes
Using Command Wrappers
Example: Docker Wrapper
Run all commands in a Docker container:
wf run Workfile --wrapper "docker run -v \$(pwd):/work -w /work ubuntu bash -c '{}'"
Example: Remote Execution
Execute workflow on a remote server:
wf run Workfile --wrapper 'ssh user@remote-server "{}"'
Example: Tmux Integration
Send commands to tmux panes:
wf run Workfile --wrapper 'tmux send-keys -t mysession "{}" C-m'
Complex Workflow Example
Let’s create a more realistic bioinformatics pipeline.
Scenario
Process multiple sample files through quality control, alignment, and variant calling.
Workflow Structure
download_samples → quality_control → trim_adapters → align_to_reference
↓
call_variants → merge_results
Creating the Workflow
# Note: These are simplified examples
# In practice, create nodes in GUI or use UUIDs for edges
# Create nodes with commands
wf edit add-node Workfile "wget https://example.com/samples.tar.gz && tar -xzf samples.tar.gz"
wf edit add-node Workfile "fastqc samples/*.fastq -o qc_reports/"
wf edit add-node Workfile "for f in samples/*.fastq; do trim_galore \$f -o trimmed/; done"
# Connect nodes in GUI or use node UUIDs with add-edge
# Edges require source and target node IDs (UUIDs)
Running with Conda
Activate a conda environment for all commands:
wf run Workfile --wrapper "conda run -n biotools"
Parallel Processing
Process multiple samples in parallel using GNU Parallel:
wf run Workfile --wrapper "parallel -j 4" --suffix ":::" --suffix "sample1 sample2 sample3 sample4"
Python API Tutorial
You can also work with workflows programmatically.
Loading and Modifying Workflows
from workforce.edit.graph import (
load_graph,
save_graph,
add_node_to_graph,
add_edge_to_graph,
edit_node_label_in_graph
)
# Load an existing workflow
G = load_graph('tutorial_workflow.graphml')
# Add a new node - returns {'node_id': '<uuid>'}
result = add_node_to_graph(
'tutorial_workflow.graphml',
label='test -f report.txt && echo "Validation passed"',
x=400,
y=100
)
new_node_id = result['node_id']
# Add an edge (requires UUIDs of source and target)
# You'd need to get the node UUID from the graph first
# add_edge_to_graph('tutorial_workflow.graphml', source_uuid, new_node_id)
# Modify a node's command (requires node UUID)
# edit_node_label_in_graph(
# 'tutorial_workflow.graphml',
# node_id,
# 'curl -O https://example.com/data.csv'
# )
# Note: Each function automatically saves the graph
Programmatic Execution
from workforce import utils
# Compute workspace ID from file path
workspace_id = utils.compute_workspace_id('tutorial_workflow.graphml')
# Get workspace URL (auto-discovers or starts server)
workspace_url = utils.get_workspace_url(workspace_id)
print(f"Workspace URL: {workspace_url}")
# To run the workflow, use the CLI:
# wf run tutorial_workflow.graphml
# The run client connects via SocketIO and executes nodes
# when it receives NODE_READY events from the server
Best Practices
Workflow Design
Keep commands atomic: Each node should do one thing well
Use meaningful names: Node names should describe their purpose
Check dependencies: Ensure nodes have proper input/output relationships
Handle errors: Use
&&chains to fail fast:command1 && command2Test incrementally: Run subsets to verify each step works
File Management
Use absolute paths or ensure working directory is correct
Create output directories before running:
mkdir -p output && ...Clean up temporary files in final nodes
Use Workfile as the default name for easy discovery
Performance
Parallelize independent nodes: Design workflows with multiple independent branches
Use wrappers for resource management: Docker, HPC schedulers, etc.
Monitor resource usage: Large parallelism may overwhelm the system
Consider subset execution: Test with small datasets first
Debugging
Check logs frequently: Press ‘l’ in GUI to view node output
Test commands in isolation: Verify each command works before adding to workflow
Use echo for debugging: Add
echostatements to track progressResume from failures: Use Shift+R to retry failed nodes after fixes
Next Steps
Now that you’ve completed the tutorial, you can:
Read the Usage guide for comprehensive CLI reference
Explore the Architecture to understand how Workforce works internally
Check the API Reference for programmatic workflow manipulation
Visit the GitHub repository for examples and issues
Happy workflow building!