Auto-Typing Engine#
SWC-Studio’s auto-labeling is a single ML engine. It runs everywhere
— the CLI, both Auto Label Editing GUI panels, and the Python API call
into the same code. There is no backend switch; the engine is the engine.
What the engine does#
Four stages run in order on every SWC:
Stage |
Purpose |
Implementation |
|---|---|---|
Stage 1 |
Cell-type detection (pyramidal vs interneuron) |
sklearn ensemble over 49 whole-cell features. A soft handoff runs Stage 2+3 for both cell types when confidence is below threshold and picks the higher-confidence outcome. |
Stage 2 |
Per-subtree classification (axon / basal / apical) |
sklearn ensemble. Labels are propagated to all branches in the same primary subtree — no mid-track type switches. |
Stage 2b |
Apical-vs-basal re-decision on pyramidal dendrites |
GraphSAGE GNN over the branch graph. |
Stage 3 |
Topology refinement |
Smooths short islands, enforces hard constraints (one primary axon, one primary apical) at the soma boundary. |
All four stages are required. The torch / torch_geometric runtime comes along through pip in every install path. The trained model files reach the engine differently depending on how it was installed:
Install method |
How the engine finds the model files |
|---|---|
Bundled desktop app ( |
bundled inside the app under |
|
downloaded from GitHub Releases on first auto-label call (~21 MB, one-time, cached locally) |
Source install ( |
bundled in the working tree under |
If any stage’s model file is missing — or torch / torch_geometric fails to import — the engine refuses to run and surfaces a clear search-path diagnostic instead of silently degrading.
The engine always emits soma + axon + basal labels and detects apical automatically — no class-selection flags. Apical detection requires both a learned per-subtree apical score and a minimum root radius; files without an apical subtree get 3-class output.
Setup#
For the bundled desktop app and the source install, every dependency
and every model file is in place after install — no separate
download step needed. For pip install swcstudio from PyPI, models
download from GitHub Releases on the first auto-label call (~21 MB,
one-time, cached) — Python dependencies still come along through pip
itself.
The bundled defaults live in swcstudio/data/models/ (source install)
or Contents/Resources/models/ (bundled desktop app):
Filename |
Stage |
Bundled? |
|---|---|---|
|
Stage 1 |
yes (~15 MB) |
|
Stage 2 |
yes (~45 MB) |
|
Stage 2b |
yes (~0.3 MB) |
To verify the engine is ready on your machine:
swcstudio models status
You’ll see a search-path diagnostic and a JSON summary indicating which model files were resolved and whether torch is available for the GNN.
Model resolution order#
When the engine looks up a model file, it checks paths in this order and uses the first match:
Explicit override (
--model-diron the CLI, the GUI’s Model dir field, or themodel_dirargument in Python)The
SWCSTUDIO_MODEL_DIRenvironment variableThe user data directory:
Windows:
%APPDATA%\swcstudio\modelsmacOS:
~/Library/Application Support/swcstudio/modelsLinux:
~/.local/share/swcstudio/models
The bundled
swcstudio/data/models/directory inside the installed package
Because first hit wins, dropping custom-trained models into the user data directory makes them the default with no extra flags. The bundled models are a fallback, not a lock-in.
Running auto-labeling#
CLI — single file#
swcstudio auto-label cell.swc
To use a custom model directory just for this run:
swcstudio auto-label cell.swc --model-dir /path/to/my-models
CLI — folder#
swcstudio auto-typing ./folder
swcstudio auto-typing ./folder --model-dir /path/to/my-models
The folder command prints a short engine summary before processing, then writes one output SWC per input plus a per-folder report.
GUI#
Both Auto Label Editing panels (Batch Processing → Auto Label Editing, Validation → Auto Label Editing) show a single Run button. The Model dir picker next to it is optional — leave it blank to use the bundled / user-data defaults. A small green “models OK” / red “models missing” indicator next to the field tells you in real time whether the engine can run with the chosen settings.
Python#
from swcstudio.core.auto_typing import (
BatchOptions, run_file, run_batch, is_available, backend_status,
)
ok, reason = is_available()
print(backend_status()) # diagnostic dict
opts = BatchOptions(soma=True, axon=True, basal=True, apic=False, rad=False, zip_output=False)
res = run_file("cell.swc", opts) # single file
res = run_batch("./folder", opts) # folder
If the engine cannot find the Stage 1 / Stage 2 pickles it raises
FileNotFoundError with the search-path diagnostic — the GUI surfaces
the same message so you don’t need to read the traceback.
Training your own models#
The engine is fully retrainable on user data; you do not have to live with the bundled defaults.
Required dataset layout#
my_dataset/
├── pyramidal/
│ ├── cell_001.swc
│ ├── cell_002.swc
│ └── ...
└── interneuron/
├── cell_a.swc
└── ...
Subfolder names are the cell-type labels. Filenames don’t matter. The
SWC type column (1=soma, 2=axon, 3=basal, 4=apical) is the per-node
ground truth — make sure the labels in your training files are
correct.
One-command training#
swcstudio train auto-typing --data-dir my_dataset --output-dir my_models
Training writes three files into my_models/:
cell_type_classifier.pkl(Stage 1)branch_classifier.pkl(Stage 2)gnn_apical_basal.pt(Stage 2b)
Tunable flags:
Flag |
Default |
Meaning |
|---|---|---|
|
(off) |
skip Stage 2b GNN training (refresh Stages 1+2 only; the existing GNN checkpoint must already be in |
|
42 |
random seed for splits and models |
|
128 |
GraphSAGE hidden dim |
|
3 |
GraphSAGE depth |
|
0.0 |
dropout |
|
200 |
max epochs per fold |
|
25 |
early-stopping patience |
Training runs Stage 1 (fast — seconds), then Stage 2 (minutes per ~1000 cells), then the Stage 2b GNN (a few minutes on CPU; faster with CUDA). Memory and time scale with dataset size and morphology complexity.
Using your custom models#
There are three ways to swap your trained models in for the bundled defaults; pick whichever fits your workflow.
Make them the default (recommended for everyday use):
Drop them into the user data directory. Because the resolver checks that directory before the bundled defaults, every CLI / GUI / Python call uses your models from then on, with no flags.
# Windows
$dst = "$env:APPDATA\swcstudio\models"
New-Item -ItemType Directory -Force -Path $dst | Out-Null
Copy-Item my_models\*.pkl $dst\
Copy-Item my_models\*.pt $dst\
# macOS
mkdir -p ~/Library/Application\ Support/swcstudio/models
cp my_models/*.pkl my_models/*.pt ~/Library/Application\ Support/swcstudio/models/
# Linux
mkdir -p ~/.local/share/swcstudio/models
cp my_models/*.pkl my_models/*.pt ~/.local/share/swcstudio/models/
Set per-shell:
export SWCSTUDIO_MODEL_DIR=/path/to/my_models
swcstudio auto-label cell.swc
$env:SWCSTUDIO_MODEL_DIR = "C:\path\to\my_models"
swcstudio auto-label cell.swc
One-off override:
swcstudio auto-label cell.swc --model-dir /path/to/my_models
The GUI panels’ Model dir picker behaves the same as --model-dir
for that session.
Verifying which models are in use#
swcstudio models status
swcstudio models status --model-dir /path/to/my_models
The output shows the resolved path for each of the three model files, so you can confirm whether the engine is hitting your custom models or the bundled fallbacks.
Troubleshooting#
“Auto-typing is missing required model files”#
Means the resolver couldn’t find one of the three required model
files: cell_type_classifier.pkl (Stage 1),
branch_classifier.pkl (Stage 2), or gnn_apical_basal.pt
(Stage 2b). Run swcstudio models status to see the full search
path. The most common cause is a typo or non-existent path passed to
--model-dir or SWCSTUDIO_MODEL_DIR.
“Auto-typing requires torch and torch_geometric”#
torch and torch_geometric are required dependencies of the package. Seeing this error means your install is broken — most often a venv was created with a different Python version than the one currently active. Reinstall:
pip install -e .
If that still fails, recreate the venv from scratch (see Getting Started).
Pickle deserialization errors#
Stage 1 and Stage 2 pickles are sensitive to the sklearn version they
were trained on. The bundled pickles are pinned to scikit-learn>=1.5,<1.8
in pyproject.toml. If you upgrade sklearn outside that range and see
deserialization errors, either re-pin sklearn or retrain your models
under the new version.
Bundled models are out of date#
To replace the bundled models with custom-trained ones, drop your new files into the location that wins for your install:
Install method |
Where to put your custom models |
|---|---|
Source install ( |
|
|
the user model dir (macOS: |
Bundled desktop app |
the user model dir above (overrides the bundled copy) |
Or pass --model-dir /path/to/your/models to swcstudio auto-label
for one-off use without modifying any directory permanently.
Note: the pip wheel intentionally does not ship model files
(see pyproject.toml’s [tool.setuptools] block). Model layers are
distributed as a separate swcstudio-models-vX.Y.Z.zip GitHub Release
asset and downloaded on first use.
Reference#
Public symbol |
Lives in |
|---|---|
|
|
|
|
|
|
|
|
Pipeline internals ( |
|
CLI reference: see CLI Reference for the full
flag list on swcstudio auto-label, swcstudio auto-typing, and
swcstudio train auto-typing.