Configuration & Setup
How toolkits collect configuration from users — API keys, paths, large data files, custom setup logic.
Why configuration matters
Many toolkits need information beyond their bundled code. A few real examples:
- An astrophysics toolkit needs a path to ~2GB of opacity data files.
- An LLM-querying toolkit needs an OpenAI API key.
- A simulation toolkit needs to know how many CPU workers to spawn.
- A file-handling toolkit needs a designated workspace directory.
SciToolkit gives you a consistent way to collect this configuration from users at install time — so every toolkit looks and feels the same to the scientist installing it, no matter who wrote it.
Two tiers, one system
There are two layers:
Declarative — toolkit.yaml
For simple values: paths, strings, integers, secrets, env var checks. Just list them. SciToolkit prompts the user, validates types, and stores the values. No logic.
Script — setup.py
For everything else: file validation, downloads, multi-step setup, custom detection. Plain Python with a helper called ctx for consistent UI.
Use the simplest tier that works for your toolkit. Most toolkits will only need declarative.
Tier 1: Declarative configuration
Add a config section to your toolkit.yaml:
config:
data_dir:
type: path
description: "Where to store outputs"
default: "~/.my-toolkit/data"
max_workers:
type: integer
description: "Number of parallel workers"
default: 4
api_endpoint:
type: string
description: "API endpoint URL"
default: "https://api.example.com"yamlWhen the user installs your toolkit, SciToolkit prompts them for each value. They can press Enter to accept defaults. Values are saved to a .env file alongside the toolkit.
Supported types
| Type | Description |
|---|---|
| path | File or directory path. ~ is expanded. |
| string | Plain text value |
| secret | Hidden input (for API keys, passwords) |
| integer | Whole number |
| boolean | true / false |
| choice | One of a fixed list of options |
Environment variables
For values the user must set in their shell themselves (like API keys you don't want to store on disk), declare them under env_vars:
env_vars:
required:
- OPENAI_API_KEY
optional:
- OPENAI_ORG_IDyamlSciToolkit checks that required env vars exist before serving your toolkit. It does not store them — they're read from the user's shell at serve time.
What declarative does not do
- File existence or content validation
- Downloading data
- Multi-step or conditional flows
- Any custom logic
If you need any of those, use Tier 2.
Tier 2: Setup scripts ( setup.py )
Drop a setup.py file at the root of your toolkit and add this to your toolkit.yaml:
setup_script: trueyamlThe file exposes two functions:
# setup.py
from scitoolkit.setup import SetupContext
def setup(ctx: SetupContext) -> bool:
"""
Interactive setup. Called when the user runs:
scitoolkit setup my-toolkit
Return True on success, False on cancel/failure.
"""
return validate(ctx)
def validate(ctx: SetupContext) -> bool:
"""
Check whether the toolkit is ready to serve. Called by:
scitoolkit serve
Return True if everything's in place, False otherwise.
"""
return TruepythonThe ctx object is a toolbox SciToolkit gives you. Using it is what makes every toolkit's setup feel consistent — you don't need to import Rich, manage .env files, or roll your own download progress bars.
The SetupContext API
The ctx object is passed into your setup and validate functions.
Output
ctx.info("Checking dependencies...") # Blue info line
ctx.warn("This will take a while") # Yellow warning
ctx.error("Path not found") # Red error
ctx.hint("Try: scitoolkit setup aster") # Dim hint after an error
ctx.success("Setup complete!") # Green successpythonInput
# Strings
name = ctx.prompt("Enter name:", default="aster")
# Typed prompts
path = ctx.prompt_path("Data path:", must_exist=True)
port = ctx.prompt_int("Port:", default=8080, min=1, max=65535)
key = ctx.prompt_secret("API key:")
# Yes/no
proceed = ctx.confirm("Download 2GB?", default=False)
# Menu
choice = ctx.choice("How would you like to proceed?", [
("download", "Download automatically"),
("path", "I have the data, let me provide the path"),
("cancel", "Cancel"),
])pythonReading and writing config
# Read
path = ctx.get_config('data_path')
path = ctx.get_config('data_path', default='~/data')
# Write (auto-saves to .env)
ctx.set_config('data_path', '/data/foo')pythonDownloads
ctx.download(
url="https://data.scitoolkit.org/aster/opacity.tar.gz",
destination=ctx.data_dir / 'opacity',
description="Opacity data",
size_hint="2.3GB",
extract=True, # Auto-extract .tar.gz, .zip
sha256="abc123...", # Optional checksum
)pythonProgress bars, extraction, and checksum verification are handled for you.
Useful paths
| Path | Description |
|---|---|
| ctx.toolkit_path | Where the toolkit is installed |
| ctx.data_dir | Per-toolkit data directory (auto-created) |
| ctx.cache_dir | Per-toolkit cache directory |
Raw Python
SetupContext is not a walled garden. You can drop to plain Python whenever you want — call subprocess, query a database, sniff environment, anything. Use ctx when you want consistent UI; ignore it when you don't.
A complete example
Here's how a toolkit that needs to download a large data file (and lets the user override) might look:
# setup.py
from pathlib import Path
from scitoolkit.setup import SetupContext
DATA_URL = "https://data.scitoolkit.org/my-toolkit/dataset_v1.tar.gz"
def setup(ctx: SetupContext) -> bool:
existing = ctx.get_config('dataset_path')
if existing and Path(existing).expanduser().exists():
ctx.info(f"Dataset already configured: {existing}")
return True
choice = ctx.choice(
"Dataset is required. How would you like to proceed?",
[
("download", "Download automatically (~500MB)"),
("path", "I have the data, provide the path"),
("cancel", "Cancel"),
]
)
if choice == "download":
dest = ctx.data_dir / 'dataset'
ctx.download(
url=DATA_URL, destination=dest,
description="Dataset", size_hint="500MB", extract=True,
)
ctx.set_config('dataset_path', str(dest))
elif choice == "path":
path = ctx.prompt_path("Path to dataset:", must_exist=True)
ctx.set_config('dataset_path', str(path))
else:
return False
return validate(ctx)
def validate(ctx: SetupContext) -> bool:
path = ctx.get_config('dataset_path')
if not path:
ctx.error("dataset_path not configured")
ctx.hint("Run: scitoolkit setup my-toolkit")
return False
if not Path(path).expanduser().exists():
ctx.error(f"Dataset path missing: {path}")
return False
return TruepythonMore patterns — see the Recipes page.
User experience
Here's what users see when your toolkit needs setup:
On install
$ scitoolkit install aster
📥 Installing toolkit: aster
✓ Downloaded and extracted
✓ Environment created (venv, Python 3.12)
✓ Dependencies installed
⚠️ ASTER requires additional setup
Required configuration:
• opacity_path (path to opacity data)
• max_workers (default: 4)
Run: scitoolkit setup astertextOn setup
$ scitoolkit setup aster
ASTER Setup
───────────
1. opacity_path — Path to opacity data files
[no current value]
How would you like to proceed?
[1] Download automatically (~2.3GB)
[2] I have the data, provide the path
[3] Skip
Choice: 1
Downloading opacity data...
[████████████████████] 2.3 GB / 2.3 GB
✓ Found 47 opacity files
✅ ASTER setup complete!textOn serve
If a toolkit isn't configured, SciToolkit skips it rather than crashing. The user sees exactly what's wrong:
$ scitoolkit serve
Checking toolkits...
✗ aster — Setup incomplete
└─ opacity_path does not contain .h5 files
└─ Run: scitoolkit setup aster
✓ simple-api — Ready (3 tools)
Starting MCP server with 1 toolkit (3 tools)...textAnti-patterns
A few things to avoid — these break consistency and the validator will warn about them:
- Don't use
input()in setup.py. Usectx.prompt(). Otherwise your prompts won't look like every other toolkit. - Don't write to
.envdirectly. Usectx.set_config(). Direct writes can corrupt the file. - Don't skip
validate(). If your toolkit needs config, validate will save your users from cryptic runtime errors. - Don't crash on setup failure. Catch exceptions and report nicely with
ctx.error(). Network errors and missing files should be surfaced as actionable messages, not tracebacks.
Next
Configuration is half the picture. The other half is stateful tools — how config values get injected into your tool functions without the AI agent ever seeing them.