Configuration & Setup

How toolkits collect configuration from users — API keys, paths, large data files, custom setup logic.

Why configuration matters

Many toolkits need information beyond their bundled code. A few real examples:

  • An astrophysics toolkit needs a path to ~2GB of opacity data files.
  • An LLM-querying toolkit needs an OpenAI API key.
  • A simulation toolkit needs to know how many CPU workers to spawn.
  • A file-handling toolkit needs a designated workspace directory.

SciToolkit gives you a consistent way to collect this configuration from users at install time — so every toolkit looks and feels the same to the scientist installing it, no matter who wrote it.

Two tiers, one system

There are two layers:

Declarative — toolkit.yaml

For simple values: paths, strings, integers, secrets, env var checks. Just list them. SciToolkit prompts the user, validates types, and stores the values. No logic.

Script — setup.py

For everything else: file validation, downloads, multi-step setup, custom detection. Plain Python with a helper called ctx for consistent UI.

Use the simplest tier that works for your toolkit. Most toolkits will only need declarative.

Tier 1: Declarative configuration

Add a config section to your toolkit.yaml:

config:
  data_dir:
    type: path
    description: "Where to store outputs"
    default: "~/.my-toolkit/data"

  max_workers:
    type: integer
    description: "Number of parallel workers"
    default: 4

  api_endpoint:
    type: string
    description: "API endpoint URL"
    default: "https://api.example.com"
yaml

When the user installs your toolkit, SciToolkit prompts them for each value. They can press Enter to accept defaults. Values are saved to a .env file alongside the toolkit.

Supported types

TypeDescription
pathFile or directory path. ~ is expanded.
stringPlain text value
secretHidden input (for API keys, passwords)
integerWhole number
booleantrue / false
choiceOne of a fixed list of options

Environment variables

For values the user must set in their shell themselves (like API keys you don't want to store on disk), declare them under env_vars:

env_vars:
  required:
    - OPENAI_API_KEY
  optional:
    - OPENAI_ORG_ID
yaml

SciToolkit checks that required env vars exist before serving your toolkit. It does not store them — they're read from the user's shell at serve time.

What declarative does not do

  • File existence or content validation
  • Downloading data
  • Multi-step or conditional flows
  • Any custom logic

If you need any of those, use Tier 2.

Tier 2: Setup scripts ( setup.py )

Drop a setup.py file at the root of your toolkit and add this to your toolkit.yaml:

setup_script: true
yaml

The file exposes two functions:

# setup.py
from scitoolkit.setup import SetupContext

def setup(ctx: SetupContext) -> bool:
    """
    Interactive setup. Called when the user runs:
      scitoolkit setup my-toolkit

    Return True on success, False on cancel/failure.
    """
    return validate(ctx)


def validate(ctx: SetupContext) -> bool:
    """
    Check whether the toolkit is ready to serve. Called by:
      scitoolkit serve

    Return True if everything's in place, False otherwise.
    """
    return True
python

The ctx object is a toolbox SciToolkit gives you. Using it is what makes every toolkit's setup feel consistent — you don't need to import Rich, manage .env files, or roll your own download progress bars.

The SetupContext API

The ctx object is passed into your setup and validate functions.

Output

ctx.info("Checking dependencies...")    # Blue info line
ctx.warn("This will take a while")      # Yellow warning
ctx.error("Path not found")             # Red error
ctx.hint("Try: scitoolkit setup aster") # Dim hint after an error
ctx.success("Setup complete!")          # Green success
python

Input

# Strings
name = ctx.prompt("Enter name:", default="aster")

# Typed prompts
path = ctx.prompt_path("Data path:", must_exist=True)
port = ctx.prompt_int("Port:", default=8080, min=1, max=65535)
key  = ctx.prompt_secret("API key:")

# Yes/no
proceed = ctx.confirm("Download 2GB?", default=False)

# Menu
choice = ctx.choice("How would you like to proceed?", [
    ("download", "Download automatically"),
    ("path",     "I have the data, let me provide the path"),
    ("cancel",   "Cancel"),
])
python

Reading and writing config

# Read
path = ctx.get_config('data_path')
path = ctx.get_config('data_path', default='~/data')

# Write (auto-saves to .env)
ctx.set_config('data_path', '/data/foo')
python

Downloads

ctx.download(
    url="https://data.scitoolkit.org/aster/opacity.tar.gz",
    destination=ctx.data_dir / 'opacity',
    description="Opacity data",
    size_hint="2.3GB",
    extract=True,        # Auto-extract .tar.gz, .zip
    sha256="abc123...",  # Optional checksum
)
python

Progress bars, extraction, and checksum verification are handled for you.

Useful paths

PathDescription
ctx.toolkit_pathWhere the toolkit is installed
ctx.data_dirPer-toolkit data directory (auto-created)
ctx.cache_dirPer-toolkit cache directory

Raw Python

SetupContext is not a walled garden. You can drop to plain Python whenever you want — call subprocess, query a database, sniff environment, anything. Use ctx when you want consistent UI; ignore it when you don't.

A complete example

Here's how a toolkit that needs to download a large data file (and lets the user override) might look:

# setup.py
from pathlib import Path
from scitoolkit.setup import SetupContext

DATA_URL = "https://data.scitoolkit.org/my-toolkit/dataset_v1.tar.gz"


def setup(ctx: SetupContext) -> bool:
    existing = ctx.get_config('dataset_path')
    if existing and Path(existing).expanduser().exists():
        ctx.info(f"Dataset already configured: {existing}")
        return True

    choice = ctx.choice(
        "Dataset is required. How would you like to proceed?",
        [
            ("download", "Download automatically (~500MB)"),
            ("path",     "I have the data, provide the path"),
            ("cancel",   "Cancel"),
        ]
    )

    if choice == "download":
        dest = ctx.data_dir / 'dataset'
        ctx.download(
            url=DATA_URL, destination=dest,
            description="Dataset", size_hint="500MB", extract=True,
        )
        ctx.set_config('dataset_path', str(dest))
    elif choice == "path":
        path = ctx.prompt_path("Path to dataset:", must_exist=True)
        ctx.set_config('dataset_path', str(path))
    else:
        return False

    return validate(ctx)


def validate(ctx: SetupContext) -> bool:
    path = ctx.get_config('dataset_path')
    if not path:
        ctx.error("dataset_path not configured")
        ctx.hint("Run: scitoolkit setup my-toolkit")
        return False
    if not Path(path).expanduser().exists():
        ctx.error(f"Dataset path missing: {path}")
        return False
    return True
python

More patterns — see the Recipes page.

User experience

Here's what users see when your toolkit needs setup:

On install

$ scitoolkit install aster
📥 Installing toolkit: aster
✓ Downloaded and extracted
✓ Environment created (venv, Python 3.12)
✓ Dependencies installed

⚠️  ASTER requires additional setup

Required configuration:
  • opacity_path  (path to opacity data)
  • max_workers   (default: 4)

Run: scitoolkit setup aster
text

On setup

$ scitoolkit setup aster
ASTER Setup
───────────

1. opacity_path — Path to opacity data files
   [no current value]

How would you like to proceed?
  [1] Download automatically (~2.3GB)
  [2] I have the data, provide the path
  [3] Skip

Choice: 1

Downloading opacity data...
[████████████████████] 2.3 GB / 2.3 GB

✓ Found 47 opacity files

✅ ASTER setup complete!
text

On serve

If a toolkit isn't configured, SciToolkit skips it rather than crashing. The user sees exactly what's wrong:

$ scitoolkit serve
Checking toolkits...
  ✗ aster — Setup incomplete
      └─ opacity_path does not contain .h5 files
      └─ Run: scitoolkit setup aster
  ✓ simple-api — Ready (3 tools)

Starting MCP server with 1 toolkit (3 tools)...
text

Anti-patterns

A few things to avoid — these break consistency and the validator will warn about them:

  • Don't use input() in setup.py. Use ctx.prompt(). Otherwise your prompts won't look like every other toolkit.
  • Don't write to .env directly. Use ctx.set_config(). Direct writes can corrupt the file.
  • Don't skip validate(). If your toolkit needs config, validate will save your users from cryptic runtime errors.
  • Don't crash on setup failure. Catch exceptions and report nicely with ctx.error(). Network errors and missing files should be surfaced as actionable messages, not tracebacks.

Next

Configuration is half the picture. The other half is stateful tools — how config values get injected into your tool functions without the AI agent ever seeing them.