Developing Custom Boefjes for OpenKAT: A Developer Guide

One of OpenKAT’s greatest strengths is its modular architecture. Boefjes — the scanning plugins that collect data — can be extended with custom implementations for your specific needs. Being actively involved in OpenKAT development and maintenance, we build boefjes both for the community and for our clients. Here’s how the system works and how you can build your own.

Understanding the Boefje Architecture

OpenKAT’s scanning pipeline consists of three components that work together:

  • Boefjes — Collect raw data by calling external tools, APIs or running custom scripts
  • Whiskers (Normalizers) — Parse the raw output and convert it into structured objects in the Octopoes data model
  • Bits (Business Rules) — Analyze the structured objects and generate findings based on security policies

When you build a “custom boefje”, you typically create both a boefje (data collection) and a whisker (normalization). The bits layer usually uses existing rules unless you have custom compliance requirements.

Anatomy of a Boefje

Every boefje lives in the boefjes/boefjes/plugins/ directory and consists of:

kat_my_custom_scanner/
├── __init__.py
├── boefje.json        # Metadata: name, description, input/output types
├── main.py            # The actual scanning logic
├── normalizer.py      # Whisker: parses raw output into OOIs
└── requirements.txt   # Python dependencies (if any)

boefje.json — The manifest

This file tells the katalogus what your boefje does, what input it expects and what output it produces:

{
  "id": "kat_my_custom_scanner",
  "name": "My Custom Scanner",
  "description": "Checks for specific configuration issues",
  "consumes": ["Hostname"],
  "produces": ["boefje/kat_my_custom_scanner"],
  "scan_level": 1,
  "enabled": true
}

Key fields:

  • consumes — The OOI (Object of Interest) types this boefje needs as input. Common types: Hostname, IPAddressV4, URL, Network
  • produces — The MIME types of the raw output. The normalizer will look for these.
  • scan_level — Minimum clearance level needed (0–4). Use 0–1 for passive checks, 2+ for active scanning.

main.py — The scanning logic

The main module must implement a run() function that receives the input OOI and returns raw results:

import json
import requests
from os import getenv

def run(boefje_meta: dict) -> list[tuple[set, bytes | str]]:
    """Main entry point for the boefje."""
    input_ooi = boefje_meta["arguments"]["input"]["hostname"]["name"]
    
    # Your scanning logic here
    result = requests.get(f"https://api.example.com/check/{input_ooi}")
    
    return [
        ({"boefje/kat_my_custom_scanner"}, json.dumps(result.json()))
    ]

The return value is a list of tuples: each tuple contains a set of MIME types and the raw data. This data gets stored in Bytes (OpenKAT’s object storage) and is then picked up by the normalizer.

normalizer.py — Parsing raw output into OOIs

The normalizer transforms raw scanner output into structured objects that Octopoes can store and analyze:

import json
from collections.abc import Iterable
from octopoes.models.ooi.findings import Finding, KATFindingType

def run(input_ooi: dict, raw: bytes) -> Iterable:
    """Normalize raw boefje output into OOIs."""
    data = json.loads(raw)
    
    if data.get("vulnerable"):
        finding_type = KATFindingType(id="KAT-MY-FINDING-001")
        yield finding_type
        yield Finding(
            finding_type=finding_type.reference,
            ooi=input_ooi["primary_key"],
            description=f"Issue found: {data['detail']}"
        )

Real-World Examples

Here are some custom boefjes we’ve built for clients:

  • Internal API health checker — Monitors internal microservices for configuration drift and exposed debug endpoints
  • EPD system scanner — Healthcare-specific checks for Electronic Patient Dossier systems (via IP-Zorg)
  • Supply chain DNS monitor — Tracks DNS changes across supplier domains to detect potential hijacking
  • Custom compliance checker — Validates specific BIO/NEN7510 controls beyond the built-in checks

Testing Your Boefje

Before deploying a custom boefje to production:

  1. Unit test the run() function with mocked inputs
  2. Test the normalizer with sample raw output to verify OOIs are created correctly
  3. Run in a Docker dev environment first — use make kat to spin up a local instance
  4. Check the katalogus to verify your boefje appears and can be enabled
  5. Run a scan on a test hostname and verify findings appear in the report

Containerized Boefjes

Since release 1.17, boefjes can also run as separate containers. This is useful when your boefje has complex dependencies or needs to run isolated from the main system. We contributed the katalogus settings for containerized boefjes feature to the OpenKAT codebase, making it possible to pass configuration parameters to boefjes running in their own containers.

To run a boefje as a container, create a Dockerfile in your boefje directory and register it in the katalogus with the container image reference.

Need Custom Boefjes?

Building boefjes requires understanding of both the OpenKAT framework and the security domain you’re targeting. Our direct involvement in the OpenKAT codebase gives us a thorough understanding of how boefjes interact with the scheduler, katalogus and Octopoes. We develop custom boefjes for clients across government, healthcare and enterprise — from initial concept to production deployment.


You May Also Like These Topics...

Elastic SIEM Optimalisatie voor Moderne Beveiliging

In het hedendaagse digitale landschap vormt Elastic SIEM een cruciale schakel in cybersecurity. Deze krachtige Security Information and Event Management oplossing transformeert de manier waarop organisaties hun beveiligingsgegevens verzamelen, analyseren en beheren. Door realtime monitoring en geavanceerde analyses biedt het een robuuste verdedigingslinie tegen moderne cyberdreigingen. De Fundamenten van Elastic SIEM Let me craft the […]

Elasticsearch ML Jobs: Automatische Inventarisatie, Analyse en Herstel met Python

Hoe je met een Python script automatisch alle Elasticsearch Machine Learning jobs inventariseert, analyseert op memory-problemen, geblokkeerde datafeeds en failed states, en vervolgens load-aware herstelt. Inclusief complete aanpak en code.

Installing OpenKAT: A Complete Guide

OpenKAT is a powerful open-source vulnerability monitoring framework, but getting it up and running requires careful planning. Being actively involved in OpenKAT’s development and maintenance, we’ve deployed it dozens of times — from single-server setups to multi-node enterprise deployments. This guide walks you through the key decisions and steps. Prerequisites Before you begin, ensure you […]

Tags: , , ,
Previous Post

Docker Compose Override: Keep Your Config Clean Across Environments

Next Post

Installing OpenKAT: A Complete Guide

Geef een reactie

Je e-mailadres wordt niet gepubliceerd. Vereiste velden zijn gemarkeerd met *