Skip to main content
This page documents the technical protections built into Guardian Bot following the security audit of March 26, 2026. It is aimed at administrators who want to understand the bot’s internal workings.

Overview

Guardian Bot implements defense-in-depth with multiple independent layers. If one layer is bypassed, the next takes over.
Incoming message


┌─────────────────────┐
│  Whitelist check     │ ◄── Trusted domains/IPs
└──────┬──────────────┘
       │ not whitelisted

┌─────────────────────┐
│  Safe decompression  │ ◄── Decompression bomb prevention
└──────┬──────────────┘


┌─────────────────────┐
│  Recursive resolution│ ◄── Redirect chain resolution (max 3 hops)
└──────┬──────────────┘


┌─────────────────────┐
│  Multi-source scan   │ ◄── PhishTank + GSB + VirusTotal
└──────┬──────────────┘
       │ threat detected

┌─────────────────────┐
│  Trust Score & Action│ ◄── Score-adapted sanction
└─────────────────────┘

1. Recursive Anti-Phishing

Problem addressed

Attackers use redirect chains to hide malicious URLs. A naïve scanner checks bit.ly/xyz (harmless) but never sees evil-phishing.com it redirects to.

Implementation

Guardian recursively resolves each URL by following HTTP redirects until the final destination.
# Resolution with cycle detection and depth limit
MAX_HOPS = 3
TIMEOUT_PER_HOP = 5  # seconds

async def resolve_redirects(url: str) -> tuple[str, list[str]]:
    visited = set()
    chain = [url]
    current = url

    for _ in range(MAX_HOPS):
        if current in visited:
            break  # Cycle detected
        visited.add(current)

        response = await http_head(current, timeout=TIMEOUT_PER_HOP,
                                    allow_redirects=False)
        if response.status not in (301, 302, 303, 307, 308):
            break

        next_url = response.headers.get("Location")
        if not next_url:
            break

        current = normalize_url(next_url, base=current)
        chain.append(current)

    return current, chain

Opaque domain handling

Some redirect domains do not reveal the destination URL without user interaction (e.g., get-qr.com, captcha gates). Guardian handles this specifically:
  • Resolution successful → scan the final URL against phishing databases
  • Resolution impossible (opaque domain) → warning generated without automatic blocking to avoid false positives
False positives are the priority to avoid. A warning without blocking on an opaque domain is preferable to unjustifiably blocking a legitimate URL.

Cycle detection

A visited set is maintained for each resolution. If a URL appears twice in the chain, resolution stops immediately to prevent infinite loops.

2. Decompression Bomb Prevention

Problem addressed

A decompression bomb is a small compressed file (a few KB) that expands to several gigabytes. If an attacker sends a .zip image or encoded file with malicious content, a naïve scanner might attempt to decompress the content into memory and crash the bot.

Implementation

Guardian enforces strict limits when processing attachments:
MAX_FILE_SIZE = 10 * 1024 * 1024        # 10 MB maximum
MAX_DECOMPRESSED_SIZE = 50 * 1024 * 1024 # 50 MB decompression limit
MAX_IMAGE_PIXELS = 4096 * 4096           # 16 MP rendering limit
Applied controls:
  1. Size check before download: The Content-Length header is verified. If the size exceeds MAX_FILE_SIZE, the file is ignored without downloading.
  2. Pixel limit for QR images: Before passing an image to the QR decoder, dimensions are checked. An image of 1px × 4 billion pixels would be rejected.
  3. Decompression timeout: Each decompression operation runs in a context with timeout (asyncio.wait_for). If decompression exceeds the time limit, the operation is cancelled.
  4. Error isolation: Decompression exceptions are caught locally and logged without crashing the main worker.

3. API Anti-Spam Protection

Problem addressed

Guardian queries up to 3 external APIs (PhishTank, Google Safe Browsing, VirusTotal) per scanned URL. Without limiting, an attacker could send thousands of messages containing URLs to exhaust API quotas or overload the bot.

Implementation

An asyncio.Semaphore limits the number of simultaneous API requests:
# Limited concurrent scanning
api_semaphore = asyncio.Semaphore(5)  # Max 5 parallel requests

async def scan_url_with_ratelimit(url: str) -> ScanResult:
    async with api_semaphore:
        return await _scan_url_internal(url)
Combined mechanisms:
MechanismImplementationGoal
Global semaphoreSemaphore(5)Limit simultaneous API calls
Result cacheTTL 1h per URLAvoid scanning the same URL twice
Probabilistic Trust ScoreScore × 0.3Reduce scans for trusted members
Preemptive whitelistCheck before networkShort-circuits all scans
Per-request timeout5s max per hopPrevents infinite waits

Probabilistic scanning

To avoid scanning every URL sent by a trusted member, Guardian applies probabilistic sampling based on Trust Score:
Scan probability = max(0.1, 1.0 - (trust_score / 100) * 0.7)
A member with score 90 has only a 37% chance of being scanned on each message. A member with score 10 is scanned 93% of the time.

4. Memory Leak Prevention

Problem addressed

Long-running sessions (captcha, anti-spam) accumulate data in memory if not cleaned up. An attacker can create thousands of unfinished captcha sessions to exhaust the bot’s RAM.

Implementation

Periodic cleanup tasks run in the background for each affected cog:
# Example in captcha.py
@tasks.loop(minutes=10)
async def cleanup_expired_sessions(self):
    now = asyncio.get_event_loop().time()
    expired = [
        session_id
        for session_id, session in self.active_sessions.items()
        if now - session.created_at > SESSION_TIMEOUT
    ]
    for session_id in expired:
        del self.active_sessions[session_id]
Cogs with automatic cleanup:
CogData cleanedInterval
captcha.pyExpired captcha sessions10 minutes
automod.pyAnti-spam message history5 minutes
report.pyExpired report cooldowns30 minutes
antiraid.pyRaid detection windows1 minute

5. Database Pool Isolation

Problem addressed

Direct database access from multiple cogs simultaneously can create race conditions and hanging connections on error.

Implementation

All queries go through a centralized pool with context management:
# utils/database.py
class Database:
    def __init__(self, pool):
        self._pool = pool  # Access restricted via property

    @property
    def pool(self):
        return self._pool

    async def fetch_one(self, query: str, *args):
        async with self._pool.acquire() as conn:
            return await conn.fetchrow(query, *args)
Independent queries are parallelized with asyncio.gather to reduce latency:
# Parallel loading instead of sequential
trust_score, warnings, infractions = await asyncio.gather(
    db.fetch_trust_score(guild_id, user_id),
    db.fetch_warnings(guild_id, user_id),
    db.fetch_infractions(guild_id, user_id)
)

6. Captcha Verification

Algorithm

The math captcha uses secrets.choice (CSPRNG) instead of random to prevent answer prediction:
import secrets

def generate_captcha() -> tuple[str, int]:
    a = secrets.choice(range(1, 20))
    b = secrets.choice(range(1, 20))
    op = secrets.choice(['+', '-', '×'])
    # ...
    return question, correct_answer
A per-session asyncio.Lock prevents race conditions if the user clicks multiple times simultaneously.

Limits

  • 3 attempts maximum per session
  • 5-minute timeout per session
  • Automatic expiration: unfinished sessions are cleaned up every 10 minutes
  • Result: +15 Trust Score (success) or -20 Trust Score + kick (failure)

7. Role Hierarchy and Escalation Prevention

Guardian systematically verifies the role hierarchy before any moderation action:
Check before ban/kick/mute:
  1. Is the target a bot? → Refuse
  2. Is the target the server owner? → Refuse
  3. Target's highest role ≥ bot's highest role? → Refuse
  4. Target's highest role ≥ moderator's highest role? → Refuse
  5. Action authorized ✓

Moderator abuse detection

If a moderator performs too many actions in a short time (configurable threshold, default: 3 actions/10s), Guardian:
  1. Logs the event as suspicious
  2. Notifies administrators
  3. Can restrict the moderator account’s permissions if the threshold is exceeded

8. Global Ban Confidence Scores

Every entry in the global blacklist carries a confidence score:
CategoryConfidenceDescription
scammer90%Confirmed scammer
raider85%Identified raider
spammer75%Documented spammer
other60%Generic category
Servers can configure a minimum confidence threshold below which automatic banning on join is not triggered, avoiding false positives on low-confidence entries.

Security Parameters Summary

ProtectionKey parameterDefault value
Max redirectsMAX_HOPS3
Timeout per hopTIMEOUT_PER_HOP5s
Max file sizeMAX_FILE_SIZE10 MB
Max image pixelsMAX_IMAGE_PIXELS16 MP (4096²)
API semaphoreapi_semaphore5 concurrent
URL scan cacheTTL1 hour
Max captcha attempts3 attempts / 5 min
Session cleanupInterval10 minutes
Default spam thresholdmessages/window5 msgs / 5s
PBKDF2 iterationsBackups480,000