Skip to Main Content

Bot attacks on TYPO3 websites: what really works – and what isn’t enough

If a website collapses over several consecutive weekends, the team has to step in even on Saturdays, and it still takes months until things stabilize, it quickly becomes clear that conventional web security advice isn’t enough. That’s exactly what we experienced with one of our clients: a scientific research institute whose TYPO3 website came under repeated, targeted attacks starting in October 2025.

In this article, we show how we protected the site using a multi-layered approach, why traditional measures alone weren’t sufficient – and what went wrong along the way.

The problem: search pages can’t be cached

Most TYPO3 pages are well protected through caching. A page is rendered once, stored, and all subsequent requests are served from cache – no database queries, no server load. Thousands of concurrent requests? No issue.

Search results work differently. Every query is unique: different search terms, filters, sorting, facets. TYPO3 extensions like Solr must process each request live. Ten bots running searches generate ten real database queries. A hundred bots mean a hundred. The server collapses under the load – and this is exactly what attackers exploited.

Of course, search results can be cached temporarily, but whenever content changes, the cache must be invalidated. In our case with Solr, the facet permutations alone created a near-infinite number of cache variations, effectively preventing meaningful reuse.

An attack behaving differently than expected

The first attack in October 2025 could still be contained with classic measures: geo-blocking at network level and enabling ModSecurity on the web server. Things calmed down.

But the attacker returned – with a different strategy. In February 2026, the attack was more intense than ever: multiple outages in a single weekend, with the team on standby. This time, requests came from constantly rotating IP addresses across different countries, making simple blocklists ineffective.

After careful log analysis, the conclusion was clear: not a traditional DDoS attack, but most likely an automated content scraper – a system designed to harvest scientific publications. The server crashes were not the goal, but collateral damage.

This distinction matters: a scraper changes IPs when blocked, but slows down when it becomes inefficient.

Our layered approach

We gradually built up protection – from basic filtering to custom middleware inside TYPO3.

Layer 0: User-Agent filtering (.htaccess)

The first instinct in many setups is User-Agent filtering. Bots often reveal themselves through strings like python-requests/2.x, curl/7.x, or empty headers. These can be blocked via .htaccess or ModSecurity rules.

The problem: User-Agents are trivial to fake. Any serious attacker can impersonate a normal browser in seconds. So this layer offers no real protection against determined attackers.

Still, it has value as Layer 0: removing obvious noise from poorly configured crawlers or harmless scripts, keeping logs clean and higher layers meaningful.

Layer 1: Geo-blocking (hoster level)

We initially blocked traffic from specific regions at network level. This is efficient and cheap, as requests never reach the server.

However, it only works as long as attackers don’t switch to proxies elsewhere. In our case, this was also politically sensitive, as legitimate users from the Asian region were affected.

Layer 2: ModSecurity (Apache)

At web server level, we enabled ModSecurity with the OWASP Core Rule Set and added TYPO3-specific rules.

This blocks known attack patterns before PHP is executed. Effective against known threats – but blind to new or slightly modified attack methods.

Layer 3: f7firewall (TYPO3 extension)

We introduced a custom TYPO3 extension integrating live IP blocklists: known bots, Tor exit nodes, abuse reports.

Crucially, this runs before the TYPO3 bootstrap, minimizing overhead.

Its limitation: rotating bot networks simply aren’t on any list yet.

Layer 4: custom behavior detection middleware

The core of our solution is a PSR-15 middleware that evaluates behavior instead of IPs.

We look at:

  • JavaScript challenge tokens (missing JS = suspicious)
  • Request timing (human vs. machine rhythm)
  • Behavioral history (established browsing vs. cold search entry)

Trusted academic networks (DFN, universities) are whitelisted by default.

Rate limiting and defense mode

In addition to behavioral analysis, we implemented two technical mechanisms that actively slow down the attack:

Rate limiting: Within a 6-second window, repeated requests from the same /24 subnet are detected and that subnet is blocked for one hour. New visitors on search pages are also subjected to a small artificial delay of 0.7 to 1.2 seconds – barely noticeable for a human user, but clearly noticeable for a bot.

Defense mode: If incoming traffic exceeds a threshold of more than 40 requests within 5 seconds, the system automatically activates a “defense mode.” From that point on, every additional request is delayed by 1.5 seconds. This makes the system so intentionally slow for the attacker that large-scale scraping becomes economically pointless. After 5 minutes without exceeding the threshold, the system automatically returns to normal operation.

The unavoidable issue: false positives

In March 2026, a researcher from a partner institution was blocked during intensive literature searches. From the system’s perspective, the behavior looked indistinguishable from a bot.

This highlights a fundamental trade-off: protection mechanisms can also affect real users. That’s not an argument against them – but for whitelisting, clear escalation paths, and transparent communication from the start.

What we learned

Several key insights became particularly clear during this incident:

  • Prepare early: The most effective security measures are the ones already in place before the first attack. In a real incident, there is no time for careful implementation.
  • Identify non-cacheable pages: Search pages, filtered list views, and Extbase plugins without caching represent the main attack surface. These should be known and accounted for at launch.
  • Whitelist before the attack, not after: Trusted partner networks and institutions should be whitelisted proactively, before any blocking logic takes effect.
  • Short monitoring intervals: When a site goes down, the team needs to know within minutes – not hours later when the client reports it.
  • Communication is part of security: An attack is also a communication challenge. Clients who understand what is happening and why measures are taken tend to respond more cooperatively – even when a false positive occurs.

Conclusion

Bot defense in TYPO3 is not a one-time setup, but an ongoing process. No single measure is sufficient – only a layered approach across network, server, extensions, and middleware creates real resilience.

If you wait for the first attack before acting, you’ll pay for it with downtime, emergency interventions, and weekend operations.

If you want to know how well your TYPO3 setup is protected against bot attacks, feel free to reach out – we’re happy to take a look with you.

Request consultation