Blog Learning

Fixing TLS timeout handling in the cside Edge

How cside improved TLS reliability in the Rust Edge by moving handshake work out of the Axum accept path and into bounded Tokio tasks.

Jun 16, 2026 • 6 min read

Taym Haddadi Software Engineer

Abstract dark-blue visualization of network connections streaming through a circular gateway

TL;DR: TLS accept loop isolation on the Rust cside Edge for Layer 4 signal collection

Bigger timeout is not the fix: Everyone raises the TLS timeout when handshakes hang, which just makes one bad connection block every good one for longer. The real fix moves handshake work out of the accept path, so a stalled peer pays its own bill.
Isolate each handshake: cside's Rust Edge now separates accept from handshake by spawning each TLS handshake into its own Tokio task, bounded by a Semaphore permit tied to the handshake's lifetime, so slow handshakes cannot delay healthy handshakes and cannot create unbounded work either, and the Edge is doing this because Layer 4 signals feed products like VPN detection and bot detection.
Bound the slow work: If your reliability model is 'catch it with a bigger timeout,' you are shifting the pain, not fixing it. Isolate the slow work, bound the concurrency, and make failure visible, or keep operating a stack that quietly hangs whenever the internet is untidy.

Short on time? See cside's in-browser Magecart and skimmer blocking. It covers everything below in one deployment.

At cside, we place a strong emphasis on memory safety and performance. All of our core services are written in Rust, including the Edge service.

The Edge sits on a sensitive boundary. It has to be fast, safe, and resilient, because it is part of the path where we collect high-quality signals for products like VPN detection and bot detection.

For many web applications, the simplest TLS setup is to let a cloud load balancer terminate TLS and forward plain HTTP to the application. That is a good default when the application only needs the final HTTP request.

The Edge has a different job.

Some detection signals exist before the request becomes ordinary HTTP, at Layer 4 (the transport layer) rather than Layer 7. If TLS is fully terminated before traffic reaches the Edge, those signals are no longer available in the same form. So for this part of the platform, the Edge has to operate at Layer 4 and manage the TLS path directly.

That gives cside better signal quality for detection, but it also means the Edge needs to handle real-world TLS behavior itself.

This post is about one small reliability fix in that path: moving TLS handshake work out of the Axum accept path and into bounded Tokio tasks.

What started failing

The patch itself was not large, but understanding the failure took some digging.

The Edge was healthy. Certificates were loading. The port was reachable. Most traffic behaved normally.

But under more load, some HTTPS checks and client connections appeared to hang or time out.

At first, that can look like a TLS problem. In practice, the important pattern was more specific: some clients opened a connection but did not complete the TLS handshake.

That is normal on an internet-facing service. Public endpoints see incomplete connections all the time:

client connects, then sends nothing
client starts TLS, then disappears
client starts TLS, then stalls

Those failures were not the surprising part.

The surprising part was how much impact one incomplete handshake could have on nearby healthy traffic.

Before the fix

Before the fix, one part of the Edge TLS path did too much work in a single step.

Conceptually, it behaved like this:

accept one connection
finish TLS work for that connection
then accept the next connection

In simplified Rust, the shape was roughly this:

impl axum::serve::Listener for EdgeTlsListener {
  async fn accept(&mut self) -> (TlsStream, SocketAddr) {
    let (mut tcp_stream, addr) = self.tcp_listener.accept().await;

    let hello = read_tls_hello(&mut tcp_stream).await;
    let tls_stream = complete_tls_handshake(hello, tcp_stream).await;

    (tls_stream, addr)
  }
}

That code is easy to reason about, but it puts the whole TLS handshake inside accept().

For Axum, accept() is the front door. If it is busy waiting on one connection, the server is not receiving the next completed connection from that listener.

That looks simple, but it creates head-of-line blocking.

If one connection started TLS and then stalled, the Edge waited for that connection's timeout before moving on. During that wait, healthy connections could be delayed behind it.

The problem was not:

TLS cannot work

It was:

one incomplete TLS handshake can delay later healthy handshakes

That distinction mattered.

Increasing the timeout would not have fixed the issue. It would have made the slow path hold the line for longer.

The fix

The fix was to separate accepting a connection from completing its TLS handshake.

The Edge now accepts new connections quickly and handles each TLS handshake independently. A stalled or incomplete handshake can still time out, but it does not block later healthy connections from making progress.

Conceptually, the new flow looks like this:

accept connections quickly
handle each TLS handshake independently
return only completed secure connections to request handling

The new shape uses Tokio tasks plus a channel of completed secure connections:

let (ready_tx, ready_rx) = tokio::sync::mpsc::channel(limit);

tokio::spawn(async move {
  loop {
    let (tcp_stream, addr) = tcp_listener.accept().await;
    let ready_tx = ready_tx.clone();

    tokio::spawn(async move {
      if let Some(tls_stream) = finish_tls(tcp_stream).await {
        let _ = ready_tx.send((tls_stream, addr)).await;
      }
    });
  }
});

Then the Axum-facing listener becomes much smaller:

impl axum::serve::Listener for EdgeTlsListener {
  async fn accept(&mut self) -> (TlsStream, SocketAddr) {
    self.ready_rx
      .recv()
      .await
      .expect("TLS accept loop terminated")
  }
}

The important part is the boundary: accept() no longer performs the slow handshake work itself. It receives handshakes that have already completed.

So the failure mode changed from this:

one stalled handshake
-> delays the next connection

to this:

one stalled handshake
-> times out independently
-> healthy connections continue

That is the important reliability improvement.

The fix did not remove timeouts. Timeouts are still necessary. An incomplete handshake should not live forever.

The fix changed where the timeout is paid. A bad connection now pays its own timeout instead of making other connections pay for it.

Keeping it bounded

There is a second important part of the fix.

If every new connection can create unlimited work, then the service becomes responsive but not safe under pressure. So the Edge also bounds the number of TLS handshakes that can be in flight at the same time.

The simplified version looks like this:

let permits = Arc::new(tokio::sync::Semaphore::new(limit));

loop {
  let (tcp_stream, addr) = tcp_listener.accept().await;
  let permit = permits.clone().acquire_owned().await.expect("TLS semaphore closed");
  let ready_tx = ready_tx.clone();

  tokio::spawn(async move {
    let _permit = permit;

    if let Some(tls_stream) = finish_tls(tcp_stream).await {
      let _ = ready_tx.send((tls_stream, addr)).await;
    }
  });
}

The permit is owned by the task. When the task finishes, Rust drops the permit and returns capacity to the semaphore. That keeps the concurrency bound tied to the lifetime of the actual handshake work.

That gives us both properties we wanted:

slow handshakes do not block healthy handshakes

and:

slow handshakes cannot create unbounded work

This is the kind of tradeoff we care about in the Edge: improve reliability without giving up predictable resource usage.

Making failure visible

We also tightened an internal failure path.

If the part of the Edge responsible for accepting secure connections ever stops unexpectedly, the service should not silently wait forever. Silent hangs are hard to operate and hard to reason about.

The channel makes this state explicit. If all senders are gone, receiving from the channel returns None. That should not be treated like normal idle time, so the final accept() body replaces the earlier .expect() with an explicit, logged failure:

self.ready_rx.recv().await.unwrap_or_else(|| {
  tracing::error!("TLS accept loop terminated");
  panic!("TLS accept loop terminated")
})

The failure path now becomes visible immediately instead of turning into a hidden wait.

That does not change normal customer traffic, but it makes the system easier to trust during incidents.

What we learned

The main lesson is that timeouts are not enough if the timeout is paid in the wrong place.

A timeout around TLS work sounds reasonable. But if one slow connection can make unrelated connections wait behind it, the timeout becomes shared pain.

The better model is:

accept quickly
isolate slow work
bound concurrency
make unexpected failure visible

Another lesson is that internet-facing services should treat incomplete connections as normal. Clients disconnect. Health checks retry. Networks flap. Some handshakes never finish.

The Edge should not assume the internet is tidy.

Before the fix:

one incomplete handshake
-> nearby healthy traffic can wait

After the fix:

one incomplete handshake
-> isolated timeout
-> healthy traffic continues

The final patch did not make bad connections disappear.

It made the Edge handle them in the right place, with the right bounds, while keeping the performance and memory safety guarantees we expect from our Rust services.

Software Engineer Taym Haddadi

Software engineer at cside, working on the Rust services behind the cside Edge.

Don't just take our word for it, ask AI

FAQ

Frequently Asked Questions

Some cside detection products rely on connection-level signals that are only available before a request has been flattened into ordinary HTTP by managed TLS termination. The Edge handles that path directly so those signals can be used safely and consistently.

No. TLS handling remains inside the Edge service. The change improved how incomplete handshakes are isolated so they do not affect healthy traffic.

Monitor and Secure Your Third-Party Scripts

Gain full visibility and control over every script delivered to your users to enhance site security and performance.

Book a demo

Start for free

Start free, or try Business with a 14-day trial.

cside dashboard interface showing script monitoring and security analytics

Bot protection in 2026: why browser-layer detection catches what WAFs miss

AI agents run inside real Chromium browsers and slip past WAFs. Browser-layer detection reads canvas entropy and session cadence to catch them.

Chargeback fraud prevention: how device evidence wins disputes in 2026

Chargeback fraud prevention hinges on device evidence captured at checkout, the proof Visa CE 3.0 accepts when you contest a card-not-present dispute.

Account takeover solutions: understanding the category before you build a shortlist

Account takeover solutions span four layers: WAF, MFA, browser device intelligence, and behavioral analytics. No single vendor covers them all.

Best account sharing detection software 2026: an honest comparison

Device fingerprinting counts how many distinct devices sit behind one login, catching the seat abuse that IP-based tools and MFA controls miss.

Fake account detection: why email verification is not enough in 2026

Email verification and CAPTCHA confirm an endpoint, not a person. Device fingerprinting is what catches fake account signups at registration.

Best VPN detection software 2026: TLS handshake fingerprint TLS fingerprinting vs IP blocklists

The best VPN detection tools use TLS handshake fingerprint TLS fingerprinting to catch the residential proxies and VPN configurations that IP blocklists miss entirely.

PCI DSS compliance checklist 2026: Requirements 6.4.3 and 11.6.1 explained

Requirements 6.4.3 and 11.6.1 became mandatory in March 2025. Here is what belongs on a modern PCI DSS compliance checklist, and how to automate it.

Card testing fraud prevention software: how to stop automated card validation at checkout

See how browser-layer detection stops automated card testing at checkout using session behavior, AI agent signals, and device fingerprinting.

What is formjacking? How it works and how to detect it

Formjacking injects malicious JavaScript into checkout pages to steal card data as it is typed, invisible to WAFs and CSPs. Here is how to detect it.

What is credential stuffing? Definition, examples, and detection

Credential stuffing tests stolen username and password pairs from breaches against other sites. Learn how it works and how device signals catch it.