IT Brief India - Technology news for CIOs & IT decision-makers
India
Cloudflare builds Town Lake data platform with Skipper

Cloudflare builds Town Lake data platform with Skipper

Fri, 29th May 2026 (Yesterday)
Sean Mitchell
SEAN MITCHELL Publisher

Cloudflare has built an internal data platform called Town Lake and an AI data agent called Skipper to give staff a single way to query internal data using either SQL or plain English.

The move addresses a long-standing problem at Cloudflare, where operational and analytics data had been spread across production databases, ClickHouse clusters, Kafka streams, cloud storage buckets and BigQuery datasets. Employees often needed to know which system held the data they wanted, what credentials to use and whether the information was current, sampled or outdated.

The problem became more acute as the business grew and more teams needed reliable access to detailed data for billing, support, security analysis and internal reporting. Sampled analytics data could work for dashboards, but not for functions such as billing, where exact figures are required.

Town Lake acts as a unified analytics layer across those systems. It is built around Apache Trino as the query engine and uses R2 object storage and an Iceberg-based catalog to store and manage data over time.

That lets a single SQL query pull information from different sources without first moving all the intermediate data into a separate system. The platform can combine data from systems such as Postgres, ClickHouse and Iceberg tables stored in R2 within one query.

Governance model

A central part of the system is access control. Town Lake is closed by default, meaning tables cannot be queried until they have been reviewed.

A service called Skimmer scans tables and classifies columns for personally identifiable information, while another service, Lifeguard, manages access rules and permission checks. Sensitive data is redacted by default, and users must have the right permissions to view raw personal data in a live session.

Users can still discover that tables exist even if they cannot see unreviewed columns. This is intended to stop new columns from disrupting existing dashboards while still limiting exposure to sensitive information.

Natural language

On top of Town Lake, Cloudflare has built Skipper, a conversational AI tool that translates natural-language questions into SQL queries and returns answers as tables or charts. The tool can search its metadata catalog, inspect schema details, write SQL, run queries and refine them if the result appears wrong.

The main challenge in building the AI layer was context. Large language models can produce plausible but incorrect answers if they do not have enough grounding in table definitions, joins, business logic and internal data models.

To address that, Skipper draws on several sources of context, including metadata from DataHub, human-written annotations, SQL used to build transformed tables, curated internal data-model documents and live inspection queries. That combination helps the agent avoid incorrect joins and misuse of columns.

Skipper can also package query results into dashboards that can be shared across internal applications. Access checks still apply when another employee views a saved query or dashboard, rather than only when it is created.

Internal uses

Billing has become one of the biggest uses for Town Lake. Cloudflare said its billable usage dashboard, which shows pay-as-you-go customers what they owe, is backed by Iceberg tables in R2 queried through Trino.

The dashboard API uses the same usage rows as the invoicing system, so the figures shown to customers match the billed amounts. In a recent measurement period, billing-related work made up 53% of all queries served by Town Lake, amounting to 91,760 queries from 324 employees.

Teams are also using the system for business intelligence, customer support and security work. Examples include revenue rankings, domain sign-up analysis, support-ticket triage and analysis of bot-management machine-learning scoring events filtered by network and geography.

The system reflects a wider push by technology companies to bring together fragmented data estates and build AI tools on top of them for non-technical users. For Cloudflare, it also serves as an internal test of products it sells externally, including R2, Workers, Workflows, Access, D1 and Workers AI.

One lesson from the project was that the difficult part was not the core query technology but the controls around it, including access policies, auditing, permission limits, schema changes and data classification. The quality of the AI agent also improved when it ingested the SQL that created a table, rather than relying only on schema metadata.

Brian Brunner, Dmitry Alexeenko and Matt Moen, the Cloudflare engineers who outlined the system, said: “The boring infrastructure is the hard part. Trino + Iceberg is not new technology. The hard work is in the boring stuff: per-row access control, default-closed table allowlisting, query auditing, time-bound credentials, PII detection, idempotent ingestion, schema evolution. Those are the things that make a data platform safe to actually use.”