This post shares our experience scaling MES analytics by replacing legacy SQL Server systems with ClickHouse. We cover the journey from pain points to real-time visibility, and the key lessons for any manufacturing data team.
Introduction
For more than 14 years, our analytics stack, anchored by Microsoft SQL Server, was a dependable part of our MES infrastructure. It powered production reports, shift summaries, and dashboards that kept operations running smoothly. It was stable, well-understood, and tightly integrated into our processes.
But as our manufacturing footprint expanded and data volumes exploded, especially with the growth of data collected from equipment, the cracks started to show:
- Dashboards lagged behind real-time events.
- Historical queries took minutes or even hours to complete.
- Maintaining indexes, partitions, and aggregation jobs became a constant battle.
And, as user expectations shifted from static reports to instant insights, our legacy architecture began to hold us back.
Our system wasn’t broken. It was outgrown. What had once been a strength was now a bottleneck.
We needed an analytics layer built for the real-time, high-volume, event-driven reality of modern manufacturing. That’s when we began exploring ClickHouse.
1. The Challenge: MES at Scale
Critical Manufacturing MES is able to generate large volumes of time-series and event-based data: resource events, operator actions, data collections, and shift-level KPIs. With the standard relational approach, we faced several growing pains:
- Slow analytics on historical production data (shift reports, OEE, downtime analysis).
- High storage costs due to the need for large indexes and partitions.
- Painful query tuning to handle growing tables (millions to billions of records).
- Limited real-time visibility. Aggregations lagged or required overnight batch processing.
2. What We Needed from Our Next-Gen Operational Data Store
Our MES setup needed a database that could:
- Efficiently handle time-series and event-based data at high frequency.
- Enable real-time dashboards and shift reports with sub-second queries.
- Be cost-effective, scalable, and easy to maintain as our data volumes doubled.
- Support fast aggregations and filtering for KPIs like OEE, scrap rate, and throughput.
3. Why We Chose ClickHouse
During our evaluation of OLAP (Online Analytical Processing) and time-series database solutions, we considered Apache Pinot, Apache Druid, PostgreSQL with TimescaleDB, and other TSDBs. Each had strengths: Pinot and Druid offered low-latency ingestion; PostgreSQL brought familiarity; TSDBs were optimized for time-series data.
ClickHouse stood out early due to its remarkably straightforward installation and lack of heavyweight dependencies. A key feature that attracted us was its native Kafka table engine, allowing seamless integration with our existing data streams.
As our use case evolved, we implemented our own Kafka sinker, which gave us finer control over batching, error handling, and ingestion tuning.
We chose the ReplacingMergeTree
engine over the standard MergeTree
to enable efficient deduplication of records, critical in our environment, where late-arriving or duplicate events can occur.
By carefully designing our ORDER BY
keys to include accurate identifiers, we maintain both State and History tables:
- State tables reflect the latest record per entity.
- History tables retain the full change log.
This setup ensures accurate analytics while simplifying data management and minimizing complex deduplication logic at query time.
Combined with ClickHouse’s columnar storage, vectorized execution, and high compression ratios, this architecture delivers consistently fast analytical performance at scale.
Another powerful feature we leveraged was ClickHouse’s TTL (Time-To-Live) functionality. By setting TTL rules at the table level, we can automatically move or delete older data without manual intervention. This simplifies our archiving strategy considerably. No more specific jobs to purge old records. Instead, we define TTL policies directly in the table schema, allowing us to keep hot data readily accessible while seamlessly expiring or relocating cold data to cheaper storage tiers.
Additionally, ClickHouse’s robust JSON support has proven invaluable in dealing with semi-structured MES data. In our migration, we chose to flatten many of our tables and embed formerly foreign-keyed records as JSON documents within a single column. This significantly simplified our queries. What used to require complex multi-table joins can now be expressed with straightforward access to structured fields inside the JSON. With functions like JSONExtract
, JSONExtractKeysAndValues
, and support for nested objects, we can parse, filter, and aggregate JSON content on the fly without preprocessing or ETL transformations.
4. How We Migrated
We took an incremental, phased approach:
- Identified key use cases: production reporting, machine state tracking, and quality metrics.
- Transformed historical MES data into ClickHouse-friendly schemas (denormalized, event-based).
- Implemented a process to sink old historical data from SQL Server to ClickHouse.
- Rewrote reporting queries and dashboards to use ClickHouse SQL.
- Validated results by running SQL Server and ClickHouse in parallel.
5. The Results
Since migrating, the improvements have been dramatic:
- Query performance improved by 50–100x, even on massive datasets.
- Shift and daily production dashboards now update in real-time.
- Storage footprint dropped by over 50%, thanks to compression.
- Maintenance overhead reduced. No more managing massive indexes or partition schemes manually.
- We can now analyze long-term trends (e.g., by line, product, operator) instantly instead of waiting hours.
6. Lessons Learned for MES Teams
- Model your data around events and time. ClickHouse thrives on flat, time-series-like tables.
- It’s not a replacement for OLTP. Keep transactional operations in the MES Online Database; use ClickHouse for analytics.
- Indexing works differently! Learn about primary keys, partitions, and data skipping indexes.
- Use materialized views for fast access to high-level metrics.
- Integrate with tools like Grafana for visualizations.
Conclusion
ClickHouse has fundamentally changed the way we manage and analyze manufacturing execution data. By embracing a modern, event-driven, and time-series-friendly architecture, we’ve unlocked real-time visibility into production, simplified data modeling, and dramatically improved query performance. What once took hours can now be done in seconds with less overhead and more flexibility.
This transformation has empowered our teams with faster insights, streamlined analytics, and a scalable platform built for the future of MES. For operations facing similar data challenges, ClickHouse is more than a powerful engine. It’s a game changer for smart manufacturing at scale.
Author
Hi! My name is Ricardo Magalhães. 🤘
You can check me on LinkedIn