Service Performance: Analyzing through SQL

June 30, 2025 · 10 min · César Meira

Overview and SQL scripts for diagnosing service performance issues in MES systems

Analysis Performance

Table of Contents

Overview and Motivation
Finding Problematic Services
- Script
- Example
Finding a Request to Analyze
- Script
- Example
Operation-Level Performance Analysis
- Script
- Example + Deep Dive
Final Takeaways
Author
- Hello, my name is César Meira 👋

As services grow in complexity, pinpointing performance issues becomes more challenging. This post shows how to analyze the behavior of your service by querying the database. We’ll explore the SQL scripts used, interpret the results, and understand what should be reviewed for possible enhancements.

Overview and Motivation

In a fully extensible system like ours — with numerous extension points and layers of custom logic — it’s not uncommon for performance issues to sneak in through one of many possible code paths. When customer-specific logic is added on top, pinpointing the exact line (or feature) responsible for slowness can quickly become a daunting task.

📝 While this SQL-based approach helps identify heavy dead time and performance gaps, it’s only an alternative approach instead of using observability. With observability, you gain access to a full execution timeline, including failed requests (which SQL doesn’t capture) and read operations (loads) that often go unnoticed but can hurt performance. In other words, observability provides a complete, real-time view of your system’s behavior, allowing you to identify and address issues more effectively.

Finding Problematic Services

This is your starting point for identifying slow or potentially problematic services.

If you’re looking to get a high-level overview of your system’s performance — and quickly spot which services might be worth optimizing — this script is for you.

The script analyzes service execution times and highlights the ones that may require further investigation. Naturally, some services are inherently slow due to the complexity of their operations, and cluttering your report with those isn’t helpful. That’s why the script allows you to exclude specific services by adding them to an ignore list.

The script outputs a summary table with the following metrics:

Service Name
Total Number of Calls
Minimum Execution Time (ms)
Maximum Execution Time (ms)
Average Execution Time (ms)

By default, the script analyzes service calls from the last 24 hours, but you can easily adjust the timeframe to suit your needs.

Reminder: The script is a base for your analysis feel free to adapt it to your needs.

Script

DECLARE @From DATETIME = DATEADD(DD, -1, CAST(GETDATE() as date))   -- ANALYZE ONLY LAST DAY
;WITH ServiceData AS(
  SELECT
    -- Service Name  
    SH.ServiceName
    -- Start Date/Time of the Service Call    
    , SH.ServiceStartTime
    -- End Date/Time of the Service Call    
    , SH.ServiceEndTime
    -- Total elapsed time of the Service Call in milliseconds
    , DATEDIFF(MILLISECOND, ServiceStartTime, ServiceEndTime) AS ElapsedMilliSeconds
  FROM dbo.T_ServiceHistory SH
  WHERE 
    ServiceStartTime > @FROM
    AND
    -- THIS IS THE IGNORE LIST
    --  ADD ANY OTHER SERVICE THAT IS NOT RELEVANT FOR THE ANALYSIS
    ServiceName NOT IN (
      'ReseedIdentityColumns',
      'Cmf.FullBackup.Ods 10.2.7 Installation',
      'Cmf.FullBackup.Dwh 10.2.7 Installation',
      'Cmf.Database.Jobs 10.2.7 Installation',
      'ProcessIntegrationEntry'
    )
)
SELECT
  ServiceName, 
  COUNT(*) AS NumberOfCalls, 
  MIN(ElapsedMilliSeconds) AS MinElapsedMilliSeconds,
  MAX(ElapsedMilliSeconds) AS MaxElapsedMilliSeconds,
  AVG(ElapsedMilliSeconds) AS AvgElapsedMilliSeconds
FROM ServiceData
GROUP BY ServiceName
ORDER BY AvgElapsedMilliSeconds DESC

Example

Service Name	# of Calls	Min Elapsed (ms)	Max Elapsed (ms)	Avg Elapsed (ms)
StartPrepareEquipment	10	3667	5214	4506
ComplexReworkMaterial	64	31	78051	4322
ComplexMoveMaterialsToNextStep	3	562	8286	3467
AbortMaterialProcessing	38	13	30692	1994

Finding a Request to Analyze

Once you’ve identified a service that deserves a closer look, the next step is to dive into specific requests — individual executions that might reveal what’s really going on under the hood.

The script in this section assumes you’ve already selected a target service (and you know the Service Name) and want to examine how it’s behaving in real-world scenarios.

This script returns a summary of recent executions of the selected service, including:

Service History Id
Service Name
Start Date/Time
End Date/Time
Elapsed Time (ms)

This gives you a short list of candidates to dig deeper into — especially those with unusually high execution times.

Bonus: Need to focus only on the most recent runs? Just uncomment the ServiceStartTime WHERE clause in the script to narrow down to the latest executions. Super handy when you’re checking the effect of recent performance improvements.

Script

DECLARE @CriticalServiceTime BIGINT = 5000 -- CONSIDER ONLY EXECUTIONS THAT TOOK LONGER THAN
DECLARE @ServiceName NVARCHAR(MAX) = 'ComplexReworkMaterial'  -- HERE YOU SELECT THE SERVICE YOU WANT TO CHECK
SELECT
  -- Service History Id of the Service Call  
  SH.ServiceHistoryId
  -- Service Name  
  , SH.ServiceName
  -- Start Date/Time of the Service Call    
  , SH.ServiceStartTime
  -- End Date/Time of the Service Call    
  , SH.ServiceEndTime
  -- Total elapsed time of the Service Call in milliseconds
  , DATEDIFF(MILLISECOND, ServiceStartTime, ServiceEndTime) AS ElapsedMilliSeconds
FROM dbo.T_ServiceHistory SH
WHERE 
  ServiceName = @ServiceName
  -- AND ServiceStartTime > '2025-04-23 09:07:33.037' -- UNCOMMENT IF YOU WANT TO CHECK LATEST EXECUTIONS ONLY
AND DATEDIFF(MILLISECOND, ServiceStartTime, ServiceEndTime) > @CriticalServiceTime
ORDER BY ElapsedMilliSeconds DESC

Example

Service History Id	Service Name	Start Date/Time	End Date/Time	Elapsed (ms)
2504230000010152315	StartPrepareEquipment	2025-04-23 09:30:00.567	2025-04-23 09:30:05.781	5214
2504230000010129953	StartPrepareEquipment	2025-04-23 07:48:43.373	2025-04-23 07:48:48.303	4930
2504160000010112003	StartPrepareEquipment	2025-04-16 06:04:31.553	2025-04-16 06:04:36.276	4723
2504230000010146189	StartPrepareEquipment	2025-04-23 09:04:10.897	2025-04-23 09:04:15.554	4657

Operation-Level Performance Analysis

In this section, we dive into the operation execution time of a specific service request. The goal?
To spot which operations are taking longer than expected — and might be silently sabotaging your performance. 🕵️‍♂️

This script returns a detailed breakdown of all operations involved in a service execution.
While it brings a lot of data, the real gold lies in identifying two key performance villains:

Slow operations: Operations with high elapsed times.
Dead time: Gaps between operations where nothing (at least, nothing tracked) is happening — but time keeps ticking.

But what is dead time? Great question! As you may know, the system only logs operations that write to the database (like Track-In, LogEvent, etc.). So not everything leaves a trace.
Individual loads in collection, uncontrolled lazy loading, expensive queries, inefficient loops, and other non-persistent operations can consume serious time — without ever appearing in your logs.

That gap between one operation ending and the next starting?
Yep, that’s your dead time — sneaky and invisible, but very real.

The script returns a breakdown of operations for the selected Service History ID, including:

Service History Id
Service Name
Operation Name
- The <Service> entry is added by the script to represent the root operation — the entire service scope.
Entity Type Name
Operation Start Time
Operation End Time
Operation History Seq
Parent Operation Sequence (ID of the operation that triggered this one)
LagOperationStartTime (dead time: time between previous op end and current op start)
Operation Elapsed Milliseconds (includes time of any sub-operations)

Think of this as an x-ray of your service request — a step-by-step timeline of what’s really happening under the hood.
Perfect for spotting hidden inefficiencies, mysterious lags, and operations that deserve a little tough love 💪

Script

DECLARE @ServiceHistoryId BIGINT = 2504230000010132939  -- CHANGE THIS FOR THE SERVICE ID TO BE ANALYZED
DECLARE @OverMs BIGINT = 0

;WITH Ops AS (
    SELECT
        SH.ServiceHistoryId,
        SH.ServiceName,
        '<Service>' AS OperationName,
        '-' AS EntityTypeName,
        SH.ServiceStartTime AS OperationStartTime,
        SH.ServiceEndTime AS OperationEndTime,
        0 AS OperationHistorySeq,
        NULL AS ParentOperationSequence,
        DATEDIFF(MILLISECOND, SH.ServiceStartTime, SH.ServiceEndTime) AS OperationElapsedMilliseconds
    FROM dbo.T_ServiceHistory SH
    WHERE SH.ServiceHistoryId = @ServiceHistoryId
    UNION ALL
    SELECT
        DISTINCT
        OH.ServiceHistoryId,
        SH.ServiceName,
        OH.OperationName,
        OH.EntityTypeName,
        OH.OperationStartTime,
        OH.OperationEndTime,
        OH.OperationHistorySeq,
        CASE
            WHEN OH.ParentOperationSequence IS NULL THEN 0
            ELSE OH.ParentOperationSequence
        END AS ParentOperationSequence,
        DATEDIFF(MILLISECOND, OH.OperationStartTime, OH.OperationEndTime) AS OperationElapsedMilliseconds
    FROM dbo.T_OperationHistory OH
    INNER JOIN dbo.T_ServiceHistory SH
        ON OH.ServiceHistoryId = SH.ServiceHistoryId
    WHERE SH.ServiceHistoryId = @ServiceHistoryId
),
OpsWithRowNumber AS (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY OperationHistorySeq ORDER BY OperationStartTime) AS rn
    FROM Ops
),
SiblingLag AS (
    SELECT
        O.*,
        P.OperationStartTime AS ParentStartTime,
        LAG(O.OperationEndTime) OVER (
            PARTITION BY O.ParentOperationSequence
            ORDER BY O.OperationStartTime, O.OperationHistorySeq
        ) AS PrevSiblingEndTime,
        P.rn AS ParentRowNumber
    FROM OpsWithRowNumber O
    LEFT JOIN OpsWithRowNumber P
        ON O.ParentOperationSequence = P.OperationHistorySeq AND P.rn = 1
),
FinalData AS (
    SELECT
        ServiceHistoryId,
        ServiceName,
        OperationName,
        EntityTypeName,
        OperationStartTime,
        OperationEndTime,
        OperationHistorySeq,
        ParentOperationSequence,
        
        -- Calculate Lag:
        CASE
            WHEN ParentOperationSequence IS NULL THEN NULL
            WHEN PrevSiblingEndTime IS NULL THEN DATEDIFF(MILLISECOND, ParentStartTime, OperationStartTime)  -- First sibling
            ELSE DATEDIFF(MILLISECOND, PrevSiblingEndTime, OperationStartTime)  -- Sibling lag
        END AS LagOperationStartTime,
        OperationElapsedMilliseconds
    FROM SiblingLag
)
SELECT *
FROM FinalData
WHERE 
    OperationElapsedMilliseconds > @OverMs OR LagOperationStartTime > @OverMs
ORDER BY OperationStartTime, OperationHistorySeq;

Example + Deep Dive

💡 Quick note:

The provided data is filtered to show relevant operations.
Some columns have been excluded from the table to reduce clutter and focus on the most relevant performance indicators (ServiceHistoryId, ServiceName, OperationStartTime, OperationEndTime).

OperationName	EntityTypeName	OperationHistorySeq	ParentOperationSequence	LagOperationStartTime	OperationElapsedMilliseconds
`<Service>`	-	0	NULL	NULL	19267
Rework	Material	62	0	80	643
ExecuteAction	CustomExecuteMaterialActions	74	62	0	357
Create	MaterialOffFlow	219	62	275	2
ExecuteAction	CustomExecuteMaterialActions	288	0	0	18529
ExecuteAction	CustomRequestNewMaterialOnMoveToReworkStep	311	288	0	18017
Sort	SortRuleSet	1343	311	570	7
ChangeProductionOrder	Material	17355	311	16087	217
EvaluateAction	CustomSetAttributesOnProductionOrderChange	17356	17355	0	210
ExecuteAction	CustomTrackObjectStateModelHandler	17395	311	0	225
SetMainStateModel	Material	17417	17395	3	221
ChangeProductionOrder	Material	17526	311	0	206
Create	MaterialProductionOrder	17563	17526	202	1
ChangeFlowAndStep	Material	17656	311	0	255
CheckAndPerformFutureActions	Material	17896	311	0	197
EvaluateAction	CustomTriggerRequestOnMaterialSelection	17979	311	0	202
ExecuteAction	CustomChangeMaterialProductionOrderRework	18228	288	0	241
ChangeProductionOrder	Material	18272	18228	226	9
EvaluateAction	CustomDeleteQualityCertificateOnMoveToRework	18503	288	220	0

Deep Dive

From the example above, the following code/logic areas should be reviewed:

CustomExecuteMaterialActions DEE Action (Operation 288)

Consuming over 96% of the total execution time (~18.5 out of 19 seconds).
This operation clearly dominates the execution timeline. We need to dig deeper and understand where this time is being spent.
- 💡 Reminder: The elapsed time includes all sub-operation times — the slowness might be hiding in nested calls.
Operation 288 triggers Operation 311, which is another DEE Action: CustomRequestNewMaterialOnMoveToReworkStep
- Inside 311, we hit a major lag spike: 16,087 ms — nearly ~84% of the total execution time!
- We’ve got a lead: the DEE is a prime suspect 🕵️
- Analyze the DEE thoroughly — trace what happens from the start until the ChangeProductionOrder operation.
- Then? Optimize it, and re-run the analysis to validate improvements. 🔁

Rework Operation (Operation 62)

Responsible for around 3% of the total execution time (~1 second out of 19 seconds)
Inside this:
- CustomExecuteMaterialActions (Operation 74) takes over 50% of Rework’s time (~0.3 seconds).
  - No sub-operations are recorded — this could mean inefficient data access, excessive loops, or invisible logic bottlenecks.
- Create MaterialOffFlow (Operation 219) took only 2 ms — but:
  - There’s a dead time of 275 ms between op 74 and 219.
  - That’s idle time — probably burned on non-database-writing logic like expensive loads, loops, or waits.

TL;DR

You’re staring at two hot zones: Operation 288 and Operation 74
Both are DEE actions — mysterious black boxes until opened
One’s taking too much time (288), the other seems to do too little (74)
Time to investigate the DEE logic, optimize the flows, and run the scripts again 🛠️✨

Final Takeaways

🔍 Start with the biggest bottleneck
- Focus on the slowest operation first — it might not just be slow on its own, but could be dragging other operations down by hogging resources or causing delays further down the chain.
🧾 Track your improvements
- Performance tuning is often a time sink. Make sure you’re capturing before-and-after metrics so you can see what’s working (and prove that it is). It’s your own version of a “glow-up” 📈.
📣 Share what you learn
- If you stumble upon a common pitfall or a sneaky performance killer, document it! Blog it! Help your team (and your future self) avoid déjà vu slowdowns.
📋 Don’t sacrifice functionality for speed
- A fast service that skips steps is just a glorified return true;. Always double-check that your fixes don’t break what the system is supposed to be doing.
🛠️ Use the scripts — they’re your friend
- Dive in, tweak, explore. If you get stuck or something doesn’t make sense, don’t suffer in silence — shoot me a message and let’s untangle it together. 🧵😄

Author

Hello, my name is César Meira 👋

I’ve been working for some years at Critical Manufacturing. You can check me at LinkedIn

Service Performance: Analyzing through SQL

Overview and Motivation#

Finding Problematic Services#

Script#

Example#

Finding a Request to Analyze#

Script#

Example#

Operation-Level Performance Analysis#

Script#

Example + Deep Dive#

Final Takeaways#

Author#

Hello, my name is César Meira 👋#

Overview and Motivation

Finding Problematic Services

Script

Example

Finding a Request to Analyze

Script

Example

Operation-Level Performance Analysis

Script

Example + Deep Dive

Final Takeaways

Author

Hello, my name is César Meira 👋