As services grow in complexity, pinpointing performance issues becomes more challenging. This post shows how to analyze the behavior of your service by querying the database. We’ll explore the SQL scripts used, interpret the results, and understand what should be reviewed for possible enhancements.
Overview and Motivation
In a fully extensible system like ours — with numerous extension points and layers of custom logic — it’s not uncommon for performance issues to sneak in through one of many possible code paths. When customer-specific logic is added on top, pinpointing the exact line (or feature) responsible for slowness can quickly become a daunting task.
📝 While this SQL-based approach helps identify heavy dead time and performance gaps, it’s only an alternative approach instead of using observability. With observability, you gain access to a full execution timeline, including failed requests (which SQL doesn’t capture) and read operations (loads) that often go unnoticed but can hurt performance. In other words, observability provides a complete, real-time view of your system’s behavior, allowing you to identify and address issues more effectively.
Finding Problematic Services
This is your starting point for identifying slow or potentially problematic services.
If you’re looking to get a high-level overview of your system’s performance — and quickly spot which services might be worth optimizing — this script is for you.
The script analyzes service execution times and highlights the ones that may require further investigation. Naturally, some services are inherently slow due to the complexity of their operations, and cluttering your report with those isn’t helpful. That’s why the script allows you to exclude specific services by adding them to an ignore list.
The script outputs a summary table with the following metrics:
- Service Name
- Total Number of Calls
- Minimum Execution Time (ms)
- Maximum Execution Time (ms)
- Average Execution Time (ms)
By default, the script analyzes service calls from the last 24 hours, but you can easily adjust the timeframe to suit your needs.
Reminder: The script is a base for your analysis feel free to adapt it to your needs.
Script
DECLARE @From DATETIME = DATEADD(DD, -1, CAST(GETDATE() as date)) -- ANALYZE ONLY LAST DAY
;WITH ServiceData AS(
SELECT
-- Service Name
SH.ServiceName
-- Start Date/Time of the Service Call
, SH.ServiceStartTime
-- End Date/Time of the Service Call
, SH.ServiceEndTime
-- Total elapsed time of the Service Call in milliseconds
, DATEDIFF(MILLISECOND, ServiceStartTime, ServiceEndTime) AS ElapsedMilliSeconds
FROM dbo.T_ServiceHistory SH
WHERE
ServiceStartTime > @FROM
AND
-- THIS IS THE IGNORE LIST
-- ADD ANY OTHER SERVICE THAT IS NOT RELEVANT FOR THE ANALYSIS
ServiceName NOT IN (
'ReseedIdentityColumns',
'Cmf.FullBackup.Ods 10.2.7 Installation',
'Cmf.FullBackup.Dwh 10.2.7 Installation',
'Cmf.Database.Jobs 10.2.7 Installation',
'ProcessIntegrationEntry'
)
)
SELECT
ServiceName,
COUNT(*) AS NumberOfCalls,
MIN(ElapsedMilliSeconds) AS MinElapsedMilliSeconds,
MAX(ElapsedMilliSeconds) AS MaxElapsedMilliSeconds,
AVG(ElapsedMilliSeconds) AS AvgElapsedMilliSeconds
FROM ServiceData
GROUP BY ServiceName
ORDER BY AvgElapsedMilliSeconds DESC
Example
Service Name | # of Calls | Min Elapsed (ms) | Max Elapsed (ms) | Avg Elapsed (ms) |
---|---|---|---|---|
StartPrepareEquipment | 10 | 3667 | 5214 | 4506 |
ComplexReworkMaterial | 64 | 31 | 78051 | 4322 |
ComplexMoveMaterialsToNextStep | 3 | 562 | 8286 | 3467 |
AbortMaterialProcessing | 38 | 13 | 30692 | 1994 |
Finding a Request to Analyze
Once you’ve identified a service that deserves a closer look, the next step is to dive into specific requests — individual executions that might reveal what’s really going on under the hood.
The script in this section assumes you’ve already selected a target service (and you know the Service Name) and want to examine how it’s behaving in real-world scenarios.
This script returns a summary of recent executions of the selected service, including:
- Service History Id
- Service Name
- Start Date/Time
- End Date/Time
- Elapsed Time (ms)
This gives you a short list of candidates to dig deeper into — especially those with unusually high execution times.
Bonus: Need to focus only on the most recent runs? Just uncomment the ServiceStartTime WHERE clause in the script to narrow down to the latest executions. Super handy when you’re checking the effect of recent performance improvements.
Script
DECLARE @CriticalServiceTime BIGINT = 5000 -- CONSIDER ONLY EXECUTIONS THAT TOOK LONGER THAN
DECLARE @ServiceName NVARCHAR(MAX) = 'ComplexReworkMaterial' -- HERE YOU SELECT THE SERVICE YOU WANT TO CHECK
SELECT
-- Service History Id of the Service Call
SH.ServiceHistoryId
-- Service Name
, SH.ServiceName
-- Start Date/Time of the Service Call
, SH.ServiceStartTime
-- End Date/Time of the Service Call
, SH.ServiceEndTime
-- Total elapsed time of the Service Call in milliseconds
, DATEDIFF(MILLISECOND, ServiceStartTime, ServiceEndTime) AS ElapsedMilliSeconds
FROM dbo.T_ServiceHistory SH
WHERE
ServiceName = @ServiceName
-- AND ServiceStartTime > '2025-04-23 09:07:33.037' -- UNCOMMENT IF YOU WANT TO CHECK LATEST EXECUTIONS ONLY
AND DATEDIFF(MILLISECOND, ServiceStartTime, ServiceEndTime) > @CriticalServiceTime
ORDER BY ElapsedMilliSeconds DESC
Example
Service History Id | Service Name | Start Date/Time | End Date/Time | Elapsed (ms) |
---|---|---|---|---|
2504230000010152315 | StartPrepareEquipment | 2025-04-23 09:30:00.567 | 2025-04-23 09:30:05.781 | 5214 |
2504230000010129953 | StartPrepareEquipment | 2025-04-23 07:48:43.373 | 2025-04-23 07:48:48.303 | 4930 |
2504160000010112003 | StartPrepareEquipment | 2025-04-16 06:04:31.553 | 2025-04-16 06:04:36.276 | 4723 |
2504230000010146189 | StartPrepareEquipment | 2025-04-23 09:04:10.897 | 2025-04-23 09:04:15.554 | 4657 |
Operation-Level Performance Analysis
In this section, we dive into the operation execution time of a specific service request. The goal?
To spot which operations are taking longer than expected — and might be silently sabotaging your performance. 🕵️♂️
This script returns a detailed breakdown of all operations involved in a service execution.
While it brings a lot of data, the real gold lies in identifying two key performance villains:
- Slow operations: Operations with high elapsed times.
- Dead time: Gaps between operations where nothing (at least, nothing tracked) is happening — but time keeps ticking.
But what is dead time? Great question!
As you may know, the system only logs operations that write to the database (like Track-In
, LogEvent
, etc.).
So not everything leaves a trace.
Individual loads in collection, uncontrolled lazy loading, expensive queries, inefficient loops, and other non-persistent operations can consume serious time — without ever appearing in your logs.
That gap between one operation ending and the next starting?
Yep, that’s your dead time — sneaky and invisible, but very real.
The script returns a breakdown of operations for the selected Service History ID, including:
Service History Id
Service Name
Operation Name
- The
<Service>
entry is added by the script to represent the root operation — the entire service scope.
- The
Entity Type Name
Operation Start Time
Operation End Time
Operation History Seq
Parent Operation Sequence
(ID of the operation that triggered this one)LagOperationStartTime
(dead time: time between previous op end and current op start)Operation Elapsed Milliseconds
(includes time of any sub-operations)
Think of this as an x-ray of your service request — a step-by-step timeline of what’s really happening under the hood.
Perfect for spotting hidden inefficiencies, mysterious lags, and operations that deserve a little tough love 💪
Script
DECLARE @ServiceHistoryId BIGINT = 2504230000010132939 -- CHANGE THIS FOR THE SERVICE ID TO BE ANALYZED
DECLARE @OverMs BIGINT = 0
;WITH Ops AS (
SELECT
SH.ServiceHistoryId,
SH.ServiceName,
'<Service>' AS OperationName,
'-' AS EntityTypeName,
SH.ServiceStartTime AS OperationStartTime,
SH.ServiceEndTime AS OperationEndTime,
0 AS OperationHistorySeq,
NULL AS ParentOperationSequence,
DATEDIFF(MILLISECOND, SH.ServiceStartTime, SH.ServiceEndTime) AS OperationElapsedMilliseconds
FROM dbo.T_ServiceHistory SH
WHERE SH.ServiceHistoryId = @ServiceHistoryId
UNION ALL
SELECT
DISTINCT
OH.ServiceHistoryId,
SH.ServiceName,
OH.OperationName,
OH.EntityTypeName,
OH.OperationStartTime,
OH.OperationEndTime,
OH.OperationHistorySeq,
CASE
WHEN OH.ParentOperationSequence IS NULL THEN 0
ELSE OH.ParentOperationSequence
END AS ParentOperationSequence,
DATEDIFF(MILLISECOND, OH.OperationStartTime, OH.OperationEndTime) AS OperationElapsedMilliseconds
FROM dbo.T_OperationHistory OH
INNER JOIN dbo.T_ServiceHistory SH
ON OH.ServiceHistoryId = SH.ServiceHistoryId
WHERE SH.ServiceHistoryId = @ServiceHistoryId
),
OpsWithRowNumber AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY OperationHistorySeq ORDER BY OperationStartTime) AS rn
FROM Ops
),
SiblingLag AS (
SELECT
O.*,
P.OperationStartTime AS ParentStartTime,
LAG(O.OperationEndTime) OVER (
PARTITION BY O.ParentOperationSequence
ORDER BY O.OperationStartTime, O.OperationHistorySeq
) AS PrevSiblingEndTime,
P.rn AS ParentRowNumber
FROM OpsWithRowNumber O
LEFT JOIN OpsWithRowNumber P
ON O.ParentOperationSequence = P.OperationHistorySeq AND P.rn = 1
),
FinalData AS (
SELECT
ServiceHistoryId,
ServiceName,
OperationName,
EntityTypeName,
OperationStartTime,
OperationEndTime,
OperationHistorySeq,
ParentOperationSequence,
-- Calculate Lag:
CASE
WHEN ParentOperationSequence IS NULL THEN NULL
WHEN PrevSiblingEndTime IS NULL THEN DATEDIFF(MILLISECOND, ParentStartTime, OperationStartTime) -- First sibling
ELSE DATEDIFF(MILLISECOND, PrevSiblingEndTime, OperationStartTime) -- Sibling lag
END AS LagOperationStartTime,
OperationElapsedMilliseconds
FROM SiblingLag
)
SELECT *
FROM FinalData
WHERE
OperationElapsedMilliseconds > @OverMs OR LagOperationStartTime > @OverMs
ORDER BY OperationStartTime, OperationHistorySeq;
Example + Deep Dive
💡 Quick note:
- The provided data is filtered to show relevant operations.
- Some columns have been excluded from the table to reduce clutter and focus on the most relevant performance indicators (ServiceHistoryId, ServiceName, OperationStartTime, OperationEndTime).
OperationName | EntityTypeName | OperationHistorySeq | ParentOperationSequence | LagOperationStartTime | OperationElapsedMilliseconds |
---|---|---|---|---|---|
<Service> | - | 0 | NULL | NULL | 19267 |
Rework | Material | 62 | 0 | 80 | 643 |
ExecuteAction | CustomExecuteMaterialActions | 74 | 62 | 0 | 357 |
Create | MaterialOffFlow | 219 | 62 | 275 | 2 |
ExecuteAction | CustomExecuteMaterialActions | 288 | 0 | 0 | 18529 |
ExecuteAction | CustomRequestNewMaterialOnMoveToReworkStep | 311 | 288 | 0 | 18017 |
Sort | SortRuleSet | 1343 | 311 | 570 | 7 |
ChangeProductionOrder | Material | 17355 | 311 | 16087 | 217 |
EvaluateAction | CustomSetAttributesOnProductionOrderChange | 17356 | 17355 | 0 | 210 |
ExecuteAction | CustomTrackObjectStateModelHandler | 17395 | 311 | 0 | 225 |
SetMainStateModel | Material | 17417 | 17395 | 3 | 221 |
ChangeProductionOrder | Material | 17526 | 311 | 0 | 206 |
Create | MaterialProductionOrder | 17563 | 17526 | 202 | 1 |
ChangeFlowAndStep | Material | 17656 | 311 | 0 | 255 |
CheckAndPerformFutureActions | Material | 17896 | 311 | 0 | 197 |
EvaluateAction | CustomTriggerRequestOnMaterialSelection | 17979 | 311 | 0 | 202 |
ExecuteAction | CustomChangeMaterialProductionOrderRework | 18228 | 288 | 0 | 241 |
ChangeProductionOrder | Material | 18272 | 18228 | 226 | 9 |
EvaluateAction | CustomDeleteQualityCertificateOnMoveToRework | 18503 | 288 | 220 | 0 |
Deep Dive
From the example above, the following code/logic areas should be reviewed:
CustomExecuteMaterialActions DEE Action (Operation 288)
Consuming over 96% of the total execution time (~18.5 out of 19 seconds).
This operation clearly dominates the execution timeline. We need to dig deeper and understand where this time is being spent.
- 💡 Reminder: The elapsed time includes all sub-operation times — the slowness might be hiding in nested calls.
Operation 288 triggers Operation 311, which is another DEE Action: CustomRequestNewMaterialOnMoveToReworkStep
- Inside 311, we hit a major lag spike:
16,087 ms
— nearly ~84% of the total execution time! - We’ve got a lead: the DEE is a prime suspect 🕵️
- Analyze the DEE thoroughly — trace what happens from the start until the ChangeProductionOrder operation.
- Then? Optimize it, and re-run the analysis to validate improvements. 🔁
- Inside 311, we hit a major lag spike:
Rework Operation (Operation 62)
Responsible for around 3% of the total execution time (~1 second out of 19 seconds)
Inside this:
- CustomExecuteMaterialActions (Operation 74) takes over 50% of Rework’s time (~0.3 seconds).
- No sub-operations are recorded — this could mean inefficient data access, excessive loops, or invisible logic bottlenecks.
- Create MaterialOffFlow (Operation 219) took only 2 ms — but:
- There’s a dead time of
275 ms
between op 74 and 219. - That’s idle time — probably burned on non-database-writing logic like expensive loads, loops, or waits.
- There’s a dead time of
- CustomExecuteMaterialActions (Operation 74) takes over 50% of Rework’s time (~0.3 seconds).
TL;DR
- You’re staring at two hot zones: Operation 288 and Operation 74
- Both are DEE actions — mysterious black boxes until opened
- One’s taking too much time (288), the other seems to do too little (74)
- Time to investigate the DEE logic, optimize the flows, and run the scripts again 🛠️✨
Final Takeaways
- 🔍 Start with the biggest bottleneck
- Focus on the slowest operation first — it might not just be slow on its own, but could be dragging other operations down by hogging resources or causing delays further down the chain.
- 🧾 Track your improvements
- Performance tuning is often a time sink. Make sure you’re capturing before-and-after metrics so you can see what’s working (and prove that it is). It’s your own version of a “glow-up” 📈.
- 📣 Share what you learn
- If you stumble upon a common pitfall or a sneaky performance killer, document it! Blog it! Help your team (and your future self) avoid déjà vu slowdowns.
- 📋 Don’t sacrifice functionality for speed
- A fast service that skips steps is just a glorified
return true;
. Always double-check that your fixes don’t break what the system is supposed to be doing.
- A fast service that skips steps is just a glorified
- 🛠️ Use the scripts — they’re your friend
- Dive in, tweak, explore. If you get stuck or something doesn’t make sense, don’t suffer in silence — shoot me a message and let’s untangle it together. 🧵😄
Author
Hello, my name is César Meira 👋
I’ve been working for some years at Critical Manufacturing.
You can check me at
LinkedIn