Home » Blog

Service Performance: Analyzing through SQL

 · 10 min · César Meira

Overview and SQL scripts for diagnosing service performance issues in MES systems

Analysis Performance

As services grow in complexity, pinpointing performance issues becomes more challenging. This post shows how to analyze the behavior of your service by querying the database. We’ll explore the SQL scripts used, interpret the results, and understand what should be reviewed for possible enhancements.

Overview and Motivation

In a fully extensible system like ours — with numerous extension points and layers of custom logic — it’s not uncommon for performance issues to sneak in through one of many possible code paths. When customer-specific logic is added on top, pinpointing the exact line (or feature) responsible for slowness can quickly become a daunting task.

📝 While this SQL-based approach helps identify heavy dead time and performance gaps, it’s only an alternative approach instead of using observability. With observability, you gain access to a full execution timeline, including failed requests (which SQL doesn’t capture) and read operations (loads) that often go unnoticed but can hurt performance. In other words, observability provides a complete, real-time view of your system’s behavior, allowing you to identify and address issues more effectively.

Finding Problematic Services

This is your starting point for identifying slow or potentially problematic services.

If you’re looking to get a high-level overview of your system’s performance — and quickly spot which services might be worth optimizing — this script is for you.

The script analyzes service execution times and highlights the ones that may require further investigation. Naturally, some services are inherently slow due to the complexity of their operations, and cluttering your report with those isn’t helpful. That’s why the script allows you to exclude specific services by adding them to an ignore list.

The script outputs a summary table with the following metrics:

  • Service Name
  • Total Number of Calls
  • Minimum Execution Time (ms)
  • Maximum Execution Time (ms)
  • Average Execution Time (ms)

By default, the script analyzes service calls from the last 24 hours, but you can easily adjust the timeframe to suit your needs.

Reminder: The script is a base for your analysis feel free to adapt it to your needs.

Script

DECLARE @From DATETIME = DATEADD(DD, -1, CAST(GETDATE() as date))   -- ANALYZE ONLY LAST DAY
;WITH ServiceData AS(
  SELECT
    -- Service Name  
    SH.ServiceName
    -- Start Date/Time of the Service Call    
    , SH.ServiceStartTime
    -- End Date/Time of the Service Call    
    , SH.ServiceEndTime
    -- Total elapsed time of the Service Call in milliseconds
    , DATEDIFF(MILLISECOND, ServiceStartTime, ServiceEndTime) AS ElapsedMilliSeconds
  FROM dbo.T_ServiceHistory SH
  WHERE 
    ServiceStartTime > @FROM
    AND
    -- THIS IS THE IGNORE LIST
    --  ADD ANY OTHER SERVICE THAT IS NOT RELEVANT FOR THE ANALYSIS
    ServiceName NOT IN (
      'ReseedIdentityColumns',
      'Cmf.FullBackup.Ods 10.2.7 Installation',
      'Cmf.FullBackup.Dwh 10.2.7 Installation',
      'Cmf.Database.Jobs 10.2.7 Installation',
      'ProcessIntegrationEntry'
    )
)
SELECT
  ServiceName, 
  COUNT(*) AS NumberOfCalls, 
  MIN(ElapsedMilliSeconds) AS MinElapsedMilliSeconds,
  MAX(ElapsedMilliSeconds) AS MaxElapsedMilliSeconds,
  AVG(ElapsedMilliSeconds) AS AvgElapsedMilliSeconds
FROM ServiceData
GROUP BY ServiceName
ORDER BY AvgElapsedMilliSeconds DESC

Example

Service Name# of CallsMin Elapsed (ms)Max Elapsed (ms)Avg Elapsed (ms)
StartPrepareEquipment10366752144506
ComplexReworkMaterial6431780514322
ComplexMoveMaterialsToNextStep356282863467
AbortMaterialProcessing3813306921994

Finding a Request to Analyze

Once you’ve identified a service that deserves a closer look, the next step is to dive into specific requests — individual executions that might reveal what’s really going on under the hood.

The script in this section assumes you’ve already selected a target service (and you know the Service Name) and want to examine how it’s behaving in real-world scenarios.

This script returns a summary of recent executions of the selected service, including:

  • Service History Id
  • Service Name
  • Start Date/Time
  • End Date/Time
  • Elapsed Time (ms)

This gives you a short list of candidates to dig deeper into — especially those with unusually high execution times.

Bonus: Need to focus only on the most recent runs? Just uncomment the ServiceStartTime WHERE clause in the script to narrow down to the latest executions. Super handy when you’re checking the effect of recent performance improvements.

Script

DECLARE @CriticalServiceTime BIGINT = 5000 -- CONSIDER ONLY EXECUTIONS THAT TOOK LONGER THAN
DECLARE @ServiceName NVARCHAR(MAX) = 'ComplexReworkMaterial'  -- HERE YOU SELECT THE SERVICE YOU WANT TO CHECK
SELECT
  -- Service History Id of the Service Call  
  SH.ServiceHistoryId
  -- Service Name  
  , SH.ServiceName
  -- Start Date/Time of the Service Call    
  , SH.ServiceStartTime
  -- End Date/Time of the Service Call    
  , SH.ServiceEndTime
  -- Total elapsed time of the Service Call in milliseconds
  , DATEDIFF(MILLISECOND, ServiceStartTime, ServiceEndTime) AS ElapsedMilliSeconds
FROM dbo.T_ServiceHistory SH
WHERE 
  ServiceName = @ServiceName
  -- AND ServiceStartTime > '2025-04-23 09:07:33.037' -- UNCOMMENT IF YOU WANT TO CHECK LATEST EXECUTIONS ONLY
AND DATEDIFF(MILLISECOND, ServiceStartTime, ServiceEndTime) > @CriticalServiceTime
ORDER BY ElapsedMilliSeconds DESC

Example

Service History IdService NameStart Date/TimeEnd Date/TimeElapsed (ms)
2504230000010152315StartPrepareEquipment2025-04-23 09:30:00.5672025-04-23 09:30:05.7815214
2504230000010129953StartPrepareEquipment2025-04-23 07:48:43.3732025-04-23 07:48:48.3034930
2504160000010112003StartPrepareEquipment2025-04-16 06:04:31.5532025-04-16 06:04:36.2764723
2504230000010146189StartPrepareEquipment2025-04-23 09:04:10.8972025-04-23 09:04:15.5544657

Operation-Level Performance Analysis

In this section, we dive into the operation execution time of a specific service request. The goal?
To spot which operations are taking longer than expected — and might be silently sabotaging your performance. 🕵️‍♂️

This script returns a detailed breakdown of all operations involved in a service execution.
While it brings a lot of data, the real gold lies in identifying two key performance villains:

  • Slow operations: Operations with high elapsed times.
  • Dead time: Gaps between operations where nothing (at least, nothing tracked) is happening — but time keeps ticking.

But what is dead time? Great question! As you may know, the system only logs operations that write to the database (like Track-In, LogEvent, etc.). So not everything leaves a trace.
Individual loads in collection, uncontrolled lazy loading, expensive queries, inefficient loops, and other non-persistent operations can consume serious time — without ever appearing in your logs.

That gap between one operation ending and the next starting?
Yep, that’s your dead time — sneaky and invisible, but very real.

The script returns a breakdown of operations for the selected Service History ID, including:

  • Service History Id
  • Service Name
  • Operation Name
    • The <Service> entry is added by the script to represent the root operation — the entire service scope.
  • Entity Type Name
  • Operation Start Time
  • Operation End Time
  • Operation History Seq
  • Parent Operation Sequence (ID of the operation that triggered this one)
  • LagOperationStartTime (dead time: time between previous op end and current op start)
  • Operation Elapsed Milliseconds (includes time of any sub-operations)

Think of this as an x-ray of your service request — a step-by-step timeline of what’s really happening under the hood.
Perfect for spotting hidden inefficiencies, mysterious lags, and operations that deserve a little tough love 💪

Script

DECLARE @ServiceHistoryId BIGINT = 2504230000010132939  -- CHANGE THIS FOR THE SERVICE ID TO BE ANALYZED
DECLARE @OverMs BIGINT = 0

;WITH Ops AS (
    SELECT
        SH.ServiceHistoryId,
        SH.ServiceName,
        '<Service>' AS OperationName,
        '-' AS EntityTypeName,
        SH.ServiceStartTime AS OperationStartTime,
        SH.ServiceEndTime AS OperationEndTime,
        0 AS OperationHistorySeq,
        NULL AS ParentOperationSequence,
        DATEDIFF(MILLISECOND, SH.ServiceStartTime, SH.ServiceEndTime) AS OperationElapsedMilliseconds
    FROM dbo.T_ServiceHistory SH
    WHERE SH.ServiceHistoryId = @ServiceHistoryId
    UNION ALL
    SELECT
        DISTINCT
        OH.ServiceHistoryId,
        SH.ServiceName,
        OH.OperationName,
        OH.EntityTypeName,
        OH.OperationStartTime,
        OH.OperationEndTime,
        OH.OperationHistorySeq,
        CASE
            WHEN OH.ParentOperationSequence IS NULL THEN 0
            ELSE OH.ParentOperationSequence
        END AS ParentOperationSequence,
        DATEDIFF(MILLISECOND, OH.OperationStartTime, OH.OperationEndTime) AS OperationElapsedMilliseconds
    FROM dbo.T_OperationHistory OH
    INNER JOIN dbo.T_ServiceHistory SH
        ON OH.ServiceHistoryId = SH.ServiceHistoryId
    WHERE SH.ServiceHistoryId = @ServiceHistoryId
),
OpsWithRowNumber AS (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY OperationHistorySeq ORDER BY OperationStartTime) AS rn
    FROM Ops
),
SiblingLag AS (
    SELECT
        O.*,
        P.OperationStartTime AS ParentStartTime,
        LAG(O.OperationEndTime) OVER (
            PARTITION BY O.ParentOperationSequence
            ORDER BY O.OperationStartTime, O.OperationHistorySeq
        ) AS PrevSiblingEndTime,
        P.rn AS ParentRowNumber
    FROM OpsWithRowNumber O
    LEFT JOIN OpsWithRowNumber P
        ON O.ParentOperationSequence = P.OperationHistorySeq AND P.rn = 1
),
FinalData AS (
    SELECT
        ServiceHistoryId,
        ServiceName,
        OperationName,
        EntityTypeName,
        OperationStartTime,
        OperationEndTime,
        OperationHistorySeq,
        ParentOperationSequence,
        
        -- Calculate Lag:
        CASE
            WHEN ParentOperationSequence IS NULL THEN NULL
            WHEN PrevSiblingEndTime IS NULL THEN DATEDIFF(MILLISECOND, ParentStartTime, OperationStartTime)  -- First sibling
            ELSE DATEDIFF(MILLISECOND, PrevSiblingEndTime, OperationStartTime)  -- Sibling lag
        END AS LagOperationStartTime,
        OperationElapsedMilliseconds
    FROM SiblingLag
)
SELECT *
FROM FinalData
WHERE 
    OperationElapsedMilliseconds > @OverMs OR LagOperationStartTime > @OverMs
ORDER BY OperationStartTime, OperationHistorySeq;

Example + Deep Dive

💡 Quick note:

  • The provided data is filtered to show relevant operations.
  • Some columns have been excluded from the table to reduce clutter and focus on the most relevant performance indicators (ServiceHistoryId, ServiceName, OperationStartTime, OperationEndTime).
OperationNameEntityTypeNameOperationHistorySeqParentOperationSequenceLagOperationStartTimeOperationElapsedMilliseconds
<Service>-0NULLNULL19267
ReworkMaterial62080643
ExecuteActionCustomExecuteMaterialActions74620357
CreateMaterialOffFlow219622752
ExecuteActionCustomExecuteMaterialActions2880018529
ExecuteActionCustomRequestNewMaterialOnMoveToReworkStep311288018017
SortSortRuleSet13433115707
ChangeProductionOrderMaterial1735531116087217
EvaluateActionCustomSetAttributesOnProductionOrderChange17356173550210
ExecuteActionCustomTrackObjectStateModelHandler173953110225
SetMainStateModelMaterial17417173953221
ChangeProductionOrderMaterial175263110206
CreateMaterialProductionOrder17563175262021
ChangeFlowAndStepMaterial176563110255
CheckAndPerformFutureActionsMaterial178963110197
EvaluateActionCustomTriggerRequestOnMaterialSelection179793110202
ExecuteActionCustomChangeMaterialProductionOrderRework182282880241
ChangeProductionOrderMaterial18272182282269
EvaluateActionCustomDeleteQualityCertificateOnMoveToRework185032882200

Deep Dive

From the example above, the following code/logic areas should be reviewed:

CustomExecuteMaterialActions DEE Action (Operation 288)

  • Consuming over 96% of the total execution time (~18.5 out of 19 seconds).

  • This operation clearly dominates the execution timeline. We need to dig deeper and understand where this time is being spent.

    • 💡 Reminder: The elapsed time includes all sub-operation times — the slowness might be hiding in nested calls.
  • Operation 288 triggers Operation 311, which is another DEE Action: CustomRequestNewMaterialOnMoveToReworkStep

    • Inside 311, we hit a major lag spike: 16,087 ms — nearly ~84% of the total execution time!
    • We’ve got a lead: the DEE is a prime suspect 🕵️
    • Analyze the DEE thoroughly — trace what happens from the start until the ChangeProductionOrder operation.
    • Then? Optimize it, and re-run the analysis to validate improvements. 🔁

Rework Operation (Operation 62)

  • Responsible for around 3% of the total execution time (~1 second out of 19 seconds)

  • Inside this:

    • CustomExecuteMaterialActions (Operation 74) takes over 50% of Rework’s time (~0.3 seconds).
      • No sub-operations are recorded — this could mean inefficient data access, excessive loops, or invisible logic bottlenecks.
    • Create MaterialOffFlow (Operation 219) took only 2 ms — but:
      • There’s a dead time of 275 ms between op 74 and 219.
      • That’s idle time — probably burned on non-database-writing logic like expensive loads, loops, or waits.

TL;DR

  • You’re staring at two hot zones: Operation 288 and Operation 74
  • Both are DEE actions — mysterious black boxes until opened
  • One’s taking too much time (288), the other seems to do too little (74)
  • Time to investigate the DEE logic, optimize the flows, and run the scripts again 🛠️✨

Final Takeaways

  • 🔍 Start with the biggest bottleneck
    • Focus on the slowest operation first — it might not just be slow on its own, but could be dragging other operations down by hogging resources or causing delays further down the chain.
  • 🧾 Track your improvements
    • Performance tuning is often a time sink. Make sure you’re capturing before-and-after metrics so you can see what’s working (and prove that it is). It’s your own version of a “glow-up” 📈.
  • 📣 Share what you learn
    • If you stumble upon a common pitfall or a sneaky performance killer, document it! Blog it! Help your team (and your future self) avoid déjà vu slowdowns.
  • 📋 Don’t sacrifice functionality for speed
    • A fast service that skips steps is just a glorified return true;. Always double-check that your fixes don’t break what the system is supposed to be doing.
  • 🛠️ Use the scripts — they’re your friend
    • Dive in, tweak, explore. If you get stuck or something doesn’t make sense, don’t suffer in silence — shoot me a message and let’s untangle it together. 🧵😄






Author

Hello, my name is César Meira 👋

I’ve been working for some years at Critical Manufacturing. You can check me at Linkedin LinkedIn

César Meira
Senior Software Engineer