Issue - DB Server CPU Utilization reaching 100%

OIM Version - 7.1

We have a scripted data import that is triggered to populate few custom tables. The input data for this comes from a simple select query which has a join of 3 OIM tables and result of that query will be used to performs insert update delete operations in ESetHasEntitlement table. Since our system has high amount data, we trigger processes in batches of 100 and for this particular import step all the 100 processes execute at the same time. When things are working fine, the import process for one batch takes about 4-5 minutes to complete.

 

We have encountered recently that sometimes the job gets loaded and is stuck in the queue showing "Loading records from source system..."(source is the above mentioned select query). When this happens the DB server CPU goes to 100% utilization and none of the other jobs get processed. When we check in the DB server, none of the queries are causing any blocking but the CPU time is very high. We have also noticed that the 'Reads' attribute in sp_whoisactive keeps increasing to a very large number even when nothing is being processed. We have tried introducing NO LOCK in the select query that is used as input but haven't had any success with that either. 

 

When we have this issue, in order to sort the queue we have to restart the services in the job servers and make the job freeze. Also there hasn't been any particular pattern in the occurrence of this issue. We have tried monitoring the other jobs that run simultaneously, check for DB issues but so far haven't been able to identify why or when this is happening. Some days everything processes successfully and some days it pushes the CPU to 100. 

 

Please suggest if there is anything else that we can potentially troubleshoot to identify the cause or if there is a fix available. 

  • You should check which session is consuming excessive CPU time. Analyze the actual execution plan of the queries run by the session.
    Possible reasons might be CPU, Memory or IO bandwidth shortage by excessive parallelisation or bad execution plans due to bad or outdated statistics.
    Bad query design like usage of unindexed columns in where clauses or join conditions are another possible reason.