Exadata - Oracle DB Performing Slow due to High IO

Issue #:

People are complaining that DB is performing slow and observing high I/O. Earlier to today, could see there are few set of functions related to application Module load causing the slowness.Can you please have a look further and provide your guidance.

Analysis #1:

From below AWR report, we can see cell getting exhaust with full table scan. Can you please provide cell statistics (load) between 1pm to 2pm (today and yesterday).

@App lead:

Every time we cannot say jobs load issue if data comes more. We need to go in detail where exactly caused problem.

From last week, I have seen load max 12 but today it went to 15. And both instances more than 10.

This time USER IO is high and timeout is 47% (not good sign).

	Wait		Event		Wait Time			Summary Avg Wait Time (ms)
I#	Class	Event	Waits	%Timeouts	Total(s)	Avg(ms)	%DB time	Avg	Min	Max	Std Dev	Cnt
*	User I/O	cell smart table scan	85,678,559	47.82	251,734.83	2.94	65.15	2.94	2.93	2.95	0.01	2
*	User I/O	cell single block physical read	75,291,249	0.00	45,166.16	0.60	11.69	0.60	0.60	0.60	0.00	2
*		DB CPU			33,067.17		8.56					2
*	User I/O	cell multiblock physical read	2,605,214	0.00	8,380.55	3.22	2.17	3.12	2.47	3.77	0.92	2
*	User I/O	cell list of blocks physical read	5,474,088	0.00	5,052.89	0.92	1.31	0.94	0.89	0.99	0.07	2
*	User I/O	direct path read temp	262,112	0.00	4,348.86	16.59	1.13	16.65	15.94	17.36	1.00	2
*	User I/O	direct path read	1,017,655	0.00	3,059.73	3.01	0.79	2.49	1.92	3.05	0.80	2
*	Network	SQL*Net more data from client	578,089	0.00	2,593.47	4.49	1.26	4.49	4.49	4.49		1
*	Scheduler	resmgr:cpu quantum	203,446	0.00	1,426.67	7.01	0.37	7.20	6.11	8.29	1.54	2
*	Cluster	gc cr grant 2-way	10,534,866	0.00	1,402.35	0.13	0.36	0.17	0.12	0.22	0.08	2

Below highlighted sql_id’s, cpu and IO wait are very high. we need to talk with developer team to tune this pl/sql.

	Total								Per Execution							Percentage of Total
SQL Id	CPU (s)	Elapsed (s)	IOWait (s)	Gets	Reads	Rows	Cluster (s)	Execs	CPU (s)	Elapsed (s)	IOWait (s)	Gets	Reads	Rows	Cluster (s)	DB CPU	DB time	IO Wait	Gets	Reads	Cluster	Execs	SQL Text
3x4h2q7rhhq30	1,147.15	3,381.13	2,315.04	28,337,764	12,787,826	1	156.77	1	1,147.15	3,381.13	2,315.04	28,337,764.00	12,787,826.00	1.00	156.77	3.47	0.88	0.72	0.20	0.09	4.82	0.00	DECLARE job BINARY_INTEGER := ...
fdp0brc3xgcpy	1,146.47	2,641.43	1,483.05	53,540,177	42,737,097	1	87.61	1	1,146.47	2,641.43	1,483.05	53,540,177.00	42,737,097.00	1.00	87.61	3.47	0.68	0.46	0.37	0.31	2.70	0.00	DECLARE job BINARY_INTEGER := ...

Below we can see highlighted table doing full table scan.

@core DBA:

Seems there is no metric created in OEM for Exadata machine. If we can enable metric IOPS then it will be easy for us to analyse Exadata cells IO whenever we see “cell smart table scan”.

Please find attached oracle document for metric details along with creation steps.

After creating metric, expected output will be:

Analysis #2:

Requested by app team:

Please find below current SGA and CPU. If we can increase SGA to 60GB and CPU to 24.

NAME                                               TYPE    VALUE

-------------------------------------------------- ----------- -

sga_max_size                                       big integer 34G

sga_target                                         big integer 28G

cpu_count                                          integer     12

You can change large pages also based on SGA. This will help us for LOB’s segment.

use_large_pages

TRUE

@core DBA

App team accidently confirmed DB facing issue because of memory/cpu crunch. So, Client requested how much memory and cpu needed extra so that he can approve it. So, I suggested values.

DB is using ASMM only, there is no value set for memory_target value.

NAME                                               TYPE        VALUE
-------------------------------------------------- ----------- -----
memory_max_target                                  big integer 0
memory_target                                      big integer 0

DB parameter use_large_pages set to TRUE, so I suggest to check huge_pages on server level and change it accordingly, so that we can take advantage when ever memory requested more than default 4KB. Can you please provide system parameter values from server level and you can check current huge_pages value set in server level.

https://docs.oracle.com/database/121/UNXAR/appi_vlm.htm#UNXAR403

https://www.carajandb.com/en/blog/2016/7-easy-steps-to-configure-hugepages-for-your-oracle-database-server/

Analysis #3:

Please find below my analysis for right side graph.

1. There is spike after 12pm on 7^th Jan but not on 8^th Jan for IORM, IO.

2. If you see IO is constant for whole period but after 12PM on 8^th is also constant.

3. IORM wait time is 12ms on 7^th Jan whereas on 8^th Jan < 2ms and if you see small IO also increased during that time.

Please have a look on belong oracle doc, it is not easy to identify IO bottle neck using AWR report only. We need Exadata health checkup also.

http://www.aioug.org/ODevCYatra/2018/Karan_Oracle_DISK_IO.pdf

If they configure Exadata (read only for us), then we can easily analyse whenever we will have issue in DB.

https://docs.oracle.com/cd/E24628_01/doc.121/e27442/ch7_metrics.htm#EMXIG356

Core DBA: Has anyone done a TFA analysis? I would like to see the OSWatcher logs between 12 and 3PM that day. If we have raised an SR for the elevated cell smart scan event, we should get a TFA diagcollect and send that to Oracle for analysis.

Venu: We already requested to SR team but hey refused to look TFA report and told us to first test given hidden parameters.

Core DBA:

It seems the cell server bad health started from Nov19th, still it is in RED. To do deep troubleshooting/analysis may we need to initiate discussion with L3 DBA team.

Final Solution:

We collected SQL's which are performing full table scan and used high IO. We ran sql tuning advisor, collcted sql history plan values and manually checked missing columns indexes. At last we created new indexes for impacted SQLs to reduce IO.

Search the post

Oracle Core DBA - GoldenGate - OEM - PostgreSQL - MongoDB - AWS_RDS

Exadata - Oracle DB Performing Slow due to High IO

Comments

Post a Comment

Popular Posts

AWR Report Analysis

Goldengate 19c Installation with issue resolution

Oracle 11g Physical Standy Setup

Oracle 12c 2-node RAC setup in VMware Player 15 -- Part01

Oracle to Postgres

Execution_Explain Plan

Active Session History Analysis

OEM queries

General Queries

Fragmentation & Re-organisation