http://oraperf.sourceforge.net/seminar/sevt_anal.html
A typical waits event section of a report.txt looks like
SQLDBA> Rem System wide wait events.
SQLDBA> select n1.event “Event Name”,
2> n1.event_count “Count”,
3> n1.time_waited “Total Time”,
4> (n1.time_waited/n1.event_count) “Average Time”
5> from stats$event n1
6> where n1.event_count > 0
7> order by n1.time_waited desc;
Event Name Count Total Time Average Time
————————— ———— ———— ————
client message 1770692 26676762 15.065726846
rdbms ipc message 7626 782385 102.59441385
pmon timer 535 158481 296.22616822
smon timer 7 146587 20941
db file sequential read 83662 79166 .94625995075
db file scattered read 5224 26120 5
db file parallel write 6982 6848 .98080779146
inactive session 60 6060 101
latch free 15653 4442 .28377946719
enqueue 28 2268 81
log file sync 3638 2245 .61709730621
log file parallel write 3892 2004 .51490236382
rdbms ipc reply 19 1813 95.421052632
control file parallel write 843 850 1.0083036773
log file space/switch 5 236 47.2
db file single write 726 210 .28925619835
control file sequential rea 1196 22 .01839464883
library cache pin 24 10 .41666666667
buffer busy waits 28 6 .21428571429
library cache load lock 2 5 2.5
write complete waits 5 5 1
log file sequential read 1 2 2
log file single write 2 0 0
23 rows selected.
SQLDBA>
(To see and interpretation of all the wait events possible see: wait events )
We can immediately simplify this list by eliminating the wait events that are not related to users
Event Name Count Total Time Average Time
————————— ———— ———— ————
client message 1770692 26676762 15.065726846
rdbms ipc message 7626 782385 102.59441385
pmon timer 535 158481 296.22616822
smon timer 7 146587 20941
db file sequential read 83662 79166 .94625995075
db file scattered read 5224 26120 5
db file parallel write 6982 6848 .98080779146
inactive session 60 6060 101
latch free 15653 4442 .28377946719
enqueue 28 2268 81
log file sync 3638 2245 .61709730621
log file parallel write 3892 2004 .51490236382
rdbms ipc reply 19 1813 95.421052632
control file parallel write 843 850 1.0083036773
log file space/switch 5 236 47.2
db file single write 726 210 .28925619835
control file sequential rea 1196 22 .01839464883
library cache pin 24 10 .41666666667
buffer busy waits 28 6 .21428571429
library cache load lock 2 5 2.5
write complete waits 5 5 1
log file sequential read 1 2 2
log file single write 2 0 0
The above output gives a profile of the “bottleknecks” in the database. We can elemenate most of the lines because they are usually irrelevant. For example:
grey colored lines are idle events. The processes are idle and waiting for work to do.
violete colored lines are background waits, and not users shadow process waits, thus, at least in the first analysis, we can ignore them. We are interested in why users have to wait and not why the background process (pmon,smon,lgwr, dbwr) have to wait
red colored lines are the events that users have to wait for.
To see a list of wait events broken down by type see: wait events by type
This leaves us with:
Event Name Count Total Time Average Time
————————— ———— ———— ————
db file sequential read 83662 79166 .94625995075
db file scattered read 5224 26120 5
latch free 15653 4442 .28377946719
enqueue 28 2268 81
log file sync 3638 2245 .61709730621
rdbms ipc reply 19 1813 95.421052632
log file space/switch 5 236 47.2
library cache pin 24 10 .41666666667
buffer busy waits 28 6 .21428571429
library cache load lock 2 5 2.5
write complete waits 5 5 1
The TIME_WAITED is measured in 100ths of seconds.
The basic easily tunable events are:
buffer_busy_waits – rbs problems, or need free list, or maybe just I/O throughput
free_buffer_waits – dbwr not working hard enough or buffer cache too small
write_complete_waits – dbwr maybe working too hard
db_file_scattered_read – read (full table scan) disk wait, buffer cache too small
db_file_sequential_read – rowid (index) block read, read disk wait, buffer cache too small
enqueue – lock waits
latch_free – waits for latch, see v$latch to see which latch exactly
log_buffer_space – log buffer needs to be larger
log_file_space_switch – log buffer space wait or log file switch wait (pre 7.3)
log_file_switch_checkpoint_incomplete – redo logs too small or not enough redo logs
log_file_switch_completion – wait for log file while switching log file
log_file_sync – write to log file wait, raw devices are faster for log files (no RAID5!!)
So by far the biggest waits from the example above are on
db file scattered read
db file sequential read
and the sum of every time any users ever waited for this event during the period of bstat/estat is
This leaves us with:
Event Name Count Total Time Average Time
————————— ———— ———— ————
db file sequential read 83662 79166 .94625995075
db file scattered read 5224 26120 5
= 79166 + 26120
= 105286 hundreths of second
= 1052 seconds
= 17.5 minutes
Now the question, so is 17.5 minutes significant? Well thats harder to decide.
Probably the easiest thing to do is look at the statistic
CPU used by this session
in the statistic section of the bstat/estat report . Compare the ratio of the CPU time to the WAIT time.
for the lovers of generic rules:
CPU > 2 * WAIT : probably ok
CPU > WAIT : probably some tuning to do
CPU < WAIT : problem area
Another way to look at the relative importance of the wait events is to look at how many users were connected during the bstat/estat to figure out the average time waited by users., First one needs to know over what period of time the bstat/estat was run. This is given at the bottom of the report.txt … time period of stats
Second, one needs to know how many users were working (thus possible waiting for these event). The file report.txt doesn’t really help here. It does give how many users logged on during the period, but not how many were connected. You can run the following for the number of users connected:
select value from v$sysstat s, v$statname n where s.statistic#=n.statistic# and n.name=’logons current’;
OK, say you know over what period the bstat/estat was taken and you know roughly how many users were connected, then you can get some idea of whether the wait time is important. For example:
stat time 18 minutes with one user ==> probably needs tuning
stat time 18 hours with 1000 users ==> probaly not a big deal
The importance of the time waited is relative to the time we spent working. So in order to determine the importance of the wait events we need to know how much time the users spent active or working (and not idle). There are two ways to determine this
1. assume active time is given by “CPU used by this session” statistic, which is actually CPU used by all users in v$sysstat table
2. assume active time is ( connect time – idle time). This is difficult because we not only need to know how many process were connected for but also the idle time. THe idle time comes from “client message” and “rdbms ipc message” in both the table “v$session_waits” and “v$system_event” which is not exactly obvious
The first method I think is probably the best and easiest to use. The stat “CPU used by this session” is the sum of time spent by all Oracle processes using the CPU measured in 100ths of a second. So:
wait_time/(wait_time+cpu_time)
So assuming user active time = wait_time+cpu_time, then the amount waited So for generic rules lovers given the above formula, if the result is:
< .5 : probably ok
> .5 and < 1 : probably some tuning to do
> 1 : problem area