272332.1 CRS 10g Diagnostic Collection Guide

CRS 10g Diagnostic Collection Guide

===================================

This document contains guidelines for collecting diagnostic information from a CRS installation.

The diagnostics listed in the sections below are necessary for development to be able to help with TARs,

bugs and other problems that may arise in the field.

CRS

Collect all CRS log files, trace files and core dumps from the following three directory trees, on every node in the cluster.

Use a tool such as tar, gzip or some equivalent method. Note that some of the files under these directories are binary files

and as such will not transfer well unless they are zipped.

The following commands will accomplish this on a Unix system. First log in as the root user:

HOST=`hostname`

cd $CRS_HOME/

tar cvf crsData_${HOST}.tar css/log css/init crs/log crs/init evm/log evm/init srvm/log racg/dump log

gzip crsData_${HOST}.tar

cd $ORACLE_HOME

tar cvf oraData_${HOST}.tar racg/dump admin/*/hdump

gzip oraData_${HOST}.tar

cd $ORACLE_BASE

tar cvf basData_${HOST}.tar admin/*/hdump

gzip basData_${HOST}.tar

If the CRS stack is not starting after initiating root.sh check the following:

/tmp/crsctl. and /var/log/messages for information on why CRS stack can’t startup on initial installation and execution of root.sh, typically permission issue with OCR file.

Another way to get all the trace files packaged is using $CRS_HOME/bin/diagcollect.pl

OCR

Check, and collect the location and existance from the orc.loc file. On Linux this will be in /etc/oracle/ and

on Solaris the file should be in /var/opt/oracle.

The content should be:

ocrconfig_loc=

local_only=FALSE

Perform an ocrdump as root so we can access the configuration data. This is done as follows:

cd $CRS_HOME/bin

ocrdump ${HOST}_OCRDUMP

This is an ASCII file. It is useful to repeat this on every node, in case one of the nodes is configured incorrectly

and is refering to a different OCR device.

Run $ORACLE_HOME/bin/ocrconfig -showbackup to find out the location of the most current backup. Go to the host in question,

and backup the file. The default location is $CRS_HOME/cdata/

CSS

In all cases the following data will be required from each of the configured nodes:

CSS daemon logs, found in $CRS_HOME/css/log; both ocssd*.log and ocssd*.blg

CSS daemon init output files, found in $CRS_HOME/init directory

CSS daemon startup files (if any), e.g. /etc/init.d/init.cssd /etc/inittab, /etc/hosts for system configuration data

Stack trace of any relevant core file found in a subdirectory of the $CRS_HOME/init directory.

Some core files may be old and irrelevant, compare core file timestamp to time of error.

The stacks of all threads will be required.

/tmp/crsctl. and /var/log/messages for information on why CRS stack can’t startup on initial installation and execution of root.sh, typically permission issue with OCR file.

CRSD

The relevant data is in the following files. You may want to examine these files for information about the problem.

CRS daemon logs: $CRS_HOME/crs/log/.log

CRS daemon core files: $CRS_HOME/crs/init//*

PSTACKs or GCOREs or other available stack information for running daemons.

This is useful in situations where crs commands are hanging.

EVMD

The relevant data is in the following files. You may want to examine these files for information about the problem.

EVM daemon logs: $CRS_HOME/evm/init/.log

EVM event logger logs: $CRS_HOME/evm/log/_evmdaemon.log

EVM event log archives for relevant timeframes:

$CRS_HOME/evm/log/_evmlog..

In at least one occasion in the past evm logs have been corrupted in transit. To avoid this, you can produce text versions of

the event traces with this command:

$CRS_HOME/bin/evmshow -t “@timestamp @@” _evmlog.

$CRS_HOME/bin/evmshow -t “@timestamp [@priority] @name” _evmlog.

@For more detailed information how to read evmlog file please see Note: 279165.1

CORE DUMPS

Core dumps may be found in multiple locations. For each core dump that occurred at a relevant time, obtain a stack trace for each thread.

$CRS_HOME/init/ may contain CSS daemon core dumps.

$CRS_HOME/crs/init/${HOST}/ may contain CRS daemon (crsd) core dumps.

$CRS_HOME/evm/init may contain EVM daemon core dumps.

$CRS_HOME/bin may contain racgmain or racgimon core dumps.

$ORACLE_HOME/bin may contain racgmain or racgimon core dumps.

To be thorough, you may wish to use:

cd $CRS_HOME ; find . -name “*core*”

cd $ORACLE_HOME ; find . -name “*core*”

The “strings” command may be useful to determine the pathname of the file that generated the core dump. The command

“strings core | more” typically prints the pathname of the binary in the first few lines.

A command like “gdb $CRS_HOME/bin/crsd core” can be used to generate the core dump. The “bt” command will generate a stack trace.

To switch threads, use a command like “thread t1” (?) and then use the “where” command to dump the stack of that thread.

In some cases it is desirable to obtain PSTACKs or GCOREs or other available stack information for running CRS and CSS daemons.

This is particularly useful when operations are hanging.

NETWORK

In some non-default network diagnostic modes, additional tracing files will be created in the following location.

CSS NETWORK tracing: $CRS_HOME/css/log/ocsns*.log

Other NETWORK tracing: $CRS_HOME/log/clscns*.trc

CLSC tracing: $CRS_HOME/log/clsc*.trc

OPROCD

Oprocd is another daemon frequently run in configurations without a vendor cluster service.

If the $CRS_HOME/log directory has been created, then it will add a small amount of diagnostics here.

OPROCD log: $CRS_HOME/log/oprocd*.log

RACX

Collect RACX trace files in all configured nodes. The files are in the following directories:

(Some directories may not exist in some configurations.)

$CRS_HOME/racg/dump

$ORACLE_HOME/racg/dump

hdump directories under $ORACLE_HOME/admin and $ORACLE_BASE/admin

In UNIX platforms, collect core files if any in the following directories:

$CRS_HOME/bin

$ORACLE_HOME/bin

RDBMS

Data collection should include data from all instances, not just the instance that experienced the problem.

In all cases the alert logs and background process trace files from all instances will be required.

If the problem is experienced by 1 or more foregrounds, the trace files of the foregrounds experiencing the problem will be required.

Network and VIP

The following information is required from all configured nodes:

The NIC configuration: This will vary by platform; on Solaris this is the output of the ‘ifconfig -a’ command, on Linux it is the output of the ‘ifconfig’ command.

Information from /etc/hosts or equivalent data.

Information about the various NICs and how they are configured.

netstat -ia / netstat -i for the customer network information.

netstat -r

/usr/sbin/traceroute 172.16.1.31 (private interconnect IP)

/usr/bin/find /etc/rc.d -type l

CLUSTER

Please collect information on the Clusterware if any is being used. The version of the clusterware, the configuration it sees and the output of its diagnostic tools will be useful.

In some cases, is it useful to know if any resources are being heavily consumed. Normally, DBAs and system administrators track at least disk, ram,

and cpu utilization via a process like SAR, or by periodically running commands like “df” or “ps -leaf”. If possible, this type of information should

be made available from around the time of the event and earlier. (Historical information is useful so that we can attempt to correlate the failure

with spikes in resource consumption.) The exact data to collect will be dependent on the normal system administration processes. The types

of questions we want to answer are:

Did we run out of disk space?

Did we run out of memory to allocate?

Were too many processes running preventing additional processes from spawning?

Were too many file descriptors open in a single process or system wide?

Was the cpu being heavily used?

Were I/O rates extremely high?

TRACE COLLECTION 11.1 ONWARDS

Starting 10.2 and 11.1 onwards we recommend use $CRS_HOME/bin/diagcollection.pl for collecting all Clusterware related log file.

Related Posts

Leave a Reply Cancel reply