In this Document
Purpose
Scope
How To Manually Start Oracle CRS Clusterware
References
Applies to:
Oracle Server – Enterprise Edition – Version: 10.1.0.2 to 11.1.0.7
Information in this document applies to any platform.
Information in this document applies to Unix platforms only.
Purpose
This article is intended for DBAs having problems to start the CRS clusterware.
Scope
This note is intended to help debugging CRS clusterware startup problems linked with the CRS startup scripts used. In case the CRS clusterware don’t start, different root causes can exist. Trying to start it manually instead of automatically can help narrow down the potential reasons why the clusterware, or some component of it, don’t start.
How To Manually Start Oracle CRS Clusterware
General CRS clusterware scripts info
————————————
The startup of the Oracle clusterware daemons is based on start scripts that are executed as root user.
The environment variables are set in the scripts themselves. The clusterware start scripts stands in
/etc/init.d (Sun, Linux), /sbin (HPUX, HP Itanium, Tru64) or /etc (AIX) and are named
init.cssd: start of ocssd.bin, oclsomon, oprocd and oclsvmon daemons
init.evmd: start of the evmd.bin daemon
init.crsd: start of the crsd.bin daemon
init.crs : enabler/start/disabler script
The automatic startup of the clusterware daemons rely on two Unix OS system mechanisms and a check
whether the clusterware is startable.
1. the run of the rc*.d scripts that enable the clusterware to start or not after a reboot
Running ‘init.crs disable’ basically permit to avoid the clusterware to start automatically
at system reboot. Running ‘init.crs enable’ permit the clusterware to autostart at system reboot
(the default setting), e.g.
./init.crs disable
Automatic startup disabled for system boot.
./init.crs enable
Automatic startup enabled for system boot.
When the rc*.d scripts are executed in the correct OS runlevel, then the
‘init.crs start’ execution (executed in runlevel 3 or 5 following the Unix versions)
will check the automatic setting (enabled/disabled) and set the clusterware
in startable/non startable mode.
A correctly working rc*.d ‘init.crs start’ execution should update the
/var/opt/oracle/scls_scr/
/etc/oracle/scls_scr/
‘norun’.
2. the inittab mechanism via the startup of three respawnable scripts, e.g.
h1:3:respawn:/sbin/init.d/init.evmd run >/dev/null 2>&1 h2:3:respawn:/sbin/init.d/init.cssd fatal >/dev/null 2>&1 h3:3:respawn:/sbin/init.d/init.crsd run >/dev/null 2>&1 clusterware don’t respond, e.g.
ps -ef | grep init
root 16554 1 0 Dec17 ? 00:00:56 /bin/sh /etc/init.d/init.cssd fatal
root 16555 1 0 Dec17 ? 00:00:00 /bin/sh /etc/init.d/init.crsd run
root 18241 16554 0 Dec17 ? 00:00:00 /bin/sh /etc/init.d/init.cssd oclsomon
root 18245 16554 0 Dec17 ? 00:00:00 /bin/sh /etc/init.d/init.cssd daemon
root 24940 16554 0 14:08 ? 00:00:00 /bin/sh /etc/init.d/init.cssd runcheck
3. The clusterware scripts further run a check script to know whether the clusterware is startable before
launching the clusterware daemons, i.e. to know whether basic prerequisites are met and
permit the clusterware to start. It is done via:
sh -x init.cssd startcheck
That last script need to return code 0 to permit the clusterware to start. In case of errors, /tmp/crsctl.xxxx logging files are written with the error message.
It execute as oracle user the command:
crsctl check boot
via a ‘su -l oracle’, that need to return nothing to permit the clusterware to start. ‘crsctl check boot’ can
returns otherwise errors like ‘no read access to the ocr’, ‘clustered ip is not defined’, ‘$CRS_HOME
is not mounted’, … and inhibit the clusterware to start.
Once the above three prerequisites are working, the clusterware start via *.bin executables viewable via ‘ps -ef’ e.g.
haclu 19611 19610 0 Dec17 ? 00:10:51 /opt/app/oracle/product/crs/bin/oclsomon.bin
haclu 19547 18245 0 Dec17 ? 00:17:03 /opt/app/oracle/product/crs/bin/ocssd.bin
haclu 18215 18005 0 Dec17 ? 00:00:45 /opt/app/oracle/product/crs/bin/evmd.bin
root 18649 16555 0 Dec17 ? 00:12:51 /opt/app/oracle/product/crs102/bin/crsd.bin
…
Procedure to manually start the clusterware
——————————————-
1. make sure the inittab mechanism can’t start the clusterware daemons,, i.e. comment out the last three lines
#h1:3:respawn:/sbin/init.d/init.evmd run >/dev/null 2>&1 #h2:3:respawn:/sbin/init.d/init.cssd fatal >/dev/null 2>&1 #h3:3:respawn:/sbin/init.d/init.crsd run >/dev/null 2>&1 sh -x init.crs start
3. start the oclsomon. The oclsomon purpose is to assist the CSS daemon
by monitoring it for hangs. In the event of a CSS daemon hang, the remote nodes
may evict the current node and so oclsomon is necessary to terminate the local node.
sh -x init.cssd oclsomon
4. In case a third party clusterware is installed (HP Serviceguard, IBM HACMP, Suncluster, VCS Veritas, etc…) start the oclsvmon with third party clusterware in another shell. The oclsvmon purpose is to assist the CSS daemon in monitoring vendor clusterware and allow additional diagnostics
to be obtained in the case of system failures.
sh -x init.cssd oclsvmon
5. When no third party clusterware is installed, start the oprocd daemon in another shell.
There is no ‘oprocd’ daemon on Linux systems until 10.2.0.3. It is included in 10.2.0.4 patchset. It exists for all other Unix releases. The purpose of oprocd is to detect system hangs, which often occur due to faulty drivers or hardware.
sh -x init.cssd oprocd
6. Start the Oracle CRS ocssd.bin daemon in another shell, via
sh -x init.cssd daemon
=> check the ocssd.bin daemon respond to request, via
crsctl check cssd
7. Start the evmd.bin daemon in another shell, via
sh -x init.evmd run
=> check the evmd.bin respond to request, via
crsctl check evmd
8. Start the crsd.bin daemon in another shell, via
sh -x init.crsd run
=> check the crsd.bin respond to request, via
crsctl check crsd
9. Check the clusterware, via
crsctl check crs
crs_stat -t