Skip to content
pvmehta.com

pvmehta.com

  • Home
  • About Me
  • Toggle search form
  • standard Monitoring – 1 Oracle
  • Gathering statistics with DBMS_STATS Oracle
  • kill all processes from specific user in solaris. Linux/Unix
  • Is It Recommended To Apply Patch Bundles When PSU Is Available? -ID 743554.1 Oracle
  • Useful Solaris Commands on 28-SEP-2005 Linux/Unix
  • PLSQL Table Syntax 1 Oracle
  • newupload.html PHP/MYSQL/Wordpress
  • Multiple listeners Oracle
  • temp_use.sql diplays usage of temp ts Oracle
  • Unix command for system configuration Linux/Unix
  • Configure ssh authentications for RAC Oracle
  • online_bkup.sql Oracle
  • Processes Parameter decision Oracle
  • TNSNAMES entries details Oracle
  • compile_inv.sql Oracle

Resolving RMAN Hung Jobs

Posted on 04-Nov-2005 By Admin No Comments on Resolving RMAN Hung Jobs

PURPOSE

=======

This note discusses resolving an RMAN Hung Job

SCOPE & APPLICATION

===================

Anyone involved in running RMAN jobs

Resolving an RMAN Hung Job

==========================

Components of an RMAN Session

The nature of an RMAN session depends on the operating system. In UNIX,

an RMAN session has the following processes associated with it:

– The RMAN process itself.

– The catalog connection to the recovery catalog database–if using a

recovery catalog, none otherwise.

– The connection to the target database, also called the default channel.

– A polling connection to the target database used for RPC testing of each

different connect string used in the allocate channel command. By default

there is no connect string in allocate channel and so there is only one

RPC connection.

– One target connection to the target database corresponding to each

allocated channel.

Process Behavior During a Hung Job

RMAN usually hangs because one of the channel connections is waiting in the

media manager code for a tape resource. The catalog connection and the default

channel seem to hang because they are waiting for RMAN to tell them what to do.

Polling connections seem to be in an infinite loop while polling the RPC under

the control of the RMAN process.

If you kill the RMAN process itself, then you also kill the catalog connection,

the default channel, and the polling connections. Target connections that are not

hung in the media manager code also terminate: only the target connection executing

in the media management layer remains active. You must manually kill this process

because terminating its session does not kill it. Even after termination, the media

manager may keep resources busy or continue processing because it does not realize

that the Oracle process is gone. This behavior is media manager-dependent.

Terminating the catalog connection does not cause RMAN to finish because RMAN is

not performing catalog operations. Removing default channel and polling connections

cause the RMAN process to detect that one of the channels has died and then proceed

to exit. In this case, the connections to the hung channels remain active as

described above.

Terminating an RMAN Session

The best way to terminate RMAN when the connections for the allocated channels

are hung in the media manager is to kill the Oracle process of the connections.

The RMAN process detects this termination and proceed to exit, removing all

connections except target connections that are still operative in the media

management layer. The caveat about the media manager resources still applies

in this case.

To identify and terminate an oracle process that is hung in the media manager code:

This procedure is system-specific. See your operating system-specific documentation

for the relevant commands.

1. Obtain the current stack trace for the desired process id using a system-specific

utility. For example, on Sun Solaris you can use the command pstack located in

/usr/proc/bin to obtain the stack.

2. After the stack is obtained, look for the process with SBTxxxx (normally sbtopen)

as one of its top calls. Note that other layers may appear on top of it.

3. Obtain the stack again after a few minutes. If the same stack trace is returned,

then you have identified the hung process.

4. Kill the hung process using a system-specific utility. For example,

on Sun Solaris execute a kill -9 command.

5. Repeat this procedure for all hung channels in the media management code.

6. Check that the media manager also clears its processes, otherwise the next

backup or restore may still hang due to the previous hang. In some media

managers, the only solution is too shut down and restart the media manager

daemons. If the documentation from the media manager is unhelpful, ask the

media manager technical support for the correct solution.

Backup Job Is Hanging

In this scenario, an RMAN backup job starts as normal and then pauses inexplicably:

Recovery Manager: Release 8.1.5.0.0 – Production

RMAN-06005: connected to target database: TORPEDO

RMAN-06008: connected to recovery catalog database

RMAN> run {

2> allocate channel t1 type “SBT_TAPE”;

3> backup

4> tablespace system,users; }

RMAN-03022: compiling command: allocate

RMAN-03023: executing command: allocate

RMAN-08030: allocated channel: t1

RMAN-08500: channel t1: sid=16 devtype=SBT_TAPE

RMAN-03022: compiling command: backup

RMAN-03023: executing command: backup

RMAN-08008: channel t1: starting datafile backupset

RMAN-08502: set_count=15 set_stamp=338309600

RMAN-08010: channel t1: including datafile 2 in backupset

RMAN-08010: channel t1: including datafile 1 in backupset

RMAN-08011: channel t1: including current controlfile in backupset

# Hanging here for 30 min now

Diagnosis of the Cause

If a backup job is hanging, that is, not proceeding, then several scenarios

are possible:

– The job abnormally terminated.

– A server-side or media management error occurred.

– RMAN is waiting for an event such as the insertion of a new cassette into

the tape device.

Your first task is to try to determine which of these scenarios is the most

likely cause.

To determine the cause of the hang:

1. If you are using a media manager, examine media manager process, log, and trace

files for signs of abnormal termination or other errors (see the description of

message files in “Identifying Types of Message Output”). If this information is

not helpful, proceed to the next step.

2. Restart RMAN and turn on debugging, making sure to specify a trace file to

contain the output. For example, enter:

% rman target / catalog rman/rman@catdb debug trace = /oracle/log

3. Re-execute the job:

run {

allocate channel c1 type ‘sbt_tape’;

backup tablespace system;

}

4. Examine the debugging output to determine where RMAN is hanging.

The output will most likely indicate that the last RPC sent from the

client to the server was SYS.DBMS_BACKUP_RESTORE.BACKUPPIECECREATE,

which is the call that causes the server to interact with the media

manager to write the backup data:

krmxrpc: xc=6897512 starting long running RPC #13 to target: DBMS_BACKUP_RESTORE.

BACKUPPIECECREATE

krmxr: xc=6897512 started long running rpc

5. Check to see what the server processes performing the backup are doing.

How many processes are hanging? If only one, check to see what it is doing

by querying V$SESSION_WAIT. For example, to determine what process 12 is

doing, enter:

SELECT * FROM v$session_wait WHERE wait_time = 0 AND sid = 12;

6. If a backup to tape stalls at the beginning, issue the following query:

SELECT * FROM v$session_longops WHERE compnam = ‘dbms_backup_restore’; –> for 8.0

SELECT * FROM v$session_longops WHERE substr(opname,1,4)=’RMAN’; –> for 8.1 & 9.0

If Oracle returns no information, then the PL/SQL program performing the backup

is hung.

Solution

Because the causes of a hung backup job can be varied, so are the solutions.

The best practice is to look for the simplest solutions first. For example,

it is quite common for backup jobs to hang simply because the tape device has

completely filled the current cassette and is waiting for a new tape to be

inserted. Look for the obvious in all components used for the backup when

problems occur.

Oracle, rman-dataguard

Post navigation

Previous Post: Vivek Tuning for Row Locks.
Next Post: Changing the Global Database Name

Related Posts

  • How to set Processes Parameter Oracle
  • Jai Shree Ram Oracle
  • findobj.sql Oracle
  • Temporary Tablespsace Temp tablespace behaviour Oracle
  • Export With Query Another Example. Oracle
  • Rman Notes -1 Oracle

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • AWS (2)
  • Azure (1)
  • Linux/Unix (149)
  • Oracle (388)
  • PHP/MYSQL/Wordpress (10)
  • Power-BI (0)
  • Python/PySpark (7)
  • RAC (17)
  • rman-dataguard (26)
  • shell (149)
  • SQL scripts (337)
  • Uncategorized (0)
  • Videos (0)

Recent Posts

  • findinfo.sql (SQL for getting CPU and Active session info)27-May-2025
  • SQL Tracker by SID sqltrackerbysid.sql22-Apr-2025
  • How to connect to Oracle Database with Wallet with Python.21-Mar-2025
  • JSON/XML Types in Oracle18-Mar-2025
  • CPU Core related projections12-Mar-2025
  • Exadata Basics10-Dec-2024
  • Reading config file from other folder inside class24-Sep-2024
  • Python class import from different folders22-Sep-2024
  • Transfer SQL Profiles from One database to other database.05-Sep-2024
  • Load testing on Oracle 19C RAC with HammerDB18-Jan-2024

Archives

  • 2025
  • 2024
  • 2023
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • before_trunc.sql Before Truncate table needs to execute following: Oracle
  • Very Good Oralce Internal Tuning Book Oracle
  • Optimizer_Index init.ora parameter explaination. Oracle
  • Wait time tuning research Oracle
  • Very clear article about oracle dataguard Oracle
  • Processes parameter and its dependencies on OS kernel parameters Linux/Unix
  • Index Range Scan Oracle
  • initUOCIOTTO.ora Oracle

Copyright © 2025 pvmehta.com.

Powered by PressBook News WordPress theme