Skip to content
pvmehta.com

pvmehta.com

  • Home
  • About Me
  • Toggle search form
  • Some useful Unix Commands Linux/Unix
  • Sequence Resetting Oracle
  • online_ts_bkup.sql Oracle
  • v$backup.status information Oracle
  • load SPM baseline from cursor cache Oracle
  • New OFA for 11g Oracle
  • Generating XML from SQLPLUS Oracle
  • Goldengate Tutorial Oracle
  • Korn Shell Arithmatic Linux/Unix
  • Oracle Release Explaination Oracle
  • proper cpu stats Linux/Unix
  • How to know Number of CPUs on Sun Box Linux/Unix
  • EXTPROC Oracle
  • Test Case for Inserting Multiple (2.3 Million rows in 26 Seconds) Oracle
  • find_log_switch.sql Find log switches in graphical manner Oracle

Resolving RMAN Hung Jobs

Posted on 04-Nov-2005 By Admin No Comments on Resolving RMAN Hung Jobs

PURPOSE

=======

This note discusses resolving an RMAN Hung Job

SCOPE & APPLICATION

===================

Anyone involved in running RMAN jobs

Resolving an RMAN Hung Job

==========================

Components of an RMAN Session

The nature of an RMAN session depends on the operating system. In UNIX,

an RMAN session has the following processes associated with it:

– The RMAN process itself.

– The catalog connection to the recovery catalog database–if using a

recovery catalog, none otherwise.

– The connection to the target database, also called the default channel.

– A polling connection to the target database used for RPC testing of each

different connect string used in the allocate channel command. By default

there is no connect string in allocate channel and so there is only one

RPC connection.

– One target connection to the target database corresponding to each

allocated channel.

Process Behavior During a Hung Job

RMAN usually hangs because one of the channel connections is waiting in the

media manager code for a tape resource. The catalog connection and the default

channel seem to hang because they are waiting for RMAN to tell them what to do.

Polling connections seem to be in an infinite loop while polling the RPC under

the control of the RMAN process.

If you kill the RMAN process itself, then you also kill the catalog connection,

the default channel, and the polling connections. Target connections that are not

hung in the media manager code also terminate: only the target connection executing

in the media management layer remains active. You must manually kill this process

because terminating its session does not kill it. Even after termination, the media

manager may keep resources busy or continue processing because it does not realize

that the Oracle process is gone. This behavior is media manager-dependent.

Terminating the catalog connection does not cause RMAN to finish because RMAN is

not performing catalog operations. Removing default channel and polling connections

cause the RMAN process to detect that one of the channels has died and then proceed

to exit. In this case, the connections to the hung channels remain active as

described above.

Terminating an RMAN Session

The best way to terminate RMAN when the connections for the allocated channels

are hung in the media manager is to kill the Oracle process of the connections.

The RMAN process detects this termination and proceed to exit, removing all

connections except target connections that are still operative in the media

management layer. The caveat about the media manager resources still applies

in this case.

To identify and terminate an oracle process that is hung in the media manager code:

This procedure is system-specific. See your operating system-specific documentation

for the relevant commands.

1. Obtain the current stack trace for the desired process id using a system-specific

utility. For example, on Sun Solaris you can use the command pstack located in

/usr/proc/bin to obtain the stack.

2. After the stack is obtained, look for the process with SBTxxxx (normally sbtopen)

as one of its top calls. Note that other layers may appear on top of it.

3. Obtain the stack again after a few minutes. If the same stack trace is returned,

then you have identified the hung process.

4. Kill the hung process using a system-specific utility. For example,

on Sun Solaris execute a kill -9 command.

5. Repeat this procedure for all hung channels in the media management code.

6. Check that the media manager also clears its processes, otherwise the next

backup or restore may still hang due to the previous hang. In some media

managers, the only solution is too shut down and restart the media manager

daemons. If the documentation from the media manager is unhelpful, ask the

media manager technical support for the correct solution.

Backup Job Is Hanging

In this scenario, an RMAN backup job starts as normal and then pauses inexplicably:

Recovery Manager: Release 8.1.5.0.0 – Production

RMAN-06005: connected to target database: TORPEDO

RMAN-06008: connected to recovery catalog database

RMAN> run {

2> allocate channel t1 type “SBT_TAPE”;

3> backup

4> tablespace system,users; }

RMAN-03022: compiling command: allocate

RMAN-03023: executing command: allocate

RMAN-08030: allocated channel: t1

RMAN-08500: channel t1: sid=16 devtype=SBT_TAPE

RMAN-03022: compiling command: backup

RMAN-03023: executing command: backup

RMAN-08008: channel t1: starting datafile backupset

RMAN-08502: set_count=15 set_stamp=338309600

RMAN-08010: channel t1: including datafile 2 in backupset

RMAN-08010: channel t1: including datafile 1 in backupset

RMAN-08011: channel t1: including current controlfile in backupset

# Hanging here for 30 min now

Diagnosis of the Cause

If a backup job is hanging, that is, not proceeding, then several scenarios

are possible:

– The job abnormally terminated.

– A server-side or media management error occurred.

– RMAN is waiting for an event such as the insertion of a new cassette into

the tape device.

Your first task is to try to determine which of these scenarios is the most

likely cause.

To determine the cause of the hang:

1. If you are using a media manager, examine media manager process, log, and trace

files for signs of abnormal termination or other errors (see the description of

message files in “Identifying Types of Message Output”). If this information is

not helpful, proceed to the next step.

2. Restart RMAN and turn on debugging, making sure to specify a trace file to

contain the output. For example, enter:

% rman target / catalog rman/rman@catdb debug trace = /oracle/log

3. Re-execute the job:

run {

allocate channel c1 type ‘sbt_tape’;

backup tablespace system;

}

4. Examine the debugging output to determine where RMAN is hanging.

The output will most likely indicate that the last RPC sent from the

client to the server was SYS.DBMS_BACKUP_RESTORE.BACKUPPIECECREATE,

which is the call that causes the server to interact with the media

manager to write the backup data:

krmxrpc: xc=6897512 starting long running RPC #13 to target: DBMS_BACKUP_RESTORE.

BACKUPPIECECREATE

krmxr: xc=6897512 started long running rpc

5. Check to see what the server processes performing the backup are doing.

How many processes are hanging? If only one, check to see what it is doing

by querying V$SESSION_WAIT. For example, to determine what process 12 is

doing, enter:

SELECT * FROM v$session_wait WHERE wait_time = 0 AND sid = 12;

6. If a backup to tape stalls at the beginning, issue the following query:

SELECT * FROM v$session_longops WHERE compnam = ‘dbms_backup_restore’; –> for 8.0

SELECT * FROM v$session_longops WHERE substr(opname,1,4)=’RMAN’; –> for 8.1 & 9.0

If Oracle returns no information, then the PL/SQL program performing the backup

is hung.

Solution

Because the causes of a hung backup job can be varied, so are the solutions.

The best practice is to look for the simplest solutions first. For example,

it is quite common for backup jobs to hang simply because the tape device has

completely filled the current cassette and is waiting for a new tape to be

inserted. Look for the obvious in all components used for the backup when

problems occur.

Oracle, rman-dataguard

Post navigation

Previous Post: Vivek Tuning for Row Locks.
Next Post: Changing the Global Database Name

Related Posts

  • Benefits and Usage of RMAN with Standby Databases Oracle
  • Find Stale DR Physical Standby Oracle
  • Locally Managed Tablespace and Dictionary managed tablespace (LMT-DMT) Oracle
  • Good Oracle Architecture In Short and point to point Oracle
  • Oracle 10g Wait Model Oracle
  • How to find password change date for user Oracle

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Ansible (0)
  • AWS (2)
  • Azure (1)
  • Linux/Unix (149)
  • MYSQL (5)
  • Oracle (393)
  • PHP/MYSQL/Wordpress (10)
  • POSTGRESQL (0)
  • Power-BI (0)
  • Python/PySpark (7)
  • RAC (17)
  • rman-dataguard (26)
  • shell (149)
  • SQL scripts (342)
  • SQL Server (6)
  • Uncategorized (0)
  • Videos (0)

Recent Posts

  • Trace a SQL session from another session using ORADEBUG30-Sep-2025
  • SQL Server Vs Oracle Architecture difference25-Jul-2025
  • SQL Server: How to see historical transactions25-Jul-2025
  • SQL Server: How to see current transactions or requests25-Jul-2025
  • T-SQL Vs PL/SQL Syntax25-Jul-2025
  • Check SQL Server edition25-Jul-2025
  • Checking SQL Server Version25-Jul-2025
  • Oracle vs MYSQL Architecture differences (For DBAs)24-Jul-2025
  • V$INSTANCE of Oracle in MYSQL24-Jul-2025
  • Day to day MYSQL DBA operations (Compared with Oracle DBA)24-Jul-2025

Archives

  • 2025
  • 2024
  • 2023
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • TABLE SIZING WITH DB_BLOCK ARCHITECTURE Reference : Metalink note : 10640.1 Oracle
  • pvmehta.com SQL scripts
    Find which sessions is accessing object that prevent your session to have exclusive locks in Oracle Oracle
  • set_env_dba Linux/Unix
  • Implementation of key based authentications Linux/Unix
  • Drop specific SQL plan baseline – spm Oracle
  • How to remove blank lines using vi editor command Linux/Unix
  • 284785.1 How to check RAC Option is currently linked into the Oracle Binary Oracle
  • Remove DOS CR/LFs (^M) Linux/Unix

Copyright © 2025 pvmehta.com.

Powered by PressBook News WordPress theme