Oracle DBA Basement

Friday, July 5, 2013

Connect to RMAN over SQL*Net fails with ORA-27211: Failed to load Media Management Library

Connecting to RMAN works locally (target /) but fails over SQL*Net with "ORA-27211: Failed to load Media Management Library". Media manager software in this case was EMC Data Domain Boost.

Encountered this error when RMAN cloning (DUPLICATE) an RAC target database. Solution was to set the LD_LIBRARY_PATH in the LISTENER.ORA for the target instance to reference the ORACLE_HOME/lib essentially, media manager libraries. If the LD_LIBRARY_PATH is not required to be set for your Media Management software there are other things to also check, read on.

Allocating a channel from a local connection OK:

$ export LD_LIBRARY_PATH=/app/oracle/product/11.2.0/db_1/lib:${LD_LIBRARY_PATH}
$ rman target /

Recovery Manager: Release 11.2.0.3.0 - Production on Mon Jun 24 16:52:02 2013

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

connected to target database: REMODB (DBID=3746514001)

RMAN> run
2> {
3> allocate channel ch1 device type sbt TRACE 2 PARMS 
'SBT_LIBRARY=/app/oracle/product/11.2.0/db_1/lib/libddobk.so,BLKSIZE=1048576,ENV=(STORAGE_UNIT=rmanbackups,
BACKUP_HOST=bkdom01,ORACLE_HOME=/app/oracle/product/11.2.0/db_1)';
4> }

using target database control file instead of recovery catalog
allocated channel: ch1
channel ch1: SID=85 instance=REMODB1 device type=SBT_TAPE
channel ch1: Data Domain Boost API
released channel: ch1

Allocate a channel via an SQL*Net connection results in error:

$ rman target sys@remodb

Recovery Manager: Release 11.2.0.3.0 - Production on Mon Jun 24 16:52:56 2013

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

target database Password:
connected to target database: REMODB (DBID=3746514001)

RMAN> run
2> {
3> allocate channel ch1 device type sbt TRACE 2 PARMS 
'SBT_LIBRARY=/app/oracle/product/11.2.0/db_1/lib/libddobk.so,BLKSIZE=1048576,ENV=(STORAGE_UNIT=rmanbackups,
BACKUP_HOST=bkdom01,ORACLE_HOME=/app/oracle/product/11.2.0/db_1)';
4> }

using target database control file instead of recovery catalog
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of allocate command on ch1 channel at 06/24/2013 16:53:18
ORA-19554: error allocating device, device type: SBT_TAPE, device name:
ORA-27211: Failed to load Media Management Library
Additional information: 3977

First check - Process of elimination. Since the target is a RAC instance, connecting to a cluster instance could end up on any node. Eliminate the possbility of inconsistent media manager installations across nodes by creating a tnsnames entry (setup TNS_ADMIN) and connect only to one instance for testing. Then try another instance to test any differences until all the instances are fine.

Second check - Since connections over SQL*Net create a dedicated server process as the same user of the running listener, and in this case the listener was running under Grid Infrastructure (Clusterware) as the 'grid' user, verify the media manager shared objects are readable by the listener "owner", 'grid' in this case under the database product (ORACLE_HOME of target database). You can test by creating a seperate listener for the instance on a different port as the 'oracle' user (or whichever user owns the database product), create a new tnsnames entry (like in the first check) and test. After trying this, in our case the same error still occurred but eliminated another possibility.

Third check - Does LD_LIBRARY_PATH get used over SQL*NET when setting as an environment variable? The ENVS parameter can be set in the LISTENER.ORA for the target instance. As it turns out LD_LIBRARY_PATH set locally does not apply over SQL*Net.

listener.ora

LISTENER_DUP =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = myserver01)(PORT = 1522))
)
)

SID_LIST_LISTENER_DUP =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = REMODB)
(ORACLE_HOME = /app/oracle/product/11.2.0/db_1)
(SID_NAME = REMODB1)
(ENVS="LD_LIBRARY_PATH=/app/oracle/product/11.2.0/db_1/lib")
)
)

oracle@:/home/oracle/dba/keskins/tns_admin$ lsnrctl start listener_dup

LSNRCTL for Solaris: Version 11.2.0.3.0 - Production on 05-JUL-2013 15:07:51

Copyright (c) 1991, 2011, Oracle. All rights reserved.

Starting /app/oracle/product/11.2.0/db_1/bin/tnslsnr: please wait...

TNSLSNR for Solaris: Version 11.2.0.3.0 - Production
...
The command completed successfully

oracle@:/home/oracle/dba/keskins/tns_admin$ tnsping remodb_dup
...
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = remodb01)
(PORT = 1522)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = REMODB)))
OK (30 msec)

Now test:


oracle@:/home/oracle/dba/keskins/tns_admin$ rman target sys@remodb_dup

Recovery Manager: Release 11.2.0.3.0 - Production on Fri Jul 5 15:08:03 2013

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

target database Password:
connected to target database: REMODB (DBID=3746514001)

RMAN> run
2> {
3> allocate channel ch1 device type sbt TRACE 2 PARMS 
'SBT_LIBRARY=/app/oracle/product/11.2.0/db_1/lib/libddobk.so,BLKSIZE=1048576,ENV=(STORAGE_UNIT=rmanbackups,
BACKUP_HOST=bkdom01,ORACLE_HOME=/app/oracle/product/11.2.0/db_1)';
4> }

using target database control file instead of recovery catalog
allocated channel: ch1
channel ch1: SID=118 instance=REMODB1 device type=SBT_TAPE
channel ch1: Data Domain Boost API
released channel: ch1

RMAN>

Solution - set ENVS with LD_LIBRARY_PATH in LISTENER.ORA for target instance. LD_LIBRARY_PATH does not get loaded over SQL*Net when set locally at the SHELL.

One last thing I came across in testing, loading a dependant media manager library first (libDDBoost.so) which is not a "libobk.so" object, and then libobk library allocated a channel OK. Even without setting LD_LIBRARY_PATH RMAN can load shared objects by force using this workaround - wouldn't call it supported.

RMAN> run
2> {
3> allocate channel ch1 device type sbt TRACE 2 PARMS 
'SBT_LIBRARY=/app/oracle/product/11.2.0/db_1/lib/libDDBoost.so,BLKSIZE=1048576,ENV=(STORAGE_UNIT=rmanbackups,
BACKUP_HOST=bkdom01,ORACLE_HOME=/app/oracle/product/11.2.0/db_1)';
4> }

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of allocate command on ch1 channel at 06/24/2013 16:53:34
ORA-19554: error allocating device, device type: SBT_TAPE, device name:
ORA-27212: some entrypoints in Media Management Library are missing
SVR4 Error: 2: No such file or directory

RMAN> run
2> {
3> allocate channel ch1 device type sbt TRACE 2 PARMS 
'SBT_LIBRARY=/app/oracle/product/11.2.0/db_1/lib/libddobk.so,BLKSIZE=1048576,ENV=(STORAGE_UNIT=rmanbackups,
BACKUP_HOST=bkdom01,ORACLE_HOME=/app/oracle/product/11.2.0/db_1)';
4> }

allocated channel: ch1
channel ch1: SID=69 instance=REMODB2 device type=SBT_TAPE
channel ch1: Data Domain Boost API
released channel: ch1

Thursday, June 13, 2013

Solaris 10 Memory usage

Quick guide to determine if Solaris 10 is low on memory:

1.Firstly, how much “physical” memory is allocated:

$ /usr/sbin/prtconf
System Configuration:  Oracle Corporation  sun4v
Memory size: 16384 Megabytes
....

2. Is revolutions of the clock hand non-zero and increasing (say 1+ a minute). Revolutions of the clock is set to 0 on system startup. If it is non-zero but everything appears fine now then it indicates there may have been a problem in the past.

vmstat -s
....
      843 revolutions of the clock hand
....

3. Scan rate in vmstat, consistently non-zero value in the hundreds+

vmstat 1 1000 | awk '{printf("sr = %d\n",$12);}'

4.Is current memory free less than a few hundred MB? pagesize = 8192, used for calculation below (sr in vmstat reported in pages).

$ /usr/bin/pagesize
8192

(in GB):

$ sar -r 1 1000 | awk '{printf("memfree %.2f\n",($2*8192)/1024/1024/1024);}'
...
memfree 0.31
memfree 0.30
memfree 0.28
...

5. Is the ‘w’ (wait) column in vmstat consistently non-zero? w column:

$ vmstat 1
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr vc vc vc vc   in   sy   cs us sy id
 0 0 105 20514416 6943200 379 532 328 702 1539 0 26970 35 26 31 2 5342 14534 8187 3 4 93
 1 0 93 30418504 12141008 33 32 0 0 0 0  0  0  0  0  0 1212 3252 1163  0  1 99
 0 0 93 30418312 12141072 30 14 0 0 0 0  0  0  0  1  0 1172 1657 1089  0  1 99
 1 0 93 30410120 12136976 30 14 0 0 0 0  0  0  0  0  0 1149 1618 1080  0  0 99

If yes to all above, what is using the most memory?

prstat -s size -Z
...

  9705 grid      487M  470M sleep   59    0   0:00:00 0.0% oracle/1
  9695 grid      487M  472M sleep   59    0   0:00:00 0.0% oracle/1
ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE
     0      182   13G   13G    84%   6:07:50 6.2% global
...

Thursday, February 21, 2013

Packaging an incident using ADRCI for Oracle Support

Sending a packaged incident to Oracle Support.

adrci> show homes
ADR Homes:
diag/clients/user_grid/host_1359114779_80
diag/clients/user_oracle/host_1359114779_80
diag/clients/user_oracle/host_1359114779_11
diag/rdbms/mydba/MYDBA1
diag/rdbms/mydbb/MYDBB1
diag/asm/+asm/+ASM1

adrci> set homepath diag/rdbms/mydba/MYDBA1

adrci> ips create package
Created package 1 without any contents, correlation level typical

adrci> ips show package
   PACKAGE_ID             1
   PACKAGE_NAME
   PACKAGE_DESCRIPTION
   DRIVING_PROBLEM        N/A
   DRIVING_PROBLEM_KEY    N/A
   DRIVING_INCIDENT       N/A
   DRIVING_INCIDENT_TIME  N/A
   STATUS                 New (0)
   CORRELATION_LEVEL      Typical (2)
   PROBLEMS               0 main problems, 0 correlated problems
   INCIDENTS              0 main incidents, 0 correlated incidents
   INCLUDED_FILES         0



adrci> ips add incident 7930 package 1;
Added incident 7930 to package 1

adrci> ips show package 1
DETAILS FOR PACKAGE 1:
   PACKAGE_ID             1
   PACKAGE_NAME
   PACKAGE_DESCRIPTION
   DRIVING_PROBLEM        N/A
   DRIVING_PROBLEM_KEY    N/A
   DRIVING_INCIDENT       N/A
   DRIVING_INCIDENT_TIME  N/A
   STATUS                 New (0)
   CORRELATION_LEVEL      Typical (2)
   PROBLEMS               1 main problems, 1 correlated problems
   INCIDENTS              1 main incidents, 11 correlated incidents
   INCLUDED_FILES         28
   SEQUENCES              Last 0, last full 0, last base 0
   UNPACKED               FALSE
   CREATE_TIME            2012-10-16 10:37:13.227548 +10:00
   UPDATE_TIME            2012-10-16 11:10:24.762948 +10:00
   BEGIN_TIME             N/A
   END_TIME               N/A
   FLAGS                  0
adrci> 

adrci> ips finalize package 1
Finalized package 1
adrci> show packages

adrci> ips generate package 1 in /mydir/mypackages/ips
Generated package 1 in file /mydir/mypackages/ips/IPSPKG_20121016103713_COM_1.zip, mode complete
adrci> exit

Thursday, February 7, 2013

Data pump, ORA-12161 TNS bad packet, ORA-12592 internal error.

Datapump import over database link failed with:
ORA-12161: TNS:internal error: partial data received
ORA-12592: TNS:bad packet

Solution was to disable ALG on the Juniper firewall. This is similar to another problem when using SCAN listeners and ALG enabled in the firewall.

Processing object type SCHEMA_EXPORT/TABLE/TABLE
ORA-39126: Worker unexpected fatal error in KUPW$WORKER.FETCH_XML_OBJECTS [TABLE:"ME"."MYTAB"]
ORA-12161: TNS:internal error: partial data received
ORA-02063: preceding line from MYDB_LINK

ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPW$WORKER", line 9001

----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
3f7ed28b0     20462  package body SYS.KUPW$WORKER
3f7ed28b0      9028  package body SYS.KUPW$WORKER
3f7ed28b0     10935  package body SYS.KUPW$WORKER
3f7ed28b0      2728  package body SYS.KUPW$WORKER
3f7ed28b0      9697  package body SYS.KUPW$WORKER
3f7ed28b0      1775  package body SYS.KUPW$WORKER
3f7c070a8         2  anonymous block

ORA-39126: Worker unexpected fatal error in KUPW$WORKER.DISPATCH_WORK_ITEMS [TABLE:"ME"."MYOTHERTAB"]
ORA-31600: invalid input value 100001 for parameter HANDLE in function CLOSE
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 105
ORA-06512: at "SYS.DBMS_METADATA", line 544
ORA-06512: at "SYS.DBMS_METADATA", line 4187
ORA-06512: at "SYS.DBMS_METADATA", line 1475
ORA-06512: at "SYS.DBMS_METADATA", line 7481
ORA-06512: at "SYS.KUPW$WORKER", line 2792
ORA-02067: transaction or savepoint rollback required
ORA-06512: at "SYS.DBMS_METADATA", line 1475
ORA-06512: at "SYS.DBMS_METADATA", line 7481
ORA-06512: at "SYS.KUPW$WORKER", line 10928
ORA-12592: TNS:bad packet
ORA-02055: distributed update operation failed; rollback required
ORA-02063: preceding lines from MYDB_LINK

ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPW$WORKER", line 9001

----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
3f7ed28b0     20462  package body SYS.KUPW$WORKER
3f7ed28b0      9028  package body SYS.KUPW$WORKER
3f7ed28b0      9831  package body SYS.KUPW$WORKER
3f7ed28b0      1775  package body SYS.KUPW$WORKER
3f7c070a8         2  anonymous block

Job "ME"."IMPDP_TEST_JOB" stopped due to fatal error at 10:52:42

Thursday, October 4, 2012

ORA-03135: connection lost contact

A firewall timeout may cause an idle session to receive an "ORA-03135: connection lost contact" when running an SQL.

SQL> select to_char(sysdate,'DD.MON.YYYY HH24:MI:SS') from dual

TO_CHAR(SYSDATE,'DD.MON.YYYYH
-----------------------------
28.SEP.2012 10:52:01

Elapsed: 00:00:00.01

(Delay for 35+ minutes)

SQL> r
  1* select to_char(sysdate,'DD.MON.YYYY HH24:MI:SS') from dual
select to_char(sysdate,'DD.MON.YYYY HH24:MI:SS') from dual
*
ERROR at line 1:
ORA-03135: connection lost contact


Elapsed: 00:05:22.63

Note the elapsed time of 5 mintes 22 seconds.

To debug enable a client side SQL*net trace and reproduce the error. In this case after several tests the error occured after 30 minutes.

Oracle 11.1+ database versions mask the "ORA-03135" error and return a result after the 5 minutes which may lead the DBA down the wrong path believing there is a problem with the database.

For 11g the DIAG_ADR_ENABLED=off must be set otherwise trace files will go to the DIAGNOSTICS destination.

To not interfer with other users create the SQLNET.ORA and TNSNAMES.ORA in your own directory and use them by setting the TNS_ADMIN variable.

SQLNET.ORA


TNSPING.TRACE_LEVEL=admin
TNSPING.TRACE_DIRECTORY=/home/oracle/dba/user1/tns_admin/trace/
TRACE_LEVEL_CLIENT=admin
TRACE_DIRECTORY_CLIENT=/home/oracle/dba/user1/tns_admin/trace/
TRACE_UNIQUE_CLIENT=on
TRACE_TIMESTAMP_CLIENT=on
TRACE_TIMESTAMP_SERVER=true
TRACE_FILE_CLIENT=client_trace
DIAG_ADR_ENABLED=off

Information gathered in the trace file:

[28-SEP-2012 11:29:21:622] nioqsm: send-break: failed to send oob break...
[28-SEP-2012 11:29:21:622] nioqper:  error from send-marker
[28-SEP-2012 11:29:21:622] nioqper:    ns main err code: 12547
[28-SEP-2012 11:29:21:622] nioqper:    ns (2)  err code: 12560
[28-SEP-2012 11:29:21:622] nioqper:    nt main err code: 517
[28-SEP-2012 11:29:21:622] nioqper:    nt (2)  err code: 32
[28-SEP-2012 11:29:21:622] nioqper:    nt OS   err code: 0
[28-SEP-2012 11:29:21:622] nioqsm: exit
[28-SEP-2012 11:29:21:622] nioqer: entry
[28-SEP-2012 11:29:21:622] nioqce: entry
[28-SEP-2012 11:29:21:622] nioqce: exit
[28-SEP-2012 11:29:21:623] nioqer: exit
[28-SEP-2012 11:29:21:623] nioqrs: nioqrs: Couldn't send break. returning 3135

The trace file data above matches what is in MOS note:

ORA-3135 with Recovery Catalog Creation Across the Network (Firewall included) [ID 805088.1]

"where nt[1]=32 is Operating System Dependent(OSD) error code.

OSD error is Err#32 Broken pipe. This OSD error is also defined in errno.h:
* #define EPIPE 32 /* Broken pipe */"

Resolution was to set an SQLNET.EXPIRE_TIME=15 in the database product $ORACLE_HOME/network/admin/sqlnet.ora (choose 15 minutes because it was less than the 30 minutes the firewall was closing the session).

Other references

Troubleshooting ORA-3135 Connection Lost Contact [ID 787354.1]