Friday, July 5, 2013

Connect to RMAN over SQL*Net fails with ORA-27211: Failed to load Media Management Library

Connecting to RMAN works locally (target /) but fails over SQL*Net with "ORA-27211: Failed to load Media Management Library". Media manager software in this case was EMC Data Domain Boost.

Encountered this error when RMAN cloning (DUPLICATE) an RAC target database. Solution was to set the LD_LIBRARY_PATH in the LISTENER.ORA for the target instance to reference the ORACLE_HOME/lib essentially, media manager libraries. If the LD_LIBRARY_PATH is not required to be set for your Media Management software there are other things to also check, read on.

Allocating a channel from a local connection OK:

$ export LD_LIBRARY_PATH=/app/oracle/product/11.2.0/db_1/lib:${LD_LIBRARY_PATH}
$ rman target /

Recovery Manager: Release 11.2.0.3.0 - Production on Mon Jun 24 16:52:02 2013

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

connected to target database: REMODB (DBID=3746514001)

RMAN> run
2> {
3> allocate channel ch1 device type sbt TRACE 2 PARMS 
'SBT_LIBRARY=/app/oracle/product/11.2.0/db_1/lib/libddobk.so,BLKSIZE=1048576,ENV=(STORAGE_UNIT=rmanbackups,
BACKUP_HOST=bkdom01,ORACLE_HOME=/app/oracle/product/11.2.0/db_1)';
4> }

using target database control file instead of recovery catalog
allocated channel: ch1
channel ch1: SID=85 instance=REMODB1 device type=SBT_TAPE
channel ch1: Data Domain Boost API
released channel: ch1
Allocate a channel via an SQL*Net connection results in error:
$ rman target sys@remodb

Recovery Manager: Release 11.2.0.3.0 - Production on Mon Jun 24 16:52:56 2013

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

target database Password:
connected to target database: REMODB (DBID=3746514001)

RMAN> run
2> {
3> allocate channel ch1 device type sbt TRACE 2 PARMS 
'SBT_LIBRARY=/app/oracle/product/11.2.0/db_1/lib/libddobk.so,BLKSIZE=1048576,ENV=(STORAGE_UNIT=rmanbackups,
BACKUP_HOST=bkdom01,ORACLE_HOME=/app/oracle/product/11.2.0/db_1)';
4> }

using target database control file instead of recovery catalog
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of allocate command on ch1 channel at 06/24/2013 16:53:18
ORA-19554: error allocating device, device type: SBT_TAPE, device name:
ORA-27211: Failed to load Media Management Library
Additional information: 3977
  • First check - Process of elimination. Since the target is a RAC instance, connecting to a cluster instance could end up on any node. Eliminate the possbility of inconsistent media manager installations across nodes by creating a tnsnames entry (setup TNS_ADMIN) and connect only to one instance for testing. Then try another instance to test any differences until all the instances are fine.
  • Second check - Since connections over SQL*Net create a dedicated server process as the same user of the running listener, and in this case the listener was running under Grid Infrastructure (Clusterware) as the 'grid' user, verify the media manager shared objects are readable by the listener "owner", 'grid' in this case under the database product (ORACLE_HOME of target database). You can test by creating a seperate listener for the instance on a different port as the 'oracle' user (or whichever user owns the database product), create a new tnsnames entry (like in the first check) and test. After trying this, in our case the same error still occurred but eliminated another possibility.
  • Third check - Does LD_LIBRARY_PATH get used over SQL*NET when setting as an environment variable? The ENVS parameter can be set in the LISTENER.ORA for the target instance. As it turns out LD_LIBRARY_PATH set locally does not apply over SQL*Net.

    listener.ora

    LISTENER_DUP =
    (DESCRIPTION_LIST =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = myserver01)(PORT = 1522))
    )
    )
    
    SID_LIST_LISTENER_DUP =
    (SID_LIST =
    (SID_DESC =
    (GLOBAL_DBNAME = REMODB)
    (ORACLE_HOME = /app/oracle/product/11.2.0/db_1)
    (SID_NAME = REMODB1)
    (ENVS="LD_LIBRARY_PATH=/app/oracle/product/11.2.0/db_1/lib")
    )
    )
    
    oracle@:/home/oracle/dba/keskins/tns_admin$ lsnrctl start listener_dup
    
    LSNRCTL for Solaris: Version 11.2.0.3.0 - Production on 05-JUL-2013 15:07:51
    
    Copyright (c) 1991, 2011, Oracle. All rights reserved.
    
    Starting /app/oracle/product/11.2.0/db_1/bin/tnslsnr: please wait...
    
    TNSLSNR for Solaris: Version 11.2.0.3.0 - Production
    ...
    The command completed successfully
    
    oracle@:/home/oracle/dba/keskins/tns_admin$ tnsping remodb_dup
    ...
    Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = remodb01)
    (PORT = 1522)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = REMODB)))
    OK (30 msec)
    
    
    Now test:
    
    oracle@:/home/oracle/dba/keskins/tns_admin$ rman target sys@remodb_dup
    
    Recovery Manager: Release 11.2.0.3.0 - Production on Fri Jul 5 15:08:03 2013
    
    Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
    
    target database Password:
    connected to target database: REMODB (DBID=3746514001)
    
    RMAN> run
    2> {
    3> allocate channel ch1 device type sbt TRACE 2 PARMS 
    'SBT_LIBRARY=/app/oracle/product/11.2.0/db_1/lib/libddobk.so,BLKSIZE=1048576,ENV=(STORAGE_UNIT=rmanbackups,
    BACKUP_HOST=bkdom01,ORACLE_HOME=/app/oracle/product/11.2.0/db_1)';
    4> }
    
    using target database control file instead of recovery catalog
    allocated channel: ch1
    channel ch1: SID=118 instance=REMODB1 device type=SBT_TAPE
    channel ch1: Data Domain Boost API
    released channel: ch1
    
    RMAN> 
    
    
    Solution - set ENVS with LD_LIBRARY_PATH in LISTENER.ORA for target instance. LD_LIBRARY_PATH does not get loaded over SQL*Net when set locally at the SHELL.

    One last thing I came across in testing, loading a dependant media manager library first (libDDBoost.so) which is not a "libobk.so" object, and then libobk library allocated a channel OK. Even without setting LD_LIBRARY_PATH RMAN can load shared objects by force using this workaround - wouldn't call it supported.

    RMAN> run
    2> {
    3> allocate channel ch1 device type sbt TRACE 2 PARMS 
    'SBT_LIBRARY=/app/oracle/product/11.2.0/db_1/lib/libDDBoost.so,BLKSIZE=1048576,ENV=(STORAGE_UNIT=rmanbackups,
    BACKUP_HOST=bkdom01,ORACLE_HOME=/app/oracle/product/11.2.0/db_1)';
    4> }
    
    RMAN-00571: ===========================================================
    RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
    RMAN-00571: ===========================================================
    RMAN-03009: failure of allocate command on ch1 channel at 06/24/2013 16:53:34
    ORA-19554: error allocating device, device type: SBT_TAPE, device name:
    ORA-27212: some entrypoints in Media Management Library are missing
    SVR4 Error: 2: No such file or directory
    
    RMAN> run
    2> {
    3> allocate channel ch1 device type sbt TRACE 2 PARMS 
    'SBT_LIBRARY=/app/oracle/product/11.2.0/db_1/lib/libddobk.so,BLKSIZE=1048576,ENV=(STORAGE_UNIT=rmanbackups,
    BACKUP_HOST=bkdom01,ORACLE_HOME=/app/oracle/product/11.2.0/db_1)';
    4> }
    
    allocated channel: ch1
    channel ch1: SID=69 instance=REMODB2 device type=SBT_TAPE
    channel ch1: Data Domain Boost API
    released channel: ch1
    

  • Thursday, June 13, 2013

    Solaris 10 Memory usage

    Quick guide to determine if Solaris 10 is low on memory:

    1.Firstly, how much “physical” memory is allocated:

    $ /usr/sbin/prtconf
    System Configuration:  Oracle Corporation  sun4v
    Memory size: 16384 Megabytes
    ....
    

    2. Is revolutions of the clock hand non-zero and increasing (say 1+ a minute). Revolutions of the clock is set to 0 on system startup. If it is non-zero but everything appears fine now then it indicates there may have been a problem in the past.

    vmstat -s
    ....
          843 revolutions of the clock hand
    ....
    

    3. Scan rate in vmstat, consistently non-zero value in the hundreds+

    vmstat 1 1000 | awk '{printf("sr = %d\n",$12);}' 
     

    4.Is current memory free less than a few hundred MB? pagesize = 8192, used for calculation below (sr in vmstat reported in pages).

    $ /usr/bin/pagesize
    8192
    
    (in GB):
    $ sar -r 1 1000 | awk '{printf("memfree %.2f\n",($2*8192)/1024/1024/1024);}'
    ...
    memfree 0.31
    memfree 0.30
    memfree 0.28
    ...
    

    5. Is the ‘w’ (wait) column in vmstat consistently non-zero? w column:

    $ vmstat 1
     kthr      memory            page            disk          faults      cpu
     r b w   swap  free  re  mf pi po fr de sr vc vc vc vc   in   sy   cs us sy id
     0 0 105 20514416 6943200 379 532 328 702 1539 0 26970 35 26 31 2 5342 14534 8187 3 4 93
     1 0 93 30418504 12141008 33 32 0 0 0 0  0  0  0  0  0 1212 3252 1163  0  1 99
     0 0 93 30418312 12141072 30 14 0 0 0 0  0  0  0  1  0 1172 1657 1089  0  1 99
     1 0 93 30410120 12136976 30 14 0 0 0 0  0  0  0  0  0 1149 1618 1080  0  0 99
    

    If yes to all above, what is using the most memory?

    prstat -s size -Z
    ...
    
      9705 grid      487M  470M sleep   59    0   0:00:00 0.0% oracle/1
      9695 grid      487M  472M sleep   59    0   0:00:00 0.0% oracle/1
    ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE
         0      182   13G   13G    84%   6:07:50 6.2% global
    ...
    

    Thursday, February 21, 2013

    Packaging an incident using ADRCI for Oracle Support

    Sending a packaged incident to Oracle Support.

    adrci> show homes
    ADR Homes:
    diag/clients/user_grid/host_1359114779_80
    diag/clients/user_oracle/host_1359114779_80
    diag/clients/user_oracle/host_1359114779_11
    diag/rdbms/mydba/MYDBA1
    diag/rdbms/mydbb/MYDBB1
    diag/asm/+asm/+ASM1
    
    adrci> set homepath diag/rdbms/mydba/MYDBA1
    
    adrci> ips create package
    Created package 1 without any contents, correlation level typical
    
    adrci> ips show package
       PACKAGE_ID             1
       PACKAGE_NAME
       PACKAGE_DESCRIPTION
       DRIVING_PROBLEM        N/A
       DRIVING_PROBLEM_KEY    N/A
       DRIVING_INCIDENT       N/A
       DRIVING_INCIDENT_TIME  N/A
       STATUS                 New (0)
       CORRELATION_LEVEL      Typical (2)
       PROBLEMS               0 main problems, 0 correlated problems
       INCIDENTS              0 main incidents, 0 correlated incidents
       INCLUDED_FILES         0
    
    
    
    adrci> ips add incident 7930 package 1;
    Added incident 7930 to package 1
    
    adrci> ips show package 1
    DETAILS FOR PACKAGE 1:
       PACKAGE_ID             1
       PACKAGE_NAME
       PACKAGE_DESCRIPTION
       DRIVING_PROBLEM        N/A
       DRIVING_PROBLEM_KEY    N/A
       DRIVING_INCIDENT       N/A
       DRIVING_INCIDENT_TIME  N/A
       STATUS                 New (0)
       CORRELATION_LEVEL      Typical (2)
       PROBLEMS               1 main problems, 1 correlated problems
       INCIDENTS              1 main incidents, 11 correlated incidents
       INCLUDED_FILES         28
       SEQUENCES              Last 0, last full 0, last base 0
       UNPACKED               FALSE
       CREATE_TIME            2012-10-16 10:37:13.227548 +10:00
       UPDATE_TIME            2012-10-16 11:10:24.762948 +10:00
       BEGIN_TIME             N/A
       END_TIME               N/A
       FLAGS                  0
    adrci> 
    
    adrci> ips finalize package 1
    Finalized package 1
    adrci> show packages
    
    adrci> ips generate package 1 in /mydir/mypackages/ips
    Generated package 1 in file /mydir/mypackages/ips/IPSPKG_20121016103713_COM_1.zip, mode complete
    adrci> exit
    

    Thursday, February 7, 2013

    Data pump, ORA-12161 TNS bad packet, ORA-12592 internal error.

    Datapump import over database link failed with:
    ORA-12161: TNS:internal error: partial data received
    ORA-12592: TNS:bad packet

    Solution was to disable ALG on the Juniper firewall. This is similar to another problem when using SCAN listeners and ALG enabled in the firewall.

    Processing object type SCHEMA_EXPORT/TABLE/TABLE
    ORA-39126: Worker unexpected fatal error in KUPW$WORKER.FETCH_XML_OBJECTS [TABLE:"ME"."MYTAB"]
    ORA-12161: TNS:internal error: partial data received
    ORA-02063: preceding line from MYDB_LINK
    
    ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
    ORA-06512: at "SYS.KUPW$WORKER", line 9001
    
    ----- PL/SQL Call Stack -----
      object      line  object
      handle    number  name
    3f7ed28b0     20462  package body SYS.KUPW$WORKER
    3f7ed28b0      9028  package body SYS.KUPW$WORKER
    3f7ed28b0     10935  package body SYS.KUPW$WORKER
    3f7ed28b0      2728  package body SYS.KUPW$WORKER
    3f7ed28b0      9697  package body SYS.KUPW$WORKER
    3f7ed28b0      1775  package body SYS.KUPW$WORKER
    3f7c070a8         2  anonymous block
    
    ORA-39126: Worker unexpected fatal error in KUPW$WORKER.DISPATCH_WORK_ITEMS [TABLE:"ME"."MYOTHERTAB"]
    ORA-31600: invalid input value 100001 for parameter HANDLE in function CLOSE
    ORA-06512: at "SYS.DBMS_SYS_ERROR", line 105
    ORA-06512: at "SYS.DBMS_METADATA", line 544
    ORA-06512: at "SYS.DBMS_METADATA", line 4187
    ORA-06512: at "SYS.DBMS_METADATA", line 1475
    ORA-06512: at "SYS.DBMS_METADATA", line 7481
    ORA-06512: at "SYS.KUPW$WORKER", line 2792
    ORA-02067: transaction or savepoint rollback required
    ORA-06512: at "SYS.DBMS_METADATA", line 1475
    ORA-06512: at "SYS.DBMS_METADATA", line 7481
    ORA-06512: at "SYS.KUPW$WORKER", line 10928
    ORA-12592: TNS:bad packet
    ORA-02055: distributed update operation failed; rollback required
    ORA-02063: preceding lines from MYDB_LINK
    
    ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
    ORA-06512: at "SYS.KUPW$WORKER", line 9001
    
    ----- PL/SQL Call Stack -----
      object      line  object
      handle    number  name
    3f7ed28b0     20462  package body SYS.KUPW$WORKER
    3f7ed28b0      9028  package body SYS.KUPW$WORKER
    3f7ed28b0      9831  package body SYS.KUPW$WORKER
    3f7ed28b0      1775  package body SYS.KUPW$WORKER
    3f7c070a8         2  anonymous block
    
    Job "ME"."IMPDP_TEST_JOB" stopped due to fatal error at 10:52:42
    

    Thursday, October 4, 2012

    ORA-03135: connection lost contact

    A firewall timeout may cause an idle session to receive an "ORA-03135: connection lost contact" when running an SQL.
    SQL> select to_char(sysdate,'DD.MON.YYYY HH24:MI:SS') from dual
    
    TO_CHAR(SYSDATE,'DD.MON.YYYYH
    -----------------------------
    28.SEP.2012 10:52:01
    
    Elapsed: 00:00:00.01
    
    (Delay for 35+ minutes)
    SQL> r
      1* select to_char(sysdate,'DD.MON.YYYY HH24:MI:SS') from dual
    select to_char(sysdate,'DD.MON.YYYY HH24:MI:SS') from dual
    *
    ERROR at line 1:
    ORA-03135: connection lost contact
    
    
    Elapsed: 00:05:22.63
    
    Note the elapsed time of 5 mintes 22 seconds.

    To debug enable a client side SQL*net trace and reproduce the error. In this case after several tests the error occured after 30 minutes.

    Oracle 11.1+ database versions mask the "ORA-03135" error and return a result after the 5 minutes which may lead the DBA down the wrong path believing there is a problem with the database.

    For 11g the DIAG_ADR_ENABLED=off must be set otherwise trace files will go to the DIAGNOSTICS destination.

    To not interfer with other users create the SQLNET.ORA and TNSNAMES.ORA in your own directory and use them by setting the TNS_ADMIN variable.

    SQLNET.ORA

    
    TNSPING.TRACE_LEVEL=admin
    TNSPING.TRACE_DIRECTORY=/home/oracle/dba/user1/tns_admin/trace/
    TRACE_LEVEL_CLIENT=admin
    TRACE_DIRECTORY_CLIENT=/home/oracle/dba/user1/tns_admin/trace/
    TRACE_UNIQUE_CLIENT=on
    TRACE_TIMESTAMP_CLIENT=on
    TRACE_TIMESTAMP_SERVER=true
    TRACE_FILE_CLIENT=client_trace
    DIAG_ADR_ENABLED=off
    
    Information gathered in the trace file:
    [28-SEP-2012 11:29:21:622] nioqsm: send-break: failed to send oob break...
    [28-SEP-2012 11:29:21:622] nioqper:  error from send-marker
    [28-SEP-2012 11:29:21:622] nioqper:    ns main err code: 12547
    [28-SEP-2012 11:29:21:622] nioqper:    ns (2)  err code: 12560
    [28-SEP-2012 11:29:21:622] nioqper:    nt main err code: 517
    [28-SEP-2012 11:29:21:622] nioqper:    nt (2)  err code: 32
    [28-SEP-2012 11:29:21:622] nioqper:    nt OS   err code: 0
    [28-SEP-2012 11:29:21:622] nioqsm: exit
    [28-SEP-2012 11:29:21:622] nioqer: entry
    [28-SEP-2012 11:29:21:622] nioqce: entry
    [28-SEP-2012 11:29:21:622] nioqce: exit
    [28-SEP-2012 11:29:21:623] nioqer: exit
    [28-SEP-2012 11:29:21:623] nioqrs: nioqrs: Couldn't send break. returning 3135
    
    The trace file data above matches what is in MOS note:
  • ORA-3135 with Recovery Catalog Creation Across the Network (Firewall included) [ID 805088.1]
    "where nt[1]=32 is Operating System Dependent(OSD) error code.
    
    OSD error is Err#32 Broken pipe. This OSD error is also defined in errno.h:
    * #define EPIPE 32 /* Broken pipe */"
    
    Resolution was to set an SQLNET.EXPIRE_TIME=15 in the database product $ORACLE_HOME/network/admin/sqlnet.ora (choose 15 minutes because it was less than the 30 minutes the firewall was closing the session).

    Other references

  • Troubleshooting ORA-3135 Connection Lost Contact [ID 787354.1]