2023年6月25日发(作者:)
1.1 登陆操作
CUDB节点中有三类板卡,分别是GEP3板,SCXB(DMX)板和 NWI-E板。
我们需要登录这些板子收集相应的日志,可以用SecureCRT,terminal或者其他SSH客户软件登录这些板卡。
有两种方式可以登陆到CUDB:
1) Console直连
Console直连的方式在日常操作维护中不推荐使用。通过Console直连的操作一般为对于硬件的操作,如更换板卡。
CUDB系统Console连接配置表。
硬件名称
SCXB
波特率
115200
数据位
8
奇偶校验
None
停止位
1
流控
None GEP3
NWI-E
115200
9600
8
8
None
None
1
1
None
None
2) 通过网管网络连接
CUDB。
在对于CUDB的日常操作维护时,推荐通过网管网络连接从OSS登陆SC板卡和DMX板卡使用SSH协议,登陆NWI使用TELNET协议。
CUDB系统网管登陆信息表
登陆节点
CUDB GEP3
DMX
NWI
登陆方式 端口
SSH
SSH
Telnet
用户名 密码
rootroot
expert
登陆命令
ssh root
2024 expert
23 admin
ssh expert@
telnet
1.2 CUDB 系统检查
通常情况下以下检查应该包括在每日健康检查中。
1.2.1 CUDB总体系统检查
验证整个系统状态。在CUDB 某块SC板卡上执行这些指令。
执行指令:
# cudbSystemStatus 命令描述:
这条命令自动执行下面的系统状态检查。
预期结果:
Execution date: Tue Mar 25 11:29:36 CST 2014
CUDB Software Version:
!- CUDB DESIGN DISTRIBUTION: CUDB13B CXP9020214/6 R1K
Checking BC clusters:
[Site 1]
SM leader: Node 1 OAM2
Node 10.173.0.2
BC server in SC_2_1 ......... running
BC server in SC_2_2 ......... running (Leader)
BC server in PL_2_5 ......... running
[Site 2]
NoLeader
Node 10.173.0.34
BC server in SC_2_1 ......... running
BC server in SC_2_2 ......... running
BC server in PL_2_5 ......... running
Checking System Monitor BC status in local node:
SM-BC in OAM1 ......... running
SM-BC in OAM2 ......... running
Checking Clusters status:
Node 1:
PL Cluster (2%) ..............................OK
DSG1 Cluster (1%) ............................OK
DSG2 Cluster (1%) ............................OK
DSG3 Cluster (1%) ............................OK DSG4 Cluster (1%) ............................OK
DSG5 Cluster (1%) ............................OK
DSG6 Cluster (1%) ............................OK
DSG7 Cluster (1%) ............................OK
DSG8 Cluster (1%) ............................OK
DSG9 Cluster (1%) ............................OK
DSG10 Cluster (1%) ...........................OK
DSG11 Cluster (1%) ...........................OK
DSG12 Cluster (1%) ...........................OK
DSG13 Cluster (1%) ...........................OK
Node 2:
PL Cluster (2%) ..............................OK
DSG1 Cluster (1%) ............................OK
DSG2 Cluster (1%) ............................OK
DSG3 Cluster (1%) ............................OK
DSG4 Cluster (1%) ............................OK
DSG5 Cluster (1%) ............................OK
DSG6 Cluster (1%) ............................OK
DSG7 Cluster (1%) ............................OK
DSG8 Cluster (1%) ............................OK
DSG9 Cluster (1%) ............................OK
DSG10 Cluster (1%) ...........................OK
DSG11 Cluster (1%) ...........................OK
DSG12 Cluster (1%) ...........................OK
DSG13 Cluster (1%) ...........................OK
Checking NDB status:
PL NDB's (6/6) ...............................OK
DS1 NDB's (2/2) ..............................OK
DS2 NDB's (2/2) ..............................OK
DS3 NDB's (2/2) ..............................OK
DS4 NDB's (2/2) ..............................OK
DS5 NDB's (2/2) ..............................OK
DS6 NDB's (2/2) ..............................OK
DS7 NDB's (2/2) ..............................OK
DS8 NDB's (2/2) ..............................OK
DS9 NDB's (2/2) ..............................OK
DS10 NDB's (2/2) .............................OK
DS11 NDB's (2/2) .............................OK
DS12 NDB's (2/2) .............................OK
DS13 NDB's (2/2) .............................OK
Checking Replication Channels in the System:
Node | 1 | 2 ====================
PLDB ___|__M__|__S1_
DSG 1 __|__M__|__S1_
DSG 2 __|__M__|__S2_
DSG 3 __|__M__|__S1_
DSG 4 __|__M__|__S1_
DSG 5 __|__M__|__S2_
DSG 6 __|__M__|__S2_
DSG 7 __|__M__|__S1_
DSG 8 __|__M__|__S2_
DSG 9 __|__M__|__S1_
DSG 10 _|__M__|__S2_
DSG 11 _|__M__|__S2_
DSG 12 _|__M__|__S1_
DSG 13 _|__M__|__S2_
[Mar 23 12:50:05]( Preventive Maintenance Logchecker has found major error(s). )
Checking MySQL server connection:
MySQL Master Servers connection ..............OK
MySQL Slave Servers connection ...............OK
MySQL Access Servers connection ..............OK
Checking Process:
<Running
System Running
Running in: OAM2
Running
Management Server Process (ndb_mgmd)..........Running
Running
Running
Running
Log Running
<Storage Engine process (ndbd).................Running
Running
Running
MySQL server process (Master).................Running
MySQL server process (Slave)..................Running
MySQL server process (Access).................Running
Running
LDAP FE .
Storage Engine process (ndbd).................Running
Running
Running
MySQL server process (Master).................Running
MySQL server process (Slave)..................Running
MySQL server process (Access).................Running
LDAP FE .Running
1.2.2 HA状态检查
在CUDB Active OAM 板卡上验证所有GEP3板加入到cluster中。
执行指令:
#cudbHaState
预期结果:
LOTC cluster uptime:
--------------------
Thu Mar 27 18:13:44 2014
LOTC cluster state:
-------------------
Node safNode=SC_2_1 joined cluster | Thu Mar 27 18:13:44 2014
Node safNode=SC_2_2 joined cluster | Thu Mar 27 18:14:23 2014
Node safNode=PL_2_3 joined cluster | Thu Mar 27 18:15:21 2014
Node safNode=PL_2_4 joined cluster | Thu Mar 27 18:15:25 2014
…..
AMF cluster state:
------------------
saAmfNodeAdminState."safAmfNode=SC-1,safAmfCluster=myAmfCluster": Unlocked
saAmfNodeOperState."safAmfNode=SC-1,safAmfCluster=myAmfCluster": Enabled
saAmfNodeAdminState."safAmfNode=SC-2,safAmfCluster=myAmfCluster": Unlocked
saAmfNodeOperState."safAmfNode=SC-2,safAmfCluster=myAmfCluster": Enabled saAmfNodeAdminState."safAmfNode=PL-3,safAmfCluster=myAmfCluster": Unlocked
saAmfNodeOperState."safAmfNode=PL-3,safAmfCluster=myAmfCluster": Enabled
……
CoreMW HA state:
----------------
CoreMW is assigned as ACTIVE in controller SC-1
CoreMW is assigned as STANDBY in controller SC-2
COM state:
----------
COM is assigned as ACTIVE in controller SC-1
COM is assigned as STANDBY in controller SC-2
SI HA state:
------------
saAmfSISUHAState."safSu=SC-1,safSg=2N,safApp=ERIC-CUDB_BC_SERVER_MONITOR"."safSi=2N-1": active(1)
saAmfSISUHAState."safSu=SC-1,safSg=2N,safApp=ERIC-CUDB_LDAPFE_MONITOR"."safSi=2N-1": active(1)
saAmfSISUHAState."safSu=SC-1,safSg=DS3_2N,safApp=ERIC-CUDB_CS"."safSi=DS3_2N-1":
active(1)
saAmfSISUHAState."safSu=SC-1,safSg=DS4_2N,safApp=ERIC-CUDB_CS"."safSi=DS4_2N-1":
active(1)
saAmfSISUHAState."safSu=SC-1,safSg=DS13_2N,safApp=ERIC-CUDB_CS"."safSi=DS13_2N-1":
active(1)
saAmfSISUHAState."safSu=SC-1,safSg=DS12_2N,safApp=ERIC-CUDB_CS"."safSi=DS12_2N-1":
active(1)
saAmfSISUHAState."safSu=SC-1,safSg=DS11_2N,safApp=ERIC-CUDB_CS"."safSi=DS11_2N-1":
active(1)
saAmfSISUHAState."safSu=SC-1,safSg=DS2_2N,safApp=ERIC-CUDB_CS"."safSi=DS2_2N-1":
active(1)
saAmfSISUHAState."safSu=SC-1,safSg=DS1_2N,safApp=ERIC-CUDB_CS"."safSi=DS1_2N-1":
active(1)
saAmfSISUHAState."safSu=SC-1,safSg=DS7_2N,safApp=ERIC-CUDB_CS"."safSi=DS7_2N-1":
active(1)
saAmfSISUHAState."safSu=Control1,safSg=2N,safApp=ERIC-EVIP"."safSi=2N": active(1)
…..
SU States:
----------
Status OK 1.2.3 CMW状态查询
在某块SC板卡上输出所有CUDB servers (OAM, PL and DS) 的磁盘使用率。
执行指令:
# cmw-status app csiass comp node sg si siass su pm
命令描述:
检查CMW状态。
1.2.4 检查磁盘使用率
在某块SC板卡上输出所有CUDB servers (OAM, PL and DS) 的磁盘使用率。
执行指令:
for a in `awk '/^node/ { print $4 }' /cluster/etc/`;do
echo $a; ssh $a df -h;
done;
命令描述:
检查磁盘使用率。 预期结果:
SC_2_1
Filesystem Size Used Avail Use% Mounted on
rootfs 2.0G 1.5G 543M 74% /
/root 2.0G 1.5G 543M 74% /
tmpfs 12G 740K 12G 1% /dev/shm
shm 12G 740K 12G 1% /dev/shm
/dev/sdb1 4.0G 220M 3.6G 6% /boot
/dev/sdb2 9.9G 3.5G 6.0G 37% /var/log
/dev/mapper/cluster_vg-data_lv 63G 11G 50G 18% /.cluster
192.168.0.100:/.cluster 63G 11G 50G 18% /cluster
/dev/sdb7 136G 1.2G 128G 1% /local
com_fuse_module 2.0G 1.5G 543M 74% /var/filem/nbi_root
SC_2_2
Filesystem Size Used Avail Use% Mounted on
rootfs 2.0G 1.5G 544M 74% /
/root 2.0G 1.5G 544M 74% /
tmpfs 12G 740K 12G 1% /dev/shm
shm 12G 740K 12G 1% /dev/shm
/dev/sdb1 4.0G 220M 3.6G 6% /boot
/dev/sdb2 9.9G 3.5G 5.9G 38% /var/log
192.168.0.100:/.cluster 63G 11G 50G 18% /cluster
/dev/sdb7 136G 1.1G 128G 1% /local
1.2.5 检查网络状态
输出所有CUDB servers (OAM, PL and DS) 在每个接口的网络状态。
执行指令:
for a in `awk '/^node/ { print $4 }' /cluster/etc/`;do
echo $a; ssh $a netstat -i;
done;
命令描述: 这条命令输出系统的网络连接,路由表,接口信息,组播连接信息。用 –i选项,显示所有网络接口的状态表。
预期结果:
CUDB1 SC_2_1 # netstat -i
warning: no inet socket available: Success
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP
TX-OVR Flg
bond0 1500 0 29292700 0 0 0 20908491 0 0 0
BMmRU
bond1 1500 0 62795 0 0 0 2407895 0 0 0
BMmRU
bond1:1 1500 0 - no statistics available - BMmRU
bond1:2 1500 0 - no statistics available - BMmRU
eth0 1500 0 28197145 0 0 0 20908491 0 0 0 BMsRU
eth1 1500 0 31394 0 0 0 2407895 0 0 0 BMsRU
eth2 1500 0 1095555 0 0 0 0 0 0 0 BMsRU
eth3 1500 0 31401 0 0 0 0 0 0 0 BMsRU
lo 16436 0 313493589 0 0 0 313493589 0 0 0 LRU
1.2.6 检查CPU负载
登陆某块SC板卡进行CUDB CPU负载查询。
执行指令: 按 “ctrl + c” 可以退出并回到CLI模式。
#cudbMpstat
命令描述:
这条命令用于收集和报告每块板卡上的CPU性能统计信息。
预期结果: 无
1.2.7 检查LDAP TPS
登陆某块SC板卡进行CUDB LDAP TPS查询。
执行指令: 按 “ctrl + c” 可以退出并回到CLI模式。
# cudbTpsStat -d
预期结果:
无
1.2.8 检查Active告警
在某块SC板卡执行检查哪些告警是Active的。
执行指令:
# fmactivealarms
1.2.9 检查历史告警
在两块SC板卡执行检查历史告警 执行指令:
# cat /var/log/ESA/ | grep -c "Alarm Raise"
# cat /var/log/ESA/ | grep -c "Alarm Clear"
1.2.10 检查DHCP状态
登陆每块SC板卡执行指令:
# /etc/init.d/dhcpd status
1.2.11 检查LDAPFE进程状态
登陆每块SC板卡执行指令:
# for a in `awk '/^node/ { if (substr($4,1,2) == "PL") {print $4} }'
/cluster/etc/`;do ssh $a "echo
$a ;/etc/init.d/cudbLDAPFrontEnd status";done;
1.2.12 检查ESA进程状态
登陆某块SC板卡执行指令:
# esaclusterstatus
登陆每块SC板卡执行指令:
# esa status 1.2.13 检查CUDB cluster配置
在某块SC板卡执行执行指令:
# cat /cluster/etc/
1.2.14 检查CUDB vipconfig配置
在某块SC板卡执行执行指令:
# cat /cluster/storage/system/config/*/
1.2.15 CUDB重要配置检查
在主用SC板卡执行执行指令:
# /opt/com/bin/cliss
# show ManagedElement=1,CudbSystem=1,backboneReliability
# show
ManagedElement=1,CudbSystem=1,CudbLocalNode=1,CudbLdapAccess=1,ldapAttrIndexes
1.2.16 CUDB数据库备份
在某块SC板卡执行指令:
# ls -l /home/cudb/automatedBackupStorage/*/*
1.2.17 软件和配置备份
在某块SC板卡执行指令:
#cudbSwBackup –l
#cudbSwBackup –p
# ls -lrt /cluster/home/cudb/swbackup
# ls -l /cluster/storage/no-backup
# ls -l /cluster/home/cudb/oam/configMgmt/
1.2.18 获取CUDB counters
在SC_2_1执行指令:
# pmreadcounter
# pmreadcounter | wc -l
# ls -lrt /home/cudb/oam/performanceMgmt/output |tail -n 200 1.2.19 CUDB软件
在模块SC板卡执行指令:
for a in `awk '/^node/ { print $4 }' /cluster/etc/`;do
echo $a; cmw-repository-list --node $a;
done;
1.2.20 LOTC crontab检查
在每块SC板卡上执行命令:
#crontab -l
预期结果 (example):
# DO NOT EDIT THIS FILE - edit the master and reinstall.
# (/tmp/7Hie7D installed on Fri Mar 28 11:31:56 2014)
# (Cron version V5.0 -- $Id: crontab.c,v 1.12 2004/01/23 18:56:42 vixie Exp $)
25 0,12 * * * /bin/bash /opt/ericsson/cudb/OAM/bin/cudbGetLogs
50 0,12 * * * /bin/bash /opt/ericsson/cudb/OAM/bin/cudbAnalyser --auto-check --send-alarm
--save-counter > /home/cudb/monitoring/preventiveMaintenance/cron__2_
37 0 * * * /bin/bash /opt/ericsson/cudb/Monitors/bin/cudbCheckConsistency --locked --alarms >/dev/null
2>&1 || true
7 0 * * * /bin/bash /opt/ericsson/cudb/Monitors/bin/cudbCheckReplication --locked --alarms >/dev/null 2>&1
|| true
0,15,30,45 * * * * /home/cudb/oam/performanceMgmt/appCounters/scripts/ >> /dev/null
0 2 * * * /cluster/home/cudb// 2>&1
*/1 * * * * /opt/ericsson/cudb/Monitors/keepAlive/bin/keepAlive_ >/dev/null 2>&1
1.2.21 CUDB Log Check 检查
在模块SC板卡执行指令: cudbAnalyser -a -w 0
ls -lrt /home/cudb/monitoring/preventiveMaintenance/
发布者:admin,转转请注明出处:http://www.yc00.com/news/1687693114a32219.html
评论列表(0条)