Friday, July 27, 2012

OMSA 7.0 firmware update issue or holdover public key problem from 2010? 

Updated Dell's OMSA from 6.5.1 to 7.0 via  standard yum process. Checked for any new goodies via "yum install $(bootstrap_firmware)". Tried to update firmware, but failed:
# update_firmware  --yes
Running system inventory...
Searching storage directory for available BIOS updates...
Checking BIOS - 6.1.0
Available: dell_dup_componentid_00159 - 6.1.0
Did not find a newer package to install that meets all installation checks.
Checking SAS/SATA Backplane 0:0 Backplane Firmware - 1.07
Available: dell_dup_componentid_11204 - 1.07
Did not find a newer package to install that meets all installation checks.
Checking PERC 6/i Integrated Controller 0 Firmware - 6.3.1-0003
Available: pci_firmware(ven_0x1000_dev_0x0060_subven_0x1028_subdev_0x1f0c) - 6.3.1-0003
Did not find a newer package to install that meets all installation checks.
Checking OS Drivers - 0
Available: dell_dup_componentid_18981 - 7.0.0.4
Found Update: dell_dup_componentid_18981 - 7.0.0.4
Checking Dell Lifecycle Controller - 1.5.1.57
Available: dell_dup_componentid_18980 - 1.5.2.32
Found Update: dell_dup_componentid_18980 - 1.5.2.32
Checking NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth1) - 6.2.16
Available: pci_firmware(ven_0x14e4_dev_0x1639) - 6.2.16
Available: pci_firmware(ven_0x14e4_dev_0x1639_subven_0x1028_subdev_0x0235) - 7.0.47
Found Update: pci_firmware(ven_0x14e4_dev_0x1639_subven_0x1028_subdev_0x0235) - 7.0.47
Checking NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth0) - 6.2.16
Available: pci_firmware(ven_0x14e4_dev_0x1639) - 6.2.16
Available: pci_firmware(ven_0x14e4_dev_0x1639_subven_0x1028_subdev_0x0235) - 7.0.47
Found Update: pci_firmware(ven_0x14e4_dev_0x1639_subven_0x1028_subdev_0x0235) - 7.0.47
Checking ST3450857SS Firmware - es65
Available: dell_dup_componentid_20795 - es65
Did not find a newer package to install that meets all installation checks.
Checking iDRAC6 - 1.80
Available: dell_dup_componentid_20137 - 1.85
Found Update: dell_dup_componentid_20137 - 1.85
Checking NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth2) - 6.2.16
Available: pci_firmware(ven_0x14e4_dev_0x1639) - 6.2.16
Available: pci_firmware(ven_0x14e4_dev_0x1639_subven_0x1028_subdev_0x0235) - 7.0.47
Found Update: pci_firmware(ven_0x14e4_dev_0x1639_subven_0x1028_subdev_0x0235) - 7.0.47
Checking NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth3) - 6.2.16
Available: pci_firmware(ven_0x14e4_dev_0x1639) - 6.2.16
Available: pci_firmware(ven_0x14e4_dev_0x1639_subven_0x1028_subdev_0x0235) - 7.0.47
Found Update: pci_firmware(ven_0x14e4_dev_0x1639_subven_0x1028_subdev_0x0235) - 7.0.47
Checking Dell 32 Bit Diagnostics - 5154a0
Available: dell_dup_componentid_00196 - 5154a0
Did not find a newer package to install that meets all installation checks.
Checking System BIOS for PowerEdge R710 - 6.1.0
Available: system_bios(ven_0x1028_dev_0x0235) - 3.0.0
Did not find a newer package to install that meets all installation checks.
Found firmware which needs to be updated.
Running updates...
/ Installing dell_dup_componentid_18981 - 7.0.0.4Installation failed for package: dell_dup_componentid_18981 - 7.0.0.4
aborting update...
The error message from the low-level command was:
Update Failure: Partition Failure - The Delete Dynamic Partition has failed
Tried the /etc/redhat-release fix without success. Tried a reboot to flush out any oddities...

After much google'n it seems that I had some kind of public key issue:
rpm --import http://linux.dell.com/files/libsmbios/download/RPM-GPG-KEY-libsmbios
rpm --import http://lists.us.dell.com/linux-security-publickey.txt
Now update_firmware works again as expected.

Tuesday, July 10, 2012

CentOS 6.3 mdadm won't start older md arrays.

For some of us, the drive setups we create stays with a system for a long time. Keeping the same data disk array untouched even for major revision changes is common (like a OS rebuild of 5.x -> 6.x). Sometimes that long term usage bites back. Here is my failure case while upgrading from CentOS 6.2 -> CentOS 6.3.

Symptoms:
Simply md RAID 1 extra data drive will not boot. The system drops to recovery mode with a missing (md) drive to mount and an fsck request. The extra file system has 2 "linux_raid_member" drives that show under fdisk and blkid. Even a "cat /proc/mdstat" shows no arrays. If I run, as root:
mdadm --auto-detect
the /proc/mdstat will finally show info, however.

Solutions:
  1. Make sure that the /etc/mdadm.conf contains the array info from: 
    mdadm --examine --scan >> /etc/mdadm.conf
  2. There seems to be kinda "depreciated" mdadm technical note about older created arrays with BZ-788022 implicated and a "+0.90" needing to be added to/etc/mdadm.conf, but you need to get rid of the "+1.x -all" options!
How do you tell in advance that you will have an issue *before* an upgrade? If  you run, as root,
mdadm -E /dev/sdc1|grep Vers
and get output like: "Version : 0.90.00", you will want to make a change *before* you reboot!

It is interesting to note that I was able to just have this md array work in 6.0, 6.1 and 6.2 because,
In Red Hat Enterprise Linux 6.1 and 6.2, mdadm always assembled version 0.90 RAID arrays automatically due to a bug.