RH5 Bare Metals Failure
I am a fairly new user to Unitrends, and I hit a snag trying to do a Bare Metals restore of a Red Hat 5 Web server. This is going to get long, but I want to try to document the whole situation
The server (called Web1) was giving me trouble, so I made a Virtual clone of it, took it offline, and moved it to my testing network. I renamed the server (Web2) and gave it a new IP address to correspond with the new subnet and not conflict with the Virtual clone. I was able to get the server up long enough to make a Bare Metals ISO disk (with the credentials as Web2 and the test network IP) as well as replace the failing hard drives.
I moved the server back onto the production network, changed its name and IP address back to Web1, took the Virtual machine offline, and started regular backups with the Unitrends agent. Of course, a few days later, the server failed again, and this time in an unrecoverable state.
I again brought up the Virtual copy of the server as Web1, moved the physical server to the test network, and replaced all of the hard drives in the failing RAID. I restarted the server with the Bare Metals CD. The the Bare Metals Interface came up and I tried the Test. The test failed, saying (paraphrasing) iit was unable to make TCP connection. I checked the Hosts file and it was correct, I checked the IP/Gateway/Unit IPs and the were all correct (showed the Test network address and Web2). I dropped into the shell and confirmed the right NIC had the right IP address. I was able to ping my own address and my gateway, but every time I tried to ping off the subnet (namely the Unitrends unit), I kept getting a Destination Unreachable error.
So Unitrends support was closed yesterday, so I would at least try manually rebuilding the filesystems. I did the Restore > Format Partitions which seemed to work. I tried to then create the filesystems, but it errored out into what looked to me like a fsck command. It stalled on inode blocks multiple times and I finally bailed out of the process (maybe not the best idea).
This morning I attempted to start the server again from the Bare Metals CD, but now I can't even get to the Restore interface. Below is a summation of the error messages that flash by as best as I can catch them, be basically it just gives me a bash interface and fails to start.
So now I'm at a loss and not sure how to proceed besides starting with the RH5 install and try to do a folder-by-folder restore.
modprobe: fatal error inserting hid_dummy
udevd-event: wait_for_sysfs: waiting for /sys/devices/ ... ioerr_cnt failed
/sbin/init: line 59: 3378 segmentation fault
mdadm: no arrays found in config file
/dev/cdrom: open failed: Read-only file system
can't find device uuid
refusing activation of partial LV LogVol00 use --partial to override
Found volume groups Volgroup00
Found volume groups Volgroup01
could not load host key /etc/ssh/ssh_host_key
Disabling protocol version 1
So to continue with the saga:
On the suggestion from Unitrends tech support, I moved the Backup appliance to the same subnet as the virtual Web server (with the correct name and IP address), made a new Bare Metals ISO using the virtual version on the Web server, and then connected just the physical server and the backup appliance using a small hub. So now the information for the physical web server contained on the Bare Metals ISO corresponded to the correct information for the Web server client information on the Backup unit as well as them being on the same subnet. Unfortunately, the Bare Metals ISO still did not load properly. I then tried booting a completely different server from both of the Bare Metals ISO disks and I loaded the Bare Metals interface with no problem. This did show that something was written to the disks on the Web server during my attempts at manual recovery that the ISO was reading and caused the ISO to load and run properly.
I ended up simply reinstalling Red Hat on the physical server and did a file-by-file restore and made the necessary configuration changes to get the Web server back online. Not the ideal situation, but at least it is now up and running.
Has anyone else had problems with the Linux Bare Metals not communicating via a gateway or across a subnet?
I then decided to do a little testing. I took an old server, connected it to my DMZ (where the web servers are) subnet and put a fresh copy of Red Hat on it. I installed the Unitrends Agent on it, confirmed that it was separated by a firewall, opened the necessary 1743 and 1745 ports on the server's firewall software, and set up the client on the server. I then successfully did a Master backup of the test server and created a Bare Metals ISO disk for it. I then booted the server from the disk and again, it was unable to communicate with anything off the local subnet and the test. For these tests I left the Unitrends appliance on version 5.0.2-1 (which was were it was at when working with the Web servers). I then upgraded the Unit to 5.1.0 (which supposedly included updated to the Bare Metals process) and repeated the entire test (including making a new Bare Metals ISO disk). I ended up with the same results; communication was fine between the test server and and the backup appliance when it was booted into the OS, but failed failed to communicate when booted to the Bare Metals ISO.
So that's where I am so far. My plan today is to move my test server to the same subnet as the Backup appliance, make a new Bare Metals ISO ant try doing a restore from there and see if it works.
Well, I did some more testing today. I moved my test server from the DMZ network to an internal subnet. My goal was to see if the problem might be with the fact that there was a firewall issue and I had a port closed that the Recovery process used (even though Backups and Bare Metal ISO creation was working fine). That would not explain the inability to Ping between the Backup appliance and the server booted into the ISO, but it was worth a shot. So now the test server and the Backup appliance were only separated by a Layer three switch that is only acting as a Gateway and is doing no filtering or packet inspection. I created a new Bare Metals ISO disk with the new credentials. I booted the server with the new disk, confirmed it had the right network settings, and again the ISO was unable to contact anything off the local subnet. So that at least ruled out that it was an issue with the firewall.
Finally, I moved the test server to the SAME subnet at the Backup appliance, burned a new Bare Metals ISO, and booted up the server from that disk. I did notice something different flash by on the screen when the Bare Metals Recovery interface was loading; there was a moments when I saw IP Address, Subnet Mask, and Gateway go by all set to 0.0.0.0. Now I can't say for certain if that was present when I started up any of the other ISOs, but I thought that might be a clue. So, from the Bare Metals Recovery interface I was able to successfully run a Test and start the Restore process.
I think I have finally figured out what is going n here. With some more tests and a few discussions with Tech Support, it would seem that the BareMetals OS really does not have a Default Gateway. From the Hot Bare Metal menu on I looked under "View Info" --> "Network routes", this was the result:
eth0 10.x.x.149 10.x.x.255 255.255.255.0 43100000
lo 127.0.0.1 255.0.0.0 255.0.0.0 49000000
0.0.0.0 0.0.0.0 10.x.x.254 09cd09c0
10.x.x.0 255.255.255.0 0.0.0.0 09cd0900
169.254.0.0 255.255.0.0 0.0.0.0 09cd0960
So to me that sure looked like the default route (0.0.0.0 0.0.0.0) was pointing to the correct gateway address. On the advice of Tech Support, I dropped into the shell of the boot CD and looked at the routes there. From the shell, the "route" command returned this:
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.x.x.0 * 255.255.255.0 U 0 0 0 eth0
No default gateway address. I manually added the address and checked the routes again:
sh-3.2#route add default gw 10.x.x.254
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.1.1.0 * 255.255.255.0 U 0 0 0 eth0
default 10.1.1.254 0.0.0.0 UG 0 0 0 eth0
After adding the route, I was able to communicate with devices off-subnet and the DPU. I have not tried a test Restore yet (that is next), but I am pretty confident that this should at least be a easy work-around for now.