Friday, August 26, 2011

Using AI to do an unaided install on OpenIndiana b148

If you have any questions, comments, concerns, corrections, etc, feel free to email me. I'd love to know what I did horribly wrong. :)

So, I was looking into how to automatically install OpenIndiana on systems in a manner similar to the Kickstart mechanism for RHEL-alikes or Jumpstart for Solaris proper.

I received instructions that I should not go down this road unless I enjoyed pain and suffering.

I got it working anyway, so here's how so you can too (without having OpenIndiana or any Solaris system already running).

I started out with the instructions found here. His instructions were very helpful, but ultimately, they were incomplete - whether this is because of differences in version or something else, I do not know.

Start out as his instructions do - make an /export/install (or your favorite path; i'll refer to it as $INSTALL_BASE from here on), and run
git clone git://github.com/rincebrain/illumos-misc.git $INSTALL_BASE
(I forked his repository. At present, my changes are mostly cosmetic, though I changed the root p/w in the generated ai_instance to be "jack" as well. Feel free to pull his repo instead; things may just require more tweaking.)

You're going to need

  • an Apache-like webserver (you need CGI support for the scripts in the above, and you'll ideally be using it to serve a local mirror of the OpenIndiana repo as well.)
  • A working DHCP setup with PXE capability, which includes
  • The ability to serve a couple of files over TFTP (I'm going to refer to the base of your TFTP path as $TFTP_BASE - for me, that's /tftproot; YMMV)
  • About 4 GB for an OI local package repo, according to my current copy
  • An OpenIndiana Automated Installer CD (in Joshua's guide above, he explains how to make one; since then, OI has started providing prebuilt ones, so I used the b148 CD available here)
  • The CGI scripts in this example use /bin/ksh - you can probably use another shell with them with not too much work, but I didn't need to.
Got everything mentioned above? Great.

First, loopback mount the OI CD you grabbed above somewhere - I'll use $INSTALL_ BASE/ai_image, like the guide I'm basing this on did, and copy the required boot files off into a convenient place to serve them (I used $TFTP_BASE/oi):
cp -r $INSTALL_BASE/ai_image/boot $TFTP_BASE/oi
As I said, you'd also probably like a local OI package repo - it'll save you a fair amount of time on install.

I'm making one at $INSTALL_BASE/repo - again, feel free to change it and change instructions appropriately. :)
rsync -a pkg-origin.openindiana.org::pkgdepot-dev $INSTALL_BASE/repo/
Configure Apache with vhost directives appropriately to serve this up - the example would be here, and that'll work if you've used all of the paths mentioned in this example. Feel find/replace /export/install with whatever you used for $INSTALL_BASE instead, but be sure to do it consistently here and in PXEgrub (later).

Now the most customization you'll probably want to do - the Automated Installer manifest file.

Joshua has nicely provided a CGI script which serves up the manifest to our target machine. My modifications do a few things - they add git to the default installed package list (NBD), make it diff more cleanly against the stock example (which was helpful to me for debugging), and most importantly, in my opinion, makes it partition and install to the root disk:

    <target>
         <target_device>
          <disk>
<disk_keyword key="boot_disk" />
          <partition name="1" action="delete" />
          <partition name="2" action="delete" />
          <partition name="3" action="delete" />
          <partition name="4" action="delete" />
<partition name="0" action="create" part_type="191" />
<slice name="0" action="create" is_root="true" force="true" />
        </disk>
  </target_device>

</target> 
As a warning, it forcibly nukes the partitions on the existing disk that it detects as the "boot disk" (caveat: some BIOSes lie, and you'll need to provide more explicit criteria - warning, the syntax on things that aren't just disk specification have changed, so their fuller examples won't work for you), and creates a single full-disk slice as root, which AI then defaults to using as the root for rpool.

Configure your DHCP daemon like so (this is for dhcpd; if you use something else and have the correct syntax, by all means)

# before any shared-network or similar statements; e.g. the first few lines
option grubmenu code 150 = text;
# Inside of whatever group or shared-network or similar "set" that you want this to apply to:
if exists dhcp-parameter-request-list {
       option dhcp-parameter-request-list = concat(option dhcp-parameter-request-list,96);
}
# And now an example host:
host nosuch1 {
   hardware ethernet 00:11:22:33:44:55;
   fixed-address nosuch1;
   next-server whatever_dhcp_server_does_tftp;
   filename "oi/grub/pxegrub";
   option grubmenu "oi/menu.lst"; 
}

And now all you need to do is boot it, and wait!

Errata and oddities:
- On the system I did this on (a Dell R815), the BMC would drop and entirely stop responding on the network once the OI kernel started. Someone suggested this was fastreboot's fault, but this happens on a cold boot of the network install too, not just fastreboot.
- I did not figure out how to make it such that it would automagically reconfigure to not network install on reboot, so either manually force a PXEboot on the machine and have your root disk higher in boot order (ugh) or edit the file it gets on PXE to produce a "boot to first hard drive" response (either PXEgrub with boot first hard drive chainloader, or just remove its PXE response entirely).
- If your webserver is not running correctly or your manifest is incorrect, and you have the console redirected using console=, NO OTHER DISPLAYS WILL RECEIVE NOTIFICATION THAT IT FAILED - they will just print "...................." and never progress to the "OpenIndiana oi_148 ..." banner.
- My example configuration has livessh=enable on the boot line, allowing you to remotely SSH in using jack - you probably want to disable this once you're sure the installation is working, or at least change the password. :)


Finally
Thanks to everyone in #openindiana who put up with my uninformed questions and occasional ranting, and to Joshua Clulow for doing most of the hard work required. :)

Tuesday, June 14, 2011

Strange Bug of the Day

Today, I got to debug an extremely odd bug.

The behavior observed was that, on nearly all of the intelligent SAS/SATA controllers we had, the OCZ Vertex 3 drives we possessed did not show up in Windows at all, and they only showed up on the "dumbest" controllers (bargain-basement onboard motherboard chipsets).

I had taken a look at this awhile ago, when it first cropped up, but hadn't found anything obvious, so I put it down in favor of other, time-sensitive things.

I started by plugging the SSD into my workstation - a Dell Precision T5500 running a modern Apt-based Linux distribution, at the moment. It happily appeared on the bus and did IO fairly well at a quick glance, though SATA 2 could only go so fast. (We actually have two identical Vertex 3s, so I tested both. They performed essentially identically.)

Huh, that was lucky. Try it in my ThinkPad T61p (my personal laptop), shows up fine in the BIOS.

Okay, fine,apparently these are stupid controllers. Drag them over to some servers in the testbed room. Plug it into the LSI SAS2008-based controller in the server I've been playing with ZFS on Linux on...yup, shows up fine, does ~500 MB/s sequential read.

At this point, I'm rather befuddled, as I have not yet encountered any machines where these have failed.

I plug them into some Supermicro disk enclosures we have laying around, with SAS expanders on their backplanes, running to more LSI SAS2008-based controllers, but this time on our OpenIndiana test system. Yup, show up great, speed is great.

...finally, I have a sinking feeling in my stomach, and plug them into the Windows Server 2008 R2 box in the room, with its LSI SAS2008-based controllers.

Nothing shows up in Disk Management. Nor in Event Viewer. I interrogate the card using SAS2IRCU, a CLI utility that LSI provides for reaching out and touching your card and extracting information from it, and lo, it shows the disk in the appropriate port and marked "Ready (RDY)", just like the rest.

Thoroughly confused, I have a more in-depth look in Device Manager. Rescan does nothing, nothing under the Disk drives subtree, no devices marked as needing a driver...hey, wait, what's this "Other Devices" subtree?

Lo and behold, Windows was marking the Vertex 3s as "Other Devices" and reported that it did not know what driver to use to make them go. If you forced it to use the generic "Disk drive" driver, the Vertex 3s began playing happily and zippily in the sandbox.

No, I still don't know why this is reasonable behavior, or what caused it. But I couldn't find any references to it online, so I thought I'd list it here.

Edit: This is fixed in the latest firmware available on OCZ's site for the Vertex 3 and any relatives. Since the only thing that I can see that changed in the update was that the drive no longer reports a blank in the "revision" field of the ATA response, I would presume that this is what was causing Windows to dislike identifying the drive. If you have a drive that you've updated with the firmware currently on their site, and it's still not showing up, try uninstalling it in Device Manager and then scanning for hardware changes - it should work.

Tuesday, June 7, 2011

Quick shell hack

As documented here, I once wrote a piece of bashrc that would color your shell's hostname consistently colored across logins, but variable across different hosts, allowing you some ease of distinction in your present host at a glance.

The backstory of this particular hack is relatively simple - a friend of mine, during one of our conversations, had come up with the idea, and asked me to implement it, because he was too {busy,lazy}, as he wanted a way to distinguish which of his systems he was logged into easily. I thought about it for awhile, and came up with a number of stupid hacks, but eventually settled on the following:


namedec=$(uname -n | $MD5 | cut -c 1-32)

hex=$namedec
length=$(echo $hex | wc -c | tr -d " ")
power=1

while [ $length -gt 2 ]
do
  length=$(( $length - 1 ))
done

decimal=0
while [ "%$hex%" != "%%" ]
do
  digit=$(echo $hex | cut -b 1)

  value=$(( $digit ))

  decimal=$(( $decimal + value ))
  hex=$(echo $hex | sed 's/^.//')
done

#echo "Decimal value: $decimal"
namecol=$(($decimal % 17))
#echo $namecol
color=$(($namecol + 30))

# PS1="\[\033[01;31m\]\u\[\033[00m\]@\[\033[01;${color}m\]\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\] "
# green user vs red user for root
if [[ ${EUID} == 0 ]]
        then PS1="\[\033[01;31m\]\u\[\033[00m\]@\[\033[01;${color}m\]\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\] "
        PS1=${PS1}"# "
else
        PS1="\[\033[01;32m\]\u\[\033[00m\]@\[\033[01;${color}m\]\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\] "
        PS1=${PS1}"$ "
fi

# this actually works so um
#if [ $(which md5sum 2>&1 | grep "command not found" | wc -l) = "1" ]
#       then PS1='OMG THIS SYSTEM SUCKS \$ '
#elif [ $(which md5sum 2>&1 | grep "no md5sum in " | wc -l) = "1" ]
#       then PS1='OMG THIS SYSTEM SUCKS \$ '
#fi

You may note a few things here - one is the lack of syntax highlighting. As a result, have a screenshot for some clarity of the end result.



Another is that the source of the highlighting is rather silly and overkill. I played with a few variants on hashing the hostname until I settled on this, because a lot of the more "intelligent" ideas I came up with ended up with too many collisions, for reasons I was far too lazy to figure out for a Q&D hack like this.

It works, and it works well enough that my friend took it, threw it in his bashrc, and as far as I know, still keeps it there on his systems. I do as well, simply because it's convenient.

(Footnote - I have some terrible wrappers which try to set up the environment correctly to have a hash sourcing program based  on the uname -v output and checking  various paths, which I elided here because they're not part of the "interesting code", so to speak.)

Welcome

Hello.

If you've found this, congratulations, I have apparently achieved relevance in some fashion.

I am Rich Ercolani. I am a Computer Scientist/Programmer turned System Administrator, at present. I also have a number of eclectic interests, ranging from tabletop games to certain classes of terrible writing. All of these things will likely influence the content.

For those of you who are intrigued and attracted by programming, I'm slowly placing various things I've done for my own use or things I'm allowed to share from work on GitHub, though I have a lot of stuff that's not there yet or poorly organized. As a warning, anything in the sysadmin repository is likely to include a lot of terrible hacks, as my goal there is to Make It Work, not necessarily Make It Good.

I wanted an outlet for public posting of various things - sysadmin things, code snippets I'm allowed to share from work, horrible hardware bugs (for the hardware that is in general deployment and finalized, anyway), and random personal project stuff. I didn't think any of the other various places I am available on the Internet were suitable, so here we have said outlet.

I will try to keep things tagged in a useful fashion, so that you can filter the portions you are uninterested in.

All opinions and content expressed on this blog are my own, and do not have any bearing or relevance to any opinions held by any of my employers - past, present, or future.

All statements made on this blog, unless otherwise noted, should be taken as opinions - I will endeavor to make it clear when I believe I am stating provable fact, based on citation or repeated personal experience in some reproducible fashion, and when I am simply making shit up to justify behavior I have no idea how to explain.

Please feel free to correct me (extending me the same courtesy as outlined above in distinguishing fact, opinion, and bullshit) or demand references whenever I inevitably slip up in this endeavor or am mistaken - I may be annoyed, but it is almost certainly at myself.

With all of that over with, enjoy your stay, and I hope you find something useful here.