Tuesday, June 14, 2011

Strange Bug of the Day

Today, I got to debug an extremely odd bug.

The behavior observed was that, on nearly all of the intelligent SAS/SATA controllers we had, the OCZ Vertex 3 drives we possessed did not show up in Windows at all, and they only showed up on the "dumbest" controllers (bargain-basement onboard motherboard chipsets).

I had taken a look at this awhile ago, when it first cropped up, but hadn't found anything obvious, so I put it down in favor of other, time-sensitive things.

I started by plugging the SSD into my workstation - a Dell Precision T5500 running a modern Apt-based Linux distribution, at the moment. It happily appeared on the bus and did IO fairly well at a quick glance, though SATA 2 could only go so fast. (We actually have two identical Vertex 3s, so I tested both. They performed essentially identically.)

Huh, that was lucky. Try it in my ThinkPad T61p (my personal laptop), shows up fine in the BIOS.

Okay, fine,apparently these are stupid controllers. Drag them over to some servers in the testbed room. Plug it into the LSI SAS2008-based controller in the server I've been playing with ZFS on Linux on...yup, shows up fine, does ~500 MB/s sequential read.

At this point, I'm rather befuddled, as I have not yet encountered any machines where these have failed.

I plug them into some Supermicro disk enclosures we have laying around, with SAS expanders on their backplanes, running to more LSI SAS2008-based controllers, but this time on our OpenIndiana test system. Yup, show up great, speed is great.

...finally, I have a sinking feeling in my stomach, and plug them into the Windows Server 2008 R2 box in the room, with its LSI SAS2008-based controllers.

Nothing shows up in Disk Management. Nor in Event Viewer. I interrogate the card using SAS2IRCU, a CLI utility that LSI provides for reaching out and touching your card and extracting information from it, and lo, it shows the disk in the appropriate port and marked "Ready (RDY)", just like the rest.

Thoroughly confused, I have a more in-depth look in Device Manager. Rescan does nothing, nothing under the Disk drives subtree, no devices marked as needing a driver...hey, wait, what's this "Other Devices" subtree?

Lo and behold, Windows was marking the Vertex 3s as "Other Devices" and reported that it did not know what driver to use to make them go. If you forced it to use the generic "Disk drive" driver, the Vertex 3s began playing happily and zippily in the sandbox.

No, I still don't know why this is reasonable behavior, or what caused it. But I couldn't find any references to it online, so I thought I'd list it here.

Edit: This is fixed in the latest firmware available on OCZ's site for the Vertex 3 and any relatives. Since the only thing that I can see that changed in the update was that the drive no longer reports a blank in the "revision" field of the ATA response, I would presume that this is what was causing Windows to dislike identifying the drive. If you have a drive that you've updated with the firmware currently on their site, and it's still not showing up, try uninstalling it in Device Manager and then scanning for hardware changes - it should work.

Tuesday, June 7, 2011

Quick shell hack

As documented here, I once wrote a piece of bashrc that would color your shell's hostname consistently colored across logins, but variable across different hosts, allowing you some ease of distinction in your present host at a glance.

The backstory of this particular hack is relatively simple - a friend of mine, during one of our conversations, had come up with the idea, and asked me to implement it, because he was too {busy,lazy}, as he wanted a way to distinguish which of his systems he was logged into easily. I thought about it for awhile, and came up with a number of stupid hacks, but eventually settled on the following:


namedec=$(uname -n | $MD5 | cut -c 1-32)

hex=$namedec
length=$(echo $hex | wc -c | tr -d " ")
power=1

while [ $length -gt 2 ]
do
  length=$(( $length - 1 ))
done

decimal=0
while [ "%$hex%" != "%%" ]
do
  digit=$(echo $hex | cut -b 1)

  value=$(( $digit ))

  decimal=$(( $decimal + value ))
  hex=$(echo $hex | sed 's/^.//')
done

#echo "Decimal value: $decimal"
namecol=$(($decimal % 17))
#echo $namecol
color=$(($namecol + 30))

# PS1="\[\033[01;31m\]\u\[\033[00m\]@\[\033[01;${color}m\]\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\] "
# green user vs red user for root
if [[ ${EUID} == 0 ]]
        then PS1="\[\033[01;31m\]\u\[\033[00m\]@\[\033[01;${color}m\]\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\] "
        PS1=${PS1}"# "
else
        PS1="\[\033[01;32m\]\u\[\033[00m\]@\[\033[01;${color}m\]\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\] "
        PS1=${PS1}"$ "
fi

# this actually works so um
#if [ $(which md5sum 2>&1 | grep "command not found" | wc -l) = "1" ]
#       then PS1='OMG THIS SYSTEM SUCKS \$ '
#elif [ $(which md5sum 2>&1 | grep "no md5sum in " | wc -l) = "1" ]
#       then PS1='OMG THIS SYSTEM SUCKS \$ '
#fi

You may note a few things here - one is the lack of syntax highlighting. As a result, have a screenshot for some clarity of the end result.



Another is that the source of the highlighting is rather silly and overkill. I played with a few variants on hashing the hostname until I settled on this, because a lot of the more "intelligent" ideas I came up with ended up with too many collisions, for reasons I was far too lazy to figure out for a Q&D hack like this.

It works, and it works well enough that my friend took it, threw it in his bashrc, and as far as I know, still keeps it there on his systems. I do as well, simply because it's convenient.

(Footnote - I have some terrible wrappers which try to set up the environment correctly to have a hash sourcing program based  on the uname -v output and checking  various paths, which I elided here because they're not part of the "interesting code", so to speak.)

Welcome

Hello.

If you've found this, congratulations, I have apparently achieved relevance in some fashion.

I am Rich Ercolani. I am a Computer Scientist/Programmer turned System Administrator, at present. I also have a number of eclectic interests, ranging from tabletop games to certain classes of terrible writing. All of these things will likely influence the content.

For those of you who are intrigued and attracted by programming, I'm slowly placing various things I've done for my own use or things I'm allowed to share from work on GitHub, though I have a lot of stuff that's not there yet or poorly organized. As a warning, anything in the sysadmin repository is likely to include a lot of terrible hacks, as my goal there is to Make It Work, not necessarily Make It Good.

I wanted an outlet for public posting of various things - sysadmin things, code snippets I'm allowed to share from work, horrible hardware bugs (for the hardware that is in general deployment and finalized, anyway), and random personal project stuff. I didn't think any of the other various places I am available on the Internet were suitable, so here we have said outlet.

I will try to keep things tagged in a useful fashion, so that you can filter the portions you are uninterested in.

All opinions and content expressed on this blog are my own, and do not have any bearing or relevance to any opinions held by any of my employers - past, present, or future.

All statements made on this blog, unless otherwise noted, should be taken as opinions - I will endeavor to make it clear when I believe I am stating provable fact, based on citation or repeated personal experience in some reproducible fashion, and when I am simply making shit up to justify behavior I have no idea how to explain.

Please feel free to correct me (extending me the same courtesy as outlined above in distinguishing fact, opinion, and bullshit) or demand references whenever I inevitably slip up in this endeavor or am mistaken - I may be annoyed, but it is almost certainly at myself.

With all of that over with, enjoy your stay, and I hope you find something useful here.