Sunday, November 13, 2011

Crapping all over my raid

I have a recently dead machine with 4 disks in a raid 5 set.  And I have a working machine with disks and free space for more disks. So I moved the disks from my dead machine to the working machine.

When booting something unremarkable happened, Ubuntu said one of the disks needed checking, and it fsck'ed. For a long time. The machine already had some 2TB disks so it would take time.  A bit later the boot sequence tells me that /boot is broken, would I like a shell to fix it? What?

Get shell. Run "fsck /boot", answer "y" to all. Fsck terminates claiming there are still errors. Run again. Still errors? Have a look in /etc/fstab. Ooooh.

  /dev/sda1 /boot ext2 defaults 0   2


Oh shit. New disk controller on board, new disks. /dev/sda1 isn't the boot partition anymore, it's one of my raid disks. I'm still not really getting the consequences of what has happened. But I look in /dev/disk/by-uuid and see this:

   lrwxrwxrwx 1 root root  10 2011-11-13 15:16 c33afa82-c287-47fb-9b10-aca0524cfbc1 -> ../../sde1

Everything else is LVM. So I edit my fstab to use UUID=c33afa82-c287-47fb-9b10-aca0524cfbc1 for the device name, so that never happens again - on this machine.

Lesson: Always mount by UUID (or lvm device name) because device names change for the simplest reasons - like the kernel changing the probe order on the PCI bus. Most (all) distributions get this right, but I didn't and I'm too much of a old timer to have UUID as a knee-jerk reflex when I edit fstab.

Get the raid running. All the lvm devices that should be there appear. I had better check the filesystems then after this upset. The most important first: /dev/mapper/DiskMd0-media - that's the filesystem with all the family videos of the kids growing up on. The most important one.  Fsck shows errors, lots of them. I press "y".  After a little while it dawns on me, "hmmm, these would be errors introduced because fsck scribbled all over the disk earlier, right?" Ctrl-C! Fsck reports that the filesystem was modified. So fsck.ext2 has crapped all over one raid disk. Damn. What now? It's a raid 5. One failed disk is survivable. Only by this time there is a graphical login on the console and the VC with the original fstab on has scrolled long past it. And my head is like teflon when it comes like stuff like that. So I don't remember which drive had been crapped all over. Sdb? Sda?

Angst follows. Lots of reading and re-reading the mdadm man page way too impatiently. Can I remove one drive from the raid and run fsck to check if the filesystem is now consistent - proving that I removed the right drive? And if I removed the wrong drive, can I add it back causing no new problems?

    mdadm md0 --fail /dev/sda1
    fsck.ext2 -n /dev/mapper/DiskMd0-misc
    ...

Note the -n, it causes fsck to NOT modify the filesystem no matter what. Cause if it modifies a filesystem while the wrong disk removed then matters will get worse. ... Whew. No errors. Had there been errors I planned to do

  mdadm md0 --re-add /dev/sda1

which should put the disk back in the raid with no new problems - providing that i didn't change the raid since it was removed (see the mdadm man page). Instead I could do

  mdadm md0 --remove /dev/sda1
  mdadm md0 --add /dev/sda1

And in /proc/mdstat I could see that the raid was rebuilding /dev/sda1, the disk that demonstrably was the one that has been crapped all over because of my fstab stupidity.

I have to add that I take more care at work. But not at home, since it's not work. And I do have a copy of the movies of the kids growing up on another disk.

So that's what I used (some of) my Sunday for.

But at least I could recover.

Tuesday, September 27, 2011

dnspython

So, I'm fiddeling with various stuff at work, and need to retrieve DNS zones.  The suite I'm currently working is in python so 2 minutes later it turns out that dnspython ("import dns") is the powerful choice.


10 minutes later it turns out that it is very poorly documented. The documentation is auto-generated and there seems to be no introductory material to any part of the gargantuan class hierarchy other than some very sparse examples. Printing the objects returned isn't much help, they all seem to be very clever generators and stuff or have as_text methods. This is probably my python newbieness that's shining through, but I just can't seem to find accessor methods for the DNS records (objects9 returned even.

As ever on the Internet, someone has felt the pain already, and done a good deed.  Already in 2005 even. So presenting http://agiletesting.blogspot.com/2005/08/managing-dns-zone-files-with-dnspython.html 

(Mumble.  Someone ought to start a wiki to document it, but unfortunately that won't be me)

Wednesday, September 21, 2011

A book that is also a phone!

Earlier I wrote about my experience with a tablet/pad and that I thought a Android phone might be a better fit for my needs since the tab was too heavy to use as a book for example.  And so I got a large screen Android phone.  And it is.

But it's not really a phone.  I use it mostly as a book: On the bus, on the train, in bed before I sleep.  I wish I could use it more as a mp3 player too, BUT I will have to shop for something with more battery life than a Galaxy S II otherwise mine will give out before dinner.

And I can make phone calls with it too.

Monday, July 11, 2011

Epost er så 1995!

Her om dagen ble jeg gjort oppmerksom på at friprog senteret strever litt, de gjør greier for det i Farvel epost.


At det skal være lettere å følge opp henvendelser på twitter/linkedin/facebook virker mildest talt merkelig. At det skal gjøre det lettere for dem å ignorere eller svare nei på henvendelser de burde ignorere eller svare nei på virker også merkelig. Kan ikke tro at det vil gjøre bildet av henvendelser og hva som er svart på mindre oversiktlig. Status sefæren er et sosialt rom, ikke egentlig et saks- og henvendelses-behandlings-rom. Antar uten videre at de som bruker twitter/facebook/... seriøst til slikt sørger for å hente henvendelsene inn i saks- og henvendelses-systemet sitt så de kan se hva de har tatt stilling til og behandlet.

Nuvel, spent på hva de må gjøre for at dette skal lykkes - for andre verdier av "lykkes" enn "jeg følger ikke med på twitter" >:-)

Monday, June 6, 2011

Pad?!

Pad'er er er over alt. Så snart elkjøp fikk prisen ned under 4000 kroner på Samsung Galaxy Tab i våres stod jeg klar og handlet en til meg og en til frua. Så nå har jeg båret rundt på en pad de siste månedene. Det fine med 7" formatet er at en kan bære dingsen med seg uten å bære den i ryggsekk eller "man bag" som en fort må fram med for å få med seg en iPad.

Det er en veldig fin / helt grei tablet/pad. Skjermen har utmerket kontrast og høy oppløsnign, den går fint å lese på i lange tider. Og den er stor nok til å lese på. Men som elektronisk bok er den ganske tung. Den fungerer utmerket til MP3 spiller, men også som MP3 spiller er den i tyngste laget. Jeg får fremdeles et kick av trådløs MP3 avspilling til bluetooth headsettet jeg kjøpte samtidig fordi om det nok mangler noe på bassen.

Jeg har lest at det er noen som har returnert dem av forskjellige grunner. Til den første artikkelen må jeg bemerke at jeg var gjennom Internett-avhengighetsfasen min på midten av 1990-tallet (jeg er jo early adopter :-p ) og at jeg siden før 2000 har kunnet reise fra Internettet i mange uker av gangen uten å få tomhetsfølelse. Så jeg er ikke følelsesmessig avhengig av å følge med på Twitter eller Facebook hele tiden. Akkurat det er trolig en av killer-app'ene til Tab'er og smart-telefonene slik de blir etterhvert. Den andre er nok å lese epost kontinuerlig og kunne se på alle de morsomme videoene en får linker til hele tiden - siden det ikke er et Apple-produkt har den jo flash-støtte. Som virker helt fint.

Til den andre artikkelen må jeg si at ikke jeg heller er sikker på at de egentlig er nyttige til noe. De er ikke greiere til å notere på enn Palm Pilotene var på begynnelsen av 2000-tallet. Selv ikke med bruk av swype el.l. input medtoder. På en jobb tur på første delen av 2000-tallet hadde jeg med meg en Palm Pilot og sammenfoldbart tastatur for å notere. Det fungerte fint, og noe tilsvarende må nok til får å gjøre tab'er til møteroms inventar. ... Men da er nok laptop mer konvensjonelt. Så det nyttigste en kan gjøre med dem på møter er å surfe og lese epost under hele møtet. Men, det kan en vist gjøre med en laptop også.

Trafikanten og Gule Sider har fine app'er til Android, men de er mer hendige på en mobiltelefon. Det som kunne ha vært fint er bruk av radio og TV tjenestene. Men siden jeg ser og hører ca. bare på NRK og NRK sin radio streamings-app er ustabil, og en tredjeparts applikasjonen for å strame NRK-TV ikke var noe særlig gøy heller og mobil sidene til NRK nett-tv suger (de er mest av alt litt dårlig og tilfeldig vedlikeholdt) så blir det ikke noe av det heller.

Min konklusjon må bli at Tab'er for min del mangler killer app, og at en Android-telefon og en ganske lett laptop vil dekke behovene mine bedre. Så nå har jeg bestilt en Samsung Galaxy S II telefon - laptop har jeg.

Friday, March 18, 2011

Perl debugging

I use and love the perl debugger, it can do lots of nice and useful things, and is very useful to experiment with perl ad-hoc to help un-muddy waters. When I use a new library and I'm not sure about the data-structures or the exact contents of return values I can break the code at the right place, try some Data::Dumper helped print statements and formulate new code that helps me complete the code section.


But sometimes I find myself working in code where a dozen or more variables hold various values that need to go in a dozen places. If I've used the wrong one in one place I'm of course screwed. WTF, which variable held the right variant of the value? Working on router configurations I find I start with "interface FastEthernet0/1.346" and after a call or two the name has been exploded into "FastEthernet0/1.346", "fa", "fa0/1", "fa0/1.346", "346", and half a dozen of other variables holds various facts about the interface. And then the right name must be used in the right SQL queries. I hate having to type all the "p" statements to figure out which value to use where.

This made me deam of the Turbo Pascal 5-6 IDE I've used on MS-DOS back in the 1980ies. It had a watch variable window pane with continous updates. I often run perldb inside emacs to make some modes of debugging smoother, and the Emacs debugger mode supports watch variables - BUT only with gdb as debugger. After some googeling I found "pdkdb". It's a TK based GUI debugger for perl. It looks rather clunky and reminiscent of X programs of the early 1990ies but it has the watch variable window pane and convenient step over/step into/return/run buttons to control program execution. Lines can be made breakpoints by clicking on the line number. And if you mouse over a variable name there is a popup showing the contents of that variable. Excellent!

$ perl -d:ptkdb ./tester

Project page at sourceforge.

Tuesday, January 11, 2011

Bring some fiber

So, I work with network people now.


The first thing I learnt: Always bring some fiber in case you get lost. You can be sure a backhoe will turn up to dig where there is fiber.