Resurrecting a lost hard disk... The Sequal.

In a previous post I documented the use of ddrescue to recover the data from a failing hard drive.  I follow-up a few months later noted that the second drive had started failing but this time I was able to copy the data before needing to resort to the rescue tools.  As I promised, here's a follow-up.

After using the second replacement drive for just a couple weeks, I started noticing the same errors creeping into the "dmesg" output.  Though I know some manufacturers have a bad line, I've never experienced any that failed that rapidly, especially when the production runs and the differences were getting to be substantial.

My first thought was that the motherboad might be failing, unfortunatly I wasn't able to find an inexpensive SATA disk controller so I did the next best thing and move the disk to a different SATA port.  The move helped a bit but the errors still came back after a bit of hard disk activity.  On a whim I decided to change out the SATA cable with a different one in my collection.  Neither cable was especially "high quality" compared to the other, but when I put the new cable in on Dec 2 I haven't seen an error since.

I'm at a loss as to what has happened to the old cable - the drive and cable are well inside the case and not touching anything so I don't think it's a problem with wear, but it's possible there is some oxidation/rust on the cable that I can't see.

I hope this is my last update on this issue, but I'll continue the saga if it's necessary.

Programming the ATTiny chips using an Arduino Duemilanove and the Arduino IDE.

My two girls and I are making personalized home-made "Arduino Blinkies" this year.  We're making the "64 pixels" display that is written up here:

This project only requires three components:


  • An Atmel ATTiny2313 micro controller
  • An 8x8 LED grid
  • A two AA battery holde and two batteries


Up to now, all of my Arduino experience has been playing with a Duemilanove with the Atmel ATMega328 in the socket.  I have seen descriptions of how to use the chip "bare", but at $3-$5 I didn't really feel like experimenting with them that much.  (Plus if I did use one in a project, I would have to flash the bootloader onto it's replacement and I haven't tackled that yet, either.)

While poking around on the Internet looking for a fun project to introduce my girls to the other side of computers and how they work, I came across the 64pixels project, and that introduced me to the ATTiny2313.  This chip (also by Atmel) is on the small end of their line of compatible chips, and costs a whopping $0.95 per chip!  The entire cost of the 64pixels project is below $5 each, so I can afford to let the girls experiment a bit and not break the bank.

So, the first thing I had to do was determine how to program the ATTiny chips on my Duemilanove.  The pins on the ATTiny aren't the same as the ATMega so I can't just plug it in.  Terms such as ISP (In-System Programming) and JTAG (Joint Test Action Group) were tossed around and friends on my mailing lists offered to loan me theirs - but that was like loaning a pair of snow skis to a Texan.   I didn't know how to use it, or if I even really did.

Thankfully a few nights of searching the Internet found people had documented bits and pieces of it.  Through a lot of reading and trial-and-error, I've put together my notes on how to flash a common Arduino Processing-based program onto any Atmel AVR-based chip.

  1. Downloaded latest Arduino IDE (1.0.3).
    1. On my system, I'm running Linux, so I extracted it in $HOME/arduino-1.0.3/.  On a Windows system, you will install it as normal (presumably to the C:\Program Files\ directory).
    2. The "Arduino IDE" is the "Integrated Development Environment" that can be used to write, debug, and upload Arduino programs (called "sketches") to the chips.
  2. Downloaded latest “arduino-tiny” library files to add the necessary support files to the Arduino IDE so it knows how to create the proper code for the ATTiny line of processors.
    2. Followed readme.txt in .../tiny/readme.txt
      1. Extracted ZIP file into ~/arduino-1.0.3/hardware/
      2. Confirmed the boards.txt “upload.using” lines all read “arduino:arduinoisp”
  3. Setup Duemilanove to act as an ISP which will forward the programming the IDE does across to the ATtiny processor.
    2. Basic steps:
      1. Connect the Duemilanove to my computer
      2. Start the Arduino IDE
        1. For me I ran "~/arduino-1.0.3/arduino"
      3. Confirmed the Duemilanove was seen and communicating with the Arduino IDE
      4. Opened the “ArduinoISP” example program
        1. File -> Examples -> ArduinoISP
      5. Uploaded this program to my Duemilanove
      6. Leave the ATMega chip in the Duemilanove
        1. This step wasn't clear in many on-line tutorials.  Given that you have to upload a bit of code to the ATMega328 chip, leaving it in the Duemilnove programming board makes sense.
  4. Chose the correct ATTiny chip you wish to program from the Tools -> Board menu within the IDE.
    1. I tried both 8MHz and 1MHz, both with success.
  5. Connected the header pins on the Duemilanove to the pins of the ATTiny2313
    1. This is another step that wasn't clear in the other on-line tutorials.  Most walked you through what jumper wires went where for a specific chip, but no-one ever really explained what each wire was going to.  In short, there are four programming pins (plus GND and VCC) on the ATTiny chips that need to be connected: SCK, MISO, MOSI, and Reset.  If you have a different chip ("Introducing the NEW ATTiny9876"), as long as you match the "SCK" port from the Duemilanove to the SCK port on the new chip, and do the same for MISO, MOSI, and Reset, thse steps shoudl work.  (Assuming the "arduino-tiny" library has been updated, too.)
    2. Here's a quick grid showing the connection from the Duemilanove header ports to the ATTiny2313 pins:
      • The text on the right is from the ATTiny2313 data sheet describing the pinouts of the chip.  The bolded works should match the pins on the Duemilanove headers.
      • Duemilanove header <--> ATTiny2313 pins
      • Pin 13 (SCK) <--> Pin 19 (USCK/SCL/SCK/PCINT7)
      • Pin 12 (MISO) <-->Pin 18 (MISO/DO/PCINT6)
      • Pin 11 (MOSI) <-->Pin 17 (MOSI/DI/SDA/PCINT5)
      • Pin 10 (Reset) <-->Pin 1 (PCINT10/RESET/dW)
      • 5v <-->Pin 20 (VCC)
      • Ground <-->Pin 10 (GND)
    3. Again, for non-ATTiny2313 chips, find the SCK/MISO/MOSI/Reset pins and connect them the same.
  6. Upload a test program
    1. For my first test I use the basic Arduino “blink” program.  The program performs a digitalWrite to output #13, but pin 13 on the ATTiny didn't blink my LED.  After some poking around on the chip, I found that it was actually “pin 16” on the ATTiny2313.  Based on my testing I made a quick map of the "pinMode()" pin number to the actual pin on the chip.
      1. Outputs 0 through 7 map to pin (output+2)
        1. ex: output 3 -> pin 5
      2. Outputs 8 through 16 map to pin (output+3)
        1. ex: output 11 -> pin 14
  7. To run the chip standalone, supply appropriate voltage and ground to pins 20 and 10 of chip
    1. It may need to have the reset pin (pin 1) pulled high.
Sample test code:
int pin = 8;
int value = HIGH;

// the setup routine runs once when you press reset:
void setup() {                
  // initialize the digital pin as an output.
  pinMode(pin, OUTPUT);     

// the loop routine runs over and over again forever:
void loop() {
    digitalWrite(pin, value);
    if (value == HIGH) {
      value = LOW;
    } else {
      value = HIGH;
I was also interested in fading an LED in/out using PWM output.  From what I can deduce, the Arduino standard “analogWrite(pin, value)” only works on specific PWM pins that are marked with “OCxx” on the datasheet.  On the ATTiny2313, these are pins 14,15, and 16.
Sample test code:
// Define the pin to test for analog features
int anapin = 13;
// Define a digital pin to flash each time the 0..255 analog cycle has completed.
int digipin = 2;
int value = 0;

// the setup routine runs once when you press reset:
void setup() {                

  // initialize the digital pin as an output.
  pinMode(anapin, OUTPUT);     
  pinMode(digipin, OUTPUT);

// the loop routine runs over and over again forever:
void loop() {
    analogWrite(anapin, value);
    value += 2;
    digitalWrite(digipin, LOW);
    if (value >= 255) {
      value = 0;
      digitalWrite(digipin, HIGH);

Cell phone companies double-dipping into my personal life.

"Dear Verizon - I know my monthly $150 donation is barely adequate for you to resolve the spotty reception and poor data connection quality I experience, so please make additional money by selling my private calling and location information."

I don't mind companies making a profit, even when they are profiting from my personal information.  Case in point Facebook, Google search, GMail, YouTube, Yahoo, CNN, HotMail, etc.  All of these "free" sites have a hidden cost - when we enter our information (name, age, email, address) or even use it (thus supplying them with our "usage pattern" information, possibly location, etc), they can then collect that information to start making highly intelligent facts about us known.  For example, "Dan checks his e-mail and facebook over lunch while he's sitting in Burger King 90%, so lets put ads for 'weight loss' and 'Subway' along the side."

But, there is also some more devious information contained within our on-line checkups.  "At 12:15, Dan logged into his personal e-mail and facebook pages from the Burger King at 114th and Dodge, and will probably be there for the next 40 minutes.  He is currently 20.1 miles and 24 minutes away from home."  Based on that bit of information, it would be extremely easy to break in to our house and be highly certain that I wouldn't return.  Thankfully, this type of location information is restricted to the sites marketing departments....yeah, right. Google and Facebook sell this information enmass - it's a big portion of their business model.

As I said, I don't mind the free sites I use paying their bills by selling ad space - in this case, I'm the 'product' being sold.  But, when I pay for a service, I don't want them double-dipping and selling my personal information on top of charging me for their services.  Case in point, the cellular telephone industry.

Quick question: Who wants to sign up to let a large company track our every move 24 hours a day for two years?  This may include information about our web browsing history, private communications via voice, text messages and e-mail, exact location, etc.  Sounds like a dictator state dream situation?  Me too, but I signed up anyway...and I see you've joined too.  You're carrying the only piece of equipment necessary to do this - your cell phone.  In my case, I've even opted into the advanced photo documenting feature since I take most of my family photos with the integrated camera - each of them are geo-tagged with the location, and Google does a good job of facial recognition.

Again, I'm ok with Google doing this since I use their service to store and share the photos with friends and family far away.  I'm sure they could scan for a child in a birthday hat, and put up ads for toys to send.  Now to my main beef and the subject of this post.  I feel that a service that I pay for should't be reselling my private information, too.  Case in point, the industry 'Customer Proprietary Network Information' (CPNI).

For my family, we pay Verizon wireless over $150 a month for three phones (two smart-phones, and a feature phone for our daughter).  Contained within our CPNI information are nuggets of valuable information such as who we contacted (via voice and text), how long we talked, where we were when we used these services, etc.  Unfortunatly the CPNI information pages at Verizon and AT&T aren't specific in the exact details, but one can surmise that there is other additional information contained that would be valuable to better "know us"  for marketing purposes.  (And by "know us" I'm not meaning they want to give us gifts...)

I'd suggest everyone with a cellphone go to your providers site and update your CPNI options so that it is kept private.  Here are the links I've been able to dig up:

  * See the section titled "How to Limit the Sharing and Use of Your Information".  You'll have to call the CPNI phone number for your state from each phone that you want to opt-out

  * See the section titled "Restricting our use of your CPNI" for the contact number to call.

T-Mobile: e-mail (Please reply if you find a better opt-out URL.)

Remember, the "C" in CPNI stands for Consumer - remind your carrier that you're paying for the services and believe their re-sale of our information is irresponsible.

Resurrecting a lost hard disk...

My main desktop system had been powered off for a few weeks and when I powered it on last weekend it wouldn't boot up completely.  It was complaining about errors, specifically errors on the hard drive that contains all my documents.  The 1.5TB hard drive had been my home directory repository for the past 2 years and never gave any signs of trouble.

I grabbed my copy of SpinRite and set it loose, but it too had troubles and would get a "divide by zero" error less than 30 seconds after starting.  Not good.  I got a bit further by removing it and trying to access it via a USB-to-SATA adapter I had.  This time SpinRite could work on the drive (albeit VERY slowly), but I still couldn't mount it or see the data.

After playing with it for most of last weekend I resigned to the fact that it was dead.  Thankfully, when I purchased it via in November 2010, I spent the extra money and got the Western Digital Caviar Black 1.5TB SATA drive - it came with a 5 year warranty.  It was well within the WD warranty period, and when I got to their on-line support page, all I had to do was enter the serial number and choose cross-ship, and the new drive was waiting for me two days later.

While waiting for the HDD, I did some research and came across the "ddrescue" utility - a Linux utility that tries to read raw sectors from one place and dump them to a new file or other drive.  The tool will attempt to re-read sectors a few times, but it will log the bad sectors in a file, then continue with the rest of the disk.  I decided to give it a try, and I'm glad I did.

When the replacement drive came, I put it into the case and attached the failing drive to the USB adapter.  After running SpinRite on the new drive for a couple days (even level two took 20+ hours - I couldn't really start on recovery until Friday night), I decided to try the ddrescue tool.  I ended up booting from an Ubuntu "LiveCD" and running a very simple command:

ddrescue -v /dev/sdc /dev/sda disk_rescue.001.wri

That setup ddrescue to be verbose ("-v"), try coping everything from /dev/sdc (the failing drive), to /dev/sda (the new drive), and save all your notes in a file named "disk_rescue.001.wri".

It was slow going at first.  It was maxing out at about 500KByte/second - that was going to take a long time.  As I let it sit for about 15 minutes, I noticed that it started leveling out at a better 6MByte/second.  Still not great but better.

Since ddrescue will auto-restart if it has a log file to reference, I killed it and re-ran it with the "-c 1024" option.  Using "-c" told the program to take a bigger "bite" each time it copied from the disk, and that seemed to bring the speed up to 32MByte/second.  Playing with different values didn't help the speed much, so I let it run over night and it was still running Saturday morning.  It ran all day Saturday, but finally finished sometime early Sunday morning.

I could tell tha some stuff was being copied - I used "dd if=/dev/sda count=25 bs=1024k | strings | more" and could see some text - but mounting the new filesystem reported a lot of errors about the superblock.

I did some additional searching and found this blog entry:

How to repair a broken ext4 superblock in Ubuntu 

In short I ran "mke2fs -n /dev/sda1" and found a number of superblocks listed so I chose one more in the middle.  I then ran "e2fsck -b {block_number} /dev/sda1" and let it run.  The first time I ran it, there were a lot of errors that I had to answer "y" to, and a large but smaller set the second time.  I then re-ran it with the "-y" option to tell it to fix everything and it finally came up clean enough for testing.

I was finally able to mount it and look around.  To my initial suprise, the home directory wasn't there, only a "lost+found" directory.  Checking there, were a few new files, and under there was my "dan" home directory.  I moved it to the correct location on the disk and rebooted my system.

WOW!  My system came up and I was able to login with my "dan" account.  Much more suprising, all of the videos and bigger files I have checked so far have been fine.

I think I dodged a bullet this time - I really need to look into a true RAID NAS solution for my home directory.

P.S. I know that a RAID system does NOT take the place of a true backup.  We do have an off-site backup solution that I use to keep important files in: JungleDisk with Amazon Simple Storage Service.  It's not the cheapest solution, but it encrypts with my private key BEFORE sending it out ot the Amazon servers, and I can use the program on as many systems as I need.  Highly recommended.

Geezer-Geek and the Kids

On a mailing list I am on, the discussion turned to the new "Raspberry Pi" sub-$40 computer on a card.

It was mentioned that part of the reason it was produced was to give school kids something that was powerfull enough for them to play with and inexpensive enough to use in a classroom setting.  The old "back in my day" discussion came up - "We had to type in hundreds of lines of code from a magazine to get a stick-figure person to walk" and other such nostalga.

I tended to agree with that being that I cut my teeth on a Timex/Sinclair 1000 and remember having to learn skills debugging the magazine programs that have served me today (patience and careful observation being two key lessons).

At the same time I came across this article: Is raspberry pi a mid-life crisis?

The jist of this article is that we're old codgers, and our children are probably going to blaze just as new trails as we did.  Mostly because of the advanced tools they have now that were built on what we learned.

Hmm...not what I wanted to hear, but it made me thing about my experiences with tech as a kid and how it relates to my daughters.

 I've struggled trying to find a geek interest that I can share with my daughters (11 and 6), but so far their only interest in computers is to use them to view "Cute Kitten" YouTube videos and play some on-line games for school.

But, what I'm missing is that I was very much into the "hands-on technical" tinkering on anything (building carts, modifying my bike, building with legos, playing with my dads electronics tools, etc).  My 11 year old is showing an aptitude for science and literature - I need to change *myself* to find something in those fields that I can apply my technical interests in and show her how to use computers/technology to do the next great thing in those fields.


Now, I just need to find a way to make the Raspberry Pi help my daughter produce great literature while displaying videos of cute kittens...



Cutting the cord - not for the faint of heart...

We're taking the plunge!

After talking about it for years, I'm finally getting things setup to cut the cord with our cable TV provider and try living with the local over-the-air (OTA) and Internet provided video content.

Before doing this, I had to ensure that the rest of the family was going to make the transition with me.  We had a number of deciding factors:

  1. The cost of the Cable TV services is around $3/day for our fairly basic package.  This fact alone was enough to convince my wife (the chief financial officer of our house) to allow me to spend a bit of extra money to make this all happen.  Some of the pieces that I had to purchase were in the $80-$100 range, but that's just about the cost of one months service, so they will pay for themselves in just a few months.
  2. We're not a "glued to the latest sitcoms or cable-only TV shows" house.  A few friends of mine are absolute glued to the latest installments of some for-pay-only channels (i.e. HBO "Game of Thrones"), or cable-only channels ("CourtTV", "Spike", etc) (see "Bit-torrent shows" below).
  3. The shows we do find hard to live without are easy to get through other means.  My wife and I don't watch much, but when we do it's usually on one of the local channels ("Big Bang Theory" on CBS, or "Masterpiece" on PBS).
  4. The shows that are not broadcast locally can usually be found through other legal means.  We're currently subscribed to NetFlix, mostly for the entertainment of our two daughters (10 and 5).  For them, NetFlix has enough variety that they enjoy being able to watch multiple instalments of a show at a time.  As a bonus, they can continue to watch their shows when we're on long trips or waiting at the doctor/dentist office.
  5. There are a number of streaming services that can provide a broader variety of shows.  The previously mentioned NetFlix is one popular example, but another one that has caught on is the Amazon OnDemand service.  If I subscribe to the sreaming-only options for both, it comes to roughly $14.60/month ($79/year for Amazon, and $8/month for NetFlix).  We've opted for the NetFlix "stream plus one DVD" so our total monthly cost for these videos is $22.60.  Compare that to the $90 we were paying monthly and it starts to make sense quickly.
  6. The ability to DVR shows was an extra cost that was driving me insane.  We originally got along with the basic cable package, but when our stand-alone DVR died and we had to go to the one provided with our cable company, we had to upgrade to the basic digital package (extra $$), then pay an additional fee for the DVR capable set-top-box - in total about $30/month in additional fees.  Considering the old DVR we had been using (a very solid ReplayTV system) only initially cost $250, I should have looked into a replacement rather than diving further into the Cable TV trap.
  7. We also want to move all the DVD videos to a "media server".  This isn't a direct replacement of an existing cable TV solution, but it is going to be a side-effect of getting this all setup.

So, those are all good benefits, but there are some drawbacks.


  1. There will be something that we don't get that will be "popular" or a "must see" that we will miss.  For our family, the Disney channel might be that one channel that or daughters will miss.  But then, I lived without all the drama that HBO's "The Sopranos" provided and they are now available via DVD on NetFlix - I can get hooked on them if I really want to.
  2. The technology to make this all happen is going to be completely home built.  I am technical enough that the maintenance doesn't scare me.  What I don't want to have happen is failures to record the special TV show (i.e. daughter on TV, or "final episode" of a series).  But then that has happed with our current cable company provided equipment too, so I've been through that pain already.
  3. The complexity of this may not be acceptable to my family.  To quote Iyaz Akhtar of "This Old Nerd", the "Partner Acceptance Rating" is paramount to making this all work.  I have to make sure everything I put in place isn't overly complex.  Thankfully, all three are fairly geeky so I don't think this will be too much of a problem.


To make all this come together, I'm starting to put together a list of items we need to pull this together:


  • TV tuner - First and foremost is the ability to bring in TV signals and convert them to something a computer can use.
  • DVR software - This is the software that will schedule the shows to record, plus provide a method to view them.
  • Media server system - The actual hardware and OS that will store the files and run the DVR software.
  • Media "extender" - Since I will want to put the recorded video on other TVs...


The "Third Rail" of topics, bit-torrents for shows.

When I first started discussing cutting the cord with my brother-in-law a number of years ago, I lamented the lack of programing available on the local OTA stations.  (At the time, NetFlix and Amazon OnDemand hadn't been created yet.)  He had just created a bit-torrent downloader, and was having good luck with it, even setting up an automatic DropBox site to let him start downloads at home just by adding a torrent link via his smartphone.  That was working well, and the automation to download the files and get the latest episodes worked great...until the cable company sent him a lettering telling him to stop.  I'm undecided how I will proceed on this front.  I will probably save the bit-torrent setup for a later experiment since most of what we watch is available OTA, but I'm sure there will be a time when the cable TV and networks lock us out and I'm tempted to look at less than socially accepted means.

Jan 2014 - An Update

For anyone coming across this post 18 months after it originally went up, I wanted to say that my wife and I dropped cable TV a few months ago (October or November 2013 if I recall correctly).  

I wont lie, there are times I wish we still had it so we could see the "hot show" that co-workers are talking about, or the DVR with multiple tuners so we could record multiple conflicting shows at once.  But then, we're spending more time reading and finding other things to do and that's the bigger advantage.

In a future blog post I'll give you my impressions of the DVR solution I'm using, the Simple.TV 1.

A SOPA-opera...

It's late in the afternoon.  Mike had just turned off the television finding nothing of interest to watch, when there was a loud knock at his front door.

Mike opened the door and sees three men dressed in black commando fatigues on his door step.  "May I help you," he asks?

"Mr. Godwin?" the first agent asks.

"Yes, I'm Mike Godwin[i]...who are you?"

"We're with the SOPA enforcement department of the government."

Mike stares back blankly, "The what..?"

"SOPA - Suppress Offending Pizza Assembly.  We have a report of a few local pizza restaurants that are using patented sauce application methods, so we're here to update your phone book as mandated by the law.  Please step aside."  And with that, the three agents push past Mr. Godwin and move toward the phone book on a table across the room.

Mike had been through this a couple of times in the past under different department names.  The first few visits replaced the pages of the phonebook for "escort" services and some shady pharmacies selling counterfeit drugs.  Mike knew he didn't want his kids having access to those sites so he and his neighbors readily accepted the changes.  But lately the changes had been for other things that didn't seem all that necessary...

"So, what is going on with these pizza places?  Are they fronts for drug dealers or human trafficking," Mike asked.  "I have a couple tips on some places I drive by on my way to work that we're certain are up to no good - I can give you their address if you want..."

"Hmm, no," the second agent responded, cutting off Mike.  "We need a reputable source[ii] to take this sort of action, not hear-say."  He resumed thumbing through the phone book, tearing out pages and replacing them with pre-printed pages they brought along.

"So, what were these pizza restaurants doing?"

Agent one turns toward Mike, obviously agitated in his questions.  "They were found to have been infringing on highly guarded trade secrets."

"What?  You mean they stole the recipe for the dough or sauce...?"

"No, much more insidious.  We have reports from highly trusted individuals that they were infringing on the sauce application procedure as documented in the patent held by Pizza Shack and cross-licensed to Dice Pizza[iii]."

Mike remembered that case, it made news a couple years ago.  Pizza Shack had settled a case out of court against another national pizza chain, Dice Pizza.  The argument by Dice Pizza was that the crust/sauce/cheese was the logical order to begin making a pizza, but Pizza Shack had received a patent for just that process a few years earlier and had successfully put a number of smaller pizza restaurants out of business.  The end result of this settlement was that Dice Pizza could put the sauce on between the dough and cheese, and they would allow Pizza Shack to use their patented "automobile pizza delivery" method Dice Pizza had patented around the same time.

Mike looked over their shoulders and saw one page being replaced.  "Wow, I had no idea "Pauls Pizza" and "Kevins Resturaunt" were doing this.  Were any of you involved in the surveillance or a sting operation?"

"No, we're too busy changing out the phone book pages to do that work.  Thankfully Pizza Shack provided us with a list of infringing sites, so that makes our job a lot easier."

"Wait, what?  You're just taking their word without checking into it yourself?"

"Oh, it's all legal - SOPA and the DMCA laws allow for this, and since there's no way our department could check all of these reports, we're grateful for their assistance."

After a few minutes agent one reports, "Our work is done for now.  Please remember it's up to each person to be vigilant in the war on terror...I mean intellectual property piracy.  Please inform us of any neighbors who might be using outdated or other phone books so we can keep them updated, too."  And with that, the agents closed the phone book and walk out the door.

Mike picked up the phone book and turns to the section marked "Pizza".  The page that use to contain ads for the offending restaurants was replaced with a new page only showing "Pizza Shack" and "Dice Pizza".

"What about "Mr. Levi's Pizzeria?  That's one of our favorite places?!?"  Mike knew they would be in the clear with SOPA because their big claim was they didn't use pizza sauce[iv], so they couldn't possibly break this law.

Mikes young daughter entered carrying a small phone book, but this one is from a foreign city.  "Here dad, I got this from a friend in school.  She was able to snag a copy when she was visiting her grandmother in Romania - they don't have SOPA or the DMCA there, so this is a complete un-censored phone book."

She opened the page and showed him hundreds of listings for pizza places all over the world.  After a few seconds she found the listing for Levi's and Mike started dialing the phone.

Before he could finish dialing, the line went dead and the door burst open.  Agent one was again in the room.

"Mr. Godwin! We just updated your phone book, but we caught you using an illegal foreign one.  Can you explain yourself?"

"Your update removed the entry for our favorite restaurant.  There must be some mistake!"

"Highly doubtful considering the source, but if you insist, you can file a formal complaint and work it out through the legal system."

"A lawsuit?!?  That will take weeks if not years to complete - what do I do if I want pizza until then?"

"That's why we left Pizza Shack and Dice Pizza - they aren't infringing on anything so they are perfectly legal."

Mike was speechless, the agent continued.  "And there's the small matter of the circumvention clause of the law."

Mike saw the agents gaze move to his daughter, and at the same time the two other agents grabbed her arms and restrained her.

"We've been keeping an eye on you and your friends at school.  Our counterparts in Romania have been watching your friends grandmother - she's claiming her innocence, but thankfully the ACTA treaty has allowed them to use our evidence in her trial.  I suppose you're one of those Free Software protesters as well, eh?"

And with that, the two agents pulled the young girl out the door to their awaiting van.

"I'm sorry for the interruption Mr. Godwin.  She'll be processed at the federal courthouse since this is a federal offense, but you can visit her some time tomorrow after she's been processed in.  I'd suggest hiring a lawyer.  The approved lawyers are still listed in the phone book."

[i] Homage paid to Mike Godwin - formulator of "Godwin's Law",'s_law.

[ii] See the Universal Music Groups use of the existing DMCA law and how it may have been abused,  Also see how Warner Brothers mis-used the existing DMCA law,  Or just use your favorite search engine to search for "DMCA abuses":

[iii] I'm envisioning the "Dice Pizza" to mirror the "Dominoes Pizza" logo, but using two dice with the values of "3" and "4" - the year in the 20'th century when Hitler became dictator of Nazi Germany.  Again, a nod to Godwin's Law.

[iv] Too much MSG in the sauces of other pizzas...

Merry Christmas - my "Three Gift" rule

Merry Christmas!

Sorry for the delay this week - I'm on vacation for the Christmas Holidays, and I totally lost track of the time!  In the busy season, it's easy to overlook the true meaning of the season.  I won't get all sappy over the meaning - it's been covered by others much more eloquent than I, rather I'd like to note how we try to keep the meaning with our two girls.

With the world of inexpensive plastic toys, free delivery from Amazon, and the numerous advertisments for thousands of nearly disposable toys - have you ever priced the cost of the special batteries some toys use?!? - it's all to easy for parents to get on the slippery slope of buying their children lots of presents and finding that they are expecting to get the same number as their siblings.  (And of course the price of the toy makes no difference - the 1 year old will love the box more than the $30 toy that came in it.)

To help remind our daughters of the reason, we've implemented a gift rule based on what "Baby Jesus" received on his birthday.  No, we're not forcing them to exchange gold, frankincense, and myrrh (though Kris would not turn down anything made of gold), rather we've implemented the "three gift" rule.  Each child receives three gifts: one from Santa, one from Mom and Dad, and one from a sibling.  There's a significance for each:


  • One gift from a sibling.  This works out easy in our family since we only have two children - it's a simple exchange in their mind.  In larger families, it will limit the expense that the family will experience plus keep each child guessing which sibling got their name and what they were given.  This allows each child some time to reflect on what they want to give to their sibling - you might be suprised what ideas some will come up with (and it won't leave us parents struggling to decide).
  • The gift from Mom and Dad symoblize the love of parent and child.  This relation ship is special and significant enough that it's not a gift lumped in with the other gifts they might receive.  My wife and I also use this gift as the "big gift" for the year to that child.  For instance, when we decided to get one daughter a video game she wanted.  Knowing that the inevitable "I got this gift from ____" might be uttered, we decided that we could bypass that argument completely if we gave it to her.
  • The final gift from Santa will change over time.  Right now, our youngest still believes in him but our oldest has figured it out so for now we're keeping the big guy in our list.  Once our youngest finds out, we'll have a family discussion about this gift.  Since "Santa" easily represents a caring individual to the youngest people, this present will transform into a gift to symbolize the season.

To keep this a tech-oriented blog, here's how I used some tech this holiday season.


  1. I entered all of my families Christmas lists into Evernote so I could easily update it wherever I was (on my laptop at work, on my iPad in the living room, from my phone in a store, etc).
  2. I set a reminder on my calendar (synced to my phone and GMail account) that will remind me to purchase a "stocking stuffer" gift for my wife next year (calendar reminder for Dec 14, 2012).  (Yeah, I forgot and had to scramble for something Christmas Eve...)
  3. I used the Amazon gift lists to help make sure I pointed other familly members to the correct item, and I bought from the lists my sisters stored on-line.  No more guessing!
  4. I setup the "X10" receiver to control the Christmas tree lights so they weren't left on (or off) for a long period of time.

Many of the gifts I received this year are also tech related:


I'll wrap up for now - hope your Christmas was great!


Folders for managing (email) floods.

For years I've had to keep track of communicaiton with many  customers at numerous sites covering a wide variety of questions and conversations.  Thankfully most have well defined start and end points to the conversations, so it is possible to use folders to segregate down to this level.  The ability to do this is key to my ability to juggle numerous conversations and not drop details.  For my work e-mails, I use the corporate standard "Microsoft Outlook", but most all e-mail clients (GMail, Yahoo, etc) support some level of mail automation.

For me, the key to this filing system is reflected in what our mothers always told us: "Put your things away."  Taking this to the on-line world, I try to keep my in-box free of clutter and only items that are one-offs or unique conversations that aren't expected to last long.  Where possible, I suggest automating if possible.  Most if not all modern e-mail clients allow for fairly complex rules to handle incoming e-mail.  At some level, the basics of sorting on the From: address into a common folder for emails from a specific person are available in even the most basic e-mail clients.  Beyond that, automation might take additional skill but should not be insurmountable.

For my needs (I'm a consultant and support engineer for my company), I have broken down my job functions by teams, customers, and trouble ticket.  The teams folder is a general category such as "consulting" , "support", etc). At this level i've also created a few unique foldes for "Inactive Customers", and some e-mails that don't fall into the categories but still need to have some long-term attention.  Since Outlook default sorts the folders based on name, I pre-pend a "@" to the name of folders that I want to appear at the top of the list.

Within those team groups, there are sub-folders based on customer names.  Eg: Appleton, Bakers, CostCo, Dales, Google, etc.  (It is at this level that I would suggest setting up filters based on the From: address if you do any automation.)  For the rare case that two customers might have the same general name (i.e. two separate divisions within the same parent organization), you might want to create a "Company-SubGroup" name (eg: "Google-SearchTeam" and a "Google-AdSenseTeam")

Before continuing, I'll take a quick departure into sorting and dates.  The standard US method of writing the date, Month/Day/Year, doesn't sort easily on computers.  Sure, they can be setup to sort this correctly, but the chronology of the folders is broken for simple views the email client provides.  Over the years I've got into the habbit of writing the date as Year/Month/Day, and having battled programming glitches introduced during the "Y2K" event, I've got into the habit of using a full four-digit year so my folders.  Using a dash ("-") for the field seperator, I write the date for "June 6, 2011" as "2011-06-14".  These date strings automaticall sort correctly and don't require any additional work on the part of ourselves or the email system.

Most people might be able to stop at this level.  For my job I may have numerous open support tickets with each customer, so keeping these different trains of though separated really helps track the current state of each problem.  For my needs, each time a new ticket comes in I create a folder named with the date of the initial contact, the support ticket number and a brief description of the problem.  This allows me to quickly scan for the problem should a co-worker need a quick update on my current workload, or quickly dive into the project without having to request and search for the un-friendly "ticket number".  So, a problem that came in on "June 14, 2011" and a subsequent ticket of "ID1234" was created to resolve "Problem updating server address" would look like this: "2011-06-14:ID1234 - Problem updating server address". 

When I finally resolve an problem tracked in the folder, I then move the entire folder to a "Closed" folder inside the customer name folder.  Depending on the ammount of information tracked about each customer, I'll create folders within here, too.  For example, for most customers I create a "year" folder (i.e. 2010, 2011) and move the ticket folder directly there.  For some customers I've found it necessary to break this folder into additional levels so I create folders based on the quarter (i.e. Jan-March = First Quarter, April-June = Second Quarter, etc).  So, using the example above, i'd move this into the "Closed/2011/2011Q2" folder.

This organizational method has allowed me to keep abreast of a huge volume of tasks and conversations.  No, it's not perfect.  There are times when the flow of incoming messages increase and the time required to sort them into their folders can be substantial, but continuing to do this allows me to quickly review the latest updates and resume when time permits (or the project has become the latest "high priority").

Some people have mentioned the "Inbox Zero" - - as a possible solution.  I haven't read the details, but from what I understand it looks to be another method to keep the flow of elecrronic thoughts organized in our daily lives.

Podcasts from the command line.

I work from my home office, so I don't have to listen to what the guy in the cubicle next to me likes.  That's good and bad, but in my case it's a moot point - my office in the basement can barely pick up any local radio stations.  Just a few short years ago I would have had to resort to a collection of CDs or tapes (or running a long set of speaker wires from the livingroom radio down to the office).  Thankfully, the technology came about and rescued me from boredom of the same CDs on endless repeat - enter the Podcast.

From the Wikipedia entry, the term came about in early 2004.  I must have been right on the cusp, because it wasn't too much after that time I was finishing our basement and ran into the entertainment problem.  Somehow I came across some tech related podcasts (DailySourcecode, TWiT), so I downloaded a few and played them through my laptop.  That all worked well but it meant each time I finished one, I had to take the laptop back up to the network connection (WiFi router died and hadn't been replaced) and download the next one.  A podcast is nothing more than an MP3 file, so copying the files to the laptop is quick but still another step that I had to do manually to make sure I didn't re-download a show I had already listened to.  After a couple evenings of this I started searching for a way to download them in the background when I was at work so I could have hours of un-interrupted geek-talk while working in the basement.

A quick bit of Googling lead me to BashPodder.  Since I was running Linux on my home system, this was a great fit.  (Though the BashPodder website says that it runs on many other OS's including MacOSX, Windows, etc.)  There are only three real files you need to make it all work:

1: The script - this is the main program that retrieves the requested podcast files.

2: The parse_enclosure.xsl file - this is used by the script to extract the podcast file names and download URLs.

3: The bp.conf file - This is a simple text file containing a list of URLs pointing to some website feeds for their podcasts.

Download these files from the BashPodder website, or you're welcome to use my tweaked version here.

Finally, to listen to them from the command line I wrote a script I cleverly call "Play And Delete" or "pad" for short.

Here's the script I am currently using:

# By Linc 10/1/2004
# Find the latest script at
# Revision 1.21 12/04/2008 - Many Contributers!
# If you use this and have made improvements or have comments
# drop me an email at linc dot fessenden at gmail dot com
# and post your changes to the forum at
# I'd appreciate it!

if [ -e /var/tmp/bashpodder.FAIL ] ; then
	echo Will not run - /var/tmp/bashpodder.FAIL exists.

# Make script crontab friendly:
cd $(dirname $0)

# datadir is the directory you want podcasts saved to:
datadir=$(date +%Y-%m-%d)

# create datadir if necessary:
mkdir -p $datadir

# Delete any temp file:
rm -f temp.log

# Read the bp.conf file and wget any url not already in the podcast.log file:
date >> ordered.log
while read podcast
	file=$(xsltproc parse_enclosure.xsl $podcast 2> /dev/null | sed 's# #%20#g' || wget -q $podcast -O - | tr '\r' '\n' | tr \' \" | sed -n 's/.*url="\([^"]*\)".*/\1/p')
	for url in $file
		echo $url >> temp.log
		if ! grep "$url" podcast.log > /dev/null ; then
			name=$(echo "$url" | awk -F'/' {'print $NF'} | awk -F'=' {'print $NF'} | awk -F'?' {'print $1'})
			# Fixes for different URLs that parse to incorrect file names.
			# Buzz Out Loud has the name first but it's a redirect URL...
			if [ $( echo $url | grep 'dl_dlnow$' | wc -l ) ] ; then 
				name=$(echo $url | awk -F? '{ print $1 }' | awk -F'/' '{ print $NF }')
				#echo FIXING: $url
				#echo NEWNAME: $name

			wget -t 10 -U BashPodder -c $QUIET -O $datadir/$name "$url"

			touch $datadir/$name
			echo "$url" >> ordered.log
	done < bp.conf
# Move dynamically created log file to permanent log file:
cat podcast.log >> temp.log || EC=1
cp podcast.log podcast.log.previous || EC=1
sort temp.log | uniq > podcast.log || EC=1
rm temp.log || EC=1
if [ $EC -gt 0 ] ; then
	echo FAILED to update podcast.log file. > /var/tmp/bashpodder.FAIL
	touch /var/tmp/bashpodder.FAIL
	exit 9
# Create an m3u playlist:
ls $datadir | grep -v m3u > $datadir/podcast.m3u

# Misc cleanup
mv */*JPG /home/dan/Pictures/Backgrounds/

Most of the changes I have made were to fix problems on my system.  One update I made was to better handle a filled up my hard drive - this really got the BashPodder script all confused as to what to download.  The script writes a "podcast.log" file that it uses each time it runs to determine if it needs to download a podcast or not. If the podcast URL doesn't exist in the podcast.log file, it downloads it and adds that URL to the file.  That works great until the drive fills up and it is unable to update this file.  In my case, the log file got erased so when I did free up space, BashPodder had to start over and tried to re-download everything.  (Some day I'll document how I fixed that, but not today.)

My changes start at line 13. If the 'magic' bashpodder.FAIL file exists, it means there was a problem in a previous run and the system needs human intervention.

Line 30 adds a simple date to my log file named "ordered.log".  I wanted to keep track of when a file was downloaded, so this helped me track that for later review.

Lines 38 through 50 are a mixture of original and new code.

  • Line 38 tries to pull out the file name that will be used later.  Some podcast URLs confuse the parsing done by the parse_enclosure.xsl template, so this helps lines 41 through 45 fix the name if necessary.
  • Line 47 was modified slightly to use the new name if necessary
  • Line 49 makes sure the date of the file matches the current system time.  The 'pad' script sorts the files by their timestamp so this keeps them accurate.

Lines 56 through 64 have a lot of additional error checking done on them.  If any one fails, the script creates the bashpodder.FAIL file mentioned earlier, then exits to let a human fix what's wrong.

Line 69 is a hack, but it works for me.  Some URLs I have BashPodder monitor have backgrounds uploaded to them.  I have these files moved to my Backgrounds folder rather than manually moving them myself.  (I'm lazy, so sue me!)

The parse_enclosure.xsl file I use is un-changed from the official BashPodder version.

My listening is also done at the command line and using VLC to play the video or audio file.  After listening to a nights worth of 20-30 minute podcasts, I could have a number of files and directories to clean up.  I wrote my "Play And Delete" script to take care of tha for me.

# VLC Options:

if [ -z `which vlc` ] ; then
	echo Could not find vlc: `which vlc`
	exit 1;

EXT=`echo $FILE | rev | awk -F\. '{ print $1 }' | rev`
echo Playing: $FILE \($EXT\)

# Set the size of the new VLC we open.
# Note: if file ends in .mp4, use a different size.
if [ "$EXT" = "mp3" ] ; then
  echo Resizing screen for $EXT extension.
  (sleep 2.0 ; wmctrl -i -r `wmctrl -l | grep VLC | awk '{ print $1 }'` -e $SIZING ) &
  echo "Not resizing an $EXT file."

echo RUNNING: vlc $OPTIONS $FILE vlc://quit
vlc $OPTIONS $FILE vlc://quit 2> vlc.err
echo Exit code: $EC
if [ $EC -le 0 ] ; then 
    echo Deleting $FILE
    sleep 2
    rm $FILE
rm -f `dirname $FILE`/*.m3u
rm -f `dirname $FILE`/.directory
rmdir --ignore-fail-on-non-empty `dirname $FILE`/../* 2>/dev/null

PAD basically takes a path/filename and tries to play the file with VLC.

Line 6 tries to confirm you have VLC installed and available in the path, otherwise it exits.

Line 12 gets the extension of the file (mp3, mp4, avi, etc) so lines 19-21 can move and re-size the vlc GUI to the lower-left corner of my screen.  I don't resize video files, so if it isn't an MP3 I don't do anything.

Line 27 calls the VLC command to play the file.

Lines 28 through 34 monitor the exit code for VLC, and if it exited normally (i.e. got to the end of the podcast), then the script deletes the podcast from the disk.  This autocleanup is great, especially for some of the larger video podcasts that can be 200+ MB in size.

Lines 35 through 37 try to do some additional cleanup.  Since I don't use an MP3 player, I don't need the M3U files, and I also try to remove all of the empty directories.  (BashPodder saves the files into directories named for the year/month/day the download was performed.)

My bp.conf has a lot of additional entries.  I won't clutter up this page with it, but if you're interested in what I'm pulling down you're welcome to contact me for a copy.  (I'll give you a hint - I'm a big fan - Hi Leo, Tom, Iyaz, Sarah, and Steve!)

A big thanks to Linc and his work on the initial BashPodder script.  Once I had that framework I was able to add and tweak it to fit my needs - I hope it helps others too.

Speeding up the daily grind...

I telecomute for my current job. Yes, it is nice to be home when the kids get out of school, but it also means that I'm only a minute away from being back at work. And since my current position is on-call support for our customers, there are evenings that I make multiple trips "to the office" to assist customers. Making the most of my time matters, so speeding up the time needed to get the job done was important. One of the biggest pains was restarting the VPN and associated tools after logging off for the day.

For most organizations, telecomuters are a small fraction of the user base and thus a small fraction of the time the IT department has to devote to supporting their unique needs. In the course of the day, there are a number of resources I need to access (support ticketing system, e-mail, file shares on servers, the Internet, etc). The documented method that the IT team provides is aimed at the average user who needs to get in from the hotel or on the weekend, but not for those of us who rely on using it 8+ hours a day. As necessity is the mother of invention, I came up with this script to automate my regular login steps on my Windows 7 laptop.

Script named "SetMappings.bat":

@echo off
rem Set %USERNAME% to your login ID on the %DOMAIN%...
set VPNNAME="MyCompanyVPN"
set USERNAME={UserName}
set PASSWORD={Password}
set DOMAIN={CompanyDomain}


echo VPN connection returned: %ERRORLEVEL%

rem Errorlevel 800 == failed to connect.
echo Exiting on error %ERRORLEVEL%...
goto EOF

echo Pause for 3 seconds until the VPN settles down..
ping -w 1000 -r 2 2>NUL: >NUL:

echo Setting time
net stop W32Time
net start W32Time
w32tm /config /syncfromflags:manual /manualpeerlist:%NTPSVR%
w32tm /config /update
w32tm /register
ping -w 1000 -r 2 2>NUL: >NUL:
echo Using NTP server:
w32tm /query /source /verbose

echo "Mapping to a domain controller"
net use \\\ipc$ /user:%DOMAIN%\linder %PASSWORD% %PERSISTENT%

echo "Mapping to ex2k3"
net use \\\ipc$ /user:%DOMAIN%\linder %PASSWORD% %PERSISTENT%

echo Mapping M...
net use M: /delete
net use M: \\\dfs\data /user:%DOMAIN%\%username% %PERSISTENT% %PASSWORD% 

echo Mapping N...
net use N: /delete
net use N: \\\dfs\home /user:%DOMAIN%\%username% %PERSISTENT% %PASSWORD%

echo Mapping O...
net use O: /delete
net use O: \\\dfs\apps /user:%DOMAIN%\%username% %PERSISTENT% %PASSWORD%

echo Mapping P...
net use P: /delete
net use P: \\\dfs\tools /user:%DOMAIN%\%username% %PERSISTENT% %PASSWORD%

start "Outlook" "c:\Program Files (x86)\Microsoft Office\Office12\OUTLOOK.EXE" /recycle

cd "C:\Program Files (x86)\Avaya\Avaya one-X Communicator\"
start "SoftPhone" "C:\Program Files (x86)\Avaya\Avaya one-X Communicator\onexcui.exe"

cd "%HomePath%"


Before you can run this on your system, you'll need to configure a couple of things:

  • Configure the VPN and name it "MyCompanyVPN"
    • For configuring the VPN under Windows 7, go to the "Network and Sharing Center" and click on the "Set up a new connection or network.
    • Choose "Connect to a workplace", and choose "No, create a new connetion".
    • Choose the "Use my Internet connection", then fill in the remainder of the windows with information provided by your IT department.
    • The "Destination Name" field needs to match the value in line 3.
  • Run the batch file as an administrator.
    • To run the batch file as an administrator, open a command prompt as an administrator user by right-clicking on the "Command Prompt" under the Accessories folder, then choosing "Run as administrator".
    • It will open up in the "C:\Windows\system32\" directory - place the SetMappings.bat file here.

Now to explain the script in detail:

Lines 3-9 : The script begins with defining the values used throughout the script so we only have to do it once. Some people might consider the password stored in the batch file a security risk. I agree, but the batch file is stored on my laptop which is never out of my posession, has a password to login, is firewalled, and is not sharing anything with the outside world. No, it's not impervious but the stored password is the method I chose. If you don't like it, you are free to modify the code to prompt for the password each tiem it is run.

Lines 11-18 : This tells the system to use "rasdial" (Explaination here) to connect to the named VPN connection setup supplying the name/password for the connection. If the "rasdial" exits with error code 800, then jump to the end of the script (line 86) and exit.

Line 21 : Assuming IP address does NOT exist, this ping command will pause for three seconds. The "-r 2" says to retry two times after the initial ping fails to receive a reply after 1000 milliseconds

Lines 23-31 : This section makes sure my laptop clock is in sync with a good known source. Since the servers at work use an internal NTP server, I make sure my laptop is synchronized with that same server. This ensures that timestamps on e-mails and meeting invitations are consistent.

Lines 33-57 : My company uses many network drives to store data and other shared resources. There is a small compiled VisualBasic script that was provided, but it had issues whenever I ran it so I re-implemented the drive mappings using the "net use" command. Theoretically the mapping of the "IPC$" share (lines 34 and 37) should ensure my login credentials are shared throughout the servers in the domain. Additionaly, I map the required drive letters manually each time to ensure they are available.

Line 59 : I live most of my day responding to e-mails that customers and co-workers send me. This line runs Microsoft Outlook for me, the "/recycle" option should ensure that only a single copy of Outlook is started (just in case I had to re-run the script and Outlook was already running).

Line 62 : Since I am remote, I also use a softphone so I can talk with customers and co-workers. This line starts it for me - it prefers to start after the system is completely up and the VPN is stable, running this last helps ensure that.

Brother, can you scan a dime?

I really like my Brother printers.  I've had only two Brother printers in the past 6+ years, but that's a fraction of the number of HP DeskJet printers I've had in the 4-5 years previous.  For my purposes, they are rock solid and a great bang-for-the-buck especially when it comes to the consumables (i.e. ink or toner).

Last year I started working from home and needed a FAX of some sort.  The Brother MFC-295CN was only $60 on Amazon, and was everything I needed:

  • Printer
  • Scanner
  • FAX
  • Network attached

My last HP laserjet cost around $100, and was not network attached (the HP JetDirect box was over $100 separately).  The Brother is a great little device for an all-in-one.

And the best thing??  Brother is very Linux friendly - a big plus in my book.

Right after I got it a year ago (2010), I jumped through some hoops to get the Scan-to-PC feature to work but it was possible.  Recently, I re-installed my workstation with Ubuntu 11.10 and had to re-setup the configuration.

To my plesant suprise, this was extremely easy to setup.  Here's what I did:

Here are my steps that I took to install the Brother scanning software under Ubuntu 11.10.

  1. Download the drivers from the Brother website.
  2. Install the driver:
    • sudo dpkg -i /home/dan/Downloads/brscan3-0.2.11-4.amd64.deb
  3. Configure the driver:
    • brsaneconfig3  -a  name=BrotherScanner model=MFC-295CN ip=
    • (My printer is at
  4. Try a test scan:
    1. For this I used "Simple Scan": Applications -> Graphics -> Simple scan.
    2. I placed a document on the scanner, the clicked the "Scan" button.
    3. After ~20 seconds of thinking, the scanner started working and the document appeared on my screen.

I almost feel silly writing this down - it was the same series of steps I would have had to go through if this was an "easy Windows install".  It was very easy to configure.  Way to go Brother!



Fork()s for Thanksgiving

A long time ago I inherited some Perl code to take care of.  It's good code, it doesn't stay out late, cleans up after itself most of the time, and does things pretty well without a lot of fanfare.  Unfortunatly, that was also a big drawback.

The code was originally written when a "big system" was one with 512MB RAM and two CPUs (note not "two cores" - we're talking two big CPU chips on the motherboard).  This code would process incoming data one at a time, storing the data back to a directory structure on the disk.  The incoming data was all contained in a single archive file, and the resulting data stored to disk was mostly self contained too.  As time went by, the code never had to speed up, thankfully Moore's Law kept increasing the speed of the CPUs as the size and volume of incoming data increased.  That was, until a customer upgraded to a new quad-core CPU with gobs of RAM and lots of fast hard drives.  Everything about the new system was easily eight times the old system; four cores running at 1.2GHz, up from 600MHz, 16 GB RAM, up from 2GB, and a fast SAN drive for the data and not the old SCSI internal disks.  Yes, it was a big upgrade in all the key areas except one: the old code.

Don't get me wrong, the old code ran perfect on the new system - but the rate it was processing the incoming data was barely twice the old system.  And with the projected growth of the data center, they were expecting to increase the incoming data by ten times over the next 18 months.  Someting had to be done - Moore had got us this far, but it was time for a change.

As I described earlier, the data is really a best case scenario for parallel processing.  In only one function does the data from one incoming set ever need to interact with pre-existing data.  I set about learning as much as I could about the fork() subroutine in Perl.

For those who haven't dealt with it, fork() allows your program to clone itself in memory and have two running copies.  It's really easy to create a "fork bomb" to bring down some systems if the programer isn't careful.  Just remember that right after the fork() subroutine is called, there are two identical copies of your code running in memory.  It's up to the programmer to add code right after the fork() to ensure that each copy knows what it's role is.

For anyone wanting some boilerplate fork() code to play with, here's a bit of code that I wrote to demonstrate this:

use strict;
my @array = qw(AA BB CC DD EE FF GG);
my $sleep = 10;
my %children;

for my $A (0..scalar(@array)-1) {
        my $pid = fork();
        if ($pid) {
                # parent
        } elsif ($pid == 0) {
                # child
                my $X = $sleep*rand();
                my $now = localtime();
                printf "$now Executing %s for %5.3f seconds.\n",@array[$A],$X;
                sleep $X;
        } else {
                die "couldn't fork: $!\n";

my $exited;
while (($exited = wait()) && ($exited > 0 )) {
        my $now = localtime();
        printf "$now EXITED: $exited(%s)", $children{$exited};
        delete $children{$exited};
        if (scalar %children > 0 ) {
                printf ", waiting for";
                foreach my $B (sort keys(%children)) {
                        printf ": %5i(%s) ",$B, $children{$B};
        printf "\n";

Line 8 is where the magic begins.  The call to fork() clones the program in memory making a parent and child.  The parent copy has the $pid set to the process ID of the child that was just forked, and the child has $pid set to zero.  In this example, the parent saves the childs PID in an array for later reference and goes through the for-loop (line 7) until it's forked a child for each element in the array named @array.

Each child does his own thing - he prints that it is going to execute (sleep) for a few seconds, sleeps, then he dies.

While the children are sleeping, the parent watches over them in the while loop at line 25.  It's kinda morbid, but the parent uses the wait() call to signal when a child exits (dies), then it prints what it knew about the child process, and also prints the list of children it's still waiting for.

When that code is executed, it will produce output something like this:

$ perl
Mon Apr 20 22:36:25 2009 Executing AA for 0.356 seconds.
Mon Apr 20 22:36:25 2009 Executing BB for 9.797 seconds.
Mon Apr 20 22:36:25 2009 Executing CC for 4.411 seconds.
Mon Apr 20 22:36:25 2009 Executing DD for 7.816 seconds.
Mon Apr 20 22:36:25 2009 Executing EE for 5.170 seconds.
Mon Apr 20 22:36:25 2009 Executing FF for 8.632 seconds.
Mon Apr 20 22:36:25 2009 Executing GG for 6.502 seconds.
Mon Apr 20 22:36:25 2009 EXITED: 12343(AA), waiting for: 12344(BB) : 12345(CC) : 12346(DD) : 12347(EE) : 12348(FF) : 12349(GG)
Mon Apr 20 22:36:29 2009 EXITED: 12345(CC), waiting for: 12344(BB) : 12346(DD) : 12347(EE) : 12348(FF) : 12349(GG)
Mon Apr 20 22:36:30 2009 EXITED: 12347(EE), waiting for: 12344(BB) : 12346(DD) : 12348(FF) : 12349(GG)
Mon Apr 20 22:36:31 2009 EXITED: 12349(GG), waiting for: 12344(BB) : 12346(DD) : 12348(FF)
Mon Apr 20 22:36:32 2009 EXITED: 12346(DD), waiting for: 12344(BB) : 12348(FF)
Mon Apr 20 22:36:33 2009 EXITED: 12348(FF), waiting for: 12344(BB)
Mon Apr 20 22:36:34 2009 EXITED: 12344(BB)

 In my situation, the fork() loop (lines 7..22) were a bit more involved.  I added code to limit the number of child processes it would fork at a time - basically keeping track of the active number of children and using wait() when it reached the threshold but still had more to process.

The data also had some situations where new incoming data was added to existing data.  If two children are processing data that has to update the data on the disk, it's entirely possible for both children to read the data at the same time, add their own bit of data to the mix, and write the file back to disk.  The end result would be that one child would end up writing last and overwriting the other childs data.  That lead to the use of the flock() command - which has its own quirks and deserves its own space.

So, when you're sitting down to your Thanksgiving dinner remember how usefull the fork() is!

"...we now return you to your regular show, already in progress."

Excuse the interruption - yes, I know this is a blank blog so far.

I'm going to be posting a new tech related blog post here each week. Some weeks it will be about a handy website I found, other times it may be a programming snippet, other times it will be tech related but in non-technical fields.

I hope you'll subscribe to the RSS feed and keep coming back.

If you have a tech topic you'd like to have me look into, leave a comment. I'll geek out about most anything.