13 Apr 2008
Quick and Dirty Perl – #001

Recently a coworker requested a quick and dirty way to extract a list of successfully delivered email addresses from a qmail log file. A message had been sent to several hundred recipients at example.com. Unfortunately, one if the domain’s two MX hosts would permanently reject messages, while the other would defer some messages because it was too busy. Most of the deferrals were accepted after several retries, but others timed out of the queue.

I wanted to leave my coworker with a simple script that could be easily modified for similar processing in the future, so this script is not as compact as it could be. I built up the script much as described here, while explaining what each addition would accomplish, and alternatives for future modification.

Keep in mind that this is a one-off, quick and dirty solution, meant to be iteratively hacked whenever it may be needed in the future, so it is not as clean and robust and I would make a production script.

First, the typical loop-through-a-file skeleton:

  1. #!/usr/bin/perl
  2.  
  3. while (<>) {
  4.  
  5.     # do stuff per-line here
  6.  
  7. }
  8.  
  9. __END__

While there are a variety of entry types in the qmail log, the only entries of interest look like this:

  1. TIMESTAMP starting delivery DDDDD: message MMMMM to remote ADDRESS
  2. TIMESTAMP delivery DDDDD: STATUS: DETAILS

where STATUS is typically “success”, “deferral”, or “failure”.

Only the “starting delivery” entries contain the email address. The varying number of intermediate and final status entries for delivery DDDDD are interspersed with other entries in the log.

The regular expression “/ delivery \d+:/” selects precisely the entries we’re intrested in, verified by printing them out (remember, this was an exploratory session):

  1.     next unless m/ delivery \d+:/;
  2.     print $_;

At this point, I’m only interested in the delivery ID, so I add parentheses to capture the ID number, and save it in a variable for future use:

  1.     next unless m/ delivery (\d+):/;
  2.     $id = $1;

Because I want to get all the messages for each delivery entry, I’ll use a hash, keyed by the ID, to accumulate the messages:

  1. $msg{$id} .= $_;

Here’s the complete processing loop for the log file:

  1. while (<>) {
  2.  
  3.     next unless m/ delivery (\d+):/;
  4.     $id = $1;
  5.  
  6.     $msg{$id} .= $_:
  7.  
  8. }

I now have a hash, keyed on delivery ID, for every distinct delivery processed. It’s time for another loop. First, a sanity check by just displaying the entries:

  1. foreach $k (keys %msg) {
  2.  
  3.     print "===== $k =====\n";
  4.     print $msg{$k};
  5.  
  6. }

Which gives us output similar to

  1. ===== 12345 =====
  2. TIMESTAMP starting delivery 12345: message MMMMM to remote ADDRESS
  3. TIMESTAMP delivery 12345: deferral: …
  4. TIMESTAMP delivery 12345: success: …
  5. ===== 23456 ======
  6. TIMESTAMP starting delivery 23456: message MMMMM to remote ADDRESS
  7. TIMESTAMP delivery 23456: failure: ….

However, I’m only interested in messages to example.com, so let’s get selective

  1. foreach $k (keys %msg) {
  2.     if ($msg{$k} =~ m/@example\.com/m ) {
  3.         # do something
  4.     }
  5. }

Note the “m” modifier on the regular expression — this causes the match to work across multiple newlines in the string. I’ll use this on all future matches because I don’t know how many entries occur, or their order, in the log file.

Now that I’ve the right destination, let’s look for just the successful deliveries. I could include this in the same if statement, but for the sake of clarity and future hacks, I’ll use a nested if:

  1. foreach
  2.     if … example.com …
  3.         if ( $msg{$k} =~ m/success:/m ) {
  4.             …
  5.         }
  6.     }
  7. }

Now that I have a successful delivery, I need to extract the email address from the initial entry. I’m assuming that the starting entry exists; a more robust script would complain if it did not, and would be more careful in matching the address part.

  1. $msg{$k} =~ m/to remote (\S+)$/m;

The “(\S+)$” captures non-whitespace text up to the first newline.

Here is the complete script, which outputs the email addresses to standard output and a count to standard error:

  1. #!/usr/bin/perl
  2.  
  3. while (<>) {
  4.  
  5.     next unless m/ delivery (\d+):/;
  6.     $id = $1;
  7.  
  8.     $msg{$k} .= $_;
  9.  
  10. }
  11.  
  12. $cnt = 0;
  13. foreach $k (keys %msg) {
  14.  
  15.     if ( $msg{$k} =~ m/@example\.com/m ) {
  16.  
  17.         if ( $msg{$k} =~ m/success:/m ) {
  18.  
  19.             $msg{$k} =~ m/to remote (\S+)$/m;
  20.             print $1, "\n";
  21.             $cnt++;
  22.  
  23.         }
  24.     }
  25. }
  26.  
  27. print STDERR "\n$cnt addresses\n";
  28.  
  29. __END__
Category: programming
Tags: , , ,
(comments closed) | (trackbacks closed) | Permalink | Subscribe to comments |

Site last updated 2015-01-12 @ 13:31:07; This content last updated 2008-04-13 @ 07:25:03