Flogging your Files

Recent blog posts

User login

Submitted by adchen on Thu, 2007/08/02 - 22:13

Why Grep When You can Flog?

As a UNIX sysadmin I find myself spending a significant amount of time sifting through log files with grep's, pipes and more grep's. A pretty typical scenario when sifting through log files is usually something like this:


% cat syslog |  grep "Jul 27"

[...hundreds of lines...]
LINE 1234: Jul 27 03:12:19 server2 sendmail[20573]: [ID 801593 mail.info] l6R7BUF0020573: host48-184.pppoe.inetcomm.ru did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
LINE 1235: Jul 27 03:12:19 server2 sendmail[20573]: [ID 801593 mail.info] l6R7BUF0020573: host48-184.pppoe.inetcomm.ru did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
LINE 1236: Jul 27 03:12:22 server2 sendmail[20574]: [ID 801593 mail.info] l6R7BYWq020574: host48-184.pppoe.inetcomm.ru did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
[...still more lines...]

That's usually followed by progressively tacking on more greps to whittle the output down to what I'm looking for:


% cat syslog |  grep "Jul 27" | grep -v 2006 | grep sendmail | grep l6R7BUF0020573
Jul 27 03:12:19 server2 sendmail[20573]: [ID 801593 mail.info] l6R7BUF0020573: host48-184.pppoe.inetcomm.ru did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA

If you use a shell with command-history and command-line editing (like tcsh and bash), then it's not that big a deal to hit up-arrow or Ctrl-P and keep tacking on extra grep's. But it's still clunky and limited to what you can feasibly cram on the command-line.

So recently, after 15 years of being lazy, I finally got tired of doing the grep-edit-grep cycle.

I decided to write an interactive "grepping" shell, which I call flog. You can think of it as short for "filter log" although I prefer to think of it as the verb, such as floging the data out of your files. flog reads in the file(s) and then has the data ready for you to grep repeatedly without much penalty in re-reading (potentially) large files.

So now I can do this:


% flog syslog
(14,234 lines) flog> m "Jul 27" & sendmail
[...hundreds of lines...]
LINE 1234: Jul 27 03:12:19 server2 sendmail [20573]: [ID 801593 mail.info] l6R7BUF0020573: host48-184.pppoe
.inetcomm.ru did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
LINE 1235: Jul 27 03:12:19 server2 sendmail [20573]: [ID 801593 mail.info] l6R7BUF0020573: host48-184.pppoe.inetcomm.ru did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
LINE 1236: Jul 27 03:12:22 server2 sendmail [20574]: [ID 801593 mail.info] l6R7BYWq020574: host48-184.pppoe.inetcomm.ru did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
[...still more lines...]
(3 lines matched)

(14,234 lines) flog>

I threw in some old-school VT100 formatting for inverse/color highlighting, so that the strings you are searching for are clearly tagged. This is especially handy for data that tends to span many lines and/or if you're matching multiple strings.

Want to repeat the previous search? Just double-tap with "mm":


(14,234 lines) flog> mm 

[...output not shown to save space...]
LINE 1234: Jul 27 03:12:19 server2 sendmail [20573]: [ID 801593 mail.info] l6R7BUF0020573: host48-184.pppoe.inetcomm.ru did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
LINE 1235: Jul 27 03:12:19 server2 sendmail [20573]: [ID 801593 mail.info] l6R7BUF0020573: host48-184.pppoe.inetcomm.ru did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
LINE 1236:Jul 27 03:12:22 server2 sendmail [20574]: [ID 801593 mail.info] l6R7BYWq020574: host48-184.pppoe.inetcomm.ru did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
[...still more lines...]
(3 lines matched)

Let's repeat the previous search but add in another search term:


(14,234 lines) flog> mm & l6R7BUF0020573

[...output not shown to save space...]
LINE 1234: Jul 27 03:12:19 server2 sendmail [20573]: [ID 801593 mail.info] l6R7BUF0020573: host48-184.pppoe.inetcomm.ru did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA

[...still more lines...]
(3 lines matched)

How about inverse (i.e., grep -v) matching? Let's show anything that doesn't match "aol.com"


(14,234 lines) flog> m !aol.com 

[ouput not shown to save space]

Or mix and match:


(14,234 lines) flog> m sendmail & Jul & !aol.com

which matches and line with "sendmail", "Jul" but not "aol.com".

You can do case-sensitive matching ("M" vs "m"):


(14,234 lines) flog> M sendmail & Jul & !aol.com

At this point mostly we've replicated what we can already do via command line with the grep-pipe-chain. But since we have our flog environment to work in, we can make it a bit more useful.

Here we read in 2 other files and search all 3 files at the same time:


(14,234 lines) flog> read+ /var/log/syslog.0 /var/log/syslog.1.gz

/var/log/syslog.0: 13,298 lines read
/var/log/syslog.1.gz: 15,011 lines read

Compressed (gzip, zip, UNIX compress) are all fine (assuming you have gzip/zip installed, which most UNIX platforms have by default). No need to uncompress the files first.

Hmm, okay maybe the file syslog doesn't have what I'm looking for. Let's blow away whatever was read in before and start fresh:


(14,234 lines) flog> read /var/log/syslog.2 /var/log/syslog.3.gz

/var/log/mail.log: 952 lines read
/var/log/mail.log.0.gz: 1255 lines read

(2,207 lines) flog>

So it's "read+" (shortcut: r+) to read in additional data, and just plain "read" (shortcut: r) to replace the existing data with new data.

Hmm, what if some new data came in while we were flogging? Just "reread" (shortcut: rr) any of the files:


(14,234 lines) flog> reread /var/log/syslog
5,203 lines unread from /var/log/system.log
/var/log/system.log: 5,344 lines read

(14,375 lines) flog>

During a "reread" flog will purge any lines previously read from the targeted file(s) and reread the entire file again.

Can't remember what files you've read in so far? Just type "files" (shortcut: f):


(14,234 lines) flog> files
Files being flogged:
  /var/log/syslog : 5,344 lines,  [last read: Tue Jul  31 22:52:30 2007]
  /var/log/system.log.3.gz : 9031 lines,  [last read: Tue Jul  31 22:49:33 2007]

(14,234 lines) flog>

Let's say we don't want to search for anything and just want to view the file. For this flog sticks to the UNIX commands we already know:


(2,207 lines) flog> ) flog> cat

[...output not shown to save space...]

Woops, those 2,2007 lines scrolled past the screen way too fast, so let's paginate the output:

>
(2,207 lines) flog> more

Aug  3 19:47:20 server2 sendmail[18380]: [ID 801593 mail.info] l73NkSTW018380: from=<2.7189.32333137313035363133.b@e.delta.com>, size=17570, class=0, nrcpts=1, msgid=<11223641.1186184893903.JavaMail.root@10.64.19.133>, proto=ESMTP, daemon=MTA, relay=dragonfire2.delta.com [205.174.22.21]
Aug  3 19:47:20 server2 sendmail[18383]: [ID 801593 mail.info] l73NkSTW018380: to=adchen, delay=00:00:00, xdelay=00:00:00, mailer=local, pri=47803, dsn=2.0.0, stat=Sent
Aug  3 19:47:32 server2 sendmail[18382]: [ID 702911 mail.notice] ruleset=check_relay, arg1=[86.69.217.163], arg2=127.0.0.10, relay=163.217.69-86.rev.gaoland.net [86.69.217.163] (may be forged), reject=550 5.7.1 Mail from 86.69.217.163 rejected, see dul.dnsbl.sorbs.net
--More--

Okay, just show me lines 100-150:


(2,207 lines) flog> 100-150

[...50 lines of output not shown to save space...]

Hmm, all right, don't include output from a certain file anymore, using "unread" (shortcut: ur):


(2,207 lines) flog> unread /var/log/mail.log

Another option is to read in the output of a command and filter on that instead:


% flog - niscat passwd.org_dir
Command "niscat passwd.org_dir": 210 lines read

(210 lines) flog>

You can mix and match command output and files. You can "read+" in one or more files and grep through both the output and the files at the same time:


(210 lines) flog> read+ /etc/passwd /etc/shadow

/etc/passwd:  128 lines read

flog also caches your most recent search string, so if you accidentally exited flog you can just do "mm" to repeat your previous grep. The caching is disabled for the root user (as I didn't want other root users to have overlapping flog caches).

Anyway, that's a brief overview of flog. The examples above are somewhat contrived, but I've been using flog a while now during my work as a sysadmin, and while still not perfect it's been a great time-saver.

I tried to make the output as readable as possible. For output with lines that are longer than the terminal width, I add an extra blank line between data lines.

Note that flog isn't meant to scale all that much. This is just a quick-n-dirty program. If you read in 10 million lines of data, don't expect it to be as fast as a database.

I hope flog is of use to others. I have other features in mind, such as true command-history scrolling (ala the up-arrow in tcsh/bash), but I don't know how much more time I'd save vs. the coding time. If you make any enhancements to flog I'd love to hear about it and possibly incorporate into the main code. Thanks and happy flogging. Update: Version 0.6.8 is attached.

Attachment	Size
flog.pl.gz	7.06 KB

adchen's blog
Login to post comments

adchen.com

Articles

Recent blog posts

Navigation

User login

Flogging your Files

Why Grep When You can Flog?

Feeds

My Stuff

ArsTechnica

Wired: Gadgets