15 May 2012

Working with log files in the /var/log/ directory Part 1


The point of this tutorial is to give a better understanding of how to look at the log files in your /var/log directory. Using the built-in command line tools available, if you are not using some special purpose monitoring suit. This tool is in the intermediate level, all though those with lost of *nix experiance will find certain sections tedious as I describe what I typed in detail, for novice first time users. This is not how ever a tutorial on vim or any of the other tools.

I will be looking at apache logs and dmesg or syslog files. But with a good understanding of these tools you can adapt them to other log files.

The basic command you should be familiar with are:
vim a text editor
cat a tool to out put the contents of a text file. 
sed a stream editor
grep a text search tool
cut a tool to cut words or characters from lines of text
head a tool to out put the first n lines of a text file. 
tail a tool to out put the last n lines of a text file.
less a tool to out put the contents of a text file.
sort tool to sort lines of text
uniq tool to find unique lines of from sorted text
tee tool to out put text passing through a pipe
wc count the lines, characters and bytes in file. 

Some other tools that would be worth looking at are:
ssh
bash especially how redirects and pipes work. "> < | ".

At some point you need to go and read the man pages on all those command, and applications.

Part 1 Using Vim

Why use vim, because you can use some very power full text search and replace filters to get some useful info from you log files. The other tools allow you to do the same thing in a much more automated way.

Counting occurrences of a particular line or error message. One of the most useful things you can do is be able to get some stats like a counts of a particular error message.

In vim we use a very similar syntax to what we use in sed.
First we open the log file in a read only mode. You could use view, but in some environments view is compiled with out certain options. Which makes using the visual selection commands impossible.

vim -R /var/log/syslog

Next I looked for some text that I want to get a count of. "local kernel:" seamed to stand out. I entered the visual mode using the "v" key and selected the text I want to search for "local kernel". I yanked the text with "y", which is similar to copying. Next I stared the global search tool by typing in ":g/" now I want to past the text I just copied so I use `ctlr+r+"` to past the text I just copied into the command.

g/local kernel:/

After I pressed enter, global brings up a list of all occurrences of the provided text "local kernel:". It was a long list. To see how long that list is I can use the substitute command. Global allows you to add the substitute command to the end of your search. So I entered the command ":" mode again and pressed the up arrow to get to the previous command and add "s///n" at the end. Which will give me a count for my search string.

g/local kernel:/s///n 
47918 substitutions on 47918 lines 

Now that I know how many times that lines with local kernel is found in the file, I want to remove those lines to see what else might be needing my attention. Again I enter command mode ":" and press the up arrow. I then change my global command to look some thing link this.

:g/local kernel:/d

W10: Warning: Changing a readonly file
47918 fewer lines


That warning about being read only can be ignored, because that is how I want this process to work. I don't want to remove vital information from a production log file. It would probably be a good Idea to copy these files some where else before doing what I am doing but with some 4 GB monster log files this may not be an option. Repeat this process until you are left with and empty log file or unique strings. 

This is process is useful to see how many times certain errors are cropping up. Some times it is easier to focus on fixing on problem that  accounts for 90% of the errors. But using vim in this way gets tiresome very quickly. The faster and more automated way of doing this is to use your command line and eventually to script the processes. 

Part 2 Using just the command line

The tool that makes command line processing of log files to get stats about the number of error messages is uniq. The problem with most log files is that unique on every line because of date information and process information.  So we need a way to get around this problem.

For starters I generate some 404 page not found errors in my apache log file.


 wc /var/log/apache2/error.log
  28  386 3421 /var/log/apache2/error.log

 wc /var/log/apache2/error.log
  81 1075 8863 /var/log/apache2/error.log


wc gives me some stats on the file like how many lines are in it how many charters and lastly how big the file is in bytes. So when I try to run the file through uniq it comes out  the same size. 


uniq /var/log/apache2/error.log | wc
     81    1075    8863


uniq needs the files sorted inorder to be able to give us the count of duplicate lines in a file or to produce a list of only those lines that are uniq. So just buy sorting the file I am already removing some of the duplicates. Now if I want to get some stats on the duplicates I can by telling "uniq -c" to count lines and "uniq -d" to only show duplicates. Read the man page for more info. 


sort /var/log/apache2/error.log | uniq | wc
     76    1010    8358


sort /var/log/apache2/error.log | uniq -dc | wc
      5      70     545


sort /var/log/apache2/error.log | uniq -dc 
      2 [Tue May 15 09:54:10 2012] [error] [client 127.0.0.1] File does not exist: /var/www/html/favicon.ico
      2 [Tue May 15 09:54:13 2012] [error] [client 127.0.0.1] File does not exist: /var/www/html/favicon.ico
      2 [Tue May 15 09:54:14 2012] [error] [client 127.0.0.1] File does not exist: /var/www/html/favicon.ico
      2 [Tue May 15 09:54:23 2012] [error] [client 127.0.0.1] File does not exist: /var/www/html/favicon.ico
      2 [Tue May 15 09:54:27 2012] [error] [client 127.0.0.1] File does not exist: /var/www/html/favicon.ico

As  you can see favicon.ico is not found, but because of the date it is not producing an accurate count of that "File does not exist error". I need some way to remove that information from file before sending it to uniq.

The first and simplest way to do that is with the "cut" command.  Using the magic of pipes I am going to use head and cut together to so I don't get my terminal scrolling while working out what parameters cut needs. "head -n 2" tells head to only show the first 2 lines.  `cut -d" " -f2-` tells cut to use the delimiter of a single " " space and to cut from the 2nd space "-" "-f2-" to the end of the line.


head -n 2 /var/log/apache2/error.log 
[Mon May 14 09:36:07 2012] [error] python_init: Python version mismatch, expected '2.7.2+', found '2.7.3'.
[Mon May 14 09:36:07 2012] [error] python_init: Python executable found '/usr/bin/python'.


head -n 2 /var/log/apache2/error.log | cut -d" " -f2-
May 14 09:36:07 2012] [error] python_init: Python version mismatch, expected '2.7.2+', found '2.7.3'.
May 14 09:36:07 2012] [error] python_init: Python executable found '/usr/bin/python'.


head -n 2 /var/log/apache2/error.log | cut -d" " -f5-
2012] [error] python_init: Python version mismatch, expected '2.7.2+', found '2.7.3'.
2012] [error] python_init: Python executable found '/usr/bin/python'.


head -n 2 /var/log/apache2/error.log | cut -d" " -f6-
[error] python_init: Python version mismatch, expected '2.7.2+', found '2.7.3'.
[error] python_init: Python executable found '/usr/bin/python'.

Now there is none of the date string left in the file. I can sort it and try uniq again. The only problem is that even though I have the counts of each error, it is not showing with the most common first or last. I could also remove those lines that only occur once using the "-d" duplicates only option on uniq.



cut -d" " -f6- /var/log/apache2/error.log | sort | uniq -c | wc
     14     130    1273


cut -d" " -f6- /var/log/apache2/error.log | sort | uniq -c
      6 [error] [client 127.0.0.1] File does not exist: /usr/share/doc/apache2-doc/manual/en/index.html21
      5 [error] [client 127.0.0.1] File does not exist: /var/www/html/conf/12
     29 [error] [client 127.0.0.1] File does not exist: /var/www/html/favicon.ico
      6 [error] [client 127.0.0.1] File does not exist: /var/www/html/isos.html12
      7 [error] [client 127.0.0.1] File does not exist: /var/www/html/tmp/12
      4 [error] python_init: Python executable found '/usr/bin/python'.
      4 [error] python_init: Python path being used '/usr/lib/python2.7/:/usr/lib/python2.7/plat-linux2:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-old:/usr/lib/python2.7/lib-dynload'.
      4 [error] python_init: Python version mismatch, expected '2.7.2+', found '2.7.3'.
      4 [notice] Apache/2.2.22 (Ubuntu) PHP/5.3.10-1ubuntu3.1 with Suhosin-Patch mod_python/3.3.1 Python/2.7.3 configured -- resuming normal operations
      2 [notice] caught SIGTERM, shutting down
      1 [notice] Graceful restart requested, doing restart
      4 [notice] mod_python: Creating 8 session mutexes based on 150 max processes and 0 max threads.
      4 [notice] mod_python: using mutex_directory /tmp 



cut -d" " -f6- /var/log/apache2/error.log | sort | uniq -dc | sort -n
      2 [notice] caught SIGTERM, shutting down
      4 [error] python_init: Python executable found '/usr/bin/python'.
      4 [error] python_init: Python path being used '/usr/lib/python2.7/:/usr/lib/python2.7/plat-linux2:/usr/lib/python2.7/lib-tk:/usr/lib/python2.7/lib-old:/usr/lib/python2.7/lib-dynload'.
      4 [error] python_init: Python version mismatch, expected '2.7.2+', found '2.7.3'.
      4 [notice] Apache/2.2.22 (Ubuntu) PHP/5.3.10-1ubuntu3.1 with Suhosin-Patch mod_python/3.3.1 Python/2.7.3 configured -- resuming normal operations
      4 [notice] mod_python: Creating 8 session mutexes based on 150 max processes and 0 max threads.
      4 [notice] mod_python: using mutex_directory /tmp 
      5 [error] [client 127.0.0.1] File does not exist: /var/www/html/conf/12
      6 [error] [client 127.0.0.1] File does not exist: /usr/share/doc/apache2-doc/manual/en/index.html21
      6 [error] [client 127.0.0.1] File does not exist: /var/www/html/isos.html12
      7 [error] [client 127.0.0.1] File does not exist: /var/www/html/tmp/12
     29 [error] [client 127.0.0.1] File does not exist: /var/www/html/favicon.ico

That is now very useful. I can still take it further buy removing the information I don't want more selectively using sed. Sed is a very advanced tool, it will take a while to get the hang of it, as learning regular expressions is learning a new language. There are lots of really good tutorials on sed that can be found through google. 

In my case I want to remove every thing inside brackets, as that information is not really useful to me. The first example only removes the first set of brackets, the second removes all brackets. The difference is the "g" at the end of the sed regex. Which tell sed to go global and use the whole line. 


sed -e "s/\[[^][]*\]//" /var/log/apache2/error.log | sort | uniq -dc | sort -n | tail -n 3

      6  [error] [client 127.0.0.1] File does not exist: /var/www/html/isos.html12
      7  [error] [client 127.0.0.1] File does not exist: /var/www/html/tmp/12
     29  [error] [client 127.0.0.1] File does not exist: /var/www/html/favicon.ico

sed -e "s/\[[^][]*\]//g" /var/log/apache2/error.log | sort | uniq -dc | sort -n | tail -n 3
      6    File does not exist: /var/www/html/isos.html12
      7    File does not exist: /var/www/html/tmp/12
     29    File does not exist: /var/www/html/favicon.ico



One thing worth noting here is that some messages in my error log are error level notice and other are error level error. Reading the Apache documentation there are a couple more like waning, and critical.
If I want to only see stats for messages with are of the level error I need to reduce the input I am sending to the whole long pipe. I will use grep  to only process those lines in the file that are containg some special text, like "[notice]" or "[error]". The problem is that with regular expressions "[]" are special characters, so I need to escape them using the "\"


grep "\[notice\]" /var/log/apache2/error.log | sed -e "s/\[[^][]*\]//g" - | sort | uniq -dc | sort -n | tail -n 3
      4   Apache/2.2.22 (Ubuntu) PHP/5.3.10-1ubuntu3.1 with Suhosin-Patch mod_python/3.3.1 Python/2.7.3 configured -- resuming normal operations
      4   mod_python: Creating 8 session mutexes based on 150 max processes and 0 max threads.
      4   mod_python: using mutex_directory /tmp 



grep "\[error\]" /var/log/apache2/error.log | sed -e "s/\[[^][]*\]//g" - | sort | uniq -dc | sort -n | tail -n 3
      6    File does not exist: /var/www/html/isos.html12
      7    File does not exist: /var/www/html/tmp/12
     29    File does not exist: /var/www/html/favicon.ico

The only thing to be a ware of the more complex the pipe become the cpu intensive the work to be done. As your text files get larger into the Giga byte range you may want to be selective in what you do and when. 

Part 3 working with scripts and over a network. 


Part 3 will deal with using these commands in scripts and over ssh to get some information on remote servers.


No comments: