The problem is this: You have a set of log files. You will have to scan the log files periodically and retrieve only new messages. You are allowed to put a marker in it. The log files switch frequently. Here is what is can be done:
- Sort the files according to the modification time.
- While there are more files, do:
- Put a marker in the syslog file.
- Scan for the number of markers.
- If the number of markers scanned is zero and the number of markers found so far is also zero, read all the messages in the file.
- If the number of markers scanned is 1 and the number of markers scanned so far is zero, it means that this is the first marker being encountered, so read all the messages from the marker to the beginning of the file.
- If the number of markers is scanned is 1 and the number of markers scanned so far is also 1, it means that we are actually reading the messages from a log file which has already been read to some extent. Read the messages from the end of the file to the marker and return. No need to scan the files further.
- If the number of markers scanned is greater than or equal to 2, then read the messages between the last two markers and return.
- If the number of markers scanned so far is greater than or equal to 2, then return, no need to search further.
If we cannot put a marker in the file, then we can do the following:
- Have a config file where there are sets of key=value pairs. The key is the file identification (say its inode number or a checksum of the file name) and the value is the number of lines read so far from the file.
- When incrementally scanning the files, read the config files and load the key=value pairs in a hash.
- For each file:
- If the filename’s checksum is found in the hash, get the count of lines read, read the remaining lines and update the count of lines read.
- If the filename’s checksum is not found in the has, read all the lines and add an entry in the hash accordingly.
- Save the config file back to the disk.