Mass search logs from S3

By | June 28, 2019

Nowadays, most companies store their log files in S3 to keep them indefinitely or to remove them as time goes on using Life Cycle policies. The S3 interface isn’t optimal for downloading and analyzing these logs, however you can easily do so with the AWS CLI, gzcat, and grep.

The first thing you’ll want to do is download all of the files from S3 to your local storage. Most log files are compressed using GZip. A sample command to do so is:

aws s3 cp s3://mylogbucketname.logs/cloudflare/websitename/2019-06-18/ . --recursive --include '*.gz' --profile AWSPROFILE1

Once the files have been downloaded, you can then get a count of all of them matching a specific value using the following command. You don’t even need to decompress them thanks to gzcat.

gzcat *.gz | grep "/?cid=" | grep subdomain.websitename.com  | grep SomeImportantString | grep 403 | wc -l

The above command searches for multiple string values including the response code of 403 then outputs the count. You could also write these results to a file using.

gzcat *.gz | grep "/?cid=" | grep subdomain.websitename.com  | grep SomeImportantString | grep 403 > datafile.txt

And that’s all there is to it. Hopefully you have some sort of tool used for searching logs, but if you don’t have Data Dog, Splunk, Kibana, etc., then text searching is the next best thing.