Splunk Queries for Identifying Data Exfiltration

I've been working with Splunk at my job and wanted to provide some interesting queries that might assist with network data analytics for cyber security purposes. These queries are specifically targeted to identify behaviors that could be viewed as data exfiltration. The queries below can be modified for any time frame, but I've been running them with data from the last 30 days. These are massive searches and with current limits on my allotted hard disk space, I'm thinking about lowering the time frame to two or three weeks. You can also use the "table" search command to specify what you'd like to see as output. For each query, I have included what I like to see for output.

Users with a Large Increase in Web Traffic Moving out of the Network. The query below will output the user, the time, the source IP, the aggregated bytes sent out, the number of data samples, the number of standard deviations away from the source's average bytes sent per day, and the number of standard deviations away from the organization's average bytes sent per day. It will show output when the bytes out is 3 standard deviations above the source's or organization's average for the latest day compared with the latest 30 days.

((tag=network tag=communicate) OR (index=pan_logs sourcetype=pan*traffic) OR (index=* sourcetype=opsec) OR (index=* sourcetype=cisco:asa) ) (src_ip=10.0.0.0/8 OR src_ip=172.16.0.0/12 OR src_ip=192.168.0.0/16) AND action=allowed AND (dest_port=80 OR dest_port=443) NOT (dest_ip=10.0.0.0/8 OR dest_ip=172.16.0.0/12 OR dest_ip=192.168.0.0/16)
| bucket _time span=1d
| stats sum(bytes*) as bytes* by user _time src_ip
| eventstats max(_time) as maxtime avg(bytes_out) as avg_bytes_out stdev(bytes_out) as stdev_bytes_out | eventstats count as num_data_samples avg(eval(if(_time < relative_time(maxtime, "@h"),bytes_out,null))) as per_source_avg_bytes_out stdev(eval(if(_time < relative_time(maxtime, "@h"),bytes_out,null))) as per_source_stdev_bytes_out by src_ip  
| where num_data_samples >=4 AND bytes_out > avg_bytes_out + 3 * stdev_bytes_out AND bytes_out > per_source_avg_bytes_out + 3 * per_source_stdev_bytes_out AND _time >= relative_time(maxtime, "@h")
| eval num_standard_deviations_away_from_org_average = round(abs(bytes_out - avg_bytes_out) / stdev_bytes_out,2), num_standard_deviations_away_from_per_source_average = round(abs(bytes_out - per_source_avg_bytes_out) / per_source_stdev_bytes_out,2)
| fields - maxtime per_source* avg* stdev*

Users with a Sudden Increase in Sending Many DNS Requests. The query below will output the user, the time, the source IP, the destination IP, the number of DNS requests, the number of data samples, the number of standard deviations away from the source's average of DNS requests per day, and the number of standard deviations away from the organization's average of DNS requests per day. It will show output when the number of DNS requests are 3 standard deviations above the source's or organization's average for the latest day compared with the latest 30 days.

index=* dest_port=53
| bucket _time span=1d
| stats count by user _time src_ip dest_ip
| eventstats max(_time) as maxtime avg(count) as avg_count stdev(count) as stdev_count | eventstats count as num_data_samples avg(eval(if(_time < relative_time(maxtime, "@h"),count,null))) as per_source_avg_count stdev(eval(if(_time < relative_time(maxtime, "@h"),count,null))) as per_source_stdev_count by src_ip  
| where num_data_samples >=4 AND count > avg_count + 3 * stdev_count AND count > per_source_avg_count + 3 * per_source_stdev_count AND _time >= relative_time(maxtime, "@h")
| eval num_standard_deviations_away_from_org_average = round(abs(count - avg_count) / stdev_count,2), num_standard_deviations_away_from_per_source_average = round(abs(count - per_source_avg_count) / per_source_stdev_count,2)
| fields - maxtime per_source* avg* stdev*

Users with a Sudden Increase in Non-Corporate Emails Sent. The query below will output the email sender, the count of emails sent within the last day, the per day average emails sent over the last 30 days, and the lower and upper bounds of 3 standard deviations from the average emails count. The results will populate when the count is outside of the 3 standard deviations from the average.

(index=* sourcetype=cisco:esa* OR sourcetype=MSExchange*:MessageTracking OR tag=email) cef_signature=Message (from=*include_part_of_email_domain_here*) AND (from!=*Brocade* OR from!=*Storage_Alerts*) NOT (to=*include_part_of_email_domain_here*)
| bucket _time span=1d
| stats count by from, _time
| eval maxtime=now() | stats count as num_data_samples max(eval(if(_time >= relative_time(maxtime, "-1d@h"), 'count',null))) as "count" avg(eval(if(_time<relative_time(maxtime,"-1d@h"),'count',null))) as avg stdev(eval(if(_time<relative_time(maxtime,"-1d@h"),'count',null))) as stdev by "from"
| eval lowerBound=(avg-stdev*6), upperBound=(avg+stdev*6)
| eval isOutlier=if(('count' < lowerBound OR 'count' > upperBound) AND num_data_samples >=7, 1, 0) | where isOutlier=1 AND count>10 AND count>upperBound | table from, num_data_samples, count, avg, stdev, upperBound

Users Suddenly Sending Excessive Email. The query below will output the email sender, the count of emails sent within the last day, the per day average emails sent over the last 30 days, and the lower and upper bounds of 3 standard deviations from the average emails count. The results will populate when the count is outside of the 3 standard deviations from the average.

(index=* sourcetype=cisco:esa* OR sourcetype=MSExchange*:MessageTracking OR tag=email) cef_signature=Message (from=*
include_part_of_email_domain_here* ) | bucket _time span=1d
| stats count by from, _time
| eval maxtime=now() | stats count as num_data_samples max(eval(if(_time >= relative_time(maxtime, "-1d@h"), 'count',null))) as "count" avg(eval(if(_time<relative_time(maxtime,"-1d@h"),'count',null))) as avg stdev(eval(if(_time<relative_time(maxtime,"-1d@h"),'count',null))) as stdev by "from"
| eval lowerBound=(avg-stdev*6), upperBound=(avg+stdev*6)
| eval isOutlier=if(('count' < lowerBound OR 'count' > upperBound) AND num_data_samples >=7, 1, 0) | where isOutlier=1 AND count>10 AND count>upperBound | table from, num_data_samples, count, avg, stdev, upperBound