Skip to content

Hash Specification

Karthik Kumar Viswanathan edited this page Apr 6, 2021 · 13 revisions

This document describes how we currently hash data and harden it using Chainkit. It describes the current differences, and how we intend to unify it in all integrations going forward.

Table of Contents

Logstash to ELK Messages (Individual)

{ 
   @timestamp: 2020-10-16T01:24:33.083Z
   @version: 1
   assetId: 6265766886241071153
   hash: 2a03d663eaa6f6556bb5a2ac34b7433beaac6c7d2e982bd5bcd9941eef44dd0f
   host: localhost
   message: hello world
   port: 60733
   uuid: c19019fe-af24-456c-bd64-639898aa766b
   verified: true
}

Splunk to Splunk Messages (Batched)

{ 
   _time: 2020-10-13T23:07:44.977137-0700
   assetId: 6288434214814227861
   earliest_time: 2020-10-13T22:45:00
   export_logs: no_space
   hash: a69f52a6075f6cb405d81f93764786882a5f81f8332f2ba1b520b5328d2489c1
   input_source: svr_reg01
   latest_time: 2020-10-13T22:55:00
   query: search index="servers"
   running_script: 2020-10-13T23:07:44.977137-0700
   verified: true
}

Description of Message Fields

Field Name Field example Field description
@timestamp (ELK) _time (Splunk) 2020-10-16T01:24:33.083Z This is when data is sent
@version (ELK) 1 This is our document version for ELK
assetId (ELK/Splunk) 6265766886241071153 A Unique Id is generated by Register API
hash (ELK/Splunk) 2a03d663eaa6f6556bb5a2ac34b7433beaac6c7d2e982bd5bcd9941eef44dd0f Sha256 Single message (ELK): We hash all attributes Multiple messages (Splunk): It depends on the mode. Description below.
host (ELK) localhost
message (ELK) hello world Raw message sent by Users
port (ELK) 60733
uuid (ELK) c19019fe-af24-456c-bd64-639898aa766b Stamping the message before Register. Prevent duplicate messages.
verified (ELK/Splunk) true A result of tampered detection
earliest_time (Splunk) 2020-10-13T22:45:00 Time range from the given interval by user
(Register/Verify runs in the given time range)
latest_time (Splunk) 2020-10-13T22:55:00 Time range from the given interval by user
(Register/Verify runs in the given time range)
input_source (Splunk) svr_reg01 Name of input sources
(Using for Splunk dashboard)
query (Splunk) search index="servers" Search query for using Register/Verify
running_script (Splunk) 2020-10-13T23:07:44.977137-0700 It is duplicated (_time field)
export_logs (Splunk) no_space Options for handling data.

Splunk Hashing Description

Hash string of json or xml

Oneshot Mode/Export Mode

We export all logs. Then, we read event by event and store/sort all events in the list. Convert this list to String and hash it.

{
		 "_bkt":"servers~4~27481611-AA86-4E38-9A08-11D1763A097B", 
		 "_cd":"4:986060", 
		 "_indextime":"1602812642", 
		 "_raw":" Action Normal :Notify: web event Production 
 Action Normal :Notify: web event Module 
 Action Normal :Notify: web event Audit Collector 
 Action Normal :Notify: web event Inventory Check 
 Action Normal :Notify: web event Security 
 Action Normal :Notify: web event Accounting 
 Action Normal :Notify: web event Compliance 
 Action Normal :Notify: web event Logger 
 Action Normal :Notify: web event Portal 
 Action Normal :Notify: web event Payment Processing 
 Sub web event Testing 
Cisco SNMP Polling event 
 Sub web event Dev Module 
Cisco SNMP Polling event 
 Sub web event Shipping 
Cisco SNMP Polling event ", 
		 "_serial":"19", 
		"_si":[
			"ip-172-31-47-29.us-east-2.compute.internal", 
			"servers" 
		],
		 "_sourcetype":"servers", 
		 "_time":"2020-10-15T18:44:02.000-07:00", 
		 "host":"server01", 
		 "index":"servers", 
		 "linecount":"16", 
		 "source":"/tmp/main/main.txt", 
		 "sourcetype":"servers", 
		 "splunk_server":"ip-172-31-47-29.us-east-2.compute.internal" 
	}

Blocking Mode

Export this xml at once. Convert XML to String and hash it.

<?xml version="1.0" encoding="UTF-8"?>
<results preview="0">
   <meta>
      <fieldOrder>
         <field>_bkt</field>
         <field>_cd</field>
         <field>_indextime</field>
         <field>_raw</field>
         <field>_serial</field>
         <field>_si</field>
         <field>_sourcetype</field>
         <field>_time</field>
         <field>host</field>
         <field>index</field>
         <field>linecount</field>
         <field>source</field>
         <field>sourcetype</field>
         <field>splunk_server</field>
      </fieldOrder>
   </meta>
   <result offset="0">
      <field k="_bkt">
         <value>
            <text>servers~4~27481611-AA86-4E38-9A08-11D1763A097B</text>
         </value>
      </field>
      <field k="_cd">
         <value>
            <text>4:982730</text>
         </value>
      </field>
      <field k="_indextime">
         <value>
            <text>1602811558</text>
         </value>
      </field>
      <field k="_raw">
         <v xml:space="preserve" trunc="0"> Sub web event PayPal
Cisco SNMP Polling event
 Sub web event Compliance
Cisco SNMP Polling event </v>
      </field>
      <field k="_serial">
         <value>
            <text>0</text>
         </value>
      </field>
      <field k="_si">
         <value>
            <text>ip-172-31-47-29.us-east-2.compute.internal</text>
         </value>
         <value>
            <text>servers</text>
         </value>
      </field>
      <field k="_sourcetype">
         <value>
            <text>servers</text>
         </value>
      </field>
      <field k="_time">
         <value>
            <text>2020-10-15T18:25:57.000-07:00</text>
         </value>
      </field>
      <field k="host">
         <value>
            <text>server01</text>
         </value>
      </field>
      <field k="index">
         <value>
            <text>servers</text>
         </value>
      </field>
      <field k="linecount">
         <value>
            <text>4</text>
         </value>
      </field>
      <field k="source">
         <value>
            <text>/tmp/main/main.txt</text>
         </value>
      </field>
   </result>
</results>

QRadar Format

a=b|c=d

Normalization Algorithm

  1. Conversion to normalized message
  2. Remove fields that generated: assetId, hash, start_time, end_time, verified, uuid
  3. If Registering, add uuid
  4. Convert normalized message to JSON, with sorted keys
  5. Hash the JSON
  6. Append the Hash, AssetId, other details to original message format.

Range Appending Algorithm

  1. Get query for fetching between time range
  2. Get start and end time
  3. Get messages from start -> end
  4. Normalize each message as above, without UUID
  5. Till end of iteration, add normalized message to hash.

TODOs

  1. Ensure all systems are logically unified: Azure Sentinel, ELK, QRadar, Splunk
  2. Unify format for all fields are so we can easily verify any log to any SIEM
  3. Add a field for the hash algorithm. If unspecified, we use SHA-256. We also support strong hash algorithms. E.g.: SHA-RSA 4096, HMAC-SHA256 (with key from user), SHA3, SHA5, SSDeep (Fuzzy Hashing)
  4. Ensure we do secure UUID based noncing to avoid replays
Clone this wiki locally