# kJQ filters

## Overview

[JQ](https://stedolan.github.io/jq/) is a popular, practical language described as 'like sed for JSON data'.&#x20;

Data inspect supports JQ-like filters on Kafka topics. We call this kJQ.

kJQ is **fast**, easily scanning tens of thousands of messages from a Kafka topic each second.

![Sample KJQ Query](/img/assets/kjqFilter.png)

The kJQ input field provides context highlighting, auto-completion, command memory (press **up-arrow** to view previous filters) and fast-execution (press **shift-enter** to execute the search).

Normally your kJQ filters will start with **.key .value or .header** but you can search on any field returned with a Kafka record, including topic, offset, etc.

Kpow implements a subset of JQ allowing you to search JSON, Avro, Transit, EDN, String, and Custom Serdes with complex queries on structured data.

## Language

kJQ filters can be applied to keys, values, and headers.&#x20;

Kpow will scan **tens of thousands of messages a second** to find matching data.

kJQ is **not whitespace sensitive.**

### kJQ Grammar

A kJQ filter is a limited version of [a basic JQ filter](https://stedolan.github.io/jq/manual/v1.4/#Basicfilters).

#### Filters

A _filter_ consists of a _selector_ optionally followed by a _transform_ then either a _comparator_ or a _function._

A filter can optionally be _negated_ and joined with other filters with a logical operator.

#### Selectors

A _selector_ is a JQ dot notation Object Index or **zero-based** Array Index.

e.g. `.user.name, .[0], .transactions[1].amount`

A _selector_ can also be a quoted string or generic object index, just like normal JQ.

e.g. `.user."first.name""`  matches a key containing a period, i.e. `{"user": {"first.name": 1}}`

e.g `.user.[:status_code]` matches an explicit Clojure keyword `{"user" {:status_code 2}}`

Simple dot notation selectors match both String or (Clojure) Keyword keys.

#### Transforms

A _transform_ converts the value of a field, often in use with a comparator or function.

Valid transforms: `length`, `to-long`, `to-double`, `from-date`, `min`, `max`, `to-string`

E.g. `| to-long`, `| min`

#### Comparators

A _comparator_ is an _operator_ followed by a _selector_ or a _scalar._

Valid operators: `==`, `!=`, `<`, `<=`, `>`, `>=`

e.g. `>= 10`, `!= false`, `== "text"`, `== nil`, `!= null`, `< .tx.baz`

#### Function

A _function_ is a _pipe_ followed by a _function-name_ with _text, keyword, number, or regex_ parameter.

Valid function names: `startswith`, `endswith`, `inside`, `has`, `test`, `within`, `contains`

e.g. `| test(".*tx")`, `| startswith("text")`, `| endswith("text")`, `| contains("text")`

### kJQ Query Evaluation

Multiple kJQ filters can be joined with a logical **AND** or **OR**, just like normal JQ.

kJQ also supports standard explicit logical operator precedence with parenthesis.

e.g. `(.key.id or .key.currency == "GBP) and .value.tx.discount | to-double < 20.20`

### kJQ Query Negation

A kJQ query filter can be negated. Negation can be applied to logically combined filters.

e.g. `| not`

## Examples

### Truthy Filter

```
.value.tx.status
```

Matches where the selector is not null.

E.g `{ "tx": { "status": true }}` or `{ "tx": { "status": 1 }}` will match, `{ "meta": { "status": true }}` will not match

### &#x20;Scalar Comparator Filter

```
.value.tx.amount > 10
```

Matches where the selector > 10&#x20;

E.g. `{ "tx": { "amount": 11 }}` will match, `{ "tx": { "amount": 8 }}` will not

### Selector Comparator Filter

```
.value.tx.amount == .value.tx.discount
```

Matches where both selectors are equal.

E.g. `{ "tx": { "amount": 10, "discount": 10 }}` will match, `{ "tx": { "amount": 10, "discount": 7 }}` will not

### Function Filter

```
.value.tx.labels[0] | contains("URGENT")
```

Matches where the selector contains text

E.g. `{ "tx": { "labels": ["URGENT-PENDING"] }}` will match, `{ "tx": { "labels": ["PENDING"] }}` will not.

### Regex Tests

Just like JQ you can test if a regex matches a field

```
.key.id | test(".*tx")
```

True when the .key.id matches the regex `#.*tx`

### Negated Filter

```
.[0].tx.status | contains("PENDING") | not
```

Matches where the selector does not contain text.

### Quoted and Clojure Selectors / Scalars

```
.value."price.with.tax" > 10 and
.value."category!" == :seasonal
```

kJQ understands quoted and Clojure data

### Multiple Filters (And)

```
.value.tx.amount > 10 and
.value.tx.amount == .value.tx.discount and
.value.tx.labels[0] | contains("URGENT")
```

Matches where **every** filter is true.

### Multiple Filters (Or)

```
.value.tx.amount == .value.tx.discount or
.value.tx.labels[0] | contains("URGENT")
```

Matches where **any** filter is true.

### Multiple Filters (Mixed)

Combine multiple filters with **and**, **or, and explicit precedence.**

```
(.key.currency == "GBP" and
 .value.tx.price | to-double < 16.50 and
 .value.tx.pan | endswith("8649")) or 
(.key.currency == "GBP" and 
 .value.tx.discount == "3.98")
```

### UUIDs

kJQ supports UUID types out of the box. For example the `UUID` deserializer, `AVRO` + logical types, or `Transit / JSON` and `EDN` deserializers that have richer data types:

To compare against literal UUID strings, prefix them with `#uuid` to coerce into a UUID:

#### Basic usage

```kjq
.key == #uuid "fc1ba6a8-6d77-46a0-b9cf-277b6d355fa6"
```

### Date Filtering with `from-date`

kJQ supports converting field values into ISO-8601 timestamp strings using the from-date transform. This includes:

* **AVRO logicalType** `date` fields (encoded as int = days since epoch)

* **UNIX timestamps** (number of seconds or milliseconds since epoch)

* **ISO-8601 formatted strings**

To compare against literal date strings, prefix them with #`dt` to coerce into a timestamp.

#### Basic usage

```kjq
.value.tx.start_date | from-date > #dt "2023-01-01T00:00:00Z"

```

Matches records where `.value.tx.start_date` is before or on May 10th, 2025.

```kjq
.value.tx.start_date | from-date >= #dt "2023-01-01T00:00:00Z" and .value.tx.start_date | from-date <= #dt "2023-12-31T00:00:00Z"

```

Matches records where `.value.tx.start_date` falls within 2023 (inclusive).

```kjq
.value.tx.start_date | from-date < #dt "2023-03-01T00:00:00Z"

```
Matches records before March 1st, 2023.


### Filtering by Record Size

kJQ supports filtering based on metadata fields such as record size, key size, and value size.

#### Record size

```kjq
.size > 1500
```

Matches records whose total serialized size exceeds 1500 bytes.

#### Key size

```kjq
.key-size | to-long < 500
```

Matches records with key payloads under 500 bytes.

#### Value size

```kjq
.value-size >= 1024
```

Matches records where the value is at least 1KB in size.

These metadata fields are always available in the record envelope, regardless of key/value format.

#### Combined size

```kjq
(.key-size + .value-size) < 30

```

Filters records where the total key + value size is less than 30 bytes.

#### Null checks

```kjq
.key == null

```

Filters records that have no key (null keys).

## String Slice Examples

### Basic String Slicing

```kjq
.value.transaction_id[0:3] == "TXN"
```

Matches where the first 3 characters of transaction_id equal "TXN"
E.g. `{ "transaction_id": "TXN12345" }` will match, `{ "transaction_id": "ORD12345" }` will not

### Missing Start Index (from beginning)

```kjq
.key.account_number[:4] == "ACCT"

```
Matches where the first 4 characters equal "ACCT"
E.g. `{ "account_number": "ACCT9876" }` will match

### Missing End Index (to end)

```kjq
.value.filename[4:] | endswith(".json")
```

Matches where everything after the 4th character ends with ".json"
E.g. `{ "filename": "data2023-export.json" }` will match

### Full Slice (entire string)

```kjq
.value.message[:] | contains("ERROR")
```

Matches where the entire message contains "ERROR"
Equivalent to `.value.message | contains("ERROR")`

### Negative Indices (from end)

```kjq
.value.log_entry[-5:] == "ERROR"
```

Matches where the last 5 characters equal "ERROR"
E.g. `{ "log_entry": "2023-06-20T10:00:00 FATAL_ERROR" }` will match

### Complex Nested String Slicing

```kjq
.value.events[0].timestamp[0:10] == "2023-06-20"
```

Matches where the first event's timestamp starts with the date "2023-06-20"
E.g. `{ "events": [{"timestamp": "2023-06-20T10:00:00Z"}] }` will match

### String Slice Comparisons

```kjq
.value.order_id[0:3] == .value.customer_id[0:3]
```

Matches where the first 3 characters of order_id match the first 3 characters of customer_id
E.g. `{ "order_id": "ABC123", "customer_id": "ABC456" }` will match

### Combined with Other Operations

```kjq
.value.user.email[-10:] | endswith(".com") and .value.user.email[0:5] != "admin"
```

Matches where email ends with ".com" in the last 10 characters and doesn't start with "admin"

## Alternatives and Arithmetic Examples

### Alternative Operator (//)

```kjq
.value.customer_name // .value.customer_id
```

Uses customer_name if it exists and is not null, otherwise falls back to customer_id
E.g. `{ "customer_id": "C123" }` will return "C123", `{ "customer_name": "John", "customer_id": "C123" }` will return "John"

### Alternative with Default Values

```kjq
.value.discount // 0 > 5
```

Uses discount if present, otherwise defaults to 0, then checks if greater than 5
E.g. `{ "price": 100 }` will use 0 and return false, `{ "discount": 10 }` will return true

### Chained Alternatives

```kjq
.value.primary_email // .value.secondary_email // .value.contact_email | endswith(".com")
```
Uses the first non-null email address and checks if it ends with ".com"

## Arithmetic Operations

### Basic Addition

```kjq
(.value.base_price + .value.tax) > 100
```

Matches where the sum of base price and tax exceeds 100
E.g. `{ "base_price": 80, "tax": 25 }` will match

### Subtraction for Calculations

```kjq
(.value.credit_limit - .value.current_balance) < 1000
```

Matches where remaining credit is less than 1000
E.g. `{ "credit_limit": 5000, "current_balance": 4500 }` will match

### Multiplication for Totals

```kjq
(.value.quantity * .value.unit_price) >= .value.minimum_order
```

Matches where total order value meets minimum requirement
E.g. `{ "quantity": 5, "unit_price": 20, "minimum_order": 90 }` will match

### Division for Ratios

```kjq
(.value.successful_requests / .value.total_requests) >= 0.95
```

Matches where success rate exceeds 95%
E.g. `{ "successful_requests": 950, "total_requests": 1000 }` will match

### Modulo for Patterns

```kjq
.value.transaction_id | to-long % 10 == 0
```

Matches every 10th transaction (IDs ending in 0)
E.g. `{ "transaction_id": "1230" }` will match

### Complex Arithmetic with Size Calculations

```kjq
(.key-size + .value-size) * 8 > 1024
```
Matches where total message size in bits exceeds 1024
Combines record metadata with arithmetic operations

### Arithmetic with Alternatives

```kjq
.value.discount // 0 + .value.coupon_value // 0 > 50
```

Calculates total savings using fallback values for missing fields
E.g. `{ "discount": 30 }` will use 30 + 0 = 30, `{ "discount": 40, "coupon_value": 15 }` will return 55
