Basic searching and string operators

By Gianni Castaldi

In this blog post, we will learn which string operator to use and when to use. We will also learn some basic queries to discover the amount of data in a Log Analytics Workspace.

The basic string operators that we can use are:

  • ==
  • has
  • contains
  • startswith
  • endswith
  • matches regex
  • has_any

In the SQL to KQL blog post, we used the evaluation data of the MITRE ATP29 test to test our queries. Because this blog post will also be about performance we want to use bigger data set in the form of the Log Analytics Demo environment. The importance of performance and optimizing queries comes from the limits in the Log Analytics. The time limit for queries is 10 minutes, and there are also limits on the amount of data a query can return. If we open the environment we have to sign in with a Microsoft account. For the demo we will change the time range to 7 days and run the following search:

search *
| summarize count()

So we have a lot of data, but how many tables do we have?

search *
| summarize dcount($table)

And which tables do we have and can we also sort them by count in descending order?

search *
| summarize count() by $table
| sort by count_ desc

Today the largest $table is the ContainerLog so let’s see which Computers are reporting container data.

ContainerLog
| summarize count() by Computer
| sort by count_ desc

So the Computer with the largest log count is the aks-agentpool-13012534-1. The next step is to query 10 random records to view the contents. To do this we will use the equal operator.

ContainerLog
| where Computer == "aks-agentpool-13012534-1"
| take 10

If we want to query 10 records where Computer is not aks-agentpool-13012534-1 we will use the following not equal query:

ContainerLog
| where Computer != "aks-agentpool-13012534-1"
| take 10

In the earlier query, we summarized the names and saw that all the names were in lower case. But we can also use the equal operator case insensitive.

ContainerLog
| where Computer =~ "AKS-AGENTpool-13012534-1"
| take 10

We can also do a not equal case insensitive with the following string operator: !~

So what about the performance? I have compared the following 2 queries by running them 5 times:

ContainerLog
| where Computer =~ "aks-agentpool-13012534-1"
| count
ContainerLog
| where Computer == "AKS-AGENTpool-13012534-1"
| count
Case insensitive search in secondsCase sensitive search in seconds
2,8752,022
2,5292,249
2,4852,213
2,1271,952
2,5571,916
avg = 2,515 secondsavg = 2,070 seconds

This is not a scientific test, but it does show that a simple query can be 20% faster if you can be case sensitive.

Let’s move on to the following step. We have queried aks-agentpool-13012534-1, but there is also aks-agentpool-13012534-0. We can query on both by using the has operator.

ContainerLog
| where Computer has "13012534"

Since the query only returns a lot of lines, we will improve it by summarizing the results by Computer and LogEntry. This way we can get a feeling about how mine times a certain log entry was logged on one of the two agent pools.

ContainerLog
| where Computer has "13012534"
| summarize count() by Computer, LogEntry
| sort by count_ desc

I think someone should take a look at aks-agentpool-13012534-1 for the log entry: “Process is terminating due to StackOverflowException.” Let’s make sure that the terminating error is not in the other pools.

ContainerLog
| where Computer !has "13012534"
| where LogEntry == "Process is terminating due to StackOverflowException. "
| summarize count() by Computer, LogEntry

Where the equal operator is case sensitive, the has operator is case insensitive. To add case sensitivity we can use the _cs. The performance gain for case sensitivity is a bit smaller when using the has operator.

I have compared the following two queries:

ContainerLog
| where Computer has "AKS-agentpool-"
| count
ContainerLog
| where Computer has_cs "aks-agentpool-"
| count
Case insensitive search in secondsCase sensitive search in seconds
2,6542,235
2,5242,720
2,7982,317
2,9142,016
2,4722,071
avg = 2,672 secondsavg = 2,272 seconds

Instead of 20% (== vs =~) its 17% (has_cs vs has) faster.

So we know how to use has but how does it work? In the workspace, strings are broken into terms. A term is a set of 4 or more ASCII alphanumeric characters. So the item: aks-agentpool-13012534-1, has the following terms: agentpool and 13012534.

If we want to display all items where the computer has the term: pool, then the following command will not work because there is no term pool.

ContainerLog
| where Computer has "pool"

So instead of using has operator we can also use hasprefix or hassuffix operators. In the case of the agentpool we can use hassuffix to query all items with terms that end on pool.

ContainerLog
| where Computer hassuffix "pool"

The hassuffix and hasprefix operators can also be used with ! to negate the operator, and used with _cs for case sensitivity.

After discovering the equal and has operator the next step is the contains operator. This operator is great when you know that something exists somewhere. We will discover which tables contain aks-agentpool-13012534-1.

search *
| where * contains "aks-agentpool-13012534-1"
| distinct $table

The contains operator also uses _cs and ! for case sensitivity and negates.

After the contains operator we will look at the startswith and endswith operator. If you only want to query the start of an item and not the start of each term, then this is the way.

ContainerLog
| where Computer startswith "aks"

As the has and the contains operators the startswith and endswith operators are case insensitive. To make them case sensitive you can append _cs. To negate you can prepend !.

The next operator we will use is the matches regex operator. Which can be used to do complex queries on single items. For example what if we wanted to see all Computers which start with aks and contain 19 followed by 6 digits.

ContainerLog
| where Computer matches regex "^aks.*-19([0-9]){6}"
| distinct Computer

While we are nearing the end of this blog post we still have to operators left. The in and the has_any operator. We will continue with the in operator.

The in operator is case sensitive by itself so if we want case insensitivity we have to use the in~ operator, and you can use !in to negate. In the next query, we will do a count of all records for the aks-linuxpool-19400979-vmss000000 and the aks-linuxpool-19400979-vmss000001.

ContainerLog
| where Computer in ("aks-linuxpool-19400979-vmss000000","aks-linuxpool-19400979-vmss000001")
| summarize count() by Computer

The last operator of this blog post is the has_any operator. We will look for terms containing 19400979 and 19037964 and count the records by Computer.

ContainerLog
| where Computer has_any ("19400979","19037964")
| summarize count() by Computer

And with the last query, we will wrap up this blog post. We have started with some basic searches to determine which table we would use for this blog post. After that we learned what the following operators do: ==, has, contains, startswith, endswith, matches regex, has_any and that case sensitive searches are faster than case insensitive searches. The reason you want to optimize your queries is because queries have a time limit and they have an output limit.

Thanks for reading and if you have any questions or ideas for a blog post let me know.

Alternative Text

By Gianni Castaldi

MVP | NinjaCat | Researching and Engineering Cyber Security @ KustoWorks

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close