Use Where Clause in Cassandra - Cassandra / CQL Tutorial

 

How to use Where Clause in Cassandra (CQL)

Where clause in Cassandra Query Language is used to filter the records.  To filter the records, you will be using 

  • Partition Key ( Single or composite)
  • A cluster column with Partition Key
  • A column on which you have created the index
  • A column which is not part of Partition Key or no index is created on it, you can use in where clause but you have to use with Allow Filtering.

Allow Filtering:

Allow filtering is not a good choice, as Cassandra will load all the records and then filter the records which you do not need. If you have table with alot of records, the query can be very expensive as resource usage and can take long time to run. Better options could be designing the partition key as per your input queries or create indexes on columns which you need to use in where clause.

 

Example:

Let's create employee table with composite Parition key( Employeeid,Fname) and cluster column LName and insert sample data.

CQLSH:techbrotherstutorials>CREATE TABLE employee 
             ( 
                          employeeid INT , 
                          fname TEXT, 
                          lname TEXT, 
                          address TEXT, 
                          age TINYINT, 
                          PRIMARY KEY((employeeid,fname),lname) 
             ) 
             WITH clustering 
ORDER BY     ( 
                          lname DESC);

 

Insert records in employee table by using CQL

INSERT INTO employee 
            
(employeeid,fname,lname,address,age) 
            
VALUES
 (1,'John','Doe','ABC Address',23);
INSERT INTO employee 
            
(employeeid,fname,lname,address,age) 
            
VALUES
 (2,'Robert','Ladson',' Address',40);
INSERT INTO employee 
            
(employeeid,fname,lname,address,age) 
            
VALUES
 (3,'M','Raza','New Address',38); 
INSERT INTO employee 
            
(employeeid,fname,lname,address,age) 
            
VALUES
 (4,'M','john','New Address',38); 

 

Where Clause with Partition Key: 

As we have composite Partition Key ( Employeeid and Fname), we can filter the records by using these columns. If I try to use one of these columns then we will get below error.


InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

We can use Allow Filtering and it will work but remember it is going to bring all the records and then filter the records which we don't need. That can be expensive operation. In our case we have only 3 records so no big deal but think if you have millions of records in table.

Here is right way to get the data from our table, I have two columns as Partition Key so I am using both of them in where clause. if you have single partition key column, you will be using that.

CQLSH:techbrotherstutorials>>SELECT * 
FROM   employee 
WHERE  employeeid=3 
AND    fname='M';

Output

 employeeid | fname | lname | address     | age
------------+-------+-------+-------------+-----
          
3 |     M |  Raza | New Address |  26

(1 rows)

 

Use Clustering Column in Where Clause: 

From table definition you can see that we have clustering column "lname". To use clustering column properly, we should use Partition key columns first in where and then clustering column to get results fast. If we will use clustering column alone, Cassandra will throw the warning and ask to use Allow Filtering. As discusses above, it can be expensive operation.

Let's write our query by using cluster column. Couple of things to remember, 

The values you provide in where clauses are case senstive

 

CQLSH:techbrotherstutorials>SELECT employeeid,
       fname,
       lname,
       address,
       age
FROM   employee
WHERE  employeeid=1
AND    fname='John'
AND    lname='Doe';

Output

 

 employeeid | fname | lname | address      | age
------------+-------+-------+--------------+-----
      
    1 |  John |   Doe |  ABC Address |  23

Use Index Column in Where Clause:

Now if we would like to filter the data by using Age,  you can use without index on the age column but we have to use Allow Filtering and the can be expensive operation. Let's create index on Age column by using below query and then we will be able to use Age column in where clause without Allowing Filtering.

 

CQLSH:techbrotherstutorials>CREATE INDEX indexname ON tablename  (columnname );

 

CQLSH:techbrotherstutorials>CREATE INDEX employee_age ON employee (age);

 

Now we are good to use the age column in our where clause. Below script will get us all the records where age=40.

CQLSH:techbrotherstutorials>SELECT * 
FROM   employee 
WHERE  age=40;

Output 

 employeeid | fname  | lname  | address      | age
------------+--------+--------+--------------+-----
         
2 | Robert | Ladson |  Xyz Address |  40