Get Row Count from Table in Cassandra - Cassandra / CQL Tutorial

How to get Row Count from Table in Cassandra by using CQL

You can use Count(*) for Count(1) to get row numbers from table. 

Syntax:

SELECT Count(* |1 ) 
FROM   tablename;

 

Example:

Let's create employee table by using below CQL Statement.

CQLSH:techbrotherstutorials>CREATE TABLE employee 
             ( 
                          employeeid INT , 
                          fname TEXT, 
                          lname TEXT, 
                          address TEXT, 
                          age TINYINT, 
                          PRIMARY KEY((employeeid,fname),lname) 
             ) 
             WITH clustering 
ORDER BY     ( 
                          lname DESC);

Insert sample data by using CQL insert statement

INSERT INTO employee 
            (employeeid,fname,lname,address,age) 
            VALUES (1,'John','Doe','ABC Address',23);
INSERT INTO employee 
            (employeeid,fname,lname,address,age) 
            VALUES (2,'Robert','Ladson',' Address',40);
INSERT INTO employee 
            (employeeid,fname,lname,address,age) 
            VALUES (3,'M','Raza','New Address',38);
            INSERT INTO employee 
            (employeeid,fname,lname,address,age) 
            VALUES (3,'M','Ali','New Address',38);  

 

Get Total Row Count from Table

If we want to get total row count from employee table, we can use below statement

CQLSH:techbrotherstutorials>SELECT Count(*) 
FROM   employee;

Or 

 CQLSH:techbrotherstutorials>SELECT Count(1) 
FROM   employee;

 

Output

 count
-------
     4

(1 rows)

Warnings :
Aggregation query used without partition key

 

Get  Row Count with Where Clause:

You can use where clause in your Select query when geting the row count from table.If you are using where clause with partition keys , you will be good but if you try to use where clause with non partition key columns you will get a warning and will have to use Allow Filtering in select query to get row count. This can be expensive operation if you have large table.

Let's get row count where employeeid=3 and fname in ('M','J'), Remeber employeeid and fname are the partition key columns.

CQLSH:techbrotherstutorials>SELECT Count(1) 
FROM   employee 
WHERE  employeeid=3 
AND    fname IN('M', 
                'J')
;

output

 count
-------
     2

(1 rows)

Warnings :
Aggregation query used on multiple partition keys (IN restriction) 

 

Let's say now if we want to get the row count where age> 25, Notice that Age is not Partition Key column, we have to use Allow Filtering otherwise we will get below warning.

CQLSH:techbrotherstutorials>SELECT Count(*) 
FROM   employee 
WHERE  age>25;

 

Ouput:

Above query is going to throw a warning.

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
 

Let's write the query with allow filtering to get the desired records.

SELECT Count(*) 
FROM   employee 
WHERE  age>25 allow filtering;

Output

 count
-------
 
   3

(1 rows)

Warnings :
Aggregation query used without partition key

How to Limit returned rows in Cassandra - Cassandra / CQL Tutorial

How to Limit returned rows in Cassandra Query Language

You can use LIMIT option in query to control the number of rows you want to get. 

Syntax:

CQLSH:techbrotherstutorials>SELECT column1, 
       column2, 
       ...FROM tablename LIMIT numberofrow;

 

Example:

Let's create employee table by using below CQL Statement.

CQLSH:techbrotherstutorials>CREATE TABLE employee 
             ( 
                          employeeid INT , 
                          fname TEXT, 
                          lname TEXT, 
                          address TEXT, 
                          age TINYINT, 
                          PRIMARY KEY((employeeid,fname),lname) 
             ) 
             WITH clustering 
ORDER BY     ( 
                          lname DESC);

Insert sample data by using CQL insert statement

INSERT INTO employee 
            (employeeid,fname,lname,address,age) 
            VALUES (1,'John','Doe','ABC Address',23);
INSERT INTO employee 
            (employeeid,fname,lname,address,age) 
            VALUES (2,'Robert','Ladson',' Address',40);
INSERT INTO employee 
            (employeeid,fname,lname,address,age) 
            VALUES (3,'M','Raza','New Address',38);
            INSERT INTO employee 
            (employeeid,fname,lname,address,age) 
            VALUES (3,'M','Ali','New Address',38);  

 

Now if we want to get only 2 records then we can use below statement. Cassandra is going to return first 2 records.

CQLSH:techbrotherstutorials>SELECT * 
FROM   employee LIMIT 2;

 

Output

 employeeid | fname | lname | address     | age
------------+-------+-------+-------------+-----
       
  3 |     M |  Raza | New Address |  38
          3 |     M |   Ali | New Address |  38

Use Distinct in Cassandra - Cassandra / CQL Tutorial

How to use Distinct in Cassandra Query Language

In cassandra you can only select the distinct records from Partition Key column or columns.  If Partition key consists of multiple columns, you have to provide all of the columns otherwise you will get an error.

Syntax:

CQLSH:techbrotherstutorials>SELECT DISTINCT partitioncolumn1, 
                partitioncolumn2... 
from            TableName;

 

Example:

Let's create employee table by using below CQL Statement.

CQLSH:techbrotherstutorials>CREATE TABLE employee 
             ( 
                          employeeid INT , 
                          fname TEXT, 
                          lname TEXT, 
                          address TEXT, 
                          age TINYINT, 
                          PRIMARY KEY((employeeid,fname),lname) 
             ) 
             WITH clustering 
ORDER BY     ( 
                          lname DESC);

Notice that I have primary key with columns employeeid,fname and lname. Our partiton key consist of employeeid and fname. We can insert duplicate records in partiton key but all the records will be unique according to primary key (employeeid,fname,lname).

INSERT INTO employee 
            (employeeid,fname,lname,address,age) 
            VALUES (1,'John','Doe','ABC Address',23);
INSERT INTO employee 
            (employeeid,fname,lname,address,age) 
            VALUES (2,'Robert','Ladson',' Address',40);
INSERT INTO employee 
            (employeeid,fname,lname,address,age) 
            VALUES (3,'M','Raza','New Address',38);
            INSERT INTO employee 
            (employeeid,fname,lname,address,age) 
            VALUES (3,'M','Ali','New Address',38);  

 

 I inserted 4 records in table, we have duplicate record 3 and 'M'  by partition key columns. If we want to get distinct records from Partition Key column, we can use below statement. 

CQLSH:techbrotherstutorials>SELECT DISTINCT employeeid, 
                fname 
FROM            employee;

 

Output

 employeeid | fname
------------+--------
          3 |      M
          1 |   John
          2 | Robert

(3 rows)

Use Where Clause in Cassandra - Cassandra / CQL Tutorial

 

How to use Where Clause in Cassandra (CQL)

Where clause in Cassandra Query Language is used to filter the records.  To filter the records, you will be using 

  • Partition Key ( Single or composite)
  • A cluster column with Partition Key
  • A column on which you have created the index
  • A column which is not part of Partition Key or no index is created on it, you can use in where clause but you have to use with Allow Filtering.

Allow Filtering:

Allow filtering is not a good choice, as Cassandra will load all the records and then filter the records which you do not need. If you have table with alot of records, the query can be very expensive as resource usage and can take long time to run. Better options could be designing the partition key as per your input queries or create indexes on columns which you need to use in where clause.

 

Example:

Let's create employee table with composite Parition key( Employeeid,Fname) and cluster column LName and insert sample data.

CQLSH:techbrotherstutorials>CREATE TABLE employee 
             ( 
                          employeeid INT , 
                          fname TEXT, 
                          lname TEXT, 
                          address TEXT, 
                          age TINYINT, 
                          PRIMARY KEY((employeeid,fname),lname) 
             ) 
             WITH clustering 
ORDER BY     ( 
                          lname DESC);

 

Insert records in employee table by using CQL

INSERT INTO employee 
            
(employeeid,fname,lname,address,age) 
            
VALUES
 (1,'John','Doe','ABC Address',23);
INSERT INTO employee 
            
(employeeid,fname,lname,address,age) 
            
VALUES
 (2,'Robert','Ladson',' Address',40);
INSERT INTO employee 
            
(employeeid,fname,lname,address,age) 
            
VALUES
 (3,'M','Raza','New Address',38); 
INSERT INTO employee 
            
(employeeid,fname,lname,address,age) 
            
VALUES
 (4,'M','john','New Address',38); 

 

Where Clause with Partition Key: 

As we have composite Partition Key ( Employeeid and Fname), we can filter the records by using these columns. If I try to use one of these columns then we will get below error.


InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

We can use Allow Filtering and it will work but remember it is going to bring all the records and then filter the records which we don't need. That can be expensive operation. In our case we have only 3 records so no big deal but think if you have millions of records in table.

Here is right way to get the data from our table, I have two columns as Partition Key so I am using both of them in where clause. if you have single partition key column, you will be using that.

CQLSH:techbrotherstutorials>>SELECT * 
FROM   employee 
WHERE  employeeid=3 
AND    fname='M';

Output

 employeeid | fname | lname | address     | age
------------+-------+-------+-------------+-----
          
3 |     M |  Raza | New Address |  26

(1 rows)

 

Use Clustering Column in Where Clause: 

From table definition you can see that we have clustering column "lname". To use clustering column properly, we should use Partition key columns first in where and then clustering column to get results fast. If we will use clustering column alone, Cassandra will throw the warning and ask to use Allow Filtering. As discusses above, it can be expensive operation.

Let's write our query by using cluster column. Couple of things to remember, 

The values you provide in where clauses are case senstive

 

CQLSH:techbrotherstutorials>SELECT employeeid,
       fname,
       lname,
       address,
       age
FROM   employee
WHERE  employeeid=1
AND    fname='John'
AND    lname='Doe';

Output

 

 employeeid | fname | lname | address      | age
------------+-------+-------+--------------+-----
      
    1 |  John |   Doe |  ABC Address |  23

Use Index Column in Where Clause:

Now if we would like to filter the data by using Age,  you can use without index on the age column but we have to use Allow Filtering and the can be expensive operation. Let's create index on Age column by using below query and then we will be able to use Age column in where clause without Allowing Filtering.

 

CQLSH:techbrotherstutorials>CREATE INDEX indexname ON tablename  (columnname );

 

CQLSH:techbrotherstutorials>CREATE INDEX employee_age ON employee (age);

 

Now we are good to use the age column in our where clause. Below script will get us all the records where age=40.

CQLSH:techbrotherstutorials>SELECT * 
FROM   employee 
WHERE  age=40;

Output 

 employeeid | fname  | lname  | address      | age
------------+--------+--------+--------------+-----
         
2 | Robert | Ladson |  Xyz Address |  40

 

Select Data from Table in Cassandra - Cassandra / CQL Tutorial

Select Data from Table in Cassandra

To Select the data from Cassandra , Select statement is used. If you need to select all columns , you can use "*" or provide list of column names. I am not using Where clause and other options in this post, Let's discuss them in next posts.

Syntax:

cqlsh>SELECT * FROM   tablename; 

 Or

cqlsh>SELECT column1,  column2,  column3, ....FROM tablename;

ALias:

You can also Alias the column names, you will be using "AS AliasName". if you want to use space in Alias Column name, you need to use double quotes around it e.g "My Alias Column Name"

cqlsh>SELECT column1 AS ColumnOne,        column2 AS ColumnTwo,        column3 AS "Column Three" FROM   tablename;

 

Example:

Let's create employee table by using CQL create statement and insert  sample data and then we will use Select statement to query the data.

CQLSH:techbrotherstutorials>CREATE TABLE employee 
             ( 
                          employeeid INT , 
                          fname TEXT, 
                          lname TEXT, 
                          address TEXT, 
                          age TINYINT, 
                          PRIMARY KEY((employeeid,fname),lname) 
             ) 
             WITH clustering 
ORDER BY     ( 
                          lname DESC);

 

Insert data into table by using CQL Insert command

INSERT INTO employee 
            (employeeid,fname,lname,address,age) 
            VALUES (1,'John','Doe','ABC Address',23);
INSERT INTO employee 
            (employeeid,fname,lname,address,age) 
            VALUES (2,'Robert','Ladson',' Address',40);
INSERT INTO employee 
            (employeeid,fname,lname,address,age) 
            VALUES (3,'M','Raza','New Address',38); 

 

Let's select all the records from employee table. Employee table in my case is in TechBrothersTutorials keyspace, I am already in TechBrothersTutorial keyspace. If you need to be in keyspace where your table exists, you can use "USE KEYSPACENAME".

CQLSH:techbrotherstutorials>SELECT * FROM   employee; 

Output

  employeeid | fname  | lname  | address      | age

------------+--------+--------+--------------+-----
         
3 |      M |   Raza |  New Address |  26
          1 |   John |    Doe |  ABC Address |  23
          2 | Robert | Ladson |  Xyz Address |  40

(3 rows)
 

 

Let's use column names in our Select query and alias , I am going to fname to FirstName, lname to LastName and address to "Home Address".

CQLSH:techbrotherstutorials>SELECT employeeid, 
       fname   AS FirstName, 
       lname   AS LastName, 
       address AS "Home Address", 
       age 
FROM   employee; 

Output 

 

  employeeid | firstname  | lastname  | Home Address      | age

------------+--------+--------+--------------+-----
         
3 |      M |   Raza |  New Address |  26
          1 |   John |    Doe |  ABC Address |  23
          2 | Robert | Ladson |  Xyz Address |  40

(3 rows)