Hbase quickly count number of rows

QBoard » Big Data » Big Data - Hadoop Eco-System » Hbase quickly count number of rows

User Dashboard

Hbase quickly count number of rows

Back To Topics

Tags : BigData hadoop HBase

Rishi Pandya

131 2
Right now I implement row count over ResultScanner like this
```
for (Result rs = scanner.next(); rs != null; rs = scanner.next()) {
    number++;
}
```
If data reaching millions time computing is large.I want to compute in real time that i don't want to use Mapreduce

How to quickly count number of rows.
September 28, 2020 1:27 PM IST

0
Viaan Prakash

461

Go to Hbase home directory and run this command,

./bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'namespace:tablename'

This will launch a mapreduce job and the output will show the number of records existing in the hbase table.

August 18, 2021 2:29 PM IST

0
Vaibhav Mali

259

If you're using a scanner, in your scanner try to have it return the least number of qualifiers as possible. In fact, the qualifier(s) that you do return should be the smallest (in byte-size) as you have available. This will speed up your scan tremendously.

Unfortuneately this will only scale so far (millions-billions?). To take it further, you can do this in real time but you will first need to run a mapreduce job to count all rows.

Store the Mapreduce output in a cell in HBase. Every time you add a row, increment the counter by 1. Every time you delete a row, decrement the counter.

When you need to access the number of rows in real time, you read that field in HBase.

There is no fast way to count the rows otherwise in a way that scales. You can only count so fast.

January 1, 2022 2:14 PM IST

0
Shivakumar Kota

102 9
Use RowCounter in HBase RowCounter is a mapreduce job to count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency. It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to exploit.
```
$ hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename>

Usage: RowCounter [options] 
    <tablename> [          
        --starttime=[start] 
        --endtime=[end] 
        [--range=[startKey],[endKey]] 
        [<column1> <column2>...]
    ]
```
September 28, 2020 5:49 PM IST

0
Nitara Bobal

53
Simple, Effective and Efficient way to count row in HBASE:

1.Whenever you insert a row trigger this API which will increment that particular cell.
```
 Htable.incrementColumnValue(Bytes.toBytes("count"), Bytes.toBytes("details"), Bytes.toBytes("count"), 1);
```
2.To check number of rows present in that table. Just use "Get" or "scan" API for that particular Row 'count'.

By using this Method you can get the row count in less than a millisecond. This post was edited by Nitara Bobal at September 28, 2020 5:54 PM IST
September 28, 2020 5:51 PM IST

0
Jainew Nanda

67 3
You can use the count method in hbase to count the number of rows. But yes, counting rows of a large table can be slow.count 'tablename' [interval]

Return value is the number of rows.

This operation may take a LONG time (Run ‘$HADOOP_HOME/bin/hadoop jar hbase.jar rowcount’ to run a counting mapreduce job). Current count is shown every 1000 rows by default. Count interval may be optionally specified. Scan caching is enabled on count scans by default. Default cache size is 10 rows. If your rows are small in size, you may want to increase this parameter.

Examples:
```
hbase> count 't1'

hbase> count 't1', INTERVAL => 100000

hbase> count 't1', CACHE => 1000

hbase> count 't1', INTERVAL => 10, CACHE => 1000
```
The same commands also can be run on a table reference. Suppose you had a reference to table 't1', the corresponding commands would be:
```
hbase> t.count

hbase> t.count INTERVAL => 100000

hbase> t.count CACHE => 1000

hbase> t.count INTERVAL => 10, CACHE => 1000
```
September 29, 2020 1:10 PM IST

0
- Viaan Prakash This counter runs very slow and can be accessed from hbase shell only. For large tables its not recommended to use
  
  or cancel
  
  September 29, 2020
Rakesh Racharla

129 8
To count the Hbase table record count on a proper YARN cluster you have to set the map reduce job queue name as well:
```
hbase org.apache.hadoop.hbase.mapreduce.RowCounter -Dmapreduce.job.queuename= < Your Q Name which you have SUBMIT access>
 < TABLE_NAME>
```
```
 
```
September 29, 2020 1:12 PM IST

0

Member Sign In

Member Sign In

Create Account

Hbase quickly count number of rows

Connect With Us