kudu
Filtering a specific row in kudu using kudu scanner
The target table in kudu is huge. I have the following in scala and I would like to check if the row exists in kudu. These four columns are primary keys in kudu table but when I define a upper bound I seem to get all the rows. How do I select a particular row in kudu? Here i expect only one row to be returned. val table2 : KuduTable = kuduClient.openTable("event-sets") val eventColumns: util.List[String] = List( OccurrenceSchema.SetId.name, OccurrenceSchema.Period.name, OccurrenceSchema.Event.name, OccurrenceSchema.Date.name).asJava val end:PartialRow = table2.getSchema.newPartialRow() end.addInt(OccurrenceSchema.Period.name,1476) end.addInt(OccurrenceSchema.SetId.name,82) end.addInt(OccurrenceSchema.Event.name,3195167) end.addLong(OccurrenceSchema.Date.name,1367922840000L) val kuduScanner: KuduScanner = kuduClient.newScannerBuilder(table2) .setProjectedColumnNames(eventColumns) .lowerBound(end) .exclusiveUpperBound((end)) .build() assert(kuduScanner.hasMoreRows) while (kuduScanner.hasMoreRows) { val resultIterator: RowResultIterator = kuduScanner.nextRows() while (resultIterator.hasNext) { val result: RowResult = resultIterator.next() assert(result != null) logger.info(" : SetId Value -- " + result.getInt(OccurrenceSchema.SetId.name)) logger.info(" : Period Value -- " + result.getInt(OccurrenceSchema.Period.name)) logger.info(" : Event Value -- " + result.getInt(OccurrenceSchema.Event.name)) logger.info(" : Date Value -- " + result.getLong(OccurrenceSchema.Date.name)) } }
From my understanding, you are looking for eaxcly one record in your table. Using a scanner and defining bounds and / or a limit with didn't worked for me either. Instead I solved the problem by defining a KuduPredicate. Below you will find my solution. val builder: KuduScannerBuilder = kuduClient.newScannerBuilder(table2) // define columns, you want to select builder.setProjectedColumnNames(eventColumns) // add predicates to select a record by primary key val pkPeriod: KuduPredicate = KuduPredicate.newComparisonPredicate(OccurrenceSchema.Period.name), KuduPredicate.ComparisonOp.EQUAL, 1476) builder.addPredicate(pkPeriod) val pkSetId: KuduPredicate = KuduPredicate.newComparisonPredicate(OccurrenceSchema.SetId.name), KuduPredicate.ComparisonOp.EQUAL, 82) builder.addPredicate(pkSetId) val pkEvent: KuduPredicate = KuduPredicate.newComparisonPredicate(OccurrenceSchema.Event.name), KuduPredicate.ComparisonOp.EQUAL, 3195167) builder.addPredicate(pkEvent) val pkDate: KuduPredicate = KuduPredicate.newComparisonPredicate(OccurrenceSchema.Date.name), KuduPredicate.ComparisonOp.EQUAL, 1367922840000L) builder.addPredicate(pkDate) val kuduScanner: KuduScanner = builder.build() while (kuduScanner.hasMoreRows) { val resultIterator: RowResultIterator = kuduScanner.nextRows() while (resultIterator.hasNext) { val result: RowResult = resultIterator.next() // do whatever you have to do with the selected record logger.info(" : SetId Value -- " + result.getInt(OccurrenceSchema.SetId.name)) } } I'm new to Kudu, therefore I'm not sure, whether this solution is the most efficient one. At least, it returns the expected result. My original code is written and tested in Java. I have ported it manually to Scala but I haven't tested it so far!
Related Links
Azure AppService deploy.cmd using the wrong file
Filtering a specific row in kudu using kudu scanner