Simple Bloom Filter Implementation Part 2
Introduction
In the last blog, we introduced the initial version of bloom filter. In the first implementation, we only tested our bloom filter on built in type String
. This time, we tested in against a custom class Person
. The idea is simple: we use thrift to define our custom data structure. Thrift will help us create a corresponding Person
class. Then we define our own Hashable<Person>
to be passed into our bloom filter.
Implementation details
Here is our new thrift file. Notice that we need to assign identifier to each data field, since they would be used for serialize de deserialize data instead of the actual variable name(space consideration).
If we run the command thrift -r -gen java bloomfilterservice.thrift
, the content in gen-java
would also contain a generated Person
class. We need to implement a Hashable<Person>
type to hash a Person
object into several integers. Here, the method I used is very simple, it still take a prime and the result would be a combination of all data fields.
The remaining is all every simple, we passed in a list of PersonHash
object into handler and replace String
with Person
at correct places. Then all is done.
Notes
- When I was trying to implement this part, I was planed to create a general bloom filter, that is utilize Java’s template feature. However, I failed. Since I don’t know how to define an API with template type in thrift.
- The package in Java is quite important and I have always ignore it before. During the implementation, I initially put
PersonHash
underbloomfilter
package. This class need to accessPerson
class generated by thrift, however,Person
class is under the root package andPersonHash
could not access it. I was confused by this problem for a long time.
Next step
- Try if thrift support function overload. Have function for
String
andPerson
at the same time to see if it still works. - Currently, I only tried it with toy data. I decided to move on to some real data to test the performance of bloom filter.
- Add some performance measure code.
Leave a Comment