Hunting the Pertinency of Bloom Filter in Computer Networking and Beyond : A Survey

Bloom filter is a probabilistic data structure to filter a membership of a set. Bloom filter returns “true” or “false” with an error tolerance depending on the presence of the element in the set. Bloom filter is used to boost up the performance of a system using small space overhead. It is extensively used since its inception. )e Bloom filter has met wide area of applications. Bloom filter is used in entire computing field irrespective of application and research domain. Bloom filter poses (i) high adaptability, (ii) low memory space overhead as compared to hashing algorithms, (iii) high scalability, and (iv) high performance. In this article, we uncover the application area of Bloom filter in computer networking and its related domain.


Introduction
Big Data is a disruptive technology in the field of dataintensive computing.Data are generated everywhere.
erefore, the volume of data is growing with an exponential pace, and it continues to grow with the same pace [1,2].
ere are varieties of data generated by various technologies, for instance, IoT.A 90% of these data are unstructured.e high volume of data requires huge memory spaces to process.erefore, it is prominent to engage the Bloom filter in Big Data.e Bloom filter is like a cog in a machine of a large-scale system.Most importantly, the Bloom filter is used to enhance the performance of a system with a small space overhead.Bloom filters are deployed in various fields to enhance lookup performance.For instance, BigTable uses Bloom filter to eradicate the unnecessary HDD accesses which boost up the lookup performance [3].Figures 1 and 2 depict the popularity of Bloom filter.
e Bloom filter is used to boost up the lookup performance of a system.It is applied in diverse areas.It improves the system performance dramatically.However, the applications of Bloom filter are limited to membership filter.e Bloom filter [4] is a variant of hash data structure to implement membership query.e Bloom filter is extensively used and experimented data structures.
e article is organized as follows: Section 2 illustrates Bloom filter.Section 6 discusses on metadata server design using Bloom filter.Sections 7, 4, 8, 3.3, 9.1, 3.2, 8.1, 9, and 5 expose application of Bloom filter in duplicate filtering, network security, database, peer-to-peer, wireless sensor network, plagiarism checking, biomedical, and Internet of ings, respectively.Finally, the paper draws a conclusion in Section 10.

Bloom Filter
e Bloom lter (BF) [4] introduces an error tolerance to increase lookup performance and space e ciency.e Bloom lter either returns true or false which is illustrated in Figure 3. us, the result of Bloom lter belongs to one of the following classes: true positive, false positive, true negative, and false negative.Most of the Bloom lter contains false positive.e false positive introduces overhead to a system.Similarly, a false negative also introduces an overhead for a system.
e Bloom lter uses an array to store the information of an element.Let S be a set which is de ned as S k 1 , k 2 , k 3 , . . ., k n .Let a random query element be k i where i 1, 2, 3, . ... e Bloom lter returns true if k i ∈ S, otherwise returns false.e false positive is de ned as follows: if the Bloom lter returns true when k i ∉ S holds.Similarly, false negative is also de ned as follows: the Bloom lter returns false when k i ∈ S holds.us, the Bloom lter belongs to the probabilistic data structure.
Figure 4 depicts the false positive probability.Bloom lter entirely depends on the number of hash function h. Figure 4 represents the relation among memory m 2 32 and number of inputs n and h.
Let n be the total element inserted into the Bloom lter, then, the probability of that bit still 0 is where m is the size of the Bloom lter and h is the total hash function used.Now, the probability of that particular bit to be 1 is e probability of all bits becomes 1, which is shown in the following equation: e optimal value of number of hash function requirement is Number of 1s depends on the number of hash functions h.However, h must be optimal to reduce false positive.
Grandi [52] calculated false positive probability (FPP) through c − transformations.Let X be the random variable representing the total number set bit in the Bloom lter array, then Let us condition the random variable X x to determine the false positive, then Total probability theorems give the exact false positive probability as follows: (7) where f(x) is the probability density function.Grandi [52] performed c − transformations to calculate the value of f(x), and the exact false positive probability is given in the following equation:

Applications of Bloom
Filter.e Bloom lter is deployed in diverse domain to increase performance and reduce memory consumption.
e prominent area of Bloom lter's applications is depicted in Figure 5 and also discussed accordingly.

Applications in Computer Network
3.1.Network Packet Filter.In the spatiotemporal and secure communication, transmission of duplicate packets is not acceptable.For instance, the duplicate multicasting is a serious issue which is addressed using the Bloom lter [37].Spatiotemporal approaches are those which convey information within time and with minimum space.It is achieved by maintaining the packet speed during transmission and reducing duplicate packets.
e presence of duplicate packets increases the congestion.And, it leads to the overhead in the link and also possibility of attacks in the communication links.
Varshney and Verma [38] proposed an arrangement in which Bloom lter is inserted in the packet header at the time of packet broadcasting.
is arrangement helps in elimination of the duplicate packet transmission.In the source node, a Bloom lter is added to the packet header.Initially, the Bloom lter is empty to hide the location of the destination from the attacker.e Bloom lter hashes the packet unique identi er (Pid) and stores it.en, the packet is transmitted to the neighboring node.When a router or a node receives the packet, it checks whether this packet is received earlier.If yes, then the hash value stored in the Bloom lter checks whether any packet is received with this Pid.If yes, then again, it is checked with unique public key (Pkey) as the Bloom lter, sometime, returns a false positive.

Journal of Computer Networks and Communications
If it is the case of false positive, then Pkey is stored in the routing table and the packet is transmitted to its neighbor.And, if it is a duplicate packet, then it is discarded.Moreover, if the Bloom lter returns true negative and the packet is received for the rst time, then the Pkey value is stored in the Bloom lter, and in the routing table.en, the packet is transmitted to its neighbor till it reaches its destination node.
Every second millions and millions of packets are generated and transmitted on the network.Hence, the Bloom lter is used for e cient packet ltering techniques.Moreover, it also improves the performance of the network.Fernandez-Del-Carpio et al. propose a hybrid technique [23] where two techniques are combined and shared multicast trees and stateless switching based on Bloom lters (BFs).In this technique, the header of the packet has two elds: one is multipoint label switched (MPLS) path that labels corresponding to the shared tree and the other is the Bloom lter.In addition, every node maintains two forwarding tables: the standard MPLS forwarding table and the link IDs table.In the MPLS forwarding table, each entry is used for shared tree forwarding.And, in the link IDs table, each entry is outgoing interface encoded with Bloom lter.So, when a packet reaches a node, it rst checks with the MPLS forwarding table, if successful, then the Bloom lter does the membership checking.is reduces the number of outgoing links to be evaluated and hence reduces tra c overhead.

Wireless Sensor Network.
Wireless network consists of mobile nodes whose location keeps changing.erefore, a collaborative distributed cache system is proposed to provide e cient services [36].It is used to store local useful information in a wireless distributed storage system.ese storage systems are located in a particular geographic area of interest.It comprises a set of wireless mobile nodes in an infrastructure-less system.And, it permits to use e ciently in numerous location-based applications.However, it is quite di cult to maintain the connection among the stored nodes.Sasaki et al. propose a distributed Bloom lter table (DBFT) [36], which is a two-layered structure overlay.It reduces overhead imposed due to a highly dynamic P2P distributed storage system.DBFT indexes the information packets with their stored nodes using two parameters: cluster-speci c Bloom lters and the information packetspeci c DBFT weights.Each cluster has its own Bloom lter.In addition, these are periodically exchanged among neighboring clusters.For information packet-speci c DBFT weights, a weight vector, size same as the Bloom lter, is calculated using the ID of the stored information packet and set of hash functions.ese hash functions are di erent from hash functions used in the Bloom lter.When an information packet is sent, calculate a sequence of integers from Bloom lter and DBFT weights.en, it discovers the cluster ID using packet's location.Using this information, the storage node of the packets is discovered.Zhang [53] proposes a simultaneous query technique.
is query technique in the wireless network allows access point (AP) to collect key control information from active nodes at low overhead.It uses a typical Bloom lter called analog Bloom lter (ABF).ABF is a Bloom lter that handles analog signals. is Bloom lter is very challenging as the analog signal has random noise and fading.ABF is used in this technique to check whether a node is active or idle.If the ABF returns 1, then the node is active, and if the ABF returns 0, then the node is idle.

Peer-to-Peer.
Peer-to-peer (P2P) is a le-sharing application.It has the potential to become a popular lesharing protocol.Hence, the performance of the P2P protocol needs to be increased.And, the Bloom lter has the ability to enhance its performance.Chen et al. [31] propose a hybrid P2P network.It uses a gossiping algorithm to gather global statistical information and a Bloom lter in overlay based on DHT global inverted indexes.It uses a Bloom lter for reducing communication cost for multikeyword searching.It is done by adjusting the Bloom lter parameters to optimal setting as per the statistical popularity of the keywords in the query.For a multikeyword searching, a distributed intersection operation is conducted in a wide area network.For this, AND operation is employed.In this operation, rst, the query is divided into words, e.g., (X, Y).Second, the Bloom lter is searched for documents containing the rst word X. ird, for the second word, intersection of output BF(X) and document containing the second word Y is found, i.e., Y ∩ BF(X). is process is repeated for all words in the query.Finally, the Bloom filter output is sent to DHT peer to remove false positive.It is done by finding X ∩ (Y ∩ BF(X)).en, this list is sent to the client.Moreover, sometime, the search applies both AND and OR operations for multikeyword searching.In OR operation, first, the query is divided into words (X, Y).Second, the Bloom filter is searched for documents containing the first word X; this list is sent to the client.ird, Y − BF(X) is also computed and sent to the client.ese data pieces are searched in the Bloom filter.If absent, that data piece is inserted into the QBF.If present, then EGRESS counts the number of hits in the flow.If the hit is more than the threshold for addition of a cache entry, then EGRESS knows flow is transmitting duplicate content.us, the data pieces are cached.Hence, it removes redundancy in P2P traffic.

Distributed Hashtable.
Ariyoshi and Fujita propose a distributed algorithm to process conjunctive queries in P2P DHTs [33].In this scheme, search results of past queries are cached and used for improving efficiency of the query processing.For this, it uses the Bloom filter.When a requester issues a conjunctive query, a list of peers having that file is obtained.en, each peer in that list conducts search operation in its Bloom filter.In the Bloom filter, searching for a file is performed based on the index of that file.e Bloom filter returns the list of indices matching the whole conjunctive query or matching some words in the conjunctive query.Lists of indices from all peers are collected, and intersection operation is done to obtain the final result.If a Bloom filter is absent corresponding to the conjunctive query, then a Bloom filter is obtained and multicast to all peers.And, if a peer has a Bloom filter corresponding to the conjunctive query, then it does union operation adding newly added indices.In addition, each peer maintains the Bloom filter using the LRU policy.

Network Security.
e Bloom filter is used to provide solution to many network security issues [25,54].e Bloom filter is used to address various security issues, and a few solutions have been discussed as follows: Zhu and Mutka [26] propose a message notification protocol.It reduces the power consumption and wireless wide area network access cost for instant messaging.Here, compressed Bloom filter is used to store and represent the message notification exchanged between the IM server and peer group.It provides privacy and security and also reduces overhead of the protocol.Maccari et al. [27] propose an application of the Bloom filter to create a distributed firewall.In this scheme, each node has a Bloom filter which stores the packet accepted by the node.Every node sends their Bloom filter to other nodes.When a node wants to send a packet, it checks with the Bloom filters received.If it matches, then the node sends the packet to the node to whom the Bloom filter belongs to.If the match not found, then the packet is dropped.

Big Data Security.
e crucial point of big data security analytics (BDSA) is the extraction of the value.is extraction becomes more crucial when it is related to threat monitoring and incident investigation to discover both known and unknown cyber attack patterns.In such cases, the Bloom filter is used in many applications related to Big Data security.Furthermore, the Bloom filter is used to profile the network stream and detect malicious patterns.Alsuhibany [28] proposes a Bloom filter called CouBF.e CouBF engages an unsupervised learning engine and also integrates an open digest-based hashing technique [29]. is technique solves the problems in indexing Big Data and BDSA.It counts the occurrences of spam email messages.In other words, a coincidental hit occurs whenever a single cell is used by two or more email messages and increases the counters rapidly.

Biometric.
e Bloom filter is a great data structure for biometric [55][56][57][58][59]. Nowadays, Bloom filter is used to achieve an efficient biometric system.e Bloom filter drastically enhances the performance of the biometric system.e fingerprint matching techniques extract the minutiae points, and these define uniqueness of the fingerprint.e minutiae points of a fingerprint are matched with the fingerprint databases.
e fingerprint matching is enhanced using Bloom filter.Moreover, there are numerous such biometric techniques to implement security purpose.For example, iris and face [30].Numerous works have been done on biometric enhancement; however, the biometric system drastically enhances the performance.
Drozdowski et al. [60] propose an iris indexing scheme using BF. e iris-code template is mapped to a BF. e 2D iris code is separated into a fixed equal sized blocks.In a BF, a single hash function is used for hashing.Using the BF, two iris templates are compared using Hamming distance.Further processing is done using binary search tree.
Rathgeb et al. [57] propose the application of the adaptive Bloom filters (ABFs) to represent the binary iris biometric feature vectors.Use of ABF enables biometric template protection and compression of the biometric data and also speeds up the biometric identification process.e Journal of Computer Networks and Communications iris codes are defined in a vector of size W × H. e codes are divided into K equal sized blocks.And, each column consists of w ≤ H bits. e entire sequence of the column is transformed to the appropriate location in ABF.So, K number of different ABFs is used.is transform makes the iris codes alignment-free.Furthermore, ABF is parameterized to desired template size.erefore, this compact alignment-free representation of the iris codes helps in efficient biometric identification.
Sadhya and Singh [55] propose a framework based on a modified Bloom filter.It provides all the desirable security measures required for a biometric template protection scheme.e key idea is perfect secrecy, and it means the encoded data should not leak any information about the original data.In other words, given the encoded data, then a priori distribution of the original data should remain same to its posteriori distribution.In this framework, first construct a key matrix K of size K × n where K is the number of keys, each having length n � 2 w (w � length of codeword).Each block corresponds to a different Bloom filter.So, there are K Bloom filters of length n. en, encode these Bloom filters.During identification operation, the encoded Bloom filters are compared with the Bloom filters obtained from features given during authentication.Stokkenes et al. [58] propose a multibiometric template protection based on Bloom filter and binarized statistical image features (BSIF).e features are extracted from face and both periocular regions.e feature vector constructed from these features is given as an input to Bloom filters.For comparison purpose, the Hamming distance between the constructed Bloom filters and the Bloom filters obtained from features given for comparison is calculated.ree different dissimilarity scores are calculated from three different modalities (face, and left and right periocular). is score is analyzed to obtain the final result.Abe et al. [59] propose an irreversible template creation technique using minutiae relation code (MRC) and Bloom filter.In this technique, the Bloom filter is used to define the irreversibility feature.After obtaining the masked MRC, it is given to the Bloom filter to check for its existence.

IoT Environment
In the IoT environment, lots of devices are connected to the Internet.
ey produce a huge volume of data everyday.Among these data, lots of data are not even valuable.Hence, storage and processing of these data consume many times and space.However, in this field, Bloom filter can be very helpful.Singh et al. [47] propose an accommodative Bloom filter (ABF) which helps in insertion of huge data produced by the IoT devices.ABF consists of b buckets of l bits each.Each bucket has a Bloom filter of size m � m/b.During insertion operation, hash the new element to p bits.Two hash functions are used, where it uses first q bits for hashing the bucket index and remaining r � p − q bits for hashing the Bloom filter.Whenever a Bloom filter exceeds its capacity, a new Bloom filter is added.During query operation, membership query checking is done at two levels: bucket level and Bloom filter level.is two level checking reduces the query time and enhances the accuracy.[61].It is an optimized RPL protocol using BF.In BloomRPL, the routers are represented using destination-oriented directed acyclic graph (DODAG).e root of the graph is the sink.

Gundogan et al. propose an optimized low-power routing protocol for IoT devices called BloomRPL
e sink is the central control point which gathers information in the IoT network.e BloomRPL uses DODAG Information Solicitation (DIS)/DODAG Information Object (DIO) [62] message handshakes for checking the link between parent and child in the DODAG.e parent stores all the information about its children.
is information becomes huge; hence, the BloomRPL uses BF to compress this information.is compression makes the BloomRPL more appropriate for a multicast dissemination.However, use of BF also leads to loss of information and introduces false positive during link checking.But, the merit of using BF exceeds the demerits in BloomRPL.

Hierarchical Bloom Filter Array (HBA). Hierarchical
Bloom filter array (HBA) [19,20] uses Bloom filter (BF) for metadata management.It uses the Bloom filter in two levels of hierarchy to reduce the memory overhead.At the first level, a small Bloom filter is used to know the destination metadata server (MDS) information.Similarly, in the second level, a pure Bloom array is used which stores the MDS information for all files.For faster lookup, all MDS has a replica of these two Bloom filters.

Group-Based Hierarchical Bloom Filter Array (G-HBA).
G-HBA [63] is an extension of HBA.It is a Bloom filter array that stores the MDS information.is Bloom filter array is used to route directly to the MDS with a high veracity.In this MDS scheme, all the MDS are divided into groups.In each group, every MDS only stores information in its local files and Bloom filter replicas of other groups.So, when information of each MDS in the group is combined, the whole group has the whole file image.

Multidimensional Bloom Filters (MBFS).
e MBFS [21] is a MapReduce-based parallel metadata search technique based on multidimensional Bloom filters (MDBFs).It creates a Bloom filter for each metadata attribute.en, these Bloom filters are combined to form MDBF. MDBF prunes the subdirectory tree partition which helps in narrowing the search namespace.is leads to fast and accurate search results.Also, in every MDS, MBFS runs concurrently to improve the metadata query efficiency.Additionally, MBFS is lightweight, and when other metadata service execution is done, the query execution is done in MDBF simultaneously.However, it does not provide the deletion operation of metadata.Over a period of time, the Bloom filter becomes exhausted which results in an erroneous output.[22]  (GBF) is used for uploading and downloading of data from the data server.GBF is a single-layered Bloom filter storing information about all MDS in the cluster.During upload, the file name is given as input to GBF, then the GBF hashes the file name, and the produced output gives the location of the MDS.Similarly, the LBF is used during metadata updates.And, the BMF is used for analyzing the metadata.

Deduplication
Data deduplication is a prominent research area in a datacenter.An enormous data set consists of duplicate data, and most of the data are unstructured.A huge RAM size is required to process the enormous data silo for deduplication.However, the Bloom filter reduces requirement of huge-sized RAM space.Still, the RAM size is unable to process large set of data.e Bloom filter does not fit in the main memory.erefore, a secondary storage space is used to achieve the goal.HDD is much slower than RAM.Hence, it is very much time-consuming when the Bloom filter is stored in the HDD.Moreover, the read and write is also very costly.In the Bloom filter, reading and writing requires a few bits.us, HDD is not a good option for Bloom filter.e alternative solution is NAND flash memory.erefore, SSD/ flash is used to store the Bloom filter instead of HDD.BloomFlash [64] implements Bloom filter RAM and flash.Initially, the Bloom filter is implemented in RAM and then the Bloom filter is expanded to flash memory.BloomFlash removes deduplication of key/value.Random reads are instantly allowed to access the flash memory.However, the random writes are buffered to form large chunks and then updated in the flash memory.However, random writes are costlier than random reads.Moreover, reducing random writes increase the performance of BloomFlash dramatically.BloomFlash uses hierarchical structure to maintain the list of filters.Similarly, Lu et al. [65] proposed forest-based Bloom filter (FBF) which works on deduplication based on RAM and flash memory.FBF is further extended in BloomStore to index key/value using SHA [66].

Database
e structured databases grow its size over a period of time.e table size becomes a monster.A table size crosses petabytes where the conventional system stops working.erefore, the Google Inc. developed the BigTable [3] to address this issue.e BigTable is a very large-scale database.It uses a Bloom filter to reduce the disk accesses [3].Chang et al. [3] claims that using the Bloom filter drastically reduces the number of disk accesses.
e BigTable examines the presence of data before accessing the HDD.e BigTable accesses disk if and only if the Bloom filter returns true.Otherwise, the BigTable assumes that the data are not present in the HDD.e HDD takes more times than RAM.us, unnecessary time is consumed by disk access where data are not present in the HDD.However, if data are present in the HDD, then an overhead is added by looking up the Bloom filter.But, in a practical scenario, the access overhead of Bloom filter is negligible.erefore, the BigTable reduces disk access time by deploying Bloom filter, and thus reducing the disk access to enhance the performance.
8.1.Plagiarism Checking.Plagiarism has become a huge problem.With the introduction of new methods to share knowledge, the problem of stealing the written knowledge has also increased.And, comparing with such a large volume of existing document is nearly impossible and also timeconsuming.Hence, the Bloom filter has the ability to solve this problem.e matrix Bloom filter (MBF) [39] is used for similar document detection.In this, each row is a standard Bloom filter.e Bloom filter stores all N documents.During the insertion phase, separate the new document into its substring called chunk.Before that, preprocessing of the document is done such as removing stop words and stemming words.After generation of chunks, these chunks are hashed into the corresponding row in the MBF.en, each chunk is given to k hash functions to generate k bits.ese k bits are set to 1 in the MBF.During query operation, the new document is separated into chunks and the same procedure is followed till storage of k bit 1s.en, compare this row of MBF with each existing document rows.Comparison is done by AND operation.is operation produces a vector of bits.If more 1s are there in the vector, then a new document is copied from another existing document.

Biomedical Data
e Bloom filter plays a vital role in biomedical data engineering too.For example, the Genome database of millions of people is very large and unimaginable.A single Genome size is almost 3.2 GB.A millions of such Genome database are unimaginable, and searching a single piece of information takes huge time.Fortunately, the BF gives the results faster than any other existing systems.erefore, the performance of Genome processing is drastically improved by deploying Bloom filter [40,41].Moreover, there are numerous biomedical systems which engage BF.For instance, MRI database.Jackman et al. [40] present an article on ABySS 2.0.It uses the BF for resource efficient assembly of large genomes.Use of BF reduces the overall memory requirements and enables to assemble large genomes on a single machine.It is used to represent the de Bruijn graph (C-DBG).In the BF, the hash functions map each k-mer to a set of slots. is set of slots is referred as bit signature.For traversal operation, to find the path, the BF is repeatedly queried to find the successor.Moreover, BF helps in avoiding duplicate sequences of K-mers.A Bloom filter trie (BFT) [67]  e CQF is similar to counting BF with many more features.It stores multiset elements.In insert operation, Squeakr reads and parses the input files and then inserts k-mers to a CQF.In query operation, the CQF returns the number of instances of an element currently in the multiset.
Mustafa et al. [69] propose two approaches for compression of a colored graph.One is lossless and the another is lossy compression using BF.
e colored de Bruijn graph (cDBG) is constructed using the input sequences and the annotation associated with the k-mers generated from the input sequences.
e annotation is represented using a binary matrix.e BF is used to compress the binary matrix.Decouchant et al. [70] proposed a filtering approach using BF to classify the raw genomic data into privacysensitive and non-privacy-sensitive information.When an attack is made in a privacy-sensitive region of the genome, more information about the individual is extracted.An approach called long read filtering uses BF to create a dictionary of every (k, i)-sensitive sequence in a genome.e (k, i)-sensitive sequence means that in a k nucleotides sequence, the ith nucleotide is sensitive.e long read filter creates several BFs. e sequence is inserted to BF, and the BF checks the sequence in a sliding window of size k with a previously defined dictionary.If found, one sensitive nucleotide is detected.Similarly, several BFs are used in parallel to increase the throughput.9.1.Error Correction.In cloud and virtualized environment, data are increasing exponentially.ese data are also replicated for load balancing and security purposes.However, it leads to the requirement for fast error correction and data reconciliation, even for very small errors.Motivated by cloud reconciliation problems, a Biff code is proposed by Mitzenmacher and Varghese [34].Biff codes use invertible Bloom lookup tables (IBLT) [34].In the IBLT, a key value is inserted into each slot.Moreover, each slot contains three fields: keysum, valuesum, and count.Keysum field contains the XOR of all the keys that is mapped to that slot.Valuesum field contains the XOR of all the values of the keys mapped to that slot.And, count field contains the number of keys mapped to that slot.Biff is used to reconcile differences in data between users.For that, a user constructs its own IBLT and shares that IBLT with the other user with whom it wants to reconcile.In addition, the hash functions are also shared.After receiving the IBLT, the later user deletes the key value from its IBLT that is absent in former IBLT.Moreover, Biff is also used for error correction.Suppose a user sent a message to another user.With the message, it also sent reconciliation information, i.e., IBLT.After receiving the message, the receiver uses the reconciliation information to find the erroneous position of the received data.Furthermore, these operations are speeded up by paralleling them.

Conclusion
e Bloom filter is extensively used data structures in a largescale computing.However, the false positive is an overhead.e false positive rate can be decreased by increasing the size of the Bloom filter.But, the probability of the false positive can never be zero.us, there are numerous variants of the Bloom filter to eradicate the problem of false positive to apply specific application.Moreover, there are still rooms to modify the Bloom filter, albeit, the Bloom filter is rigorously experimented.e Bloom filter is adapted in various fields for membership query.For instance, search engine.Besides, there are still numerous fields to adapt the Bloom filter.e Bloom filter is a tiny data structure, yet very powerful.is data structure can change the performance of a system.us, studying Bloom filter is worthy irrespective of application area.

Figure 1 :Figure 2 :
Figure 1: Popularity measurement based on publication of Bloom lter.

Figure 4 :
Figure 4: eoretical probability of false positive.X-axis represents the number of input items, and Y-axis represents probability of false positive in m 2 32 memory size.

Figure 3 :
Figure 3: Conventional Bloom lter with k 3 that illustrates the true positive, false positive, and true negative.

Figure 5 :
Figure 5: Application of Bloom lter in prominent area.
3.3.1.P2P Traffic Management.Sasaki and Nakao propose a new Bloom filter called queue Bloom filter (QBF) [32], to introduce a new flow classification in P2P.QBF usage reduces the memory consumption of P2P cache.QBF is a new Bloom filter which is a time-series queue.In this Bloom filter, after every fixed time interval, a new Bloom filter is enqueued in QBF.And, when the number of Bloom filter becomes more than the size of the queue, the oldest Bloom filter is dequeued from the QBF.During insertion operation, the new elements are inserted into the new Bloom filter.And, during query operation, all Bloom filters are searched.eflow classification process checks for the flows transmitting duplicate content. is flow classification runs on EGRESS.EGRESS is a filter that monitors and restricts the flow of information from one network to another.Here, it uses QBF for flow monitoring.When EGRESS receives P2P packets, it constructs data pieces.
models an efficient MDS to retrieve data from cloud data servers.It uses 6 Journal of Computer Networks and Communications a Bloom filter called cloud Bloom filter (CBF) array.CBF consists of global Bloom filter (GBF), local Bloom filter (LBF), and Bloomier matrix filter (BMF).Global Bloom filter [68] structure is also proposed to store the par genome.It uses colored graph for efficient storage and traversal of genome data.It compresses the colored graph and stores in single data structure.It helps in efficient graph traversal.Recently, Pandey et al.[68]proposed a weighted de Bruijn graph to store genome information.It uses Squeakr which is a k-mer counter.
e Squeakr is built using Journal of Computer Networks and Communications counting quotient filter (CQF).