Filtering techniques for data streams

Rozenbaum, Irina

doi:doi:10.7282/T3GX4BZZ

RUcore: Rutgers University Community Repository

Search
- All
- Text
- Images
- Audio
- Video
Advanced Search | Help

Search all content in all RUcore collections.
Services
Collections

Help Contact Us My Account

Home

Resource

Filtering techniques for data streams

PDF

PDF format is widely accepted and good for printing.

Plug-in required

PDF-1(671.03 kb)

Citation & Export

View Usage Statistics

Staff View

Citation & Export
Hide

Simple citation

Rozenbaum, Irina. Filtering techniques for data streams. Retrieved from https://doi.org/doi:10.7282/T3GX4BZZ

Export

Click here for information about Citation Management Tools at Rutgers.

Statistics
Hide

Description

Uniform TitleFiltering techniques for data streams

NameRozenbaum, Irina (author); Muthukrishnan, Shanmugavelayutham (chair); Marian, Amelie (internal member); Martin, Richard (internal member); Srivastava, Divesh (outside member); Rutgers University; Graduate School - New Brunswick

Date Created2007

Other Date2007 (degree)

SubjectComputer Science, Streaming technology (Telecommunications), Data transmission systems, Database management

Extentx, 157 pages

DescriptionWith the growth in popularity and complexity of streaming applications, there is a rising need for sophisticated analyses of massive high speed data generated by such applications. Such analyses often need to be performed in near real-time, using limited system resources. Under such conditions, it is very important to find an appropriate balance between the efficiency of processing and the accuracy of the produced results. A common technique is to filter the stream with suitable conditions so that the resulting data size is manageable, and the analyses are still accurate.
The work presented by this thesis focuses on a number of complex filtering techniques that are of interest in data steam processing in general and in network traffic monitoring in particular. These techniques allow the analyst to define a filtering condition that is more appropriate for the particular query at hand than the simpler random uniform sampling.
First, we propose a single operator which captures a common thread of evaluation of sampling queries and can be specialized to implement a wide variety of quite sophisticated stream sampling algorithms within an operational data stream management system and scale in performance to line speeds. Additionally, we propose a solution for flow sampling mechanism, which integrates the logic of flow aggregation as well as flow sampling into one procedure that works directly on IP traffic.
Next, we introduce the notion of the inverse distribution for massive data streams, and present algorithms that draw a uniform sample from the inverse distribution in the presence of inserts and deletes to the stream; such a sample can be used for a variety of summarization and filtering/mining tasks.
Another contribution of this thesis is the development of a filter join operator, which makes it feasible to evaluate a common type of join query that searches for records matching dynamic criteria on high speed data streams, in an efficient, stable and accurate manner. We also present analyses of query transformations which expose the filter join operator in conventional query join.
Finally, we study the problem of matching regular expression that can span multiple data records in a data stream in the presence of stream quality problems, such as duplicates and out-of-order records; we present a number of algorithms that can match regular expressions over multiple data stream records without stream reassembly, by maintaining partial state of the data in the stream.
The ideas presented in this thesis are motivated by actual practical problems that arise in data stream processing, and are further validated by the presented experimental studies.

NotePh.D.

NoteIncludes bibliographical references (p. 145-155).

Genretheses, ETD doctoral

Persistent URLhttps://doi.org/doi:10.7282/T3GX4BZZ

LanguageEnglish

CollectionGraduate School - New Brunswick Electronic Theses and Dissertations

Organization NameRutgers, The State University of New Jersey

RightsThe author owns the copyright to this work.

Version 8.5.5

Citation & ExportHide

Simple citation

Export

StatisticsHide

Description

Citation & Export
Hide

Statistics
Hide