A simple and space efficient segment tree implementation

The segment tree is an extremely versatile data structure. In this paper, a new array based implementation of segment trees is proposed. In such an implementation of segment tree, the structural information associated with the tree nodes can be removed completely. Some primary computational geometry problems such as stabbing counting queries, measure of union of intervals, and maximum clique size of Intervals are used to demonstrate the efficiency of the new array based segment tree implementation. Each interval in a set S = {I1, I2, ⋯ , In} of n intervals can be insert into or delete from the heap based segment tree in O(log n) time. All the primary computational geometry problems can be solved efficiently.


Introduction
The segment tree structure, originally discovered by Bentley [1,9,11], is used as a one-dimensional data structure for intervals whose endpoints are fixed or known a priori.The segment tree is very important in solving some primary computational geometry problem because the sets of intervals stored with the nodes can be structured in any manner convenient for the problem at hand.Therefore, there are many extensions of segment trees that deal with 2-and higher-dimensional objects [2,3,12,13] .The segment tree can also easily be adapted to stabbing counting queries: report the number of intervals containing the query point.Instead of a list of the intervals is stored in the nodes, an integer representing the number of the intervals is stored.A query with a point is answered by adding the integers on one search path.Such a segment tree for stabbing counting queries uses only linear storage and queries require O(log n) time, so it is optimal.The segment tree structure, can also be useful in finding the measure of a set of intervals.That is, the length of the union of a set of intervals.It can also be used to find the maximum clique of a set of intervals [5,7,8,10].Segment trees are generally known as semi-dynamic data structures.The new intervals may only be inserted if their endpoints are chosen from a restricted universe.By using a dynamization technique, van Kreveld and Overmars proposed a concatenable version of the segment tree [4,6].this can be used to answer the one-dimensional stabbing queries.In addition to the stabbing queries and standard updates (insertion and deletion of segments), the data structure can support split and concatenate operations.
We will discuss the implementation issues on segment tree in this paper.A very simple and space efficient segment tree implementation is presented.
The organization of the paper is as follows.
In the following 4 sections, we describe our presented segment tree implementation.In Section 2 the preliminary knowledge for presenting our implementation is discussed.In Section 3 a heap based segment tree implementation is proposed.In such an implementation of segment tree, the structural information associated with the tree nodes can be removed completely.In Section 4, we discuss a simpler non-recursive implementation of a heap based segment tree.Some concluding remarks are provided in Section 5.

The set S
represented by a data array, D(S), whose entries correspond to the end points, l i or r i , and are sorted in non-decreasing order.This sorted array is denoted In the following discussion, the indexes in the range [0, N ] are used to refer to the entries in the sorted array x[0..N ].A comparison involving a point q ∈ R and an index i, 0 ≤ i ≤ N , is performed in the original domain in R. For instance, q < i is interpreted as q < x[i].Consider the partitioning of the real line induced by x[0..N ].The regions of this partitioning are called elementary intervals.Thus, the elementary intervals are, from left to right: That is, the list of elementary intervals consists of half open intervals between two consecutive endpoints x[i] and x[i + 1].The segment tree for the set x[0..N ] is a rooted augmented binary search tree, in which each node v is usually associated with some information as shown by (1).
Where, v.b and v.e are used to represent [v.b, v.e], a interval of indexes from v.b to v.e.The key v.key splits the interval [v.b, v.e] into two subintervals, each of which is associated with each child of v.The two tree pointers v.lef t and v.right point to the left and right subtrees, respectively.v.aux is an auxiliary pointer, to an augmented data structure.
Given integers s and t, with 0 ≤ s < t ≤ N , the corresponding segment tree T (s, t) can be built recursively as follows.
Algorithm 2.1: build(s, t) In the algorithm, a new node v is created first.The parameters v.b and v.e associated with node v are then set to s and t, which define a interval [v.b, v.e], called a standard interval associated with node v.The standard interval associated with a leaf node is also called an elementary interval.
Definiton 1 Let b and e be two integers and 0 ≤ b < e ≤ N .A node v in the segment tree T (0, N ) is said to be in the canonical covering of the interval [b, e] if its associated standard interval satisfies the property [v.b, v.e] ⊆ [b, e], while that of its parent node does not.
It is obvious that if a node v is in the canonical covering, then its sibling node u, the node with the same parent node as v, is not, for otherwise the common parent node would have been in the canonical covering.Thus, at each level of the segment tree, there are at most two nodes belong to the canonical covering of a interval [b, e].Thus, for each interval [b, e], the number of nodes in its canonical covering is Algorithm 2.2: insert(b, e, v) The insertion of interval [b, e] into segment tree T (0, N ) corresponds to a tour in T (0, N ), having a general structure.A (possibly empty) initial path, called PIN, from the root to a node v * , called the fork, from which two (possibly empty) paths P l and P r issue.Either the interval being inserted is allocated entirely to the fork (in which case P l and P r are both empty), or all right-children of nodes of P l , which are not on P l , as well as all left-children of nodes of P r , which are not on P r , identify the fragmentation of [b, e].See Fig. 1 for an illustration.In Fig. 1, each node has a node number.The node number is assigned to each node as follows.The root node is numbered 1.If a node is numbered i, then its left and right child are numbered 2i and 2i + 1 respectively.In the insertion of interval [2,5] into segment tree T (0, 13), the initial path from the root to the node 2 is PIN.The node 2 is fork.The path P l goes from fork node 2 to node 9, and the path P r goes from fork node 2 to node 11.The node 19 is allocated to the interval as a right child of node 9 on the path P l , and the nodes 10 and 22 are allocated to the interval as a left child of node 5 and 11 respectively on the path P r .
To assign [b, e] to a node v could take different forms, depending upon the requirements of the application.Frequently all we need to know is the cardinality of the set of intervals allocated to any given node v.This can be managed by a single nonegative integer parameter v.cnt, denoting this cardinality, so that the allocation of [b, e] to v becomes v.cnt ← v.cnt+1.In other applications, we need to preserve the identity of the intervals allocated to a node v. Then interval I = [b, e] is inserted into the auxiliary structure associated with node v to indicate that the standard interval of v is in the canonical covering of I.If the auxiliary structure v.aux associated with node v is an array, the operation assign [b, e] to v can be implemented as v.aux[i + +] = I.
The insertion algorithm described above can be used to represent a set S of n intervals in a segment tree by performing the insertion operation n times, one for each interval.As each interval I can have at most O(log n) nodes in its canonical covering, and hence we perform at most O(log n) assign operations for each insertion, the total amount of space required in the auxiliary data structures reflecting all the nodes in the canonical covering is O(n log n).
Deletion of an interval [b, e] can be done similarly.The assign operation will be replaced by its corresponding inverse operation remove that removes the interval from the auxiliary structure associated with some canonical covering node.Note that only deletions of previously inserted intervals guarantee correctness.

A Heap Based Implementation
It is straight forward to see that the segment tree built in the algorithm insert(0, N, v) described above is balanced, and has a height log N .If a heap is used to store the segment tree nodes, then the structural information associated with the tree nodes can be removed completely.The heap mentioned above will be defined shortly.It is somewhat different from its definition in the heap sort algorithm where a heap order is defined.
Definiton 2 A nearly complete binary tree or a heap can be defined as follows.
• The depth of a node v in a binary tree is the length (number of edges) of the path from the root to v.
• The height (or depth) of a binary tree is the maximum depth of any node, or -1 if the tree is empty.Any binary tree can have at most 2 d nodes at depth d.
• A complete binary tree of height h is a binary tree which contains exactly 2 d nodes at depth d, 0 ≤ d ≤ h.In this tree, every node at depth less than h has two children.The nodes at depth h are the leaves.The relationship between n (the number of nodes) and h (the height) is given by n − 1, and thus h = log(n + 1) − 1.
• A nearly complete binary tree of height h is a binary tree of height h in which (1) There are 2 d nodes at depth d for 1 ≤ d ≤ h − 1.
(2) The nodes at depth h are as far left as possible.
(3) The relationship between the height and number of nodes in a nearly complete binary tree is given by 2 h ≤ n ≤ 2 h+1 − 1, or h = log n .
• A heap is a nearly complete binary tree T stored in its breadth-first order as an implicit data structure in an array A, where (1) A [1] is the root of T .
(2) The left and right child of (3) The parent of and A[i + 1] is its right sibling.
Definiton 3 A heap based segment tree T (0, N ) is defined as an array tree[1..2N − 1] of tree node elements satisfying the following: • The information associated with a tree node v is : v.cnt the number of intervals allocated to node v, v.aux augmented data structure. (2) • The index i of node tree[i] is called its node number, 1 ≤ i ≤ 2N − 1.
• The N leaf nodes corresponding to the N elementary intervals are stored in tree[N..2N −1] in increasing order of their left end point.In other words, the node tree[N +i] corresponds to the elementary interval • The parent node of node tree[i] is tree[ i/2 ] for all 1 < i ≤ 2N − 1.The node tree [1] is the root of the heap based segment tree.For each non-leaf node i, 1 ≤ i < N , its left and right children are 2i, and 2i + 1 respectively.
For example, Fig. 2 shows the heap based segment tree T (0, 13).It can be seen from Fig. 2 and the definition of a heap based segment tree that there are three kinds of nodes in the tree, complete binary tree nodes, nearly complete binary tree nodes and leaf nodes.
A complete binary tree node v, called a C node, is such a node that the subtree rooted at the node v is a complete binary tree.A nearly complete binary tree node v (yellow nodes in Fig. 2), called a Y node, is such a node that the subtree rooted at the node v is a nearly complete binary tree node but not a complete binary tree.A leaf node v (green nodes in Fig. 2), corresponds to a leaf of the tree.The elementary interval The N leaf nodes are also C nodes.There are a total of 2N − 1 nodes in T (0, N ), where N leaf nodes and N − 1 non-leaf nodes.Furthermore, these 3 kinds of nodes satisfy with the following properties.
Theorem 1 Let T (0, N ) be a heap based segment tree, and its nodes are stored in array tree[1..2N − 1] by definition 2, then (1) If node x is a C node, then the high of the subtree rooted at x is h(x), and the leftmost and rightmost nodes of the subtree rooted at x are l(x) and r(x) respectively , and thus the standard interval associated with node where (2) Let t(N ) be the number of trailing zeros of N in its binary expression, then the lowest Y node of T (0, N ) is the node y(N ) = N/2 1+t(N ) .All of the Y nodes of T (0, N ) are on the path from the root node 1 to node y(N ).(2) In the case of N = 2 k , the segment tree T (0, N ) is a complete binary tree of height k, and t(N ) = k.It follows that y(N ) = 0, and thus there is no Y node in T (0, N ).The claim is true for this trivial case.
In the general cases of N < 2 h(N ) , the segment tree T (0, N ) is a nearly complete binary tree of height h(N ), as shown in Fig. 3.The nodes on the left spine of the tree are numbered 1, 2, 2 2 , • • • , 2 h(N ) , and the nodes on the right spine of the tree are numbered 1, The leaves are distributed at depths h(N ) and h(N ) − 1.It is clear that a node x is a Y node if and only if it has a leaf u at depth h(N ) and a leaf v at depth h(N ) − 1.The parent of x contains these two leaves either, and thus it is also a Y node.It follows by induction that the nodes on the path from root 1 to x are all Y nodes.Since the node 2N − 1 is the rightmost leaf node at depth h(N ), and the node N is the leftmost leaf node at depth h(N ) − 1, the Y node x contains leaf nodes N and 2N − 1.Let node y(N ) be the lowest common ancestor of nodes N and 2N − 1.It follows that the Y nodes of the segment tree T (0, N ) are all on the path from root 1 to y(N ).It follows from node N − 1 is the parent node of 2N − 1 that y(N ) is also the lowest common ancestor of nodes N and N − 1.It is readily seen that if node u is the parent node of v, then u is exactly v shift right 1 bit in its binary expression.It follows that y(N ) is exactly the longest common prefix of N and N − 1 in their binary expression.Let t(N ) be the number of trailing zeros of N and n = log N , then the number of trailing ones of N − 1 is also t(N ), and the numbers N and N − 1 in their binary expression must be It follows that (y(N ) ).In other words, The proof is complete.It follows from Theorem 1 that if node x is a Y node, then its rightmost node is the node In this case, the standard interval associated with node x is no longer a single interval, but usually two separated intervals [0, r (x) − N + 1] and [l(x) − N, N ].For example, int the case of N = 13 (see Fig. 2), the three C nodes of T (0, N ) are 1,3 and 6.It is readily seen that [l(1) = 16, l(3) = l(6) = 24, and [r (1) = r (3) = 15, r (6) = 13.The interval associated with nodes 1,3 and 6 are {[0, 3], [3, 13]}, {[0, 3], [11,13]} and {[0, 1], [11,13]} respectively.
It follows from Theorem 1 that y(N ) is a key number for all C nodes.If x is a C node, then it is a prefix of y(N ) in binary expression.It follows that x is a C node if and only if It follows from Theorem 1 that Where, & is a bitwise and operation of two numbers.
Based on Theorem 1, the structural information of any node x in a heap based segment tree T (0, N ) can now be computed in O(1) time as follows.
Output : the left end of standard interval.h ← log(N/x) .return (x2 h − N ).
Output : the right end of standard interval.h ← log(N/x) .
The following three primary computational geometry problems are used to demonstrate the efficiency of the new heap based segment tree implementation.
• Stabbing Counting Queries: Given a set S = {I 1 , I 2 , • • • , I n } of n intervals, each of which is represented by , and a query point q, count all those intervals containing q, that is, find a subset F ⊆ S such that F = {I i |l i ≤ q ≤ r i }.The problem is to find |F |.
• Measure of Union of Intervals: The problem is to find the measure of U .
• Maximum Clique Size of Intervals: Given a set S = {I 1 , I  The three problems are supposed to be solved simultaneously by using a segment tree to store a set S = {I 1 , I 2 , • • • , I n } of n intervals.In a heap based segment tree T (0, N ), all the structural information are no longer maintained, but only application related information will be associated with the tree nodes.For the three problems to be solved, the tree nodes are associated with following information.   v.cnt the number of intervals assigned to the node, v.uni the measure of the union of intervals assigned to the node, v.clq the maximum clique size of intervals assigned to the node.(7) Since there is no structural information to be maintained, the heap based segment tree is not built explicitly by a procedure like Algorithm 2.1.The important thing to do is to insert interval set S into the segment tree T (0, N ) , that is, by performing a call to the following algorithm for each interval of S. Algorithm 3.5: insert(b, e, v)   e] is also a right-child of a node on the path P l , then is must be located in the path from the root to node l + 1 ( see Fig. 4) .The search can now be moved up to the parent node of l + 1 .The movement of the node r is totaly symmetric to the movement of the node r.The search stops when l > r.
In above algorithm, modify(v, k) is invoked to assign the interval [b, e] to the canonical covering node v.The parameter k is 1 in algorithm insert(), and -1 in algorithm delete().Input : an integer k, either +1 or -1, and a node v of T (0, N ).change(v, k).update(v).
To complete the insertion, the information associated with the nodes on the paths P l and P r must be updated also.The tasks are finished at the end of algorithm by performing two calls to the following

Figure 4 :
Figure 4: search for canonical covering nodes

Algorithm 4 . 2 :
modify(v, k) 2 , • • • , I n } of n intervals, a clique is a subset a subset C I ⊆ S such that the common intersection of intervals in C I is non-empty, and a maximum clique is a clique of maximum size.That is, Ii∈C I ⊆S = ∅ and |C I | is maximized.The problem is to find the maximum size |C I |.