ALGORITHM FOR GENERATING TRAIN CALENDAR TEXTS

The article describes a possibility of generating train calendar text for the needs of compiling the annual timetable in the conditions of the Czech Republic. Based on the analysis of the types of texts of calendars that appear in various print outputs, a heuristic algorithm was designed to generate a text from a set of calendar days. The algorithm is a part of an application that also provides a tool to define the text of the calendar by using a mask of sub-periods and calendars to be displayed in them. The algorithm was tested on real data of the timetable. In most cases, the algorithm shows the same or better results than the previously used tools. In several cases, however, a better result can be obtained by the user. The described algorithm to generate the text of the calendar is a part of a program that is used for compiling the timetable for trains in the Czech Republic.


INTRODUCTION
One of the basic data of a train is a calendar containing the days it operates. Within the compilation of the annual timetable of trains the calendar includes the days over a period of one year approximately. The passengers and rail workers learn about restrictions on the train running through a text formulation that usually does not contain a simple list of days, but variations of shorter and more meaningful texts.
The role of a modern information system is not only to record a set of train operation days, but also to provide a corresponding textual representation the user would not have to modify for output sets.
The article describes the possibilities of generating train calendar text for the needs of designing the annual timetable in the conditions of the Czech Republic.

TYPES OF CALENDAR TEXTS
The annual train diagram in the area of the Czech Republic is compiled using the KANGO information system [1]. The train calendar is a part of the train data entered in the KANGO-Vlak module [2] by the railway undertaking.
The annual timetable is in most cases valid from a determined Sunday in December of one year to a determined Saturday in December of the next year. It usually concerns the second week in December.
The following examples of calendar texts refer to the 2008/2009 timetable period that was valid from December 14, 2008 to December 12, 2009.
The calendar texts appear in various outputs of the KANGO system. The main ones among them are the following: -Book timetable -the timetable in the form of a book for passengers. It contains passenger trains only. It is created in the KANGO-GVD program and before printing it requires adjustment by the user. The current version in PDF format is available on the Web [3]. -Marshalling plan for freight trains. It is printed from the KANGO-Vlak program without any further adjustment. -Overview of restrictions on running of trains -an annex to the order to implement the train transport diagram. It includes calendar texts of both passenger and freight transport trains. It is intended for official use only. It is printed from the KANGO-Vlak program usually without any further adjustment. In the calendar texts there are in addition to individual days of the year also symbols of predefined calendars. They are listed in Table 1. Each day of the year in the calendar text is represented by an Arabic number of the day of the month and a Roman number of the month separated by a period, such as 23.IV.
As public holidays are different in each country, for international trains the symbols x and + are not used and the symbols of days in a week are used instead.
Calendar texts can be divided into the following types:
The main problem related to text generation is the determining of sub-periods. The ideal would be to generate a text for all possible sub-periods and select the shortest possible text. However, the number of variants is very large and it is not possible in real time to explore all the options. Therefore, a heuristic algorithm has been designed.
In literature the algorithm for calendar text generation used on the Slovak Republic railways is briefly described, but it does not solve the cases of consecutive sub-periods [4].
For days (calendar dates) d1 and d2 the following relations and operations are defined: d d 1 2 1 -Day d1 is older than day d2. Similarly, other relational operators are defined. d d 2 1 --The number of days between day d2 and d1. If d d 2 1 = , the difference is zero. d n 1 + -Day d1 shifted by n days forward (n 0 2 ) or backward (n 0 1 ) where n is from the set of integers. The following notation is introduced for the symbols of the predefined calendars: , , , b b b 1 2 7 f -calendars for symbols 1 to 7; b x -calendar for symbol x; b + -calendar for symbol +.
The following text uses the following symbols and functions: A -number of elements of set A; D -set of all days within TVP; C -a set of calendar days, the text of which is to be generated; B -a set of all predefined calendars , , , , , , ; K. Greiner: Algorithm for Generating Train Calendar Texts G x h -a set of days of the predefined calendar x B ! that belong to the set D. For example, G b1 h includes all Mondays in set D. , x y C^h -The set of all days in the period x to y: f d h -Textual representation of day d D ! , for example 20.XII. Sub-periods occurring in the text are represented by a set of sub-calendars. A sub-calendar is an arranged tetrad , , , x y B B + -6 @ where: x -the starting day of a sub-calendar period; y -the final day of a sub-calendar period; B + -a set of predefined calendars when the train operates; B --a set of predefined calendars when there is no service. For the sub-calendar , , , The set of sub-calendars is arranged -the elements are sorted in ascending order by x.
The set of days for sub-calendar , , , The text representation of a calendar that contains a set of days C and a set of sub-calendars X is provided by function g X h.
The algorithm uses the constants listed in Table 2. The values of the constants were determined by analyzing the texts of the calendars present in the output sets and based on consultations with the users.

The main part of the algorithm
The procedure to generate a calendar text from set C is as follows: 1. If C Q = , the result is the text "operates on demand".

If C D
= , the result is the text "operates daily".
@, the result is the text containing a symbol of the predefined calendar. For example, "operates x". 4. If the calendar contains all the days from the beginning of TVP until a date within TVP and, at the same time, the number of the calendar days is greater than 2, while the difference of the number of TVP days and the number of the calendar days is greater than 2, i.e. , 2 min max min (4) For example, "operates until 30.III.". If the number of the calendar days is less than or equal to 2, the result will be a list of calendar days, for example, "operates 14., 15.XII.". If the difference of the number of TVP days and the number of the calendar days is less than or equal to 2, the result will be a list of days that are not included in the calendar, for example, "no service 29., 30.XI.". 5. If the calendar includes all days from the date inside TVP until the end of TVP while the same conditions are met as in Step 4, i.e. , 2 min max max the result is the text " min f C "operates from^h (6) For example, "operates from 1.IV.". 6. We create an initial set of sub-calendars S (see Chapter 3.2) taking care that two requirements are met: that every day from set C belongs to a period of a sub-calendar of set S, i.e.
and at the same time, that the sub-calendars in set  28 The minimum number of days of the sub-calendar period representing daily operation that will not merge with adjacent sub-calendars.
The maximum difference between two periods of sub-calendars, between which there is no empty week. If the first sub-calendar ends with Monday of the week i and the second sub calendar begins with Sunday of the week i 1 + h, the difference between these days is the value 13.
c4 28 The minimum number of days of a period between adjacent sub-calendars when there is no train service, i.e. the days of this period does not belong to set C. The constant is used in some stage of merging of the group of sub-calendars.
The minimum percentage of occurrence of predefined calendar days in the subcalendar, the symbol of which may appear in the text of the calendar. 7. We merge the group of sub-calendars from set Ssee Chapter 3.5. 8. The resulting calendar text is obtained through function g Ŝ h -see Chapter 3.7.

Creating an initial set of sub-calendars
The initial set of sub-calendars S is created as follows: 1. We set S Q = . 2. Into the set S we add each sub-calendar , , , , , , , that represents the maximal subset of consecutive days of set C and has at least c2 days, i.e. , @ (9) These sub-calendars represent the daily operation period. 3. In set C we go through periods that do not belong in the period of sub-calendars contained in set S. For each such period we go through individual weeks (even partial, with which such a period begins or ends) and we find consecutive weeks in which set C contains the same days of the week (Monday to Sunday). From these consecutive weeks we create sub-calendars , , , @ where x is the first day of the first week of consecutive weeks containing the same days of the week and y is the last day of the last week of consecutive weeks containing the same days of the week. For the sub-calendar s it is also true that , , , We add these sub-calendars to set S. After completion of this step it is true that 4. For each sub-calendar , , , , we perform these operations in set B + : -If set B + contains calendars b1 to b5 and in the sub-calendar period there is the set of days of calendars b1 to b5 equal to the set of days of we replace calendars b1 to b5 with calendar b x .
-If set B + contains calendar b7 and in the subcalendar period there is the set of days of calendar b7 equal to the set of days of calendar b + , i.e. , we replace calendar b7 with calendar b + .

We merge each group of sub-calendars
@, for which it will be true that First, we merge adjacent couples of the sub-calendars k 1 = h and only then the adjacent triads k 2 = h. Set B + of the merged sub-calendar s * contains at least one of the pre-defined calendars b x or b + alternatively supplemented with predefined calendars of the days of the week. There is no point in merging more than 3 adjacent sub-calendars, because there are more than 3 consecutive weeks that include public holidays (for example, Week 1: Sunday 24.XII., Week 2: Monday 25.XII. and Tuesday 26.XII., Week 3: Monday 1.I.). From the merging we exclude: -Sub-calendars si whose set , , , , . -Couples of sub-calendars si and si 1 + that include at least one blank week, i.e. y x c For example, for the calendar shown in Figure 1, set S will prior to this step contain three sub-calendars:  After this step it will contain only one sub-calendar: In the calendar figures, the days of set C are highlighted in bold with a grey background. The holidays are displayed in a frame. 6. For each sub-calendar from set S we perform the algorithm to extend its period -see Chapter 3.3. 7. For each sub-calendar from set S we perform the algorithm to adjust its period -see Chapter 3.4. If the period of sub-calendar s S i ! has transformed into the beginning of TVP, we exclude from the set S all sub-calendars sj for j i 1 . Similarly, if the period of sub-calendar si has transformed into the end of TVP, we exclude from set S all sub-calendars sj for j i 2 .
K. Greiner: Algorithm for Generating Train Calendar Texts

Extending the period of a sub-calendar
The algorithm is used to extend the period of the sub-calendar from set S, having at least c1 days. We perform the extending at the expense of the previous or subsequent sub-calendar, whose period is at most one week long. The extension is performed only on days that belong to set C and also belong to a set of the days of predefined calendars of the extended subcalendar.
The procedure to extend the period of the sub-calendar , , , , is as follows: , we do not perform the extension and the algorithm ends. 2. If the S includes the sub-calendar si 1 -, we proceed with Step 3, otherwise Step 7.
, we proceed with Step 7. 4. We are looking for the minimum value d C ! in the sub-calendar period si 1 -, for which it is true that This condition means that all the days with which the period of sub-calendar si was extended, and that belong to set C, also belong to the set of the days of the predefined calendars of sub-calendar si .
If d does not exist, we proceed with Step 7. 5. The period of the sub-calendar si shall begin with day d, i.e. we set x d i = . 6. We are looking for the maximum value , e x d C 1  Figure 2, before this algorithm is initiated set S will contain three sub-calendars: , , b b 1.VII.,3.VII., 3 5 Q  6  @  " , , , , , , b b b 6.VII.,24.VII., 1   After completion of the algorithm, the period of the second sub-calendar will be extended at the expense of adjacent sub-calendars and the third sub-calendar shall cease to exist: , , b b 1.VII.,1.VII., 3 5 , . The set of predefined calendars of the first sub-calendar additionally contains calendar b5 that we could omit. However, because it is a short period that will not appear in the final text of the calendar, we can leave it in the set in order to accelerate the algorithm. In the first phase, we modify the period of the subcalendar so that the starting and final day of its period belonged to the set of its predefined calendars and also to set C as follows:

Modifying the period of the sub-calendar
In the second phase we try to extend the period of the sub-calendar that has at least c1 days and does not represent the daily operation period as follows: , the algorithm ends. 5. If between the beginning of the TVP and the beginning of the period of the sub-calendar there is a period at most one week long that does not include a day that belongs to the set of the predefined calendars of the sub-calendar, i.e. 8 min min 6. If between the end of the period of the sub-calendar and the end of TVP there is a period at most one week long that does not include a day that belongs to the set of the predefined calendars of the sub-calendar, i.e. max max

Merging the group of sub-calendars
The algorithm is used to merge groups of consecutive sub-calendars from set , , , S s s sn , , so that the text of the calendar is as short as possible. The procedure is as follows: 1. We set t g S * =^h and k 1 = . 2. We set m 1 = . 3. We set S S * = .
and the period of this sub-calendar do not overlap with the period of other sub-calendars in the set S o . c) We set t g S = ô h . d) If text t is shorter than text t * (according to the string length), we set t t * = and S S * = o . 5. We set m m 1 = + and S S * = . 6. If m 3 # , we proceed with Step 3, otherwise we set k k 1 = + . 7. If k 2 # , we proceed with Step 2, otherwise the algorithm ends. The algorithm performs merging of at most 4 consecutive sub-calendars. In testing it was found that for m 3 2 there is no merging of sub-calendars taking place any more.

Creating a sub-calendar
The algorithm is used to create sub-calendar , , , s x y B Q = The procedure is as follows: 1. We set B Q = + . 2. To set B + we add any predefined calendar , , for which it is true that 3. If B Q = + , the sub-calendar is not created and the algorithm ends. 4. If , , , , and , , the sub-calendar is not created and the algorithm ends.
, , in set B + we replace calendars b1 to b5 with the calendar b x , if the number of positive and negative days of exceptions when using calendar b x is less than or equal to the number of positive and negative days of exceptions when using calendars b1 to b5, i.e.
6. If b B 7 ! + , we replace calendar b7 in set B + with calendar b + , if the number of positive and negative days of exceptions when using calendar b + is less than or equal to the number of positive and negative days of exceptions when using calendar b7, i.e.
@. 8. We modify the period of sub-calendar s -see Chapter 3.4. For example, for the calendar shown in Figure 3 a sub-calendar will be created at some point of the algo-  rithm for the period from 2.II. until 31.III. In this period there are predefined calendars with the following percentages: b1 100%, b2 89%, b4 and b5 13%. The percentage limit c5 is met by calendars b1 and b2 that will be included in set B + . In this algorithm a sub calendar , , , b b 2.II., 31.III. 1 2 Q 6 @ " , will be created.

Generating text
This algorithm is represented by the activity of function g Ŝ h for the set of sub-calendars , , , S s s sn , . First, we modify set S as follows: 1. From set S we exclude short sub-calendars , , , @, for which it is true that , , = + + . The excluded sub-calendars will be included in set P. 4. We determine a set of positive days of exceptions A + and a set of negative days of exceptions Abelonging to the period of sub-calendars included in set P. 5. For the sub-calendars from set P we generate the text of the sub-periods, after that we supplement the list of textual presentations of the positive and negative days of exceptions included in sets A + and A -. We add the text to the end of t.
The text of the sub-periods includes at first the beginnings and endings of the periods and then a list of symbols of the predefined calendars included in sets B k , is represented by the text "daily". 6. If S Q ! , we proceed with Step 2. 7. We determine a set of positive days of exceptions and a set of negative days of exceptions not belonging to the period of sub-calendars included in set S. 8. We generate a text containing a list of textual representations of positive and negative days of exceptions contained in sets A + and A -. We add this text to the end of t. 9. Text t contains the resulting text of the calendar.
If S contains only one element , , , s x y B B = + - the procedure for generating the text of the calendar is similar to the previously described algorithm with the difference that the period of the sub-calendar is not mentioned in the text. The result is for example, the text "operates x, 6 and 24.XII., 1., 8.V., no service 31.XII.". If the calendar is used for freight trains, we also generate a negative text of the calendar and if it is shorter, it will be the resulting text. We generate the negative text of the calendar for freight trains, if it is true that b B B 3 x 0 2 ! + + and the set of positive days of exceptions A Q = + . In such case, we will negate the predefined calendars of set B + that we include in set B -, for example, ,. Then we set , , , , and generate the text, for example, "no service 6, + and 29.XII. -2.I.".
If S Q = , the result is the text of type 5 or 6 given in Chapter 2.

TEXT CREATED BY THE USER
Since the algorithm generating the text of the calendar from a set of days does not always have to provide a text that is suitable for the users, the KANGO-Vlak program allows the user to define sub-periods and calendars to be displayed therein. Figure 4 shows a part of the dialog box that is used to enter the sub-calendars. In the "Operates" field we can enter the name of predefined calendars, when the train operates, and in the "No Service" field we can enter the name of predefined calendars, when there is no train service. These fields correspond to sets B + and Bof the sub-calendar. After clicking an appropriate button the corresponding days are marked in the calendar control and the calendar text is generated. For the sub-calendars shown in Figure 4 the result will be the text "operates 2 -6 from 1.I. until 28.II., from 2. until 31.III. operates x". After generating the text, the user can mark or unmark individual days in the calendar control, adding or removing positive or negative days of exceptions. The actual text of the calendar is read-only. The user can only affect the definition of sub-calendars and individual days included in the calendar.

VERIFYING THE ALGORITHM
The text-generating algorithm was tested on real data of the 2008/2009 timetable. Before the introduction of the algorithm described, a tool was used for creating the text by the user (see Chapter 4) and a simplified algorithm that failed to identify sub-calendars immediately following each other. The simplified algorithm was part of the CEV program [5], which was used prior to the implementation of the KANGO system.
The proposed algorithm was applied to all the calendars contained in the test database. Texts were generated on the computer Intel Core i7 3 GHz with the results listed in Table 3. The total number of calendars in the database 2,835 The number of calendars, the text of which was abbreviated 359 The average number of characters, by which the texts of calendars was shortened 20 The number of calendars, the text of which was extended 76 The number of calendars, the text of which was extended and the original text did not contain a daily operation sub-period while the new text does 27 The average number of characters, by which the texts of the calendars was extended to the exclusion of calendars, whose original text did not contain a daily operation sub-period while the new text does 6 The number of calendars with the same length of text, but with a different text 22 The average time of generating a calendar text 5 ms The maximum time of generating a calendar text 231 ms The time of generating a calendar text (both the average and the maximum) allows using an algorithm in real time. When marking days in the calendar, the user is not impeded by generating the text, and therefore the text can be generated automatically after each change of the calendar days.
The introduction of the algorithm significantly shortened the texts of calendars that contained multiple sub-periods (on the average by 20 characters).
However, some user-defined calendar texts were shorter (about 3%). Firstly, these are calendars differing in the daily operation period. In the test database there are calendars that do not contain a daily operation period, but the algorithm included such a period in the text. For example: -User-defined text: "operates 5 -7 and 24.XII. -1.I., 1.VII. -31.VIII." -Text generated by the algorithm: "operates 5 -7 until 28.VI. and from 4.IX. and 24.XII. -1.I., from 1.VII. until 31.VIII. operates daily".
A part of generating the daily operation period cannot be omitted from the algorithm, because there is a large number of calendars in which there is a daily operation period.
If we exclude the calendars that differ in the presence of a daily operation period, there are 49 calendars whose text is longer when using the algorithm, 6 characters on the average. Thereof, 26 calendars differ in the use of predefined calendars of working days and holidays. The algorithm will use these calendars, if the number of days of positive and negative exceptions is smaller. However, this may result in some cases in a longer text. For example: -User-defined text: "operates x, 6 and 24. -26.XII., 1.I., 12.IV., 1., 8.V., 5.VII., 27.IX., 28.X., 15.XI." -Text generated by the algorithm: "operates 1 -6 and 12.IV., 5.VII., 27.IX., 15.XI., no service 13.IV., 6.VII., 28.IX., 17.XI." The remaining calendars with a longer text (23 calendars) usually differ in confines of the sub-periods. Sub-periods that do not represent a daily operation are primarily determined by full weeks, but sometimes it is better to move the confines. For example, the algorithm provides the text: "operates 1 and 2 from 1. until 23.VI., from 29.VI. until 31.VII. operates 1 -5", but more suitable is the text: "operates 1 and 2 from 1. until 30.VI., from 1. until 31.VII. operates 1 -5".
Another problem is the fact that the algorithm provides a daily operation sub-period as the maximum period of consecutive days. If it is immediately followed by another sub-period, it can sometimes be more suitable to include the last day or days of the daily operation sub-period in the follow-up period. For example, the algorithm provides the text: "operates from 1.VI. until 1.VII. daily, from 8.VII. until 26.VIII. operates 3", but more suitable is the text with sub-periods that begin at the beginning of the month: "operates from 1.VI. until 30.VI. daily, from 1.VII. until 26.VIII. operates 3".

CONCLUSION
The information system of the train timetable must not only work with the set of train operation days, but it must also provide a corresponding textual representation. The calendar text does not usually contain a simple list of days, but rather variations of shorter and more meaningful texts, which require the use of a certain algorithm.
Types of calendar texts that appear in various print outputs in the Czech Republic were analyzed first. The calendar texts were divided into 11 types.