PlanetLab Europe as Geographically-Distributed Testbed for Software Development and Evaluation

In this paper, we analyse the use of PlanetLab Europe for development and evaluation of geographically-oriented Internet services. PlanetLab is a global research network with the main purpose to support development of new Internet services and protocols. PlanetLab is divided into several branches; one of them is PlanetLab Europe. PlanetLab Europe consists of about 350 nodes at 150 geographically different sites. The nodes are accessible by remote login, and the users can run their software on the nodes. In the paper, we study the PlanetLab’s properties that are significant for its use as a geographically distributed testbed. This includes node position accuracy, services availability and stability. We find a considerable number of location inaccuracies and a number of services that cannot be considered as reliable. Based on the results we propose a simple approach to nodes selection in testbeds for geographically-oriented Internet services development and evaluation.


Introduction
PlanetLab (www.planet-lab.org)[3], [14] is a network of the dedicated servers running services accessible by users.Its primary purpose is to support development and evaluation of new Internet technologies such as peer-to-peer systems, overlay routing, distributed storage, and topology mapping.The PlanetLab nodes are grouped into sets, called slices.A node associated with a slice runs a Linux virtual machine (Linux distribution CentOS, www.centos.org),called sliver.A user can remotely login on a sliver and run their code on it.The users run their code in separate working environments by accessing different slivers on the same physical PlanetLab node.Various tools exist to simplify the management of the slices and to upload the users' code on the slivers, such as Plush, PIMan, Stork, and PLDeploy (www.planet-lab.org/tools).
PlanetLab consists of about 1100 nodes that are associated with over 500 sites distributed over the world.For each site, the hosting organization name, web page, number of running nodes, delivery address, and geographical position is provided.The location of the nodes is inherited from the site's position information.
PlanetLab is divided into four main autonomous branches based on the geographical distribution of the sites [20].PlanetLab Central or USA (referred as PLC, www.planet-lab.org) is the main authority covering the nodes in the USA and, also, the nodes not assigned to any specific branch.The European nodes belong to PlanetLab Europe (PLE, www.planet-lab.eu),the nodes in Japan are associated with PlanetLab Japan (PLJ, www.planet-lab-jp.org),and the nodes in Korea are associated to Private PlanetLab Korea (PPK, www.planet-lab.kr).PlanetLab Europe currently consists of about 350 nodes at 150 sites and the geographical distribution of the PlanetLab Europe sites is shown in Fig. 1.The figure omits a site in Iceland for a better visibility of the nodes.
A PlanetLab site covers (run) one or more nodes.These nodes share some of the information provided for the PlanetLab site, such as its geographical location.Linux virtual machines (slivers) are run on each node to implement the PlanetLab services.The commonly used services are the remote login to a sliver using SSH (Secure Shell) and node echo request/reply (ping).PlanetLab creates a new sliver and runs it on a node when the node is assigned to a slice by a user as shown in Fig. 2. A slice is created by the PlanetLab administrators based on the user's request.After establishing a slice, a user selects the nodes to be included in the slice based on the research purpose of the slice.Nodes from different sites can be added to a slice, and it is common that a node is assigned to a number of slices and, thus running a set of slivers at the same time.This scenario can be seen in Fig. 3.However, from the user's (slice) point of view, a sliver equals a node.By assigning nodes to a slice, a user is able to operate the nodes by a remote login.The nodes assigned to a slice can be further grouped into subsets by assigning them different purpose-tags.
Due to remote access and geographical diversity of the nodes, PlanetLab is commonly used as a testbed for research and evaluation of geographically-oriented Internet services [6], [13].However, correct and notmisleading results of such experiments and evaluations require trustful input data such as the nodes' position (commonly referred as ground-truth data).
In this paper, we analyse the use of PlanetLab Europe as a geographically distributed testbed for Inter- net research by studying the accuracy of the location information provided for the PlanetLab sites and, consequently, for the associated nodes.We also address the problem of the PlanetLab services availability and stability.
We used a number of techniques to evaluate the nodes' location accuracy.These were the following: • location's uncertainty caused by the used number of digits for the coordinates (latitude and longitude), • the distance between the site's position (coordinates) and site's postal address, • the distance between the site's position (coordinates) and the location (coordinates) provided by public location databases.
The PlanetLab services, we analysed, were echo request/reply (ping) and remote login (SSH).We inspected the availability and performance of these two services.We evaluated the availability of the services for a period of time to reflect the changes in time.For remote login, we additionally evaluated the SSH connection establishment delay since it is critical for Internet services that require timely actions taken on a large number of remote nodes.
The evaluations identified a considerable number of the location information inaccuracies and a number of services that cannot be considered as reliable.Based on the results we proposed a simple method to assist with the selection of the nodes to be used in the geographically-distributed testbeds.
The paper is structured as follows: the section 'Motivation and related work describes in more detail the need for a proper selection of the PlanetLab nodes for the geographically-distributed testbeds.Examples of the geographical-oriented research based on PlanetLab are described.Related papers concerning the Plan-etLab geographical and service properties are referred and discussed.In section 'PlanetLab analysis description' we talk about the techniques used in the paper and give an overview of the nodes covered by the analysis.The results are described in the section 'Planet-Lab analysis results'.First, we focus on the geographical properties, then we continue with the services performance and stability.The section 'PlanetLab nodes preferences' covers the proposal for the use of the Plan-etLab nodes.Also, we present statistics for the nodes meeting the proposed criteria in this section.Finally, we conclude the paper.

Motivation and Related Work
The PlanetLab nodes have been used in a lot of Internet research.Research into geographically-oriented Internet services covers a broad variety of fields, such as estimation of IP node's location (geolocation), network topology design and discovery, assurance of local copyright and intellectual property laws of the resources available on-line, geographical organization of P2P VoIP networks, overhead traffic reduction by predicting the communication parameters, and prevention of Internet security attacks.Some of the papers based on PlanetLab are [4], [5], [8], [10], [11], [18], [19], [21].PlanetLab information is also commonly used as groundtruth data for comparison of different geolocation tools and applications [1], [6], [16].
However, research papers reviewed by us did not inspect nor consider the location accuracy of PlanetLab nodes used.However, in order to achieve correct results and facilitate geographically-oriented Internet research, it is important to use error-free input data and services that are stable over time.We, therefore, address this gap in this paper.This problem has also been identified by other research teams, and we next present an overview of the related papers.The ELTE's Location Survey (www.planetlab.eu/node/220 ) focused on the position accuracy of the PlanetLab nodes.The finding was that a relatively large number of the PlanetLab nodes had very inaccurate location information with errors of varying magnitude.After some improvements, the result was that 90 % of the PlanetLab sites had a correct location, although the error distance limits to consider a position to be correct were not mentioned.The authors of paper [12] studied the geographical position of the PlanetLab nodes.They used a community-based geolocation database 'hostip.info' to retrieve the positions of the PlanetLab nodes, and they identified the distance between the coordinates obtained from the database and the coordinates provided for the PlanetLab sites.The conclusion was that, in same cases, the PlanetLab location information was unreliable to be used in geographically-oriented Internet experiments.Paper [2] brought recommendations for improving the PlanetLab's geographical diversity.The approach described was to establish new PlanetLab sites to reflect the behaviour of the Internet.The authors suggested using the routing and topological data provided by Caida and Skitter to expand the PlanetLab's geographical diversity.Paper [17] studies the geographical distribution and services stability of the PlanetLab nodes.The authors proposed recommendations of the node selection for long-time research or network measurement purposes.In our previous work [9] we analysed the accuracy of PlanetLab Europe coordinates using geolocation databases.This paper extends work [9] by a new analysis or the location accuracy and, also, by a new study of the PlanetLab services performance and their changes in time.

Description of PlanetLab Analysis
In this section, we describe our analysis methodology.
For our work, we used different information sources.We used information from the PlanetLab Europe's official site (www.planet-lab.eu)as well as information from outside PlanetLab, such as information from geolocation databases MaxMind, IP2Location and IPligence.We developed a number of Linux shell scripts to communicate with the PlanetLab nodes and to implement the related geolocation procedures.For geocoding (transformation of postal address into coordinates), we used Google API. Figure 4 shows the source of location information for the PlanetLab sites.It shows location data for a site named CESNET and located in Prague, Czech Republic.This site has three PlanetLab nodes associated (in the figure below the location data).All the nodes share the same data that is the hosting organization URL and the geographical coordinates.We used URL to obtain the postal address of the organization.For the gecoding procedures we used API (The used code is based on The Google Geocoding API manual available at developers.google.com/maps/documentation/geocoding/ ) as shown in the command bellow: wget -O siteLoc.json"http://maps.googleapis.com/maps/api/geocode/json?address=Zikova4, 160 00 Praha 6, Czech Republic&senzor=false" with this output: "formatted_address" : "Zikova 1905/4, Czech Technical University in Prague, 160 00 Prague 6-Dejvice, Czech Republic", "geometry" : { "location" : { "lat" : 50.1016729, "lng" : 14.3907131 } The listing demonstrates how we transformed the PlanetLab site postal addresses into geographical coordinates.The listing shows the coordinates obtained for CESNET site and shown in Fig. 4. The address 'Zikova 1905/4, Czech Technical University in Prague, 160 00 Prague 6-Dejvice, Czech Republic' was obtained from the URL specified for the site in the same figure (www.ces.net).This technique was used for absolute error distance evaluations.
In the analysis, we also studied the effect of coordinates rounding on location inaccuracy since less number of digits in the coordinate values means a greater area covered.For this purpose, we needed to calculate the distance uncertainty for each number of digits specified.In our analysis we used Eq. ( 1) for latitude uncertainty and Eq. ( 2) for longitude uncertainty: where We used values of latitude equal of a 50 • since this goes through the area of central Europe.
In the analysis, we used location databases to find the geographical location for a given IP address.
We used the demo pages of these databases: www.maxmind.com/en/home,www.ip2location.com/demo,and www.ipligence.com/.The demo pages allow to geolocate a certain number of IP address per day.The Maxmind demo service is based on the GooIP2 Precision product (www.maxmind.com/en/web_services)and allows up to 25 address per day to be geolocated.The limit for IP2Location demo service is 20 addresses per day and the product used is DB24 (www.ip2location.com/demo).The IPligence demo service allows 50 queries per day and according to the information provided is based on the MAX product (www.ipligence.com/products/?max#max ).
We started to work with 367 PlanetLab nodes which seemed to be related to Europe.An overview of the nodes used is given in Tab. 2. After the first check of the nodes, we noticed some flaws in this set.We failed to resolve IP addresses for 19 domain names specified by PlanetLab (we call these nodes zombies).Next, we identified 14 nodes assigned to PlanetLab Europe (PLE), but geographically located outside Europe, for example Tunisia, Australia, Israel, and Thailand.On the other hand, we identified 29 additional PlanetLab nodes geographically located in Europe, but not assigned to PlanetLab Europe [15].After leaving the nodes not geographically located in Europe and without an IP address, we obtained 334 nodes.

Results of PlanetLab Analysis
We divided our analysis into two parts.The first part is on the geographical properties of PlanetLab sites and the second part on the PlanetLab services.

Geographical Properties
The distribution of the PlanetLab nodes across the European countries is shown in Tab.The list shows that there are great differences in the nodes count in different European countries.The countries with the largest number of PlanetLab nodes are Germany, Spain, France, Great Britain, Italy, and Poland.For PlanetLab use as a geographicallydistributed testbed, a uniform nodes distribution would be beneficial [2].However, in practice, the nodes cover only specific areas.Based on the idea of the ITU ICT development index (The ICT development index is published by ITU -International Telecommunication Union, and is used to measure the information society status), which measures and compares the IT performance of different countries, we evaluated the support of the countries to the Internet research by providing resources in form of PlanetLab nodes.For each country, we calculated the 'population/PlanetLab nodes' ratio (Ppl/Plb) as shown in the Tab. 3. The result is that Switzerland, Cyprus, Finland, Hungary, and Slovenia are the most PlanetLab supportive countries.

Country
Nodes Popul.(×10 The countries in Tab. 3 are highlighted in Fig. 5.We divided the countries into four European parts (west, east, north, south) to indicate the diversity of Plan-etLab nodes across Europe.Table 4 shows that the west part of Europe covers the majority of the nodes.A related paper [2] recommended a way to improve the PlanetLab's unbalanced geographical diversity by adding new nodes/sites belonging to commercial organizations.These nodes should be assigned to a dedicated PlanetLab block to keep the routing distances small within the commercial and educational networks.Another suggested approach dealing with the Plan-etLab's diversity was to identify the existing GREN (Global Research and Educational Network) sites by examining routing data and motivate them to add the PlanetLab nodes.After the initial insight into PlanetLab Europe's geographical properties, we continued with an analysis of the location information provided for the sites by the PlanetLab's website [9].We started with the position uncertainty caused by the number of the digits used for specifying the coordinates (latitude and longitude).Some of the PlanetLab sites had the latitude and longitude specified using a low number of digits which resulted in erroneous locations, in same cases, of the tens of kilometres.The product of coordinates rounding is pointed out in Tab. 5. We observed that the rounding is not a major problem as it might have seemed.Approximately 90 % of the nodes have the coordinates' rounding error less than 5 km.However, attention should be taken not to use nodes with a possible rounding location error over 30 km, that is about 5 % of the nodes.The next information, we incorporated into our analysis, was the postal address of the organizations/sites running the PlanetLab nodes.We used this information to measure the difference between the provided coordinates and the site's postal address.We based this research on the idea that the organization's address and the location of the nodes associated should be close to each other in some extent to consider a node's location as trusted.Also, we were aware of the fact that an organization can reside in a different location than the nodes it owns.This could be the case of situating the servers in a dedicated server-room facility at another organization.However, we did not assume that this is common with the PlanetLab organizations since the vast majority are universities and large research centres.To obtain the distance between the nodes and their organization, we geocoded the postal address of each PlanetLab organization into coordinates.The result is shown in Fig. 6.The cumulative probability function shown indicates that about 70 % of the nodes are within 5 km distance from their organization's address.However, it can be seen that much greater distances were also measured.Up to this point, we used only information provided by the PlanetLab's website.In practice, a common way, to find a location of an Internet node, is to use a geolocation database.The geolocation databases maintain geographic locations for blocks of IP addresses.Upon a location request for an IP address, the corresponding block of IP addresses is found, and the searched node is assigned a position stored for this block.There is a broad variety of geolocation databases, some of them public or private.In our analysis, we used three public databases to find the positions of the PlanetLab nodes.Then we compared the locations (coordinates) from geolocation databases with the locations (coordinates) provided by Planet-Lab.The databases used were: • MaxMind (www.maxmind.com), • IP2Location (www.ip2location.com), • IPligence (www.ipligence.com).
The results are plotted using the cumulative probability function in Fig. 7.The used databases gave similar results with Ip2location returning coordinates closest to the coordinates provided by PlanetLab.In terms of the distance difference between the PlanetLab location information and information from the databases, about 45 % of the nodes has difference lower than 5 km.The result of the location accuracy analysis is shown in Tab. 6.The column 'node count' displays the number of nodes which PlanetLab location information difference (inaccuracy for GPS rounding) was less than 5 km for the corresponding estimation method.
The probability function of the median of distance differences is shown in Fig. 8.The graph was generated using the location information sources shown in Tab. 6.
Tab. 6: Summary of geographical accuracy analysis.

Location estimation method
Node count Of all nodes (%)

Services Performance and Availability
In this analysis, we inspected the PlanetLab services used for geographically-oriented Internet research.We focused on echo-reply service and remote login as they are fundamental to any measurement-based geolocation service.We developed a number of BASH scripts, distributed them and ran them on the PlanetLab nodes.We used the standard ping and SSH (remote login) commands from the command line.We remotely connected to the PlanetLab nodes using a combination of private/public SSH keys, managed by the PlanetLab remote-login service.We evaluated the services for a period of time to reflect their performance and availability changes in time.In total, we performed around 400 measurements for each PlanetLab node at different times during a day.
Table 7 shows the node numbers for the SSH and echo-reply services.We considered a service as reliable if a node replied to at least of 95 % of the attempts over a period of three months.The numbers show a 'surprising' fact that only about half of the nodes were reliable for periodic latency measurements using the echo-reply service.Even a lower number of the nodes can be considered as reliable for a long-term use of the SSH service.
We linked ping and SSH services together (logical AND) and obtained that only 38 % of the nodes could be considered as reliable/stable for a longer-term use.
We also inspected how the performance of the services changed over time.For the evaluation period of three months, we plotted a graph shown in Fig. 9.The node numbers show that there is not a strong correlation between the ping and SSH service (r = 0.64).This indicates that the status of the both services should be checked independently.The graph also shows that about 250 nodes run the echo-reply service at a specific time with the number varying from 240 to 260.Regarding the SSH service, there were about 190 nodes available at a specific time with the number varying from 180 to 200 nodes.When remotely accessing the PlanetLab nodes, we noticed another issue which should be definitely considered -the SSH connection establishment delay.We experienced some delays to be unexpectedly long.This significantly degrades the use of the nodes in time-sensitive Internet applications.We found this critical for the applications requiring a time-sensitive response from commands run on different machines, especially talking about timely measurements initiated from the PlanetLab nodes.In these cases, a number of seconds delay to connect a node plays an important role.We therefore measured the SSH connection delays using BASH scripts.The used script for the measuring SSH connection time to PlanetLab nodes is shown in the command bellow [7].
T="$(date +%s%N)" ssh -o PreferredAuthentications= publickey -o PasswordAuthentication=no -o ConnectTimeout=30 -o StrictHostKeyChecking=no -o UserKnownHostsFile =/dev/null -l sliceNameB -i privateKey user@IP exit || timeout =1 T="$(($(date +%s%N)-T))" M="$((T/1000000))" Figure 10 plots a cumulative probability function of maximum, average, and minimum values for the SSH connection establishment delays.The graph covers the nodes we were able to at least once login into during the evaluation period -that was 204 nodes.Also, we defined a maximum delay of 30 seconds to consider a login to be successful.When a login was not established due to this limit, or the connection failed due to the other reasons, we set the delay to indefinite.This can be seen when inspecting the red line in the graph showing that the maximum connection delay was exceeded at least once for about 40 % of the nodes (81 nodes).
We observed that the average SSH connection delay for all the nodes was about 5 seconds and median about 4.5 seconds.We empirically suggest the long delays are caused by the reverse DNS lookup (determination of a domain name of the ssh client).We measured the time for the reverse DNS lookups on a local server and found that the time need was around 20 seconds.

Method for PlanetLab Nodes Selection
The results of the previous section show that definitely both: • geographical information accuracy, • service availability and performance, of the PlanetLab nodes should be considered when the geographically-oriented Internet research is involved.Consequently, in this section, we combine the analysis results to reveal the complex information about the PlanetLab nodes.We propose a simple method to assist with the PlanetLab Europe nodes selection to be used in testbeds for geographically-oriented Internet research and evaluations.
We start with the geographical aspects.We consider location information from PlanetLab to be correct if it is within a 5 km distance difference from the median values of all the location accuracy results for a node.We used the median of all location results for a node to filter out the extreme values.We would like to note that other distance difference thresholds can be set satisfying the needs of the specific application.
For the services, we keep our previous assumption to consider a service reliable to be available for at least of 95 % of the attempts over a specified period of time.We consider this as an important property since when producing correct outputs for longer evaluation periods, the same nodes should be used.We also involved the SSH establishment delay into our proposal.We empirically set the delay limit to be less than 5 seconds to filter out the nodes that might cause problems in time-sensitive applications.
Table 8 unveils the details about the parameters.We got 211 nodes with the correct location information provided.That was 60 % of the nodes geographical located in Europe.From this set, 83 nodes (40 %) provided reliable services.Additionally, these 83 nodes were assigned to the correct European Plan-etLab branch (PLE), had IP address and the domain name could be resolved.Finally, from this set of nodes, we got 56 nodes (67 %) with an acceptable average SSH login delay.

Conclusion
PlanetLab as a geographically distributed testbed has some issues which should be considered prior to its use.Based on our experiences of using PlanetLab and the related work, we observed that the location information along with the services stability plays an important role in Internet research experiments.In the paper, we evaluated the accuracy of the location information provided for the PlanetLab Europe nodes.The results show that many nodes' location could not be trusted as correct input data (groundtruth) for experiments.Our other consideration was the availability and stability of

Fig. 4 :
Fig. 4: Example of location information at PlanetLab Europe site.

Fig. 5 :Tab. 4 :
Fig. 5: Countries in four European parts with one or more Plan-etLab sites.
Difference of node coord.and site address[km]

Fig. 6 :
Fig. 6: Difference of site location and site postal address.

Fig. 7 :
Fig. 7: Difference of site location and locations from geolocation databases.

Fig. 9 :
Fig. 9: Availability of ping and SSH services, changes in time.
Tab. 1: Latitudinal and longitudinal lengths for selected latitudes.
3. The locations used in the table were the coordinates of the PlanetLab sites provided by the PlanetLab Europe's website.The ta- Tab. 5: Site location uncertainty due to coordinates rounding.