Abstract
Correctly mapping Autonomous Systems (ASes) to their owner organizations is critical for connecting AS-level and organization-level research. Unfortunately, constructing an accurate dataset of AS-to-organization mappings is difficult due to a lack of ground truth information. CAIDA AS-to-organization (CA2O), the current state-of-the-art dataset, relies heavily on Whois databases maintained by Regional Internet Registries (RIRs) to infer the AS-to-organization mappings. However, inaccuracies in Whois data can dramatically impact the accuracy of CA2O, particularly for inferences involving ASes owned by the same organization (referred to as sibling ASes).
In this work, we leverage PeeringDB (PDB) as an additional data source to detect potential errors of sibling relations in CA2O. By conducting a meticulous semi-manual investigation, we discover two pitfalls of using Whois data that result in incorrect inferences in CA2O. We then systematically analyze how these pitfalls influence CA2O. We also build an improved dataset on sibling relations, which corrects the mappings of 12.5% of CA2O organizations with sibling ASes (1,028 CA2O organizations, associated with 3,772 ASNs). To make this process reproducible and scalable, we design an automated approach to recreate our manually-built dataset with high fidelity. The approach is able to automatically improve inferences of sibling ASes for each new version of CA2O.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
RIPE NCC, ARIN, APNIC, LACNIC, AFRINIC.
- 3.
IDNIC, CNNIC, JPNIC, KRNIC, TWNIC, VNNIC, IRINN.
- 4.
NIC Mexico, NIC.br.
- 5.
aut refers to autonomous system.
- 6.
The official romanization system for Standard Mandarin Chinese in China.
References
Bgp.tools. https://bgp.tools/
The CAIDA AS Organizations Dataset. (Downloaded on July 1 (2022)). https://www.caida.org/data/as-organizations
CAIDA AS Rank. https://as-rank.caida.org/
Daily snapshots of PeeringDB data. (Downloaded on April 4 (2022)). https://publicdata.caida.org/datasets/peeringdb/
Global Routing Intelligence Platform (GRIP). https://grip.inetintel.cc.gatech.edu/
GTT acquired Interoute in 2018. https://www.gtt.net/us-en/media-center/press-releases/gtt-to-acquire-interoute/
The Internet registry system. https://www.ripe.net/participate/internet-governance/internet-technical-community/the-rir-system
Introduction to ARIN’s databases. https://www.arin.net/resources/guide/account/database/
Mapping autonomous systems to organizations: CAIDA’s inference methodology. https://www.caida.org/archive/as2org/
The merge of France IX and Rezopole A.D. https://www.linkedin.com/company/france-ix
The merger of Nutrien in 2018. https://www.nutrien.com/investors/news-releases/2018-agrium-and-potashcorp-merger-completed-forming-nutrien-leader-global/
News of the acquisition of CNS by Beeks. https://beeksgroup.com/news/beeks-acquires-vps-provider-cns/
News of the acquisition of Linode by Akamai. https://www.akamai.com/newsroom/press-release/akamai-completes-acquisition-of-linode
Process of ASN application of RIPE. https://www.ripe.net/manage-ips-and-asns/as-numbers/request-an-as-number
The public datasets of ISI ANT lab. https://ant.isi.edu/datasets/all.html
Template of APNIC Whois. https://www.apnic.net/manage-ip/using-whois/guide/aut-num/
Cai, X., Heidemann, J., Krishnamurthy, B., Willinger, W.: Towards an AS-to-organization Map. In: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pp. 199–205 (2010)
Cai, X., Heidemann, J., Krishnamurthy, B., Willinger, W.: An organization-level view of the internet and its implications (extended), p. 26 (2012)
Dainotti, A., et al.: Lost in space: improving inference of ipv4 address space utilization. IEEE J. Sel. Areas Commun. 34(6), 1862–1876 (2016)
Jin, Y., Scott, C., Dhamdhere, A., Giotsas, V., Krishnamurthy, A., Shenker, S.: Stable and practical AS relationship inference with ProbLink, pp. 581–598 (2019). https://www.usenix.org/conference/nsdi19/presentation/jin
Konte, M., Perdisci, R., Feamster, N.: ASwatch: An AS reputation system to expose bulletproof hosting ASes (2015)
Liu, J., Yang, B., Liu, J., Lu, Y., Zhu, K.: A method of route leak anomaly detection based on heuristic rules, pp. 662–666. Atlantis Press (2017). https://doi.org/10.2991/ammee-17.2017.127. https://www.atlantis-press.com/proceedings/ammee-17/25878482. iSSN: 2352-5401
Luckie, M., Huffaker, B., Dhamdhere, A., Giotsas, V., claffy, k.: AS relationships, customer cones, and validation. In: Proceedings of the 2013 conference on Internet measurement conference - IMC 2013, pp. 243–256. ACM Press, Barcelona, Spain (2013). https://doi.org/10.1145/2504730.2504735. https://dl.acm.org/citation.cfm?doid=2504730.2504735
Nemmi, E.N., Sassi, F., La Morgia, M., Testart, C., Mei, A., Dainotti, A.: The parallel lives of Autonomous Systems: ASN allocations vs. BGP. In: Proceedings of the 21st ACM Internet Measurement Conference, pp. 593–611. IMC 2021, Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3487552.3487838. https://doi.org/10.1145/3487552.3487838
Padmanabhan, R., et al.: A multi-perspective view of Internet censorship in Myanmar. In: Proceedings of the ACM SIGCOMM 2021 Workshop on Free and Open Communications on the Internet, pp. 27–36 (2021)
Testart, C., Richter, P., King, A., Dainotti, A., Clark, D.: Profiling BGP serial hijackers: capturing persistent misbehavior in the global routing table. In: Proceedings of the Internet Measurement Conference on - IMC 2019, pp. 420–434. ACM Press, Amsterdam, Netherlands (2019). https://doi.org/10.1145/3355369.3355581. https://dl.acm.org/doi/10.1145/3355369.3355581
Zhao, X., et al.: An analysis of BGP multiple origin AS (MOAS) conflicts. In: Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement, pp. 31–35 (2001)
Ziv, M., Izhikevich, L., Ruth, K., Izhikevich, K., Durumeric, Z.: ASdb: a system for classifying owners of autonomous systems. In: Proceedings of the 21st ACM Internet Measurement Conference, pp. 703–719. ACM, Virtual Event (2021). https://doi.org/10.1145/3487552.3487853. https://dl.acm.org/doi/10.1145/3487552.3487853
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Information of RIR/NIR Whois
APNIC. The bulk Whois data of APNIC is public, while among 7 NIRs, only JPNIC and KRNIC publish their bulk Whois data. We learned from the APNIC helpdesk that if NIRs make further assignments within the NIR-maintained whois database, they may not be reflected in the APNIC Whois database.
20,127 ASes are delegated in the APNIC region including the ones delegated by the NIRs. aut-num (i.e., autonomous system number) and organisation are the AS-object and org-object in APNIC Whois, associated with the org field (i.e., org-id of the organization) in aut-num. However, 8,781 ASes in APNIC do not have org field (i.e., no related organization objects), where 99.4% of such ASes are registered in the countries of 7 NIRs. For these ASes, the descr (i.e., description) field in AS-objects carries the name of the owner organization without association by org-id. The descr field is mandatory [16], and all AS-objects have such field including the ones with associated organization-objects.
RIPE NCC. The bulk Whois data of RIPE is public. 37,672 ASes are delegated in the RIPE region, which is the most among the 5 RIRs. RIPE NCC has a similar structure as APNIC that there are aut-num and organization objects associated by org-id. Though no NIR exists in the RIPE region, there is still a small amount of ASes (108 ASNs) without associated organizations, whose holder organization is in the descr field. Different from APNIC, the descr field is not mandatory and only 3,962 ASes in RIPE have this field.
AFRINIC. The bulk Whois data of AFRINIC is public. AFRINIC allocates the least AS numbers among RIRs, where only 2,168 ASes are delegated in the AFRINIC region. The Whois structure of AFRINIC is similar to APNIC and RIPE but more consistent: all aut-num objects have org fields associating with org-objects and the descr field is also mandatory in AFRINIC.
ARIN. The access to ARIN bulk Whois data needs an application (we get access for this work). 31,446 ASes are delegated in the ARIN region. ARIN uses its own format of Whois [8]: ASHandle and OrgName are two main objects, associated by OrgID. AS-objects does not have the descr field and every ASN-object has an associated org-object.
LACNIC. The access to LACNIC bulk Whois data needs an application (we do not get access for this work). 12,740 ASes are delegated in the LACNIC region. To compare CA2O with LACNIC Whois, we conduct a web scraping on the LACNIC official webpage for Whois to collect the Whois mappings.
B Details of Keywords Function
We implement two lists of stop-words, where the first list contains the words that can not be used to identify an organization, while the second list might be useful for some time. The first list contains apnic, enterprise, asn, sas, as, information, ap, pvt, university, jpnic, jsco, telecom, and, bvba, autonomous, ltda, services, for, op, backbone, telekom, based, ohg, de, gmbh, technologies, lacnic, pt, legacy, inc, company, the, technology, of, llc, sdn, organization, afrinic, com, idnic, bhd, da, international, corporation, twnic, limited, research, or, aka, pty, service, solutions, me, arin, ltd, jsc, in, org, ripe.
The second list contains health, communication, tecnologia, data, network, comunicacao, center, coop, hospital, australia, bank, servi, servers, sg, telecomunica, el, northern, north, net, en, me, systems, sdn, telecommunications, telecomunices, telecommunication, east, eu, uab, education, info, de, public, silva, exchange, world, serv, college, communications, eng, western, digital, hosting, apac, city, southern, yue, internet, broadband, asia, link, route, uk, consumo, provedora, networks, japan, tech, ag, west, sp, cloud, web, co, telecomunicacoes, os, servicos, ab, ix, comunica, tel, publicos, telefon, experimental, yu, europe, connect, eastern, south, computing, group, county, global. In addition, we add the names of countries and the two-letter country codes to the second list.
For each set of extracted English keywords, we first filter out the words in the first list. Then we examine if all the remaining words exist in the second list. If so, we do not use the second list; otherwise, we use the list to filter out part of the words.
C Manual Input Knowledge
1.1 C.1 Manual Input Pools in Sect. 4
We identified 8 CA2O.orgs during the semi-manual investigation, which are likely to be APNIC LIRs (211 ASes involved). The pool detection did not recognize them because none of the involved ASes maintain information in PDB. We list the names of them here: REANNZ Education and Schools; Internet Thailand Company Ltd.; ePLDT Inc.; CS Loxinfo Public Company Limited; Globe Telecom (GMCR,INC); Sky Internet; KSC Commercial Internet Co.Ltd.; Philippine Long Distance Telephone Co.
1.2 C.2 Manual Knowledge of admin-c in Sect. 5
We identified several pools that the CA2O.orgs are very likely to be APNIC LIRs, but the involved ASes have the same admin-c fields. For the sake of the accuracy of our dataset, we do not add admin-c as a feature for the ASes in these pools:
One Pool Containing of an NIR. IRINN (Indian Registry for Internet Names and Numbers) put their org-handle (RB486-AP) in admin-c fields for 11 ASes. We contacted IRINN and confirmed that it was a technical glitch that the system automatically set the IRINN nic-handle on the ASes delegated by IRINN if Whois server issue happened.
Six Pools Containing APNIC LIRs. We list the names of the APNIC LIRs here: United Information Highway; Eastern Telecommunications Philippines, Inc.; SingTel Optus Pty Ltd; True Internet Co.,Ltd. and TRUE INTERNET; Communications & Communicate Nepal Pvt Ltd; VOCUS PTY LTD.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, Z., Bischof, Z.S., Testart, C., Dainotti, A. (2023). Improving the Inference of Sibling Autonomous Systems. In: Brunstrom, A., Flores, M., Fiore, M. (eds) Passive and Active Measurement. PAM 2023. Lecture Notes in Computer Science, vol 13882. Springer, Cham. https://doi.org/10.1007/978-3-031-28486-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-28486-1_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28485-4
Online ISBN: 978-3-031-28486-1
eBook Packages: Computer ScienceComputer Science (R0)