Skip to main content

Improving the Inference of Sibling Autonomous Systems

  • Conference paper
  • First Online:
Passive and Active Measurement (PAM 2023)

Abstract

Correctly mapping Autonomous Systems (ASes) to their owner organizations is critical for connecting AS-level and organization-level research. Unfortunately, constructing an accurate dataset of AS-to-organization mappings is difficult due to a lack of ground truth information. CAIDA AS-to-organization (CA2O), the current state-of-the-art dataset, relies heavily on Whois databases maintained by Regional Internet Registries (RIRs) to infer the AS-to-organization mappings. However, inaccuracies in Whois data can dramatically impact the accuracy of CA2O, particularly for inferences involving ASes owned by the same organization (referred to as sibling ASes).

In this work, we leverage PeeringDB (PDB) as an additional data source to detect potential errors of sibling relations in CA2O. By conducting a meticulous semi-manual investigation, we discover two pitfalls of using Whois data that result in incorrect inferences in CA2O. We then systematically analyze how these pitfalls influence CA2O. We also build an improved dataset on sibling relations, which corrects the mappings of 12.5% of CA2O organizations with sibling ASes (1,028 CA2O organizations, associated with 3,772 ASNs). To make this process reproducible and scalable, we design an automated approach to recreate our manually-built dataset with high fidelity. The approach is able to automatically improve inferences of sibling ASes for each new version of CA2O.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/InetIntel/Improving-Inference-of-Sibling-ASes.

  2. 2.

    RIPE NCC, ARIN, APNIC, LACNIC, AFRINIC.

  3. 3.

    IDNIC, CNNIC, JPNIC, KRNIC, TWNIC, VNNIC, IRINN.

  4. 4.

    NIC Mexico, NIC.br.

  5. 5.

    aut refers to autonomous system.

  6. 6.

    The official romanization system for Standard Mandarin Chinese in China.

References

  1. Bgp.tools. https://bgp.tools/

  2. The CAIDA AS Organizations Dataset. (Downloaded on July 1 (2022)). https://www.caida.org/data/as-organizations

  3. CAIDA AS Rank. https://as-rank.caida.org/

  4. Daily snapshots of PeeringDB data. (Downloaded on April 4 (2022)). https://publicdata.caida.org/datasets/peeringdb/

  5. Global Routing Intelligence Platform (GRIP). https://grip.inetintel.cc.gatech.edu/

  6. GTT acquired Interoute in 2018. https://www.gtt.net/us-en/media-center/press-releases/gtt-to-acquire-interoute/

  7. The Internet registry system. https://www.ripe.net/participate/internet-governance/internet-technical-community/the-rir-system

  8. Introduction to ARIN’s databases. https://www.arin.net/resources/guide/account/database/

  9. Mapping autonomous systems to organizations: CAIDA’s inference methodology. https://www.caida.org/archive/as2org/

  10. The merge of France IX and Rezopole A.D. https://www.linkedin.com/company/france-ix

  11. The merger of Nutrien in 2018. https://www.nutrien.com/investors/news-releases/2018-agrium-and-potashcorp-merger-completed-forming-nutrien-leader-global/

  12. News of the acquisition of CNS by Beeks. https://beeksgroup.com/news/beeks-acquires-vps-provider-cns/

  13. News of the acquisition of Linode by Akamai. https://www.akamai.com/newsroom/press-release/akamai-completes-acquisition-of-linode

  14. Process of ASN application of RIPE. https://www.ripe.net/manage-ips-and-asns/as-numbers/request-an-as-number

  15. The public datasets of ISI ANT lab. https://ant.isi.edu/datasets/all.html

  16. Template of APNIC Whois. https://www.apnic.net/manage-ip/using-whois/guide/aut-num/

  17. Cai, X., Heidemann, J., Krishnamurthy, B., Willinger, W.: Towards an AS-to-organization Map. In: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pp. 199–205 (2010)

    Google Scholar 

  18. Cai, X., Heidemann, J., Krishnamurthy, B., Willinger, W.: An organization-level view of the internet and its implications (extended), p. 26 (2012)

    Google Scholar 

  19. Dainotti, A., et al.: Lost in space: improving inference of ipv4 address space utilization. IEEE J. Sel. Areas Commun. 34(6), 1862–1876 (2016)

    Article  Google Scholar 

  20. Jin, Y., Scott, C., Dhamdhere, A., Giotsas, V., Krishnamurthy, A., Shenker, S.: Stable and practical AS relationship inference with ProbLink, pp. 581–598 (2019). https://www.usenix.org/conference/nsdi19/presentation/jin

  21. Konte, M., Perdisci, R., Feamster, N.: ASwatch: An AS reputation system to expose bulletproof hosting ASes (2015)

    Google Scholar 

  22. Liu, J., Yang, B., Liu, J., Lu, Y., Zhu, K.: A method of route leak anomaly detection based on heuristic rules, pp. 662–666. Atlantis Press (2017). https://doi.org/10.2991/ammee-17.2017.127. https://www.atlantis-press.com/proceedings/ammee-17/25878482. iSSN: 2352-5401

  23. Luckie, M., Huffaker, B., Dhamdhere, A., Giotsas, V., claffy, k.: AS relationships, customer cones, and validation. In: Proceedings of the 2013 conference on Internet measurement conference - IMC 2013, pp. 243–256. ACM Press, Barcelona, Spain (2013). https://doi.org/10.1145/2504730.2504735. https://dl.acm.org/citation.cfm?doid=2504730.2504735

  24. Nemmi, E.N., Sassi, F., La Morgia, M., Testart, C., Mei, A., Dainotti, A.: The parallel lives of Autonomous Systems: ASN allocations vs. BGP. In: Proceedings of the 21st ACM Internet Measurement Conference, pp. 593–611. IMC 2021, Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3487552.3487838. https://doi.org/10.1145/3487552.3487838

  25. Padmanabhan, R., et al.: A multi-perspective view of Internet censorship in Myanmar. In: Proceedings of the ACM SIGCOMM 2021 Workshop on Free and Open Communications on the Internet, pp. 27–36 (2021)

    Google Scholar 

  26. Testart, C., Richter, P., King, A., Dainotti, A., Clark, D.: Profiling BGP serial hijackers: capturing persistent misbehavior in the global routing table. In: Proceedings of the Internet Measurement Conference on - IMC 2019, pp. 420–434. ACM Press, Amsterdam, Netherlands (2019). https://doi.org/10.1145/3355369.3355581. https://dl.acm.org/doi/10.1145/3355369.3355581

  27. Zhao, X., et al.: An analysis of BGP multiple origin AS (MOAS) conflicts. In: Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement, pp. 31–35 (2001)

    Google Scholar 

  28. Ziv, M., Izhikevich, L., Ruth, K., Izhikevich, K., Durumeric, Z.: ASdb: a system for classifying owners of autonomous systems. In: Proceedings of the 21st ACM Internet Measurement Conference, pp. 703–719. ACM, Virtual Event (2021). https://doi.org/10.1145/3487552.3487853. https://dl.acm.org/doi/10.1145/3487552.3487853

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiyi Chen .

Editor information

Editors and Affiliations

Appendices

A Information of RIR/NIR Whois

APNIC. The bulk Whois data of APNIC is public, while among 7 NIRs, only JPNIC and KRNIC publish their bulk Whois data. We learned from the APNIC helpdesk that if NIRs make further assignments within the NIR-maintained whois database, they may not be reflected in the APNIC Whois database.

20,127 ASes are delegated in the APNIC region including the ones delegated by the NIRs. aut-num (i.e., autonomous system number) and organisation are the AS-object and org-object in APNIC Whois, associated with the org field (i.e., org-id of the organization) in aut-num. However, 8,781 ASes in APNIC do not have org field (i.e., no related organization objects), where 99.4% of such ASes are registered in the countries of 7 NIRs. For these ASes, the descr (i.e., description) field in AS-objects carries the name of the owner organization without association by org-id. The descr field is mandatory [16], and all AS-objects have such field including the ones with associated organization-objects.

RIPE NCC. The bulk Whois data of RIPE is public. 37,672 ASes are delegated in the RIPE region, which is the most among the 5 RIRs. RIPE NCC has a similar structure as APNIC that there are aut-num and organization objects associated by org-id. Though no NIR exists in the RIPE region, there is still a small amount of ASes (108 ASNs) without associated organizations, whose holder organization is in the descr field. Different from APNIC, the descr field is not mandatory and only 3,962 ASes in RIPE have this field.

AFRINIC. The bulk Whois data of AFRINIC is public. AFRINIC allocates the least AS numbers among RIRs, where only 2,168 ASes are delegated in the AFRINIC region. The Whois structure of AFRINIC is similar to APNIC and RIPE but more consistent: all aut-num objects have org fields associating with org-objects and the descr field is also mandatory in AFRINIC.

ARIN. The access to ARIN bulk Whois data needs an application (we get access for this work). 31,446 ASes are delegated in the ARIN region. ARIN uses its own format of Whois [8]: ASHandle and OrgName are two main objects, associated by OrgID. AS-objects does not have the descr field and every ASN-object has an associated org-object.

LACNIC. The access to LACNIC bulk Whois data needs an application (we do not get access for this work). 12,740 ASes are delegated in the LACNIC region. To compare CA2O with LACNIC Whois, we conduct a web scraping on the LACNIC official webpage for Whois to collect the Whois mappings.

B Details of Keywords Function

We implement two lists of stop-words, where the first list contains the words that can not be used to identify an organization, while the second list might be useful for some time. The first list contains apnic, enterprise, asn, sas, as, information, ap, pvt, university, jpnic, jsco, telecom, and, bvba, autonomous, ltda, services, for, op, backbone, telekom, based, ohg, de, gmbh, technologies, lacnic, pt, legacy, inc, company, the, technology, of, llc, sdn, organization, afrinic, com, idnic, bhd, da, international, corporation, twnic, limited, research, or, aka, pty, service, solutions, me, arin, ltd, jsc, in, org, ripe.

The second list contains health, communication, tecnologia, data, network, comunicacao, center, coop, hospital, australia, bank, servi, servers, sg, telecomunica, el, northern, north, net, en, me, systems, sdn, telecommunications, telecomunices, telecommunication, east, eu, uab, education, info, de, public, silva, exchange, world, serv, college, communications, eng, western, digital, hosting, apac, city, southern, yue, internet, broadband, asia, link, route, uk, consumo, provedora, networks, japan, tech, ag, west, sp, cloud, web, co, telecomunicacoes, os, servicos, ab, ix, comunica, tel, publicos, telefon, experimental, yu, europe, connect, eastern, south, computing, group, county, global. In addition, we add the names of countries and the two-letter country codes to the second list.

For each set of extracted English keywords, we first filter out the words in the first list. Then we examine if all the remaining words exist in the second list. If so, we do not use the second list; otherwise, we use the list to filter out part of the words.

C Manual Input Knowledge

1.1 C.1 Manual Input Pools in Sect. 4

We identified 8 CA2O.orgs during the semi-manual investigation, which are likely to be APNIC LIRs (211 ASes involved). The pool detection did not recognize them because none of the involved ASes maintain information in PDB. We list the names of them here: REANNZ Education and Schools; Internet Thailand Company Ltd.; ePLDT Inc.; CS Loxinfo Public Company Limited; Globe Telecom (GMCR,INC); Sky Internet; KSC Commercial Internet Co.Ltd.; Philippine Long Distance Telephone Co.

1.2 C.2 Manual Knowledge of admin-c in Sect. 5

We identified several pools that the CA2O.orgs are very likely to be APNIC LIRs, but the involved ASes have the same admin-c fields. For the sake of the accuracy of our dataset, we do not add admin-c as a feature for the ASes in these pools:

One Pool Containing of an NIR. IRINN (Indian Registry for Internet Names and Numbers) put their org-handle (RB486-AP) in admin-c fields for 11 ASes. We contacted IRINN and confirmed that it was a technical glitch that the system automatically set the IRINN nic-handle on the ASes delegated by IRINN if Whois server issue happened.

Six Pools Containing APNIC LIRs. We list the names of the APNIC LIRs here: United Information Highway; Eastern Telecommunications Philippines, Inc.; SingTel Optus Pty Ltd; True Internet Co.,Ltd. and TRUE INTERNET; Communications & Communicate Nepal Pvt Ltd; VOCUS PTY LTD.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, Z., Bischof, Z.S., Testart, C., Dainotti, A. (2023). Improving the Inference of Sibling Autonomous Systems. In: Brunstrom, A., Flores, M., Fiore, M. (eds) Passive and Active Measurement. PAM 2023. Lecture Notes in Computer Science, vol 13882. Springer, Cham. https://doi.org/10.1007/978-3-031-28486-1_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-28486-1_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-28485-4

  • Online ISBN: 978-3-031-28486-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics