Examining the Integrity of Apple’s Privacy Labels: GDPR Compliance and Unnecessary Data Collection in iOS Apps

: This study investigates the effectiveness of Apple’s privacy labels, introduced in iOS 14, in promoting transparency around app data collection practices with respect to the GDPR. Specifically, we address two key research questions: (1) What special categories of personal data, as regulated by the GDPR, are collected and used by apps, and for which purposes? (2) What disparities exist between app-stated permissions and the apparent unnecessary data gathering across various categories in the iOS App Store? By analyzing a comprehensive dataset of 541,662 iOS apps, we identify common practices related to prevalent use of sensitive and special categories of personal data, revealing widespread instances of unnecessary data collection, misuse, and potential GDPR violations. Furthermore, our analysis uncovers significant inconsistencies between the permissions stated by apps and the actual data they gather, highlighting a critical gap in user privacy protection within the iOS ecosystem. These findings underscore the need for stricter regulatory oversight of app stores and the necessity of effective privacy notices to build accountability and trust and ensure transparency. This study offers actionable insights for regulators, app developers, and users towards creating secure and transparent digital ecosystems.


Introduction
Privacy has become a major problem for society in modern times due to the pervasiveness of technology in our daily lives.While cellphones and associated apps have significantly improved communication and convenience, they have also given rise to serious concerns regarding the security of personal data.When people use a variety of apps, each requesting access to their personal information, the opaqueness of the regulations governing data usage is a serious worry.Privacy regulations are more important than ever since people are demanding more transparency and control over their personal data.Nevertheless, consumers frequently find it difficult to completely understand these activities due to the complex nature of typical privacy regulations.Privacy labels surfaced as a possible remedy in reaction to these issues.Privacy nutrition labels were first introduced by Kelley et al. [1] with the goal of providing a clear and succinct summary of privacy policies to improve users' visual comprehension.
Apple introduced Apple Privacy Labels in the App Store in 2020, and Google released Data Safety Sections in the Google Play Store soon after.Apple launched its app privacy labels, which purportedly help users better understand an app's privacy practices before they download the app on any Apple platform [2].All software products available on the App Store now have privacy labels, even desktop applications.However, for the sake of this paper, we will only be discussing mobile apps.Apple divides data into fourteen categories, each with a unique name and icon, to make it easier for developers to summarize how an app handles private, sensitive data.These data types combine related or comparable pieces of information; for instance, the identifiers category includes User ID and Device ID [3].Next, the privacy labels page groups the app's data practices into three primary categories, each of which is displayed on a distinct card according to how it is used: "data used to track you", "data linked to you", and "data not linked to you" [4].
These labels are intended to make it easier for end users to understand how apps handle data [5]; they provide an alternative to wading through lengthy privacy rules that are rarely read.But just as they frequently do with privacy policies, there is a worry that users may ignore these new, potentially simplified privacy labels.This could result in a false sense of security or a lack of awareness regarding the impact on their privacy, which varies greatly from person to person.Furthermore, it is possible that developers may fail to disclose their true data practices if the labels comply with GDPR.
The EU General Data Protection Regulation (GDPR), in Article 9, defines special categories of data [6], which require higher protection due to their sensitive nature.These categories include data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person's sex life or sexual orientation.The GDPR prohibits the processing of these categories unless one of the exceptions are fulfilled-such as asking for explicit consent.The GDPR, in Article 35 [7] also requires conducting a data protection impact assessment (DPIA) to assess the risks to the "rights and freedoms" of individuals that the processing of such data may produce.
Additionally, from the perspective of individuals, certain categories are considered particularly sensitive.These include data related to photos and videos, finance and navigation.Photos and videos can reveal a lot about a person's private life, finance data include sensitive information about an individual's economic activities and status, and navigation data can track a person's location and movements.The importance of protecting such sensitive data highlights the need for effective privacy labels and regulations to ensure that users are fully aware of how their data are being collected and used.
The labels also bring up new research concerns, like examining the effectiveness, usefulness and usability of the labels in real-world settings and the discrepancy between the privacy choices that mobile app users have access to in permission managers and the disclosures made on the labels.However, there has not been much research conducted to determine how well people's privacy concerns and inquiries are addressed by the content of the present mobile app privacy labels.
In light of the aforementioned findings, the following research questions were identified and are examined in this paper: 1.
What sensitive categories of personal data, as might be understood by individuals, and special categories, as regulated by the GDPR, are collected and used by apps, and for which purposes? 2.
What disparities exist between app-stated permissions and the apparent unnecessary data gathering across various app categories in the iOS App Store?

Background
Privacy labels on Apple's App Store [8] provide users with clear information about the data collected by apps, including fourteen distinct types of data, such as identifiers, location data, contact info, user content and financial information.These data types, each represented with a unique name and icon, help users understand what personal information an app might collect.Data collected by apps are categorized into three main groups to clarify how personal information is handled (Figure 1): Data Not Linked to You: Covers anonymized data where user identity references have been removed, such as anonymized usage statistics.
Each category is designed to highlight the varying privacy risks associated with different types of data usage.The privacy labels also indicate the purpose behind data collection, which can range from analytics and app functionality to product personalization and advertising.Users can find detailed information about the collected data and their purposes by accessing the "details" section on the privacy labels page.Data Used to Track You: This includes data used for tracking user or device data across different companies for targeted advertising, such as browsing history or app usage patterns.
Data Linked to You: This involves data that can be directly tied to the user's identity, including account, device or contact details like email addresses and phone numbers.
Data Not Linked to You: Covers anonymized data where user identity references have been removed, such as anonymized usage statistics.
Each category is designed to highlight the varying privacy risks associated with different types of data usage.The privacy labels also indicate the purpose behind data collection, which can range from analytics and app functionality to product personalization and advertising.Users can find detailed information about the collected data and their purposes by accessing the "details" section on the privacy labels page.
With the introduction of iOS 14.5, Apple required app developers to seek user consent for tracking through the App Tracking Transparency (ATT) framework, ensuring users have control over their privacy settings.Special Categories of Data Under GDPR Article 9 [9], certain personal data are classified as "special categories" due to their sensitive nature, requiring enhanced protection.These include data on racial or ethnic origin, political opinions, religious beliefs, genetic and biometric data, health information and data concerning sexual orientation.
Processing such data is generally prohibited unless explicit consent is obtained or if it is necessary for substantial public interest.
In contrast, "sensitive data" refers to personal data that, while not classified as special under the GDPR, still require protection due to its nature, such as financial data or contact details.The regulatory requirements for these data vary depending on the jurisdiction.
Special categories under the GDPR are strictly regulated due to their potential impact on individual rights and freedoms, whereas sensitive data, though important, do not fall under these stringent GDPR conditions unless they relate to the defined special categories.This distinction underlines that while all GDPR special categories are sensitive, not all sensitive data are categorized as special under the GDPR.

Literature Review and State of the Art
We cover four primary themes in this section's discussion of material pertinent to our study: the literature about iOS privacy labels, iOS and Android label comparisons, whether or not privacy labels are helpful to users and why low-popularity apps have missing privacy labels.

iOS Privacy Labels
Understanding privacy labels is essential for users to make informed decisions.The four-level hierarchy of privacy labels provides a structured way to interpret the information presented.Users can start by checking the top-level privacy types to determine if an app collects any user data.If an app falls under the "No Data Collected" category, it means the app does not collect any user data [10].However, numerous iOS applications continue to collect machine data that could be used to track users, according to Kollnig et al. (2022) [5].According to a survey [5], 22.2% of apps said they did not gather user data, a claim that is frequently refuted by more investigation.Analysis revealed a disparity between stated and actual practices: 68.6% of these apps transferred data to a monitoring site upon first start, and 80.2% of these apps contained at least one tracker library.These apps, on average, had fewer tracking libraries and made less contact with tracking businesses than those that acknowledged collecting data; this indicates a notable lack of transparency in privacy procedures.Reference [3] also observed that on average, there were 37,450 newly added apps and 38,053 removed apps per week.During the end of the collection period, 60.5% of apps had a privacy label and the remaining 39.5% of apps did not have privacy labels.The author considered the apps that have privacy labels and noticed that they track user data, which has increased by 3733 apps on average each week for a total increase of 47,658 apps.
The number of apps that link data to users' identities increased by 9169 apps on average each week for a total increase of 103,886 apps.But most new apps use the Data Not Collected privacy type.These increased on average 12,597 apps per week for a total increase of 125,333 apps [3].Users have a difficult time finding apps that are truly concerned about their privacy, as seen by the limited percentage of these apps that were featured in the App Store charts.The discrepancy between apps' stated permissions and their actual data collection practices offers a critical entry point for our research on unnecessary data gathering across various iOS App Store categories.This gap not only raises questions about the integrity of app privacy disclosures but also underscores the importance of scrutinizing the privacy practices of apps that claim not to collect data, revealing a crucial area for deeper investigation and analysis in the quest for genuine transparency and user protection in digital environments.
It is found that among free applications, 40.55% are used for developer advertising, 78.92% are used for collecting data and 36.41% are used for third-party advertising.Free applications perform 2.72 usage of linked data, which are anonymized, whereas paid applications perform 0.43 usage of linked data (Scoccia et al. 2022) [4].Therefore, it can be said that paid applications have a better ratio than free applications on the Apple store.The results section from [11] shows that 51.6% of iOS applications did not have any privacy labels as of 2021.Though 35.5% of applications had already generated privacy labels, just 2.7% of iOS applications had developed privacy labels without the features of app updates (Li et al. 2022 [11]).Even, the changing rates appeared to be slower over time.The overall low level of adoption in terms of privacy labels makes this label system comparatively less useful for customers (users).The fact that "half of the iOS applications have only view labels" underscores a critical problem in how app developers approach the transparency and disclosure of their data collection practices.This situation is further compounded by developers' passive attitude towards privacy [12] and a lack of awareness or misconceptions about how to create effective privacy labels [11].Such attitudes and misunderstandings can lead to inadequate compliance with privacy regulations, including the General Data Protection Regulation (GDPR), which is especially critical when it comes to handling sensitive data categories.

How Developers Talk about Personal Data and What It Means for User Privacy: A Case Study of a Developer Forum on Reddit
Ref. [12] examines the discussions on personal data by Android developers on the /r/androiddev forum on Reddit, exploring how these discussions relate to user privacy.The paper employs qualitative analysis of 207 threads (4772 unique posts) to develop a typology of personal data discussions and identify when privacy concerns arise.The research highlights that developers rarely discuss privacy concerns in the context of specific app designs or implementation problems unless prompted by external events like new privacy regulations or OS updates.When privacy is discussed, developers often view these issues as burdensome, citing high costs with little personal benefit.
Ref. [12] suggest that privacy-related discussions are reactive rather than proactive among developers, who tend to address privacy concerns only when they are forced to do so by external factors.Risky data practices, such as sharing data with third parties or transferring data off the device, are frequently mentioned without corresponding discussions on privacy implications.The study concludes by offering recommendations for improving privacy practices, such as better communication of privacy rationales by Android OS and app stores, and encouraging more privacy-focused discussions in developer forums.
The research contributes to understanding the challenges developers face regarding privacy and highlights the need for better tools, guidance and community practices to support more privacy-conscious app development.The study underscores the importance of proactive privacy discussions and offers actionable suggestions for enhancing privacy practices within the Android development community.

Comparison of Privacy Labels in iOS and Google Store
There has been research that helps us to compare privacy labels in the Apple App Store and Google Play Store.Since these two are different platforms, privacy label policies also differ.A comparison between these two platforms can help the users understand how both platforms handle data collection and privacy practices.The comparison can also help users understand which platform can help users prioritize their privacy better.
In [10,13], the authors analyzed the privacy labels on the Apple App Store and Google Play Store to understand how mobile apps handle user privacy.Reference [10] compares the privacy labels on both platforms; however, ref. [13] deep-dives into the practices that are reported in privacy labels along with privacy label comparisons on both platforms.Considering both the research papers are based on a comparison of privacy labels on both platforms, it is observed that [13] provides more detailed insights based on app popularity, age rating and price, as well.Since the authors were performing the comparison between both platforms, it was essential first to identify the apps that were present on both platforms.In [10], the authors were only able to compare 822 apps that are present on both the platforms and the findings showed us that there were mismatches in terms of data types collected on both iOS and Apple iOS App Store.The precise location is collected more on iOS and platforms and less on Android platforms.The approximate location is collected more by Android than iOS.This tends to explain that iOS is more oriented towards the precise location and Android is oriented towards the approximate location.Another mismatch can be seen regarding device ID and user ID.When it comes to device ID, Android collects it more than iOS, and when it comes to user ID, then iOS collects more than Android.
The below figure shows the difference between data types that are collected for both platforms in [10].
As we can see in Figure 2, there is a big gap between precise location, coarse location, user ID and device ID.Also, when it comes to sensitive info, iOS has only 7 apps that match the disclosure, while the other 16 differ.In [13], the authors filtered out the data and performed a comparison analysis on 100k apps.The authors found that while comparing the Apple privacy labels with Google Play Store privacy labels, 60% of the cross-listed apps had at least one inconsistency.We further say that inconsistencies are highest for sensitive information, browsing history, and email or text message data types [13].The authors also identified an inconsistency wherein developers report data collection for two different purposes.In an app called Twitch TV, the purpose of data type purchase history on Google Play Store is app functionality.On the iOS App Store, it is mentioned as analytics and personalization purposes.
In [10,13], the authors analyzed the privacy labels on the Apple App Store and Google Play Store to understand how mobile apps handle user privacy.Reference [10] compares the privacy labels on both platforms; however, ref. [13] deep-dives into the practices that are reported in privacy labels along with privacy label comparisons on both platforms.Considering both the research papers are based on a comparison of privacy labels on both platforms, it is observed that [13] provides more detailed insights based on app popularity, age rating and price, as well.Since the authors were performing the comparison between both platforms, it was essential first to identify the apps that were present on both platforms.In [10], the authors were only able to compare 822 apps that are present on both the platforms and the findings showed us that there were mismatches in terms of data types collected on both iOS and Apple iOS App Store.The precise location is collected more on iOS and platforms and less on Android platforms.The approximate location is collected more by Android than iOS.This tends to explain that iOS is more oriented towards the precise location and Android is oriented towards the approximate location.Another mismatch can be seen regarding device ID and user ID.When it comes to device ID, Android collects it more than iOS, and when it comes to user ID, then iOS collects more than Android.
The below figure shows the difference between data types that are collected for both platforms in [10].
As we can see in Figure 2, there is a big gap between precise location, coarse location, user ID and device ID.Also, when it comes to sensitive info, iOS has only 7 apps that match the disclosure, while the other 16 differ.In [13], the authors filtered out the data and performed a comparison analysis on 100k apps.The authors found that while comparing the Apple privacy labels with Google Play Store privacy labels, 60% of the cross-listed apps had at least one inconsistency.We further say that inconsistencies are highest for sensitive information, browsing history, and email or text message data types [13].The authors also identified an inconsistency wherein developers report data collection for two different purposes.In an app called Twitch TV, the purpose of data type purchase history on Google Play Store is app functionality.On the iOS App Store, it is mentioned as analytics and personalization purposes.The overall conclusion in Paper 8 suggests that Apple's privacy label does not distinguish between data collection and sharing.Apple's privacy label is more explicit about data practices like link ability, third-party advertising and tracking.In contrast, data safety sections lack these details but do inform the users about the safety of their The overall conclusion in Paper 8 suggests that Apple's privacy label does not distinguish between data collection and sharing.Apple's privacy label is more explicit about data practices like link ability, third-party advertising and tracking.In contrast, data safety sections lack these details but do inform the users about the safety of their data (data encryption) and the choices that they have with developers (data deletion option) [13].In terms of security practices on Google Play Store, the authors revealed that 23% of apps do not provide any details of their security practices [12] and 65% of apps encrypt the data they collect or share while in transit [13].It was also found that 42% declare data collection on the Google Play Store, while 58% do so on the iOS platform.In terms of popularity, we noticed that for the Google Play Store, 76% of the high-popularity apps have privacy labels while for low-popularity apps, only 42% have privacy labels.The app KineMaster-Video Editor, a video editing application with over 400M+ downloads on Google Play Store, claims not to collect any data in the Play Store, but on the App Store, it asserts the collection of sensitive data such as location and identifiers [13].In terms of price, it was noticed that in Google Play Store, 68% of the paid apps have labels, whereas for free apps, only 46% have labels; when it comes to Apple iOS App Store, a similar trend was observed where paid apps have greater numbers of privacy labels compared to free apps [13].Here, we can conclude that, apart from data privacy label comparison, [13] deep-dived into more granular details as compared to [10].Both research papers showed the data practices performed on both platforms; however, the authors did cover whether the data practices that are disclosed by both platforms are in line with the GDPR.Also, refs.[10,13] did not provide any findings that would help us identify whether any of the apps were performing unnecessary data collection.

Do Privacy Labels Help Users?
It is very important to understand whether privacy labels can answer the questions users have.The main purpose for the introduction of privacy labels is to allow for users to know an app's data practices.For the same reason, a research paper was published by Shikun Zhang and Norman Sadeh from Carnegie Mellon University [14] in 2023 to analyze whether privacy labels resolve users' privacy questions.The authors used a corpus of questions that were published in a research paper [15] combining computational and legal perspectives.The authors first tried to understand the nature of those questions and transformed them into themes, ending with 18 themes and 67 codes.The study aimed to evaluate whether the questions asked by users regarding privacy can be answered using the privacy labels on Google Play Store and Apple iOS App Store.The authors analyzed each code to determine whether questions under each sub-theme could be answered using the privacy labels provided by Google or Apple.The authors of [14] found that the most common questions asked by users were in relation to data collection.Other themes comprised app security, data sharing, data selling, permissions and app-specific privacy features.it was found that approximately 40% of question themes could be answered by the labels.Google Play labels provided more coverage, addressing additional data types and security-related questions as compared to iOS labels.However, iOS labels provided more information regarding data selling practices.Several question themes, such as permissions, data retention, external access, account requirements and cookies policy, were not addressed by either iOS or Google Play labels.These themes represent areas where users' privacy concerns may not be adequately addressed by current label designs.
The findings provide insights into potential improvements needed in label design to better align with users' mental models and address their privacy concerns effectively.Based on the study, we saw only 40% of the question themes could be answered by current privacy labels, which shows us that there is still room for improvement.Reference [14] indicates that there are significant gaps in addressing users' privacy concerns; thus, we attempt to fill this research gap through an understanding of user perception.In our research, we explore the formats of privacy labels across different apps and identify inconsistencies.As a part of our research, we also employ campaigns/surveys that help us understand user perception and raise awareness among users in terms of understanding privacy labels, and empower users to make more informed decisions about their app usage.

Why Do Low-Popularity Apps Have Privacy Labels Missing?
One striking observation revealed in a research paper [13] shows us that for apps with high popularity, 76% of them had privacy labels; however, for apps with low popularity only 42% of them had privacy labels.While the popular apps provide robust privacy disclosure, a disparity exists where the apps with less popularity lack privacy label disclosures.Reference [13] makes clear that for less popular apps, half of the apps had privacy labels missing.This brings out the need to analyze the factors that influence privacy labelling practices with apps that are less popular.Reference [16], published in 2022, helps us under-stand the challenges faced by small enterprise businesses regarding privacy labels.The authors conducted three interviews with three SMEs from the retail, culture and media sectors.Two sessions were arranged for each SME where the objective of the first session was to introduce the SMEs to online SERIOUS tools, which helped in creating a sensible SERIOUS privacy label.The second session was based on an interview with each SME where the questions were based on the first session, use of privacy labels and facilitation of privacy label deployment.
The results of the second session highlighted some points: The first was that since the participants belonged to different sectors, so should they use separate privacy labels.The second point that was brought to light is that these SMEs did not have in-house capacity for privacy label generation and thus used online third parties.Also, it was noticed that the SMEs did not have a specific person responsible for handling privacy labels.The authors recommended that service providers develop automated systems, tools and architectures that help estimate privacy practices/labels based on the operational behavior of the corresponding service [16].The authors mentioned that privacy labels and the tools used to develop them need to be adopted by both label-issuing enterprises and label-consuming parties.The authors also expressed the need for further research on the possible ways of enhancing label adoption in label-issuing enterprises.
The research paper shows us that there is still a need for better privacy labels.SMEs providing online services and applications need to offer transparency regarding each privacy practice.Moreover, there is a need to introduce a trusted party (third party) to monitor and supervise ongoing processes.

How Usable Are iOS Privacy Labels?
Reference [17] provides a comprehensive analysis of the usability and effectiveness of Apple's iOS app privacy labels, introduced with iOS 14, within the broader landscape of privacy communication.These labels were designed as a more accessible alternative to traditional privacy policies, which are often criticized for their length, complexity and general lack of engagement from users.The study situates itself within the "Notice and Choice" framework of U.S. privacy law, which traditionally relies on users being informed and making choices based on detailed privacy policies.
The survey begins by reviewing the key literature on privacy notices, identifying essential criteria for effective privacy communication: readability, comprehensibility, salience, relevance and actionability.It highlights the shortcomings of traditional privacy policies and explores the emergence of alternative approaches, such as standardized and simplified notices, which aim to make privacy information more digestible and actionable for users.
Ref. [17] then focuses specifically on the concept of privacy labels, which are intended to function similarly to nutrition labels on food products, offering a concise summary of how an app handles user data.The authors in [17] discuss the theoretical potential of these labels to improve user understanding and control over their privacy, particularly in the mobile app ecosystem, where privacy concerns are especially pertinent.
To evaluate the real-world effectiveness of iOS privacy labels, the authors conducted an empirical study involving in-depth interviews with 24 iPhone users.The findings reveal a range of user experiences, with many participants expressing confusion and frustration with the labels.Despite their intended purpose, the labels often failed to clearly communicate the necessary information or to empower users to make informed privacy decisions.Common issues included misunderstandings of the labels' content, perceived inconsistencies with app behaviors, and a general lack of actionable guidance.
The survey concluded by offering recommendations for enhancing the design and implementation of privacy labels.These included using clearer and more straightforward language, better integrating the labels with app permission settings, and refining the visual and interactive aspects of the labels to make them more user-friendly.The paper's findings contribute to the ongoing discussion on improving privacy notice design, particularly in the context of mobile applications, and highlight the persistent challenges in creating privacy tools that effectively bridge the gap between technical information and user understanding.

Helping Mobile Application Developers Create Accurate Privacy Labels
Reference [18] explores the complex challenge of ensuring that mobile application developers can produce accurate privacy labels, a requirement introduced by Apple in December 2020.These privacy labels are designed to inform users about the data collection and sharing practices of apps they use.However, many developers struggle to complete these labels accurately due to a lack of expertise in privacy regulations and the complexities introduced by third-party software development kits (SDKs) and libraries, which are often integral to app development.
The authors identify several key obstacles that developers face when attempting to create these privacy labels.One significant issue is the difficulty in understanding the data collection behaviors of third-party components within their apps.These components can introduce data practices that developers might not be fully aware of, leading to inaccuracies in the privacy labels.Additionally, developers often lack the necessary privacy expertise to correctly interpret and implement the requirements for these labels, resulting in widespread inaccuracies that could have legal, regulatory, and reputational consequences.
To address these challenges, the authors developed and evaluated a tool called Privacy Label Wiz (PLW).PLW is an enhanced version of an earlier tool, Privacy Flash Pro, and is designed to help iOS developers create more accurate privacy labels by integrating static code analysis with interactive user prompts.The tool scans the app's codebase to identify potential data collection practices and then guides the developer through a series of questions and prompts to clarify and confirm these practices.This process is intended to help developers better understand their apps' data flows and ensure that the privacy labels they produce accurately reflect these practices.
Reference [18] details the iterative development process of PLW, which involves gathering feedback from semi-structured interviews with developers.These interviews provided valuable insights into the difficulties developers face and informed several key design decisions for PLW.For example, the tool was designed to integrate seamlessly into developers' existing workflows, minimizing disruption and making it easier for developers to use it effectively.The authors also discuss the tool's evaluation, which showed that PLW could significantly improve the accuracy of the privacy labels generated by developers.
In addition to describing the tool and its development, the paper makes several broader contributions to privacy engineering.It highlights the need for tools that are tailored to the specific challenges developers face when working with privacy regulations and underscores the importance of aligning these tools with typical software development practices.The paper concludes with suggestions for future work, including further refinement of tools like PLW and expanding support for other mobile platforms beyond iOS.
Overall, the study emphasizes the importance of providing developers with the right tools and resources to help them navigate the complexities of privacy regulations, thereby improving the accuracy of privacy labels and enhancing user trust in mobile applications.

Keeping Privacy Labels Honest
Reference [19] explores the effectiveness and reliability of Apple's privacy labels, which were introduced in December 2020.These labels require app developers to disclose the types of data their apps collect and the purposes for which the data are used.The study primarily investigates whether these privacy labels accurately reflect the data collection practices of the apps and whether developers comply with these self-declared labels.The authors conducted an exploratory statistical analysis of 11,074 apps across 22 categories from the German App Store.They found that a significant number of apps either did not provide privacy labels or self-declared that they did not collect any data.A subset of 1687 apps was selected for a "no-touch" traffic collection study.This involved analyzing the data transmitted by these apps to determine if they matched the information disclosed in their privacy labels.The study revealed that at least 276 of these apps violated their privacy labels by transmitting data without declaring it.Reference [19] also assessed the apps' compliance with the General Data Protection Regulation (GDPR), particularly regarding the display of privacy consent forms.Numerous potential violations of the GDPR were identified.The authors developed infrastructure for large-scale iPhone traffic interception and a system for automatically detecting privacy label violations through traffic analysis.Ref. [19] concluded that Apple's privacy labels are often inaccurate, with many apps transmitting data not disclosed in their labels.The findings suggest that there is no validation of these labels during the Apple App Store approval process, leading to potential privacy violations and non-compliance with GDPR.The paper emphasizes the need for more rigorous enforcement and verification of privacy labels to protect users' data effectively.The study provides a critical evaluation of the effectiveness of privacy labels and highlights significant gaps in their implementation and enforcement.

ATLAS: Automatically Detecting Discrepancies between Privacy Policies and Privacy Labels
Ref. [20] introduces a novel tool, the ATLAS (Automated Privacy Label Analysis System), which is designed to identify discrepancies between privacy policies and privacy labels in iOS apps using advanced natural language processing (NLP) techniques.The study reveals a concerning finding, that 88% of the apps analyzed with both available privacy policies and labels exhibit at least one discrepancy, with an average of 5.32 potential issues per app.These discrepancies often involve the types of data collected, the purposes for data use, and data-sharing practices, pointing to significant gaps between what apps disclose in their privacy labels and what is outlined in their privacy policies.
The ATLAS serves as a critical resource for developers, regulators and researchers, providing a way to automatically detect and address these inconsistencies, thereby improving privacy transparency and compliance in the mobile app ecosystem.The study highlights the potential of the ATLAS to enhance user trust by ensuring that privacy labels accurately reflect the practices detailed in privacy policies, thus supporting regulatory efforts to protect user privacy.
In conclusion, ref. [20] underscores the importance of addressing the identified discrepancies to improve the accuracy of privacy disclosures in mobile apps.The researchers suggest that the ATLAS could be further developed to cover more platforms and languages, potentially broadening its impact.They also call for stronger regulatory oversight to ensure that privacy labels are not just a formality but a true reflection of an app's data practices.The authors believe that by using tools like the ATLAS, the industry can move towards greater transparency and accountability, ultimately fostering a more privacy-respecting digital environment.

Study Design
The main aim of our study is to explore how iOS apps on the Apple App Store handle special, sensitive data, providing insights that can aid regulators, app store management and users in making better-informed decisions.This study consists of 541,662 apps published on iOS app store as of November 2023.The goal of this study is to analyze privacy labels to identify any limitation they pose.
Initially, we conducted a literature review to identify existing research papers on privacy labels.Although we found relevant studies, none of them included attached data.This led us to contact other academics who were also collecting data on the Apple App Store.These academics provided us with data in JSON format, collected in November 2023, encompassing privacy information for 541,662 iOS apps.Each app's privacy details are stored in separate JSON files, specifying the categories of data collected.To promote further research, we will upload these data to GitHub in the interest of reproducibility and further reuse of the data.
The JSON files we received comprised 541,662 individual JSON files, one for each app.We developed a Python function to extract information from these JSON files and convert it into a structured format suitable for analysis, combining respective fields such as data linked to you, data not linked to you and data tracking you.This process involved parsing each JSON file to identify various privacy-related aspects, such as the types of data collected (e.g., location, contact info), the purposes of data collection (e.g., analytics, app functionality) and the categories of data usage (e.g., data used to track you, data linked to you, data not linked to you).We then used binary encoding to indicate the presence or absence of specific data types and purposes within each app's privacy label.For instance, if an app collected location data, it was encoded as "1"; otherwise, it was encoded as "0".Similarly, data used for analytics were encoded as "1" if applicable.The encoded data were then stored in a .csvfile for further analysis.
The choice of binary encoding was driven by its simplicity and clarity, which aids in straightforward statistical analyses and visualizations, and ensures consistency across the dataset, making comparisons between different apps and categories feasible.Additionally, this method enhances analytical flexibility, allowing for various analytical techniques such as frequency analysis, cross-tabulations, and visual representations.This comprehensive approach enables a thorough examination of data practices.Lastly, the encoded data in the .csvfile were stored in an SQLite database, which facilitated efficient querying and management, helping aggregate data and extract meaningful insights regarding privacy practices across different app categories.
The collected and structured data were then analyzed by identifying specific queries through analysis of the literature and performing a requirements analysis.In this exercise, we analyzed the information available to us in the app store and the privacy label, and formulated questions whose answers would illuminate the state of data collection and privacy within the app's use of data and the impact on privacy.To ensure the questions were diverse and reflected a well-grounded research approach, we compared them with the existing literature that analyzed iOS apps to identify which questions were repeated (i.e., prior work that had also investigated them) and which were novel.For the questions that were repeated, we sought to validate existing analysis in terms of findings, and for novel ones, we attempted to formulate theories as to their reason for occurrences and impacts.

Analysis and Results
The results of our analysis are shown in this section, arranged in accordance with the relevant research question.Table 1 shows us the information related to our dataset.Based on the GDPR, we found that the "medical" and "health and fitness" categories contain special categories of personal data, while the "finance", "photo and video" and "navigation" categories contain sensitive categories of personal data.Here is a detailed breakdown of the data usage within these categories: Medical: In this category (Table 2), data linked to users is distributed across various purposes, with the majority (83.4%) being used for developer advertising.This significant portion, amounting to 109,917 apps, highlights the emphasis on supporting in-app advertisements and marketing efforts by developers.Additionally, 9.6% (12,610 apps) of the data ensures app functionality, crucial for the app's operational effectiveness.Analytics data account for 3.3% (4373 apps), used to understand user interaction and improve the app experience.Minor portions are dedicated to other purposes (1.0%, 1340 apps), product personalization (2.3%, 3089 apps) and third-party advertising (0.4%, 463 apps), ensuring a tailored user experience and supporting external marketing.Health and Fitness: In contrast, the health and fitness category demonstrates a more balanced data distribution.App functionality is the largest segment, comprising 53.1% (65,535 apps) of the data, emphasizing the importance of maintaining a smooth and effective service.Analytics data make up 21.3% (26,242 apps), crucial for monitoring and enhancing user interaction and performance.Product personalization, accounting for 12.3% (15,149 apps), plays a significant role in tailoring the user experience.Developer advertising constitutes 9.4% (11,588 apps), and other purposes take up 1.9% (2290 apps).Third-party advertising is relatively minimal, at 2.1% (2609 apps), indicating a lesser focus on external marketing compared to internal app functionality and personalization efforts.
In comparison to these special categories, the sensitive categories of finance, photo and video, and navigation exhibit different patterns of data usage.
Finance: In the finance category (Table 3), data linked to users are distributed as follows: 47.87% (46,201 apps) ensures app functionality, highlighting the critical need for operational effectiveness.Developer advertising constitutes 13.26% (12,789 apps), supporting inapp advertisements and marketing efforts.Analytics data account for 19.25% (18,580 apps), used to understand user interaction and improve the app experience.Other purposes cover 3.83% (3700 apps), while product personalization takes up 13.86% (13,379 apps) to tailor the user experience.Third-party advertising represents a smaller portion at 1.93% (1861 apps).Photo and Video: In this category, the data distribution is as follows: App functionality comprises 34.97% (3048 apps), essential for the app's smooth operation.Developer advertising accounts for 7.33% (639 apps), and analytics to enhance user interaction make up 26.54% (2313 apps).Other purposes represent 2.93% (255 apps), while product personalization is 9.66% (842 apps).Third-party advertising constitutes a significant portion at 18.56% (1618 apps), indicating a strong emphasis on external marketing.
In summary, while both special categories (medical, and health and fitness) and sensitive categories (finance, photo and video, and navigation) prioritize app functionality and developer advertising, their focus and distribution patterns vary.The medical category is heavily skewed towards advertising, with minimal emphasis on personalization and other purposes.The health and fitness category balances its data usage across functionality, analytics and personalization, with a smaller proportion dedicated to advertising.In contrast, the sensitive categories like finance, photo and video, and navigation exhibit a more varied distribution, reflecting different operational, marketing and user engagement strategies employed by apps in these categories.These differences highlight the distinct approaches taken by apps in managing and utilizing user data to meet their specific operational and marketing objectives.
In the landscape of iOS app categories, medical apps and health and fitness apps stand out for their notably abundant data collection practices.These categories gather significantly more data compared to other app categories such as finance, navigation, and photo and video.This trend suggests a strong emphasis on analytics, app functionality, and advertising, leading to privacy concerns among users.The findings reveal that medical apps and health and fitness apps collect significantly more data-almost 10-15 times more than navigation apps and photo and video apps.Such extensive data collection raises serious privacy concerns, underlining the need for app developers to adopt transparent data usage policies and enhance user consent mechanisms.Users are advised to exercise caution with permissions and settings when using these apps due to their high data collection rates.

"Data Not Linked to You" Using Sensitive and Special Categories
For data not linked to you, we used the same approach as outlined in Section 5.1.1 to identify special and sensitive categories of data.
Medical Category: In the medical category (Table 4), data not linked to users are distributed across various purposes.The majority (47.09%) are used for app functionality, amounting to 6042 apps.Analytics data account for 35.10% (4505 apps), used to understand user interaction and improve the app experience.Minor portions are dedicated to other purposes (5.67%, 728 apps), product personalization (7.64%, 980 apps) and third-party advertising (2.43%, 312 apps).Developer advertising constitutes the smallest portion (2.06%, 264 apps).The total number of data in the apps is 12,831, with app functionality being the highest count at 6042.Health and Fitness category: In the health and fitness category, data not linked to users are distributed with app functionality taking the largest share at 46.61% (16,631 apps).Analytics data account for 32.63% (11,641 apps), used for monitoring and enhancing user interaction.Product personalization makes up 8.72% (3110 apps) and developer advertising constitutes 3.29% (1173 apps).Other purposes cover 4.08% (1454 apps), and third-party advertising represents 4.68% (1670 apps).The total number of data in the apps is 35,679, with app functionality being the highest count at 16,631.
In comparison to these special categories, the sensitive categories in finance, photo and video, and navigation show following patterns: Finance Category: In the finance category (Table 5), data not linked to users are distributed as follows: 45.09% (13,681 apps) ensure app functionality, highlighting the critical need for operational effectiveness.Analytics data constitute 32.23% (9780 apps), used to understand user interaction and improve the app experience.Developer advertising accounts for 3.20% (972 apps), while other purposes cover 9.01% (2732 apps).Product personalization takes up 7.05% (2140 apps), and third-party advertising represents a smaller portion at 3.42% (1037 apps).The total number of data in the apps is 30,342, with app functionality being the highest count at 13,681.Photo and Video: In the photo and video category, the data distribution is as follows: App functionality comprises 31.61%(3579 apps) of the total.Analytics data make up 37.26% (4218 apps), highlighting their importance in understanding user behavior.Developer advertising accounts for 5.16% (584 apps), while other purposes cover 3.19% (361 apps).Product personalization represents 6.00% (679 apps), and third-party advertising constitutes 16.79% (1900 apps).The total number of data in the apps is 11,321, with analytics being the highest count at 4218.Navigation: In the navigation category, data not linked to users are mainly focused on app functionality, which makes up 45.61% (3149 apps).Analytics data represent 33.14% (2288 apps), crucial for improving app performance.Developer advertising accounts for 2.74% (189 apps), and other purposes cover 4.01% (277 apps).Product personalization constitutes 6.68% (461 apps), while third-party advertising represents 7.82% (540 apps).The total number of data in the apps is 6904, with app functionality being the highest count at 3149.
While both special categories (medical, and health and fitness) and sensitive categories (finance, photo and video, and navigation) prioritize app functionality and developer advertising, their focus and distribution patterns vary.The medical category is more focused on app functionality and analytics, with minimal emphasis on advertising and other purposes.The health and fitness category balances its data usage across functionality, analytics and personalization, with a smaller proportion dedicated to advertising.In contrast, the sensitive categories like finance, photo and video, and navigation exhibit a more varied distribution, reflecting different operational, marketing and user engagement strategies employed by apps in these categories.These differences highlight the distinct approaches taken by apps in managing and utilizing user data to meet their specific operational and marketing objectives.

"Data Used to Track You" Using Sensitive and Special Categories
Based on the GDPR, we analyzed the data used to track users across various categories.
The analysis reveals distinct patterns in data usage for tracking users across different categories.In the special categories, the medical category primarily uses identifiers and usage data, reflecting a focus on personal identification and user interaction.Similarly, the health and fitness category is dominated by identifiers and usage data, with additional emphasis on diagnostics and contact information to monitor and enhance user experience.In the sensitive categories, finance relies heavily on identifiers and usage data, essential for secure and efficient financial transactions, with significant roles for diagnostics and location data to ensure operational efficiency.The photo and video category also sees identifiers and usage data as predominant, highlighting the need for user identification and interaction tracking, while diagnostics and location data support app performance and user engagement.The additional finance category mirrors this distribution, with identifiers and usage data as the largest segments, crucial for secure financial operations and user experience monitoring.These differences highlight the distinct approaches taken by apps in managing and utilizing user data to meet their specific operational and marketing objectives.

RQ2: Disparity between App-Stated Permissions and Apparent Unnecessary Data Gathering
To address RQ2, we began by analyzing a comprehensive set of app categories available on the iOS App Store.Our study encompasses a total of 25 distinct app categories, including books, music, travel, social networking, shopping, games, entertainment, reference, medical, lifestyle, sports, finance, education, business, news, navigation, health and fitness, photo and video, utilities, productivity, food and drink, graphics and design, weather, magazines and newspapers, and developer tools.This broad classification enables us to explore data tracking practices across a wide array of application types, providing a thorough examination of how different categories justify or do not justify their data collection practices.By considering such a diverse range of app categories, we aim to gain a nuanced understanding of data tracking trends and justifications within the mobile app ecosystem.
Categories such as sports, education, books, medical, business, news, utilities, reference, productivity, graphics and design, magazines and newspapers, and developer tools typically do not require location tracking to deliver their core functionalities.Therefore, we specifically examined the data tracking practices for apps within these categories using the dataset from the iOS App Store's "Data Used to Track You" feature.The rationale behind excluding location tracking for these categories is that their primary functions are not inherently dependent on geographic information.For instance, an app designed for productivity or reference purposes does not need to access a user's location to offer its services effectively.By focusing on these categories, we aimed to identify and understand the types of data that are being collected and assessed whether such practices align with the actual needs of the app's functionality.
Additionally, we also investigated categories such as travel, social networking, entertainment, navigation, health and fitness, photo and video, and weather.Although location tracking may seem justifiable for some of these categories, we assessed them to determine whether it was indeed necessary or overextended.Our analysis aimed to reveal the extent and nature of data tracking within these categories to understand better how often location data are collected and whether their use aligns with the intended functionality of the apps.

Justification for Tracking
Books: Justified data tracking includes usage data, identifiers, purchases and diagnostics.However, unjustified tracking includes location (18.83%), contact info (8.83%), other data (3.83%),browsing history (2%), sensitive info (0.17%) and financial info (0.17%).The high percentage of location tracking is especially concerning, as book apps typically do not need location data to function effectively.
Music: Justified data tracking consists of usage data, identifiers, user content, purchases and diagnostics.However, the high percentages of location (38.17%) and contact info (5.52%) tracking are unjustified, given that music apps generally do not need such data.Additionally, other data (3.27%),browsing history (1.52%), search history (1.31%), contacts (0.15%), financial info (0.07%) and sensitive info (0.07%) also appear to be tracked unnecessarily.
Stickers: We noticed that nothing is tracked for the stickers app category.Out of a total of 541,662 iOS apps analyzed, 237 apps were identified to have a significant gap in their privacy labels.Specifically, these 237 apps are categorized under "data linked to you" and "data tracking you," yet they lack the mandatory privacy policy URLs.This omission raises several critical concerns regarding user privacy and compliance with regulatory standards.

Missing Privacy Labels
Among the 541,662 iOS apps that were examined, 83,618 were found to have no privacy labels at all.This significant discrepancy suggests that user data management procedures lack openness and compliance on a widespread basis.To promote openness and build user trust, privacy labels are crucial for telling consumers about the data that apps collect and how they are used.

User Survey on App Usage and Privacy Concerns
To identify how our analysis of the privacy labels corresponded to actual use of apps by individuals, we created a user survey to identify which apps are commonly used by people and what privacy concerns they have.Through this survey, we hoped to identify which of our analyzed privacy labels-and their shortcomings-had the most impact on individuals, and which of the identified privacy concerns were legitimate and could be addressed by the privacy label.
This study focused on people between the ages of 20 and 35 to identify app usage trends and privacy issues among Irish app users.This particular age range was selected to provide important insights into the habits and privacy concerns of this digitally active demographic of young adults who are frequently early adopters of technology and digital trends.Fifty individuals in this age range who lived in Ireland participated in this study to precisely document the preferences and behaviors specific to this demographic.Online questionnaires were used to gather data, and to successfully reach and engage the intended audience, the surveys were distributed via social media platforms.Quantitative techniques were used in the response analysis to enable a thorough investigation of common app usage patterns and major privacy issues among those taking part.
A number of the respondents' primary privacy concerns are highlighted in the poll responses.The following lists the main privacy issues in brief: The survey reveals the most frequently used categories of apps among respondents (Figure 3): Social media: 63.6% of respondents primarily use social media apps, such as Facebook, Instagram, Twitter and Snapchat.Entertainment: 17.1% favor entertainment apps, including streaming services and gaming apps.Shopping: 7.3% of respondents predominantly use shopping apps.News and information, health and fitness, travel and navigation: Each of these categories is used by 2.4% of respondents, indicating a lower frequency of use compared to other app categories.Concern About Privacy: (Figure 4) The degree of concern about privacy among app users is as follows: Very concerned: 50% of respondents are very concerned about their personal information's privacy.Somewhat concerned: 45.2% express some concern, showing that a significant majority hold reservations about privacy.Other levels of concern: A minority are slightly concerned, neutral, or not concerned, underlining that privacy remains a predominant issue for most.Familiarity with Privacy Labels: (Figure 5) Respondents' familiarity with privacy labels in the iOS App Store is categorized as follows: Very familiar: 39% are well-acquainted with the privacy labels.Heard of them but do not know much: 43.9% have heard of them but lack detailed knowledge.Not familiar: 17.1% are not familiar with privacy labels, suggesting a need for increased awareness and educational efforts.Many respondents expressed significant concerns about the privacy of their personal information while using apps.Specifically, they were worried about how their data are collected, stored and used.Key concerns included the types of data being collected, the potential for these data to be shared with third parties, and how securely the data are stored to prevent access or breaches.Additionally, there was apprehension regard- Concern About Privacy: (Figure 4) The degree of concern about privacy among app users is as follows: Very concerned: 50% of respondents are very concerned about their personal information's privacy.Somewhat concerned: 45.2% express some concern, showing that a significant majority hold reservations about privacy.Other levels of concern: A minority are slightly concerned, neutral, or not concerned, underlining that privacy remains a predominant issue for most.Concern About Privacy: (Figure 4) The degree of concern about privacy among app users is as follows: Very concerned: 50% of respondents are very concerned about their personal information's privacy.Somewhat concerned: 45.2% express some concern, showing that a significant majority hold reservations about privacy.Other levels of concern: A minority are slightly concerned, neutral, or not concerned, underlining that privacy remains a predominant issue for most.Familiarity with Privacy Labels: (Figure 5) Respondents' familiarity with privacy labels in the iOS App Store is categorized as follows: Very familiar: 39% are well-acquainted with the privacy labels.Heard of them but do not know much: 43.9% have heard of them but lack detailed knowledge.Not familiar: 17.1% are not familiar with privacy labels, suggesting a need for increased awareness and educational efforts.Many respondents expressed significant concerns about the privacy of their personal information while using apps.Specifically, they were worried about how their data are collected, stored and used.Key concerns included the types of data being collected, the potential for these data to be shared with third parties, and how securely the data are stored to prevent access or breaches.Additionally, there was apprehension regarding whether apps track users' activities across other applications and services, and Familiarity with Privacy Labels: (Figure 5) Respondents' familiarity with privacy labels in the iOS App Store is categorized as follows: Very familiar: 39% are well-acquainted with the privacy labels.Heard of them but do not know much: 43.9% have heard of them but lack detailed knowledge.Not familiar: 17.1% are not familiar with privacy labels, suggesting a need for increased awareness and educational efforts.Many respondents expressed significant concerns about the privacy of their personal information while using apps.Specifically, they were worried about how their data are collected, stored and used.Key concerns included the types of data being collected, the potential for these data to be shared with third parties, and how securely the data are stored to prevent access or breaches.Additionally, there was apprehension regarding whether apps track users' activities across other applications and services, and whether these practices comply with existing privacy regulations.To address these concerns, respondents seek greater transparency about privacy practices and more control over their data through app settings.This indicates a strong desire for assurances that apps are adhering to legal and ethical standards in handling personal information.
Information 2024, 15, x FOR PEER REVIEW 22 of 24 that apps are adhering to legal and ethical standards in handling personal information.Detailed App Analysis: In response to the survey, we conducted a detailed analysis of the three most frequently mentioned apps by our respondents, which collectively represent 55% of the total usage among participants: Leap Top Up, TFI Live, and AIB.Here is a brief overview of each app: Leap Top Up is an application that allows for users to manage their Leap Card, a smart card used for public transportation across Ireland.It provides functionalities such as easy top-ups and balance checks without tracking user data, catering to privacy-conscious consumers.TFI Live, operated by Transport for Ireland, offers real-time bus route information and uses a user's location to provide relevant route directions, but it does not engage in continuous location tracking.This approach prioritizes user privacy and challenges the conventional norms of locationbased services.Lastly, the AIB app from Allied Irish Banks delivers a comprehensive suite of mobile banking services, including balance checks, fund transfers and bill payments.It is specifically designed to protect user data and ensure transaction security without tracking user activities, thus building trust among its users.These apps demonstrate how various sectors are increasingly considering user privacy in their service offerings.

Conclusions and Future Work
This work analyzed a large corpus of iOS apps (n = 541,662) and identified the prevalence of sensitive and special categories of personal data being used based on the application of the GDPR.Our work shows how a large number of apps use such sensitive/special categories and in many cases do so without a sufficient apparent justification for why the app needs to use that information.Our work also shows the prevalence of using these categories to track individuals without any transparent information, in contravention of the requirements of the GDPR.
This study identified significant flaws in the implementation of iOS privacy labels, revealing substantial data collection under the guise of app functionality and data analytics, with the health and fitness category posing particular concerns due to high levels of data collection linked and unlinked to users.Such excessive data gathering poses a grave risk if breached, potentially leading to misuse of sensitive health information, identity theft, fraud and unwanted exposure of private data, as underscored by the 2018 MyFit-nessPal breach compromising 150 million accounts.Users must proactively safeguard their data by managing app permissions, reviewing privacy labels before downloading apps, and disabling the "Allow Apps to Request to Track" option in privacy settings.Additionally, this study observed unjustified location tracking in categories such as sports, Detailed App Analysis: In response to the survey, we conducted a detailed analysis of the three most frequently mentioned apps by our respondents, which collectively represent 55% of the total usage among participants: Leap Top Up, TFI Live, and AIB.Here is a brief overview of each app: Leap Top Up is an application that allows for users to manage their Leap Card, a smart card used for public transportation across Ireland.It provides functionalities such as easy top-ups and balance checks without tracking user data, catering to privacy-conscious consumers.TFI Live, operated by Transport for Ireland, offers realtime bus route information and uses a user's location to provide relevant route directions, but it does not engage in continuous location tracking.This approach prioritizes user privacy and challenges the conventional norms of location-based services.Lastly, the AIB app from Allied Irish Banks delivers a comprehensive suite of mobile banking services, including balance checks, fund transfers and bill payments.It is specifically designed to protect user data and ensure transaction security without tracking user activities, thus building trust among its users.These apps demonstrate how various sectors are increasingly considering user privacy in their service offerings.

Conclusions and Future Work
This work analyzed a large corpus of iOS apps (n = 541,662) and identified the prevalence of sensitive and special categories of personal data being used based on the application of the GDPR.Our work shows how a large number of apps use such sensitive/special categories and in many cases do so without a sufficient apparent justification for why the app needs to use that information.Our work also shows the prevalence of using these categories to track individuals without any transparent information, in contravention of the requirements of the GDPR.
This study identified significant flaws in the implementation of iOS privacy labels, revealing substantial data collection under the guise of app functionality and data analytics, with the health and fitness category posing particular concerns due to high levels of data collection linked and unlinked to users.Such excessive data gathering poses a grave risk if breached, potentially leading to misuse of sensitive health information, identity theft, fraud and unwanted exposure of private data, as underscored by the 2018 MyFitnessPal breach compromising 150 million accounts.Users must proactively safeguard their data by managing app permissions, reviewing privacy labels before downloading apps, and disabling the "Allow Apps to Request to Track" option in privacy settings.Additionally, this study observed unjustified location tracking in categories such as sports, education and business.The TFI Live app serves as a positive example of delivering location-based services without intrusive tracking, suggesting many apps engage in unnecessary data collection practices.This calls for Apple's App Store to bolster its app review process, ensuring developers provide compelling justifications for location tracking, with regular audits to ensure compliance with privacy standards.
Our research advances the state of the art by providing an empirical analysis of data collection practices across app categories, confirming and extending the findings of previous work by Scoccia et al. (2022) [4].We highlight inconsistencies in privacy labels and user behavior regarding privacy settings, emphasizing the need for improved transparency and user vigilance.In conclusion, our study underscores the urgent need for better privacy practices in the app ecosystem, offering insights and recommendations to create a more secure and privacy-conscious environment, ultimately aiming to enhance user trust and protect sensitive information from potential breaches and misuse.
Future Work: This study identified numerous areas that should be the focus of future research.First and foremost, longitudinal studies are required to assess how privacy label efficacy changes over time and impact developer practices and user behavior.Furthermore, broadening the scope of the investigation to encompass a greater number of app categories would offer a more thorough understanding of data-gathering procedures used throughout the App Store.Examining different privacy label designs and user education techniques could improve knowledge and control over app permissions.The effects of improved app review procedures and regulatory modifications on privacy label accuracy and developer compliance should also be investigated in future studies.Lastly, researching how wellpublicized data breaches affect user privacy settings and trust could provide information about how to increase security and transparency.

Figure 1 .
Figure 1.An example of Apple's App Store privacy labels.Figure 1.An example of Apple's App Store privacy labels.

Figure 1 .
Figure 1.An example of Apple's App Store privacy labels.Figure 1.An example of Apple's App Store privacy labels.

Figure 2 .
Figure 2. The number of data items collected in privacy labels based on 100k apps [10].

Figure 2 .
Figure 2. The number of data items collected in privacy labels based on 100k apps [10].

Information 2024 ,Figure 3 .
Figure 3.Most used categories of apps.

Figure 3 .
Figure 3.Most used categories of apps.

Information 2024 ,Figure 3 .
Figure 3.Most used categories of apps.

Table 1 .
The table below provides information about the dataset.

Table 2 .
Special category distribution in "data linked to you" with percentage breakdown.

Table 3 .
Sensitive category distribution in "data linked to you" with percentage breakdown.

Table 4 .
Special category distribution in "data not linked to you" with percentage breakdown.

Table 5 .
Sensitive category distribution in "data not linked to you" with percentage breakdown.

Table 6 .
Special category distribution in "data used to track you" with percentage breakdown.

Table 7 .
Sensitive category distribution in "data used to track you" with percentage breakdown.