Improving Visualization of Female Breast Cancer Survival Estimates: Analysis Using Interactive Mapping Reports

Background: The Missouri Cancer Registry collects population-based cancer incidence data on Missouri residents diagnosed with reportable malignant neoplasms. The Missouri Cancer Registry wanted to produce data that would be of interest to lawmakers as well as public health officials at the legislative district level on breast cancer, the most common non-skin cancer among females. Objective: The aim was to measure and interactively visualize survival data of female breast cancer cases in the Missouri Cancer Registry. Methods: Female breast cancer data were linked to Missouri death records and the Social Security Death Index. Unlinked female breast cancer cases were crossmatched to the National Death Index. Female breast cancer cases in subcounty senate districts were geocoded using TIGER/Line shapefiles to identify their district. A database was created and analyzed in SEER*Stat. Senatorial district maps were created using US Census Bureau’s cartographic boundary files. The results were loaded with the cartographic data into InstantAtlas software to produce interactive mapping reports. Results: Female breast cancer survival profiles of 5-year cause-specific survival percentages and 95% confidence intervals, displayed in tables and interactive maps, were created for all 34 senatorial districts. The maps visualized survival data by age, race, stage, and grade at diagnosis for the period from 2004 through 2010. Conclusions: Linking cancer registry data to the National Death Index database improved accuracy of female breast cancer survival data in Missouri and this could positively impact cancer research and policy. The created survival mapping report could be very informative and usable by public health professionals, policy makers, at-risk women, and the public.


Introduction
In the United States, it is estimated that 12% of women will be diagnosed with breast cancer at one stage of their lives [1].Nationally, the estimated new cases of breast cancer were 14% of all new cancer cases and the estimated deaths from breast cancer were 7% of all cancer deaths in 2013 [2].
Traditionally, incidence and mortality rates have been presented in data tables, a format that is easily understood by epidemiologists and statisticians, but one that does not meet the needs of all potential users of the data.Data visualization is an alternative means of portraying the burden of breast cancer at various levels (eg, county, region, state).
There is a critical need to build accurate fact sheets in the form of interactive and dynamic map reports of the breast cancer burden at the substate level in Missouri.Several studies emphasize the efficiency and importance of matching National Death Index (NDI) data to cancer registry data to ensure high quality and accurate population-based cancer survival statistics [3][4][5].We matched the registry breast cancer data to the Social Security Death Index (SSDI) and the NDI.This contribution will be significant because, with more complete data to analyze, we can accurately estimate survival data for the State of Missouri.
Numerous evidence-based studies have concluded that the use of geographic mapping software allows users to interact in a timely manner with the datasets and publish high-quality interactive reports [6][7][8].The Web-based mapping systems' contribution is significant because these systems will enable users to visualize cancer data easily, and users can share this data with contributors in fields related to the visualized cancer.Distribution of geospatial health data could help public health leaders and decision makers in designing, developing, and adopting effective and efficient strategies and programs to improve public health outcomes targeting specific subpopulations within geographical areas [6][7][8].
A study by Koenig et al [9] recognized the impact of the interactive mapping visualization of health data on the public health field and health care-related laws and decisions.The study spotted the need for more interaction between mapmakers and the mapping reports' beneficiaries [9].
The Missouri General Assembly includes 34 senators, each representing one of Missouri's 34 districts.Every senate district included an annual average population of approximately 90,000 female residents (176,000 total residents) between 2004 and 2010 (study period).Most of the districts included whole counties.In high population density areas, including the Kansas City metropolitan area, Saint Louis metropolitan area, and the city of Springfield, district limits do not follow county boundaries [10,11].
We aim to measure the survival proportions of female breast cancer cases in the Missouri Cancer Registry database and to further analyze these survival data by stage and grade at diagnosis, by race, by age, and by senatorial district in Missouri for the period from 2004 through 2010.We also aim to visualize the survival data by Missouri state senatorial district by creating interactive mapping reports.

Methods
The study design was an observational longitudinal epidemiological study.The Missouri Cancer Registry and Research Center updated vital status of female breast cancer cases by linking with death records from the Missouri Department of Health and Senior Services and the SSDI [12].We extracted female breast cancer cases (59,674 covering all years in the Missouri Cancer Registry database) without a known date and cause of death and submitted a formatted file containing required fields to the National Center for Health Statistics for NDI linkage [13].The NDI staff returned the search results.We assessed the results to identify true matches.Partially matched records were reviewed manually using specific criteria (eg, possible typos, use of spouse's social security number, change of surname, use of compound names in a different order, use of nicknames).We then updated the database with the linkage results.
The female breast cancer cases in counties split by senate districts were loaded into Esri's ArcMap [14] with the Census Bureau's TIGER/Line Shapefiles [15] to determine their district based on their latitude and longitude.For this project, we used the State Senate districts that were defined by the redistricting following Census 2010 [16].
A database was created in SEER*Stat, a statistical software package for analyzing cancer data [17]; this database included cases diagnosed from 2004 through 2010 in which the tumor was the first reportable in situ or malignant tumor diagnosed in the woman's lifetime.This resulted in a total of 24,908 malignant cases for most of the survival calculations and an additional 5130 in situ cases included only in stage-specific survival calculations.The 5-year cause-specific survival proportions and their 95% confidence intervals were calculated for female breast cancer cases diagnosed from 2004 through 2010.Survival was measured in terms of cause-specific survival using the Surveillance, Epidemiology, and End Results (SEER) program's cause-specific death classification recode as the endpoint [18].The 5-year female breast cancer survival was calculated by age, race, stage, and grade for each senate district.To protect patient confidentiality, we suppressed cells with small numbers, employing a commonly used threshold of five or fewer cases [19].
The US Census Bureau's cartographic boundary files were used to create maps showing 115 Missouri counties (including the City of St Louis-a county-equivalent entity) and 34 state senatorial districts [20].Five-year survival statistics were loaded, along with cancer incidence and mortality data and the cartographic boundary files, into InstantAtlas software to produce interactive mapping reports that display our study's results [21].The interactive reports included maps, graphs, and tables for each county and Missouri senatorial district as well as for 20 regions formed by aggregating senate districts by county boundaries.The senate district grouped to county boundaries were created because mortality data was not available at the subcounty level.
The years of female breast cancer diagnoses we chose for this study were from January 1, 2004 through December 31, 2010, with survival calculated by including follow-up through December 31, 2011.When this project was started, 2011 was the most recent year with complete survival follow-up for female breast cancer cases.The case selection criteria we used for survival excluded cases diagnosed in 2011 because a relatively large number of cases diagnosed in that year may have been reported too late to be included in the death linkages or even too late to be included in the Missouri Cancer Registry database.The beginning year of the case selection criteria-2004-was chosen such that relatively stable estimates could be obtained for a wide variety of demographic groups of interest while still covering a relatively recent set of years (7 years total).We classified female breast cancer cases as "early stage" if the stage at diagnosis was in situ or localized according to the Derived SEER Summary Stage 2000 field [22]; "late-stage" female breast cancer cases included regional and distant cases.Low-grade female breast cancer cases involved grades I and II; high-grade female breast cancer cases included grades III and IV.

Results
The senatorial districts' 5-year cause-specific survival proportions of female breast cancer were categorized, as shown in Tables 1-4, according to the following groupings: all malignant cases, cases younger than 50 years, cases 50 to 64 years, cases 65 years or older, white cases, African-American cases, early-stage (in situ and local) cases, late-stage (regional and distant) cases, low-grade cases, and high-grade cases.These tables include female breast cancer case counts and survival data for all 34 senatorial districts and Missouri and the 95% confidence intervals of the measured survival data for all the previously mentioned categories.Using these tables, the reader can compare every district to one another, as well as to the state's survival proportion.
The reports we created displayed survival data results in two layouts: an "area profile" focused on displaying many indicators for one or a small number of selected districts along with results from statistical hypothesis testing (Figure 1) and a "double map" that displays two indicators simultaneously along with a district-level scatterplot (Figure 2).These reports include combined maps and statistical data.The area profile map displays a single map and presents many indicators for each senatorial district and compares each district's results to the State of Missouri.The double map centers on assessing the statistical associations (correlation coefficient, R 2 , and the simple linear regression equation) among the chosen survival indicators.The screenshots displayed in Figures 1 and 2 show the final formats of the interactive mapping reports we built at the Missouri Cancer Registry and Research Center to display Missouri female breast cancer survival data along with other incidence and mortality data [23,24].

Principal Findings
The Missouri Cancer Registry needs to measure female breast cancer survival proportions to be able to evaluate the impact of Missouri's breast cancer control program and the burden of female breast cancer in Missouri.The measured and visualized survival data will transform our registry from being an incidence registry to becoming a survival registry for breast cancer.
Survival data mirrors female breast cancer prediction in a specific period [25].We used the Missouri Cancer Registry records because Missouri Cancer Registry is a nationally recognized, population-based registry with data that originates from diverse sources including hospitals, ambulatory surgical centers, freestanding cancer treatment centers, pathology laboratories, long-term care facilities, and physician offices.It also contains cases obtained through case-sharing agreements with 19 states.The Missouri Cancer Registry data undergo a strict quality control process and the data are evaluated following specific national measures [26].Several studies have revealed the significance of linking NDI data to a central cancer registry's data to obtain more accurate population-based cancer survival data [3][4][5].
From this study's results, as shown in Tables 1-4, we can create female breast cancer survival profiles for the 34 Missouri senate districts.By creating these profiles, we can compare each district's results to the state and to other districts' results and give more detailed information to public health practitioners and decision makers about female breast cancer in their district.

Mapping Reports
Cancer incidence and mortality data have traditionally been presented in tabular and descriptive statistics formats; these are easily understood by health professionals with specific knowledge and experience in statistics and epidemiology.At the Missouri Cancer Registry and Research Center, we strive to present our data in formats that meet the needs of a wide range of potential data users.That is why we chose to combine our survival data with geographical data to produce interactive mapping reports at the Missouri senate district level.InstantAtlas is an interactive, internet-based mapping tool licensed to the Missouri Cancer Registry and Research Center that allows users to visually display data gathered from the registry and other databases.Use of interactive data visualization and mapping software allows users to interact with the datasets.We built two interactive mapping reports that include our senate district-level female breast cancer survival data [23,24].The two maps, the area profile map and the double map, have not yet been published on the Missouri Cancer Registry and Research Center's website.The area profile report shows a single map and focuses on displaying many indicators for a selected state senate district and compares the district's findings to other districts and to Missouri.The double map focuses on exploring the relationships between selected indicators; it displays two indicators simultaneously along with a scatterplot or a table.
The InstantAtlas reports can facilitate communication between collaborators from different fields related to breast cancer, enhance female breast cancer research and policy, and inform public health professionals and policy makers.These maps can be used as educational tools at the community level for women at risk and the public about the distribution of female breast cancer in Missouri by age, race, stage and grade at diagnosis, and by senatorial district.These data could be used as a knowledge base at Missouri oncology facilities to assess management plan decisions taken by providers and by female breast cancer cases.

Study Challenges and Limitations
During the matching processes, some cases did not have a social security number, which is the best available unique identifier.Also, some identifiers, such as date of birth and last and/or first name, showed differences when the NDI database and the registry database were compared, possibly due to data entry errors or changed last name.Such cases were manually reviewed.Manual review of all partial matches was done by more than one Missouri Cancer Registry and Research Center staff member, including at least one certified tumor registrar, to reduce possible mistakes.Survival was measured using cause-specific survival rather than relative survival (another common net measure of survival) to avoid the need of having detailed population lifetables by senatorial district.Potential disadvantages of using cause-specific survival is that, unlike relative survival, it relies on additionally having the cause of death rather than just the fact and date of death and on accurate coding of the cause of death [18].To decrease the number of known decedents with unknown cause of death in the Missouri Cancer Registry database, these cases were included in the NDI linkage to try to obtain their cause of death.To lessen the impact of miscoded cause of death (eg, a breast cancer death being misattributed to the location of a metastatic site), the data used here was defined "breast cancer death" according to the SEER cause-specific death classification recode variable [18].It should be noted that this will miss indirect deaths originating from a diagnosis of breast cancer, such as toxic effects of chemotherapy.Moreover, the use of this death classification variable is limited to first primary tumors only as used in these analyses and cannot be used to analyze second and subsequent tumors.
Due to aggregating the cases to areal units, this study is subject to the modifiable areal unit problem [27].State senate areal units were selected for this project because they would be relevant to policy makers making decisions at the senate district level and to constituents within those districts.It should be noted that the modifiable areal unit problem implies that differing conclusions can potentially be drawn from the same data had different areal units been used.
The survival rates presented in these mapping reports are the observed percentages rather than rates that have been spatially smoothed.Observed percentages may be more directly interpretable and relevant to the residents of each of the individual senate districts; however, observed percentages have the disadvantage of being less stable and more prone to spurious high and low values than spatially smoothed survival rates.This instability is mitigated somewhat by the fact that, with the notable exception of African Americans in many districts, survival was calculated with fairly large sample sizes: always more than 100 and generally at least 200 or more.
For the selected cases, only approximately 1% had the district imputed.Due to the relatively small number of cases, a sensitivity analysis was not performed.

Future Directions
In the future, by combining mortality and incidence data in the survival profiles, we will be able to inform every district's decision makers about the full picture of female breast cancer burden by district and we could help them assess female breast cancer interventions and policies on geographical bases.Due to small sample sizes, we do not have county-level results from the Behavioral Risk Factor Surveillance System, a state-based health survey that annually gathers data on health events, behaviors, preventive practices, and access to health care.A similar Behavioral Risk Factor Surveillance System-based survey known as the "County-Level Study" has been conducted at the county level in Missouri [28].In the future, we hope to combine these results with female breast cancer survival data and create InstantAtlas mapping reports at the senatorial district level that include survival and other measured contextual indicators (eg , demographic, environment, and socioeconomic), similar to the currently published county-level maps.This kind of mapping report could be used to explore the relationship between female breast cancer and other measured contextual indicators all over Missouri.
In this paper, we measured 5-year cause-specific survival proportions of female breast cancer for the 34 senate districts in Missouri.In the future, we will consider the feasibility of measuring the same data for all 163 Missouri legislative districts [29,30].We will also consider measuring 5-year cause-specific survival for other screening-amenable cancers (eg, colorectal cancers) and for cancers that impact many residents (eg, lung cancer).
Before we publish senate district maps on our website, we aim to test the usability of the survival maps using a pilot sample of actual users, similar to one we conducted with our previously published maps [31], in order to make them more user friendly.

Conclusions
Net measures of survival factor out other causes of death and are useful from a policy-based perspective.These measures enable comparisons of cancer survival across geographical regions and between groups of patients without differences in background mortality rates of other causes impacting the results.
Cancer registry data are very rich and can be used in the exploration of many scientific theories and models.Registry data are a valuable source for survival data on breast cancer by race, age, and stage at diagnosis.Using cancer registry data supplemented by SSDI and NDI information will be beneficial and can improve accuracy of breast cancer survival data by age, stage, or race, as well as by geographic area (counties and senatorial districts).

Table 1 .
Five-year cause-specific female breast cancer survival across different age groups by state senatorial district,Missouri, 2004Missouri,  -2010.   .

Table 2 .
Five-year cause-specific female breast cancer survival data among whites and African Americans by state senatorial district,Missouri,  2004Missouri,   -2010.   .

Table 3 .
Five-year cause-specific female breast cancer survival data by stage at diagnosis and state senatorial district,Missouri, 2004Missouri,  -2010.   .

Table 4 .
Five-year cause-specific female breast cancer survival data for low-and high-grade cases, Missouri, 2004-2010.