Study on Urban Spatial Function Mixture and Individual Activity Space From the Perspectives of Resident Activity

The research on the relationship between residents’ daily activities and urban spatial structure is of considerable significance to urban planning engineering and the organization of urban functions. However, little research considers the perspective of micro-spatial scale or resident perception. The increasing user-generated activity check-in data in social networks provides a database for this research. In this study, we first divided the urban space into nine functions that satisfy the residents’ activities, then used the small-scale grid to divide the city blocks and used information entropy to evaluate the mixed degree of land use functions. We then introduced the latent Dirichlet allocation (LDA) topic model to identify 15 mixed patterns of land use functions and each spatial unit’s topic distribution. Moreover, the JS divergence index was employed to measure spatial units’ similarity, fit the distance-activity intensity decay curve, and studied the influence of the individual spatial function distribution choice. We demonstrate that in urban space, residents’ daily activities mold the blending of urban area functions and shift single-function urban planning to mixed-use, consisting of single-function dominant and multi-function mixed. Besides, the functional complementarity between the activity units weakens the distance attenuation effect of the activity-space interaction intensity to some extent. The research on the interaction between active space and spatial activities expect to support the combination of urban land use types, the layout of facilities, and the guidance of residents’ activities.


I. INTRODUCTION
With the development of the transportation industry and information communication technologies (ICTs), human movement is becoming more powerful, resulting in activity compression of time and space [1]. Space distance is no longer a barrier to activities, making people's activities more fragmented, personalized, and sophisticated in time and space [2], [3]. Mobile phones and Global Positioning System (GPS) have recorded many urban population activity data, providing opportunities for exploring the urban spatial structure and functional zoning [4]. The ''human'' factor has received more attention in urban space research, emphasizing the relationship between human behaviors and space from human subjectivity perspectives. Human activities can utilize the function of land with fewer space-time limits, and the structure of the function of land becomes complex, which The associate editor coordinating the review of this manuscript and approving it for publication was Guido Lombardi . changes the formation of the urban multi-functional activity area and then affects the traditional urban functional zoning.
The relationships among residents, places, and activities are considerably complicated, and the traditional space theory based on the law of distance decay is challenging. Therefore, the study of the interactions between urban space and residents' behavior should be based not only on the conditions of space and land but also on personal behavior. The interaction between the urban spatial functional organization and residents' behavior has become a research hotspot. More and more scholars have paid their attention to optimizing urban living space's functional structure and balancing urban infrastructure resources to people's perspective. Material space and activity space explain urban space from static and dynamic angles [5]. Urban spatial structure is various urban functional regions' geographical position and its combination. For urban material space, the urban areas are represented by kinds of functional regions and their combinations, while for urban activity space, the urban area is various types of residential VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ space and their combination [6]. Besides, the people affected by ICTs often have frequent network activities and high space mobility. The submission and sharing of geo-tagged information supported by social media sites have facilitated crowdsourced data. Since much of the information is usergenerated and location-specific, Goodchild coined the term voluntary geographic information (VGI) for it [7]. More and more platforms offer location-based service (LBS), making it is easy for users to record and share the time, location, content, and emotion. Objectively, it has retained the traces of its space-time activities to the maximum extent. The people's activities play an essential role in reshaping urban space's functional structure [8]. Therefore, analyzing the spatial distribution of resident check-in data is essential to study the urban spatial structure and function. Although a person's activities are entirely random, as the number of activities increases, it is likely to find potential patterns. On the one hand, at the physical space level, these data enable us to study the relationship between spatial activity distribution in a region and spatial functions. On the other hand, at the residents' activities, this paper studies the relationship between spatial function distribution and people's activities distribution.
In this study, the functional zoning of urban space and land use break through the traditional space theory of the more substantial limits of the scope of activities, the use of ICT development brought about by the data resources, considering the impact of human subjective activities. We used social media check-in data to examine how urban residents' activities and urban spaces mold each other. On the one hand, the mixed-use of urban land functions is discussed based on residents' urban space activities' distribution and intensity. On the other hand, the relationship between activity intensity and activity distance discusses urban space functions' mixed background.
The remainder of this paper is organized as follows. Section 2 presents a brief overview of current studies related to spatial behavior and behavioral space, region function, and space-behavior interactions. Section 3 introduces the study area and data. Section 4 describes the methodology of modeling the function mixed pattern of urban units and activityspace interaction under spatial similarity. Section 5 focuses on the analysis and results. Section 6 summarizes the main findings of this study and points out the directions for future studies.

II. RELATED WORK
In this section, the relevant research works are highlighted in three categories, namely, spatial behavior and behavioral space, urban region extraction function, and space-behavior interactions.

A. SPATIAL BEHAVIOR AND BEHAVIORAL SPACE
Spatial behavior is the choice result of a resident under the space restriction. Behavioral space is how individuals meet their needs and carry out their daily activities [9].
Horton and Reynolds concluded the travel behavior spatial conceptual model [10], which holds that personal factors play an essential role in the perception of external objective spatial structures. Activity space is a subset of the behavioral space in which individuals carries out most of their daily activities. The perspective of urban space research has changed from material space to social space, from functional space to behavior space, and from the rational allocation of land use to the spatial expression of human behavior. The empirical research based on the influence of individual behavior on cities carried out from three aspects. First, the studies of the Spatial-temporal structure of urban residents' daily activities, such as commuting, shopping, and leisure, accentuate the characteristics of spatial behavior and decision-making mechanisms [11]. Second, studies focus on the residents' cognitive-behavioral space and urban intention [12] and seek the format rules of specific activity space [13]. Third, illustrate how to use travel behavior to analyze the city commercial center structure in micro-view conditions [14]. Generally speaking, most of the studies are confined to a single activity, ignored the diversity, emphasized the space-time continuity of activity behavior, and run short of discussion on the macro-spatial interaction mechanism. Under the paradigm of behavior-based urban space research, the theory of ''spacebehavior interaction'' is considered the core.
The Interaction research between urban space and human activities began in the 1980s. Torsten Hagerstrand combined time and space at the micro-individual level and constructed a theoretical framework through spatial-temporal paths and spatial-temporal prisms. However, the frame overemphasizes Euclidean distance constraints and absolute time and ignores the dynamics [15]. Since 1984 research has turned to the subjective initiative function under time-space constraints [16]. Behaviorist geography of individual cognitive decision making and time geography of objective constraints draw on each other. They formed the methodological basis for studying space-behavior interactions.
The activity-based approach unifies daily activities in time and space through travel activities and highlights the interaction between travel behavior and urban functional structure. It forms the research angle to study behavior and the environment by exploring the resident daily activity [17]. Individual satisfaction on the activity space and the around situation will be the thrust of the spatial adjustment. At the same time, when adjusting the activity space will create surplus space. This lagging supply will be the new space supply and become the pull to attract behavior adjustment. Under the two forces, the urban space will reach dynamic equilibrium [18]. The activity-based approach is insufficient for analyzing the pull of space supply, and the stated geographical preference analysis method makes up for this deficiency [19].
The rapid development of ICT and LBS technology provides a new opportunity for data collection of spatial behavior and behavioral space research. Large-scale real-time activities tracking, scene interaction and exploratory analysis produced many spatial behavior-based mining algorithms, analysis tools, and software platforms [20]. ICT has led to activity feature reconstruction and weakens spatial and temporal human activities [21]. At present, the impact of ICT on spatial behavior studies mainly focuses on the macro-scale, including the urban spatial structure, influencing factors [22], and the urban system in network space [23].
It is an important research direction to use space-behavior interaction theory to do a case study, model analysis, and methodology construction. Research content that focused on spatial behavior expands the use of ICT and spatial constraints. Big data from social networks combine with individual behavior, consider databases with different geographic scales, different accuracy, and rich attributes. In terms of research methods, the city is regarded as a collection of individual activities, behaviors, and reactions and described by ''what happened'' rather than the quantitative characteristics of land use types. Integrating the decision-making activities and behavior interactions can help the urban behavior space evolution from micro-individual action. The study of nonwork behavior such as shopping and leisure provides essential theoretical and practical significance for optimizing commercial space, public service space, and transportation [24].

B. EXTRACTING URBAN REGION FUNCTION FROM LOCATION BASED NETWORK
Human activity usually occurs in different types of (Point of Interest) POIs, and the region function depends on the activity type [25]. The same type POI can locate in different land using types that support different functions. It helps to make up the semantic difference between land use classification and urban function. Integrated Circuit Card (I.C. card) data, GPS data, mobile phone data, and social media data have fine-grained spatial analysis capabilities, helping to understand urban space utilization levels for individuals and groups. Long et al. [26] identify the urban functional regions in Beijing using I.C. card records and POIs data. Yuan et al. [27] adopt the topic model using the GPS trajectory data and POIs data to find the different urban functional regions. As crowdsourced data content providers, social media contains a wealth of information about residents' spatial interactions and the location semantics, which has widely used to understand the urban functional structure and how residential activities mold urban systems [28]. At present, many scholars use social media data [29] to study urban dynamics. Using Twitter messages, Han et al. [30] determined users' Spatio-temporal distribution, analyzed people's co-occurrence density change and revealed the relationship between urban regions. Long et al. [31] used land functions reflected by POIs data and Sina Weibo check-in data to overlay Beijing's current and planned land use. After the grid processed the research area, use the mixed land use index method proposed by Frankie et al. [32] to calculate each grid's degree. It evaluated a fine-scale land-use function mixing degree.
Under the joint influence of land use and economic activities, similar activities show a high degree of spatial aggregation and form functional zones. In order to meet residents' needs, different functional units have gradually formed in the city. The spatial distribution and combination pattern of functional regions reflects the spatial distribution and combination of the city material elements, an integral part of urban spatial structure. The change of resident activity amount in urban space partly reflects the region's function; that means, similar functional regions have similar resident activity patterns. Therefore, the resident movement's spatial characteristics can apply to the study of urban functional regions [33]. The functional regions divide from the perspective of resident perception. Liu et al. [34] studied the taxi activity intensity change at different city locations, then used cluster analysis to find six functional regions. Combined with Poi data, Yuan et al. [27] used taxi data to quantify the pedestrian flow between regions and built a D-Model to find the different functional regions. With the help of Weibo's data, Wang Bo et al. [35] analyzed the dynamic change of residents' movement space from time, space, and activity, and divided urban space into regional functions.
The past research mainly evaluated the mixed land use function according to different land-use types at the regional scale, and the research scale was coarse. Based on big crowdsourced data, research mainly focuses on identifying or evaluating the whole functional region and lacks the indepth quantitative analysis of the refined region units' mixed function. Besides, due to the lack of human-land interaction, region units are rarely classified from residents' perception. Few studies employed POI information and human activities in LBS networks simultaneously to get the urban function regions [36].

C. SPACE-BEHAVIOR INTERACTIONS
There are two paths to construct the coupling relationship between population heterogeneity and geospatial environmental heterogeneity in urban research. One uses the similarity of individual behavior characteristics [37] or the strength of social connection [38] to classify the population, deal with the heterogeneity, and determine the spatial distribution of different population categories. The other is dividing the urban space, quantifying the behavior characteristics in different spatial units, and classifying the spatial units. Due to the difficulty of obtaining individual granularity data and the lack of attribute dimension data [39], the current research is mainly carried out from the second route. (1) Geographical data reflect the spatial distribution of the corresponding activity events. Assuming that in different locations performing certain activities has a similar proportion over time, the extracted point distribution can be used as an estimated [40] population distribution model. (2) Text and photo with spatiotemporal labels can analyze the semantic information that reflects people's cognition about the association of different geographical units [41]. (3) Geographic big data can quantify the spatial interaction intensity between geographic units [42]. Second-order features expressed by interaction patterns between geographic units reflect their VOLUME 8, 2020 role in the urban spatial flow [43]. Spatial interaction between geographical entities helps us understand the region's spatial structure and plan effective spatial form. The interaction intensity measure in terms of pedestrian flows [44], trade flows, currency flows, and even the co-occurrence of place names [45]. Due to the complexity of spatial interactions involving multiple spatial node pairs, various research has done to visualize spatial interactions effectively and delineate meaningful sub-regions [46], [47].
Most spatial interactions affected by distance decay [48], the gravity model and the power function fit the relationship between the distance and the intensity geographical entities intersection [49], [50]. Human behavior is affected by distance and the functional characteristics of the destination, which is especially important in daily activities. As the main body of behavior, the residents form specific cognition and preference to space and constrained by space [51].
At present, many studies use the data with spatiotemporal labels to illustrate the interaction patterns among geographical units. Less research focuses on the interaction between residential activities and geographical units. Looking for the anchor point of human activity from human activity space, we can use it as a virtual geographical entity and expand the interactive mode between humans and space. Besides, from individual residents' perspectives, we can consider the relationship and rationality of different behavior spaces projected to the city's physical space.

A. STUDY AREA
New York City is about 1214.4 square kilometers, divided into five administrative districts, has a complex land type and abundant functions. Therefore, New York is chosen as the research city in this paper. Manhattan is the core of New York City and has the highest population density. Moreover, Manhattan was the first district to develop from an urban perspective, followed by the Bronx and Brooklyn, and Queens and Staten Island were the last. Because of the population density on Staten Island is very spare, only the other four districts are selected as the study area.

B. DATA SETS
POIs can introduce the necessary information of each functional unit in a city and represent city's functional regions' distribution to some extent, and it is also a spatial anchor point when other types of data are fused. To start the research, we need a series of check-in data. This study used a 227428 Foursquare check-in data set from New York City. Using ArcGIS software to select the data, we got 180074 data in the study area. Each check-in data includes user ID, venue ID, venue category, latitude, longitude, and time stamp. The POI data in this research is from the OpenStreetMap open data platform. There are 29,418 POIs in the study area that matches the check-in location. Each data includes the attributes name, latitude, longitude, category, and address. We formalize the resident check-in point as a set of points P = {p 1 , p 2 , . . . , p n }. For each p i ∈ P, it includes latitude (p i .Lat), longitude (p i .Lon), and position category label (p i .Category).

IV. METHOD
The study was based on check-in data from social media and POI data. We discuss the mixed-use of urban land function at the fine-scale, divide the mixed degree and mode of land-use function according to activity intensity's characteristics, and establish a functional area identification model. The interactive characteristics between residents and land functions are illustrated based on the mixture of regional functions and the distribution of individual residents' activities.
First, the area studied was divided into grids. The land use function is then integrated according to the activity type and regarded as a POI label. Moreover, information entropy was used to describe the quantity ratio mixed degree of landuse. Based on the activities proportion in each spatial unit, a series of mixed land-use patterns were generated using the topic modeling method-Latent Dirichlet Allocation Analysis (LDA), it realizes the mixed pattern of identifying the function of the spatial unit and the representation of subject ratio by using the function distribution of each specific spatial unit. Finally, the Kullback-Leibler divergence (KL divergence) is employed to evaluate the similarity of function distribution among units, and the effect of unit function on activity intensity-distance was discussed at the individual level.

A. THE MIXED USE OF URBAN SPACE UNIT
The impacts of behavior on space are mainly reflected in the continuous adjustment of activity space for people to obtain pleasurable activities. Choosing the daily activity space, adjusting the long-term living space, and reconstructing the urban space (including the material space and the social space) in the choosing process all affect the urban space organization structure. From the perspectives of people's activities, urban functional regions are represented by various types of residential space. Our study is based on the function division of urban activity space, the activity content, and intensity in the region, to study the mixed-function.

1) ACTIVITY CATEGORY CLASSIFICATION
The advantage of check-in data from social media is the ability to identify the activity purpose. Each check-in data reports the functional category information of the place visited. Therefore, the functional categories in the checkin data are very detailed, which are over 100 categories. When discussing land functions at a sufficient scale, such a detailed functional classification would make the feature distribution pattern hard to extract. Therefore, we classify the types of check-in location functions into nine categories in Table.1 according to the activities' purpose [52].
The activity function cannot be determined entirely by the check-in location category. For instance, a place marked as entertainment could be the workplace for a resident, and a place marked as a home may not be the resident's home but could be his/her friend's home. In this case, we should be cautious in discussing the residents' activities distribution. For other location categories, it is reliable to classify activity function by check-in.

2) BUILD A GRID LAND SEGMENTATION
Some studies of spatial-temporal data preprocessing divide the study area into several regular grids with the same distance. When studying the urban population spatial distribution, the grid size is set as 200m, 500m, or 1000m [53].
Then, use the human mobility characteristics reflected by big geographical data to illustrate the spatial distribution and interaction among grid units.
To estimate the spatial change of urban land-use patterns, we delineate the urban functional units at a fine scale. Divide the research area into 4006 regular square grids of 300m * 300m. The grids are represented by G = {A 0,0 , A 0,1 , . . . , A ij }, in which A ij is a unique unit grid, and i, j ∈ {0, 1, 2, . . . , n} are the serial number in the horizontal and vertical directions. In this way, map residents' check-in records to the grids according to activity purposes. Then, we obtain the intensity spatial distribution map of various activities shown in Fig.1.

3) IDENTIFY THE FUNCTION MIXING DEGREE OF SPATIAL UNITS
For each spatial unit, there is one or more POIs, and the POIs' function determines the spatial unit's function. Therefore, for each unit, we build a POI activity category ratio (CR) to identify the functional properties as Equation (1) [54].
in which ij represents the horizontal and vertical serial number of spatial units, A ij is the number of POIs, s is activity category, and A s,ij is the number of s-type activities in a spatial unit. Obviously, s CR s,ij = 1. The CR value of each spatial unit is calculated, and 80% is taken as the criterion to judge the functional properties. When the proportion of one activity is over 80%, we define the unit as a single function area and vice versa.
The CR values are calculated for all spatial units, and the single functional area and the mixed functional area are marked as gray and blue, respectively, as shown in Fig.2  Information entropy is a physical concept to measure the complexity and equilibrium of a system. The city is also a system in which the structure and formation is the mapping of the system function to space. The level of information entropy can reflect the equilibrium degree of land-use. The higher the entropy value is, the more the land functions. While the smaller the area difference between functions type is, and the more balanced land distribution [55]. Quantifying the mixed distribution of function introduces the information entropy of urban function space form to investigate each function class's spatial distribution law. The calculation of information entropy H ij is as follows: Traverse the POIs data in each grid and obtain the H ij value gives a clear picture of the mix of different spatial functions in the city, as shown in Table.2. As we can see, there is a high degree of blending of some spatial units' functions, so it is necessary to understand how the functions of spatial units are mixed.

B. ACTIVITY DENSITY-BASED TOPIC MODEL
To take the city as a whole subject and make it convenient for the manager to classify land use. We put forward the topic ratio to describe the mixed-use of land. The topic ratio uses the proportion of each spatial unit activity throughout the city to generate a series of topics for mixed land-use patterns and discuss the probability of meeting these topics for a particular spatial unit. The probabilistic topic model is widely used in recent years. Among them, the latent Dirichlet allocation (LDA) is the most popular one, and it is an unsupervised generation probability model. Blei et al. [56] put forward the LDA to speculate the distribution of article topics in 2003. It generates the topic probability distribution for each document in the document set so that analyzing some documents and extracting their topic distribution can do topic clustering or text classification according to the subject distribution. If every article in the corpus has multiple topics, and each word in the article supports these topics when setting all words in all articles as the observed value, the model is able to discover the hidden topics. This idea can be adopted to discover the region's function. Regarding a spatial unit as a document, the actual check-in data in the spatial unit as the word in the document, and the spatial unit's function as the document topic, we can obtain each spatial unit's functional distribution using the topic model. The generation process in the topic model does not assume any particular word order, and it only cares about the word quantity. This assumption also applies to the proposed mixed land-use model. Let us assume that document d m includes k topics. These topics form a mixed polynomial distribution z ∼ Multi(z| − → θ m ), which can be represented by − → θ m . − → θ m is a random variable subject to Dirichlet's prior distribution, which has a prior parameter − → α , and − → θ ∼ Dirichlet( − → α ). As for the relationship between topic and word, the words in a topic are subject to a polynomial distribution w ∼ Multi(w| − → ϕ k ). − → ϕ k represents the lexical distribution of topic k, which is also a random variable subject to Dirichlet's prior distribution. the prior parameter is − → β , and − → β ∼ Dirichlet( − → β ). The graphical representation for latent Dirichlet allocation showed in Fig.3.
Overall, LDA topic generation process can be represented as a joint distribution, as shown in the following formula: The parameter estimation is to determine the topic-word parameter matrix ϕ and the document-topic matrix θ. This manuscript uses Gibbs sampling method to estimate the parameters.

LDA process:
Step 1: Randomly generate topics. Traversing all words of all the corpus' documents, randomly assign a topic for each word, z m,n = k ∼ Multi(1/k).
Step 2: Variable initialization. Suppose n (k) m is the quantity of the topic k appears in document d m . n m is the total topics in document d m . n (t) k is the word t appears in topic k. n k are the total words in topic k.
Step 3: Iteration. Iterate over all the words in the Corpus, calculate the topic probability of each word, and sample the new topic according to the topic probability. If the word t in the current document d m corresponds to topic k, update the four variables to (n Step 4: Repeat iteration. Repeat step 3 until the hyperparameters − → α and − → β converges.

2) REGIONAL FUNCTION HYBRID PATTERN RECOGNITION BASED ON LDA
Within a region, a specific type of POI is dominant in quantity, but it did not always represent the region's function. For example, the area where many universities congregate is likely to be an education center, but there are also many restaurants. It is because of the generally higher restaurant quantity in the city. Therefore, we need to consider the POIs in which residents' activity happens. This question is similar to the topic finding, where we should consider the word frequency and the contribution. Similarly, we use the POI quantity and the check-in frequency to build the topic model.
The application of LDA in urban area function identification firstly appeared in the work of Microsoft Research Asia (MSRA) by Yuan et al. [57], [58], and then this method has been widely used [59]. POI and function label are always used as the auxiliary data for classification, but in this paper, they are analogous to the document's words and directly treated by the LDA model. Besides, function labels in the third-party data set classified according to the activity type reduce the data complexity and consider the residents' activities.

3) DETERMINE TOPICS SIZE
In the LDA model, it is essential to find a decent number of potential topics. The general solution is to use perplexity [60], which is a standard performance measure of probabilistic models in machine learning, and theory like this: in which M is the number of documents in the testing set, N d is the number of words in each document. k is the topic size, w stands for a specific word, and z is a particular topic generated from θ, γ is document-topic distribution in the test set. Perplexity is the logarithm of words in the test set to the negative likelihood estimate of the generated document, so the more powerful the model, the smaller the value of perplexity. Repeating the following steps and averaging the results at last using different K : 1) dividing documents into a training set and testing set; 2) training topic model in the training set; 3) computing the corresponding perplexity in the testing set. The relationship between perplexity and topic size, i.e., K , could be plotted. The smaller the perplexity values, the better the model's effects are so that the optimal topic size could be determined.

C. ACTIVITY-SPACE INTERACTION UNDER THE INFLUENCE OF SPATIAL SIMILARITY
When mixed-use patterns of urban space are obtained, they can be used to discuss further whether individual residents' activities are affected. In order to better discuss the interactive features of individual residents' activities and spaces, firstly, the similarity of grid spaces is judged, and similar grids can provide similar functions for residents. Distance and functional demand for travel destinations are two competitive factors, so it is necessary to consider these two factors when analyzing spatial attraction from the individual perspective.

1) SPATIAL UNIT SIMILARITY MEASURE
After running the LDA topic model, the spatial unit represents a polynomial distribution of k dimensional mixed topics, as a probability vector A i = [p 1 i , p 2 i , · · · , p m i , · · · , p k i ], where the sum of all the probability values is 1.Therefore, we can apply a variety of probabilistic distance or similarity measures to quantify the distribution similarity of spatial units' function such as the Hellinger Distance [61], cosine similarity [62], and Jensen-Shannon divergence (JSD) [63]. VOLUME 8, 2020 Specifically, the cosine distance is a measure of similarity between two eigenvectors, and Hellinger distance is a triangle inequality distance.
KL divergence is a measure of the asymmetry of the difference between two probability distributions A i and A j . JS divergence measures the similarity of two probability distributions. It is a variant of KL divergence and solves the problem of KL Divergence Asymmetry. If the two distributions do not intersect at all, then JS divergence is a maximum of 1; if the two distributions are identical, then JS divergence is a minimum of 0.
The entropy of the probability distribution is: The KL divergence of A i and A j . is: Based on KL divergence, the JS divergence of A i and A j . is: To measure the similarity between spatial units more intuitively, define the JS divergence similarity matrix:

2) INTERACTIVE CHARACTERISTICS BETWEEN THE INDIVIDUAL ACTIVITIES AND SPATIAL UNITS
If regarding the activity space unit as the trip node, the individual can produce a trip between nodes. However, it is impossible to infer the travel distribution among nodes and the relationship among spatial units only by activities spatial distribution. Because the individual activities gather unevenly in a specific spatial scope, we can cluster the residents according to their check-in positions, and then the activities distribute around a core point. Take this core point as the focal point, and this paper discusses the spatial distribution features of residents' activities, e.g., the interactive features between the core point and the spatial units.
In the travel activities, the farther away from the activity distribution center, the lower the activity probability is. In the interactions between an individual and a spatial unit, as the distance increases, the strength of the interaction between the core point and the spatial unit decreases, and this phenomenon is called distance decay [64]. On the macrolevel, the activity intensity between residents and spatial units negatively correlates with the distance under the condition that other variables are relatively stable. On the micro-level, the probability of individuals moving to different spatial units negatively correlates with the distance. G ij = CP i P j f (d ij ) is usually used to quantify the effect of distance on spatial interaction. C is a constant coefficient, G ij represents the interaction strength between the core point i and the spatial unit j, P i and P j reflect the size of the core point and the spatial unit. The distance decay function f (d ij ) has an independent variable distance d ij , which is used to describe the influence of distance factor quantitatively. The most commonly used distance decay functions are exponential [65], power [66], and gauss function [67].
Exponential distance decay function: Power distance decay function: Gauss distance decay function: For individual residents, the choice of a spatial unit is to choose the land function. Residents' initiative activities are not only affected by distance, but also by other subjective factors. Residents care about whether multiple functional requirements can fill within a smaller footprint and whether the functionality provided by one location can be replaced by other frequently visited or familiar locations. It means that spatial units with similar functions in different locations will compete, while the spatial units with different functions in similar locations are complementary [62].
Based on spatial units' topic distribution similarity and the distribution of individual activities, we can distinguish the interaction between the daily activities and spatial unit functions. It is possible to determine whether functional complementarity or competition among spatial units affects the distance attenuation of individual activity-spatial unit interactions, and the impact.

V. RESULT A. LDA TOPIC MODEL RESULT 1) TOPICS QUANTITY DETERMINATION
We first divide the check-in functions and the quantity in each spatial unit to represent the document-word Matrix (the spatial unit-function occurrence Matrix). For the LDA input matrix, each row represents check-in activities quantity for each spatial unit, and each column represents the activity type.
To obtain the proper number of topics for the regionfunctional hybrid pattern, test the dataset's perplexity. We randomly select 70% as the training set, take the remaining as the test set, calculate the perplexity, repeat this operation five times in each topic size, and calculate the corresponding perplexity of the topic size k. The K ranges from 0 to 25 is used to show the relationship between the degree of perplexity and the topic size in Fig.4. The Lower degree perplexity indicates that the better model performance. The results showed that the perplexity decreased gradually with the size of the topic in the five tests. When the topic size surpasses 15, the perplexity decreased slowly, so we chose 15 as topic size.

2) FUNCTION HYBRID PATTERN RECOGNITION
After running the LDA model, the distribution of 9 functions in 15 topics is obtained. A specific function has a significant proportion in some topics, and the other functions' contribution is tiny. There are two or three functions that account for a high proportion, and function mixing is prominent in other topics.
The 15 topics in Fig.5 are divided into two categories according to the degree of functions mixing. The first category is single functional-dominant: social service-dominant (topic #1), eating-dominant (topic #2), entertainment-dominant (topic #3), recreation-dominant (topic #7), transportation-dominant (topic #6), recreationdominant (topic #7), shopping-dominant (topic #10) and home-dominant (topic #13). Each of the above-mentioned single function types has its corresponding dominant topic. An important reason for the lack of a work-dominant topic may be residents' general reluctance to share their work status.
The second category is the multi-functional: entertainment, social service, transportation mixed (topic #4); entertainment, recreation, transportation mixed (topic #8); shopping, eating, transportation mixed (topic #9); transportation, recreation, entertainment, eating, shopping mixed (topic #12); transportation, social service, shopping, shopping, recreation, eating, home mixed (topic #14); social, eating mixed (topic #15). The probability of transportation type is always high in the topic with multi-functions, so transportation is an essential reason for the mixed degree of land use functions. The impact of land use on transportation demand is primarily generated by the impact of activities. Besides, activities related to noncommuting like leisure, entertainment, shopping, and eating always appear together. The education and home functions are less intermingled with other functions, possibly because these two types of land use functions typically occupy a larger area than a single spatial unit.

3) TOPIC DISTRIBUTION FOR EACH SPATIAL UNIT
The LDA model obtains the topic distribution of each spatial unit. Each spatial unit is denoted as a polynomial distribution of a k-dimensional mixed topic. Ten spatial units are randomly selected, and their topic distributions are displayed in Table.3, according to which we can find that most spatial units have a topic, and its probability is much higher than other topics.
According to all spatial units' topic distribution, we find that only 746 grid units' second-largest topic probability is over 0.05, which means 81% of the topic with the highest probability can represent spatial units. Moreover, only 22 grids units' third-largest topic probability is over 0.05, which means that 99.4% of the spatial units can be represented by the topics with the most considerable two probability.
For New York City, the urban space can be divided into two categories with different functional mixing degrees, and further divided into 15 meaningful mixing patterns. For each VOLUME 8, 2020 spatial unit, one or two of these 15 patterns can be represented. It provides a suitable classification method for space management from a micro perspective.

B. ACTIVITY-SPACE INTERACTION RESULT 1) SPATIAL UNIT SIMILARITY MEASURE
According to the proposed spatial unit's similarity metric D JS We can analyze the pairwise similarity between spatial units. Fig.6 shows the D JS similarity matrix for all the spatial units in New York City, in which each unit represents a 15dimensional vector of the topics. The similarity value ranges from 0 to 1. The higher the value, the more similar the distribution of topics between the two units. The values on the diagonal are all 1. Visualize the similarity matrix, Use the grid number as the abscissa and ordinate It is easy to identify some blue bars with low similarity values in Fig.6. It indicates that the distribution of the functions of these spatial units is significantly different from others. Further investigation reveals that the units are mainly located in developed areas, such as Wall Street, Broadway. The frequent simultaneous occurrence of different types of POIs in these spatial units within the region led to a low similarity to other locations. Therefore, given any spatial unit, we can find the most or least similar spatial unit according to this similarity matrix.

2) SPATIAL UNIT FUNCTIONAL INTERACTION
To study the interaction between the resident and spatial unit, we use the grid number as a distance unit denoted by d gn , recognize the spatial unit A ij as a node, and distribute each resident's activity in the grid graph. The grid coordinates of the core points C p are obtained by averaging the grid numbers G p = {A ij , User ID = p} of each resident p's activity.
Calculate the distance between the check-in activities and the core point use as well as statist the relative frequency of check-in activity, and then we obtain the distance-frequency graph. Use exponential, power, and gauss distance decay functions to fits the distance and check-in activity frequency curve, as shown in Fig.7. The R-square values of the three functions are 0.91389,0.79364 and 0.93926. The gauss distance decay function is the best, so we employ it in this paper. The 180074 data in the check-in dataset correspond to 1,085 residents. However, our check-in data sets are uneven for individuals because each resident has a different propensity to initiative share their location in the social network. The more check-in data a resident has, the more distinctive the location will be. To better discuss the effect of spatial interaction distance decay at the individual resident level, we first ruled out residents with fewer than 200 check-ins. The remaining residents are divided into four equal parts according to the activity space's scope. A resident number is randomly selected from four data sets. We selected four residents with numbers 354,293,315,84 for Quantitative analysis and recorded them as A, B, C, and D.
Firstly, by distributing their activities in grid coordinates, it can be found that the four residents' spatial range of activities is quite different. Resident A activity distributes over 275 d gn in the range ((2, 72), (11,125)). Resident B activity distributes in 58 grids within ((9, 62), (16,108)). Resident C activity distributes in 121 d gn ((10, 70), (20,124)), and Resident D activity distributes in 80 d gn ((8, 55), (36,117)). We can see that although residents' activities can distribute in spatial units far-off, the access rate to accessible spatial units is not high and does not exceed 5 percent. Therefore, activities concentrate on a small number of space units, and space unitindividual interactive intensity is not entirely following the distance decay. Therefore, discuss the effects of individual selection in the following. For Resident A, as shown in Fig.8, the distance-interaction frequency curve is significantly different from the gauss curve. The primary activity radius is 50 d gn equaling 15 km, close to the city radius. There is no apparent decay between distance 0 to 50 d gn , and there are four peaks at the distance of 10, 18, 27, 42, and the last peak is the highest. By observing the activity distribution, we find the spatial clustering phenomena at the four distances where the interaction intensity peaks occur. Select the spatial units with high interactive intensity in the aggregated areas and capture the similarity data in the similarity matrix. We can see that the fourth peak's main reason is that the similarity of the function distribution between the fourth peak and the other peaks lies in the third and fourth grades, and their function distribution forms a complementary relationship. The demand for resident A activity for location function mostly overcomes the attenuation resistance of distance.
For Resident B, as shown in Fig.9, the interaction intensity decreases with distance, and the trends are steeper than the gauss curve. The primary activity radius is 20 d gn , which is equivalent to 6 km. There are also clusters of active regions at several distances with high interaction intensity.
For Resident C, as shown in Fig.10, activities are basically in the range of 3 d gn , converted to 0.9 km, a very short activity distance. By observing the activity distribution, we can find  that the spatial units near the core point have a high functional mixing degree. The functional similarity degree between the core point and some spatial units far away is in the first grade. Residents can meet the needs of daily activities at a minimal distance, so the distance factor largely determines the intensity of activity-space interaction. For resident D, as shown in Fig.11, most activities are concentrated in 15 d gn , converted to 4.5 km, and 40 grids distance also covers some activities. By observing the activity distribution, we can find that there are three activity clusters, and the activity intensity of the cluster at 5 d gn is much higher than that at 40 d gn . The similarity of the three activity regions is at the first level, and the functional distribution forms a complementary relationship. VOLUME 8, 2020 By combining the above four residents' intensity-distance distribution, we can find that the distance decay curve's effect is not the same for individuals. The range of activity varies significantly from person to person. Distance as a limitation to activity frequency has a few influences for the person with an extensive range. As for the complementary functions provided by the different urban spaces, all residents show the demand and access. People show strong access desire to the locations far away but can meet their activity function needs. The subjective demand for activities is an essential factor in overcoming the distance limit.

VI. CONCLUSION
The utilization of urban space function is an essential task in the field of urban planning and design. This paper provides a new perspective from residential activities' footprint for exploring the land use patterns and the urban activity space patterns. This work uses the vast amount of check-in data from social media to explore how spatial functions and activities model each other from urban residents' activities perspective. This paper quantitatively analyzed the degree of mixed-use of urban functions and studied the mixed mode of functions in small-scale spatial units. On the premise of applying information entropy to evaluate the mix of landuse functions, the LDA topic model is introduced to identify the mix patterns. Use the intensity of residential activities as the input of the model, we get the 15 mixed land use pattern topics and obtain the topic distribution of each spatial unit. Furthermore, on that basis, we discussed the effects of spatial function on individual activity based on the distance feature in activity-space interaction. Using the JS divergence of the spatial unit's similarity measure makes it easy to obtain the similarity matrix. On the premise of fitting the gauss distance decay function, we discuss the variation of the interaction intensity between the individual and space. It proved that the spatial function distribution similarity leads to the complementation and competition in the residents' subjective decision-making. The influence of subjective on activityspace interaction is to satisfy the functional requirements of the activity. This study provides a new idea for studying urban land use patterns and urban spatial function from a multi-disciplinary perspective. The way of bringing active location information from Social Network check-in data into urban space, the methods of GIS, information entropy, pattern recognition, text mining, spatial similarity are integrated and cross-utilized. We hope that this approach can be applied to other areas of future urban development, such as benefiting urban planning, public services, and location-based recommendations.
We only used a predefined grid system (300 m * 300 m) as urban land use spatial analysis unit. The size was adopted by following previous research works and the size of checkin data. This spatial scale is in line with the requirements of micro-scale urban functional area quantitative identification, and the number of check-in data in each grid is reasonable for the topic model training. However, we may need to examine our methods' sensitivity using different sizes of grids and different sizes of check-in data sets. Although our study has successfully used check-in data in New York City to identify 15 functional urban distribution topics, LDA topic modeling is an unsupervised method with some limitations in finding proper urban functions. We plan to study additional considerations to use a supervised topic model to discover functional distribution performance in the future. Although the method used in this study applies to different cities, the thematic model results are different, representing the city's spatial characteristics. Due to the lack of check-in data in other cities, it is impossible to compare different cities' spatial characteristics through contrast experiments. The unitary social network data can only represent the location where the activity occurs, ignoring the time sequence and transportation, and covering up some activities characteristics. In the following research, multi-source data should be used to mine further the characteristics of individual residents' whole-day tripactivity chain. The interaction between residents and spatial units is mainly based on qualitative descriptions and a lack of quantitative surveys based on residents' travel choice tendencies.