Data-driven agriculture for rural smallholdings

Spatial information science has a critical role to play in meeting the major challenges facing society in the coming decades, including feeding a population of 10 billion by 2050, addressing environmental degradation, and acting on climate change. Agriculture and agri-food value-chains, dependent on spatial information, are also central. Due to agriculture’s dual role as not only a producer of food, fibre, and fuel but also as a major land, water, and energy consumer, agriculture is at the centre of both the food-water-energyenvironment nexus and resource security debates. The recent confluence of a number of advances in data analytics, cloud computing, remote sensing, computer vision, robotic and drone platforms, and Internet of Things (IoT) sensors and networks have lead to a significant reduction in the cost of acquiring and processing data for decision support in the agricultural sector. When combined with costeffective automation through development of swarm farming technologies, these technologies have the potential to decouple productivity and cost efficiency from economies of size, reducing the need to increase farm size to remain economically viable. We argue that these pressures and opportunities are driving agricultural value-chains towards high-resolution data-driven decision-making, where even decisions made by small rural landowners can be data-driven. We survey recent innovations in data, especially focusing on sensor, spatial, and data mining technologies with a view to their agricultural application; discuss economic feasibility for small farmers; and identify some technical challenges that need to be solved to reap the benefits. Flexibly composable information resources, coupled with sophisticated data sharing technologies, and machine learning with transparently embedded spatial and aspatial methods are all required.


Introduction
Several of the 17 UN sustainable development goals, including zero hunger; decent work and economic growth; industry; sustainable cities and communities; responsible consumption and production; and life on land require very significant changes to global agriculture. The food sector accounts for around 22% of greenhouse gas emissions while 26% of workers are engaged in agriculture and 821 million people are undernourished [50]. While agricultural sustainability has traditionally focused on on-farm efficiency in food and fibre production, this has shifted to integrated value-chain approaches which address the full life-cycle impacts of production, transport, logistics, sales, consumption, and disposal [37].
In developed nations industrialization, technological advancement, and mechanization have led to agriculture becoming more capital-intensive. However, until recently it has remained information-poor. Decision-making has traditionally relied on historical narrative developed over generations, peer-to-peer local farmer networks, state-supported education and extension programs, private sector advisor services, personal connection to the landscape, intuition, and subjective visual assessment. While farming remains a labor-intensive occupation, it is becoming a knowledge-intensive industry [49]. There has also been a strong trend towards specialization, intensification, and consolidation into larger farming enterprises to achieve economies of size and scale, with better knowledge management and technology to increase efficiency and reduce the cost of production [17]. However, consolidation has come at the cost of decline in the number and value of of smaller commercial farming enterprises and, in-turn, the decline of rural and regional services and economies.
As industry moves into the next wave of agricultural innovation "Agriculture 4.0", the confluence of a number of advances in data analytics, cloud computing, remote sensing, computer vision, robotics and drone platforms, and Internet of Things (IoT) sensors and networks, are poised to reshape agricultural production. Progress in the field of precision agriculture over the past three decades has allowed farmers to increase their productivity and reduce costs based on insights drawn from the analysis of spatial and biophysical data (e.g., [10,29]). However application of these advances remains capital-intensive, as they have been tied to minimising cost through deployment on large and expensive machinery, to maximize the efficiency of labor units. Robotic automation and deployment via "swarm farming" removes the need for a human operator in the machine, allowing agricultural operations to decouple production cost efficiency from economies of size [26]. The scalable nature of swarm farming reduces the capital required to achieve production efficiency, allowing smaller farmers to maintain their economic viability. They may also become affordable for farming applications in the developing world. Deployment is reliant on spatial information systems for navigation, robot-to-robot coordination, sensing, route optimization, precision application of inputs, assessing and managing field heterogeneity, and post-operational data analysis.
The rapid advancement of IoT in manufacturing and building management shows a way forward towards an ecosystem of real time sensing, predictive analytics, and decision making [2]. However, these developments do not immediately translate to the agriculture sector where capital investment and access to capital is comparatively lower than other sectors and labor is comparatively inexpensive in the developing world. In developed nations farmers operating smaller rural holdings are often willing to bear a lower return on their capital investment in their land and business to sustain their rural lifestyle and custodial responsibility for sustainability. They may also fail to adequately value the time www.josis.org of unpaid family labor (as it does not incur an obvious cash-cost), or the opportunity cost of other income generating activities they could be doing if they were freed from attending to menial tasks.
Another significant barrier to agricultural application of IoT lies in the vertically-siloed nature of typical IoT applications. Products are developing primarily in vertical markets, such as computer-integrated manufacturing, vehicle and transportation, home and buildings, or healthcare, which suits the structure of the vertically-organized industries. In agricultural enterprises, business operates holistically. Although the locus of control may be limited in geographical extent, the melange of activities within the boundaries are highly varied and variable, with seasons, market prices, short and long-range weather forecasts, developments in technology, and consumer demand. While the market for agricultural apps has expanded rapidly in the past 5 years, and while there are "many apps" available to farmers, there are too many independent special-purpose apps which lack interoperability (e.g., [31,55]). This limits the ability to consolidate and pool the large amounts of data being generated on-farm, and to analyse the data for valuable insights. It is imperative that data arising from sensing services on farm equipment, for example, interoperates with data arising from in-situ soil moisture probes together with data arising from local weather stations and remote weather forecasts, so that inferences can be made on the basis of the joint observations. Independent apps for analysing and viewing each separately are of limited benefit.
This drives the imperative for interoperable data-driven platforms for agricultural decision making. This in turn drives the need for flexible data-representation standards, inference systems, and visualization technologies, that permit plug'n'play components to work together in a specific farming setting. In particular, leveraging modern sensing technology, interoperable public open data, and pluggable machine learning technologies, it seems that highly customized platforms for farm-scale agricultural decision making are within grasp.

The crucial role of data semantics
In 1999 the W3C first standardized the Resource Description Framework (RDF) [21], a graph data model that became the underpinning layer for two decades of work on the Web of Data, or Semantic Web. This work may be most visible today through Google's knowledge panels that provide extended information about people, places, organizations, events, and other things, together with Web search results. The panel draws from an internal, highly scalable knowledge graph [16], a graph database that is populated from a range of sources, including information presented by semantic standards on Web pages. Knowledge graphs are also used for enterprise data integration in, for example, manufacturing [38] and agriculture [40]. A knowledge graph may incorporate other semantic languages and knowledge representation tools developed through the W3C standards process, such as the Web Ontology Language [53], SPARQL Protocol and Query Language [34], and Shapes Constraint Language [19]. These platform technologies have been customized for spatial information, too. From 2017, the first author co-chaired the Spatial Data on the Web working group, jointly established by the the OGC, the major standards body for the spatial industry, and the W3C [47]. The Working Group aimed, among other things, to encourage the uptake of public spatial data by prescribing approaches to spatial data publication that make it easier for Web developers to build spatial data into applications [45,51,52].
Semantic technologies also make data amenable to machine reasoning. They trace their heritage to the AI discipline of knowledge representation but are also highly suitable for inductive machine learning. While agricultural research has been firmly founded in the statistical sciences, broader machine learning research is having impact there, too (e.g., [24,54]). Machine learning techniques are particularly useful when (a) it is not clear which features or relationships are driving sought-after behaviors, (b) the patterns being sought are not well understood in terms of biophysical processes, or (c) spatial and temporal data is sparse with respect to the problem being studied and the decision being made. Machine learning methods designed for semantic data are able to leverage expert or background data expressed in a knowledge base to assist in navigating the space of important relationships expressed in data. And, because they express inductive hypothesizes as rules or class expressions, they are relatively transparent about the reasons for the inferences they make.
In earlier work, we have shown how, with semantic technologies, live sensor data can be automatically processed by combining multiple sensor feeds in real time together with more-slowly changing background data, to generate an alert customized for a farmer's needs [46,47]. Analytical processing can also be built in to the pipeline [9,11]. This approach relies on decades of research into distributed stream management systems, but more importantly uses the data integration techniques of ontology and linked data for integration of heterogeneous data sources.
More recently, distributed streaming systems for native semantic data have been developed (e.g., [8,25,56]), ontologies for agriculture have become widely adopted [18], and the well-known cross-domain ontology for sensor data [6] has become a formal standard [13,14,44]. This means that the approach of [46], that requires customized engineering effort for wrapping and mapping sensor data, can leverage these developments for easier on-farm implementation. We might reasonably expect, for example, that commercial agricultural sensors would be semantics-powered out-of-the-box with RDF digital datasheets.
While not so rapidly-changing, large scale coverage data such as soil maps, aerial imagery, satellite imagery, and products derived from satellite imagery are needed at farmscale resolutions and currency. As they are large and rapidly updated, these are best served through public Web architectures via dynamic queries for decision making. Conveniently, [3] has shown how this can be done: very large-scale satellite imagery can be published as semantic linked-data with query-driven spatial resolution, and served for ready integration into analytical systems, such as those that are fusing or ground-truthing with in-situ observations.

Ongoing challenges for decision support
In order to properly support small-farm decision-making in a context of increasing availability and decreasing cost of IoT devices for agriculture, there are some critical problems to solve.

Composable information resources
Given the variety in both the inputs (size, landscape, soil, climate, livestock, seeds, etc.) and outputs across the agricultural sector, there can be no universal solution for agricultural decision support. While large scale farming enterprises can and do develop software customized to their needs, this is not feasible for small, www.josis.org adaptive, family farms, for whom multiple indepedent silo-ed apps are also unhelpful. There is an urgent need for a point-and-click selection and customization capability for farmer desktops, potentially relying on a cloud back-end for storage and computation. A composable GUI is required that displays, not independent windows of independent data streams, but instead customized, focused views on a linked information space. There are existing research efforts towards this direction, such as plug'n'play sensor integration at the back end [4,32], cloud APIs for connecting data to analysis services including physical models [7,48], declarative service composition [5,23,27], and ontology-driven user interfaces at the front-end [1,15]. Is it possible to draw these ideas together so that a farmer can describe what they have and what they need and thereby declaratively compose the information resources into a custom dashboard desktop? The spatial frame of the farm offers a strong organising principle that will assist. Given the tree-structured limitation on expressiveness in the OWL ontology language, more expressive declarative rule languages are likely to be needed (e.g., [28]) to interface components together. These languages will need to be graphically editable [42] for use by farmers without training in formal declarative languages.
Agri-food value-chain efficiency Consumers of food and fibre, produced on small farms, are increasingly demanding clear provenance of their purchases from the farm and through the supply chain. While research in the food industry is already leveraging semantics for this purpose [41], there can also be direct benefit for farmers in sharing data across a datatransparent supply chain to optimize services, logistics, and sales for major seasonal activities including harvesting and shearing [12]. While this is already the practice for some large scale corporately-dominated industries such as sugar cane cropping [12], the financial benefit could instead be returned to farmers to defray on-farm costs of sensor deployment. There may be competition barriers to sharing on-farm data amongst a local community. A net benefit to small farmers could be returned through a rejuvenation of the once-popular farming co-operatives, this time driven by shared data. Technically, architectures to monetize data holdings delivered through integrated services have been explored [43] and this approach may also be worth trying. Otherwise, without the clear ROI at small scale, small farmers may miss out on the data-driven benefits and be pushed right out of the industry.
Machine learning of spatial relations In agriculture, predictions of variables such as yield or optimal harvest date necessarily combine representations of spatial vector, topological and coverage data (e.g., respectively, paddock boundaries, paddock adjacency, rainfall) along with classical aspatial data (e.g., fertilization rate, water level, soil tests, pasture species). Yet it is very early days for predictive analytics with a strong spatial component in data or in target variables [39]. While there is a developing toolbox for learning semantic relational models (e.g., [22,30,35]), we need to move towards methods where spatial is not special, but where representations of spatial objects and relations are transformed to representations where general-purpose relational methods apply, just as they do for aspatial relations. While the semantic predicates for spatial relations are well-defined [33,52], along with an underlying algebra (e.g., [36]) and application demands (e.g., [20]), we lack representations of the spatial semantics in a way that is accessible to general-purpose relational learning algorithms and to composable information resources. This is an urgent need to bring the benefits of predictive analytics into agriculture, at any scale.

Conclusion
Small-scale agriculture is desirable for its ability to contribute to international goals for zero hunger; decent work and economic growth; industry, innovation, and infrastructure; sustainable cities and communities; responsible consumption and production; and life on land. Small-scale agriculture will be left behind large-enterprise agriculture if it cannot employ developing information technologies at low cost and fine-scale temporal and spatial resolution. In turn, new spatial information technologies are needed.
We are building an experimental IoT testbed across a collection of family-run dairy farms in the Bega Valley of New South Wales, Australia, to test our ideas.