Position Paper Data Governance
JoinData’s core belief is that trust and transparency are key in data governance.
A farmer invests in (expensive) machinery to manage his operations. By doing so, the farmer produces more data and can utilize the insights gained from these (new) data sources to achieve better results.
To get insights from data often requires an investment in data analytics or data science capabilities. Typically, these investments are done by third parties or by the equipment manufacturer itself. They require data, data from the farmer and from other parties, to generate these insights and to generate new data sets. These investments need to be protected.
As JoinData we created a model which we believe fits the needs of all parties.
This position papers aims to clarify our view on data governance and how that model works.
Sharing Data
Data can easily be copied. This makes it hard to control who has access to your data. There’s not just a single copy that you can keep track of. To give access to data to someone means that you have to accept that a copy of that data can be made by that person. And that means that you have to trust him or her to use the data in a proper way since you cannot control that copy.
Once you have shared data, there is no turning back. You cannot unshare data in the same way that you cannot force someone to forget something.
It is our strong belief that for innovation to happen and to increase transparency in the chain, data needs to be shared. And for data to be shared, trust is essential. For JoinData, trust is thus essential.
We see a worldwide trend where companies and consumers wonder if they have control over ‘their’ data.
In JoinData’s vision, sharing data of someone without his or her consent, is not a future proof model. We need to create an eco-system where data is shared with explicit permission.
Ownership
Who really owns the data? Which parties were involved to produce the data? Since a copy is easily made, who owns the copy? In most cases, there are multiple parties required to produce the data and so each party could claim partial “ownership”. Therefor it is easier to reason about permission to use, and who needs to give that permission. Within the JoinData platform, data is categorized based on who controls the right to use. In most cases, there are two parties that have an opinion on this. The farmer, from whose farm the data originates, and the data source, that manages that data.
If the data is related to the farm, the farmer needs to give consent to share. If the data is not related to a farm, only the data source itself needs to give consent. In some cases, the data source wants to retain some control of the distribution of data, also for farmer specific data. That may be for commercial reasons or to assure a certain quality standard on the usage of that data. JoinData allows all three forms of consent: free data is data that the farmer, and the farmer alone, has to give consent for distribution. Licensed data requires consent from both the data source and the farmer. And aggregated data requires only consent from a data source. The data source determines if the provided data is free or under license.
JoinData challenges parties to share as much data as possible and to be transparent on reasons why some data is licensed.
Authentication
For data to be distributed, the proper parties have to give consent. That consent needs to be stored digitally so that the distribution can happen digitally. Not everybody can give consent for any company, JoinData needs to establish that someone is authorized to do so for a company. For this, a per-country approach is appropriate. Each country where JoinData is active has its own system or systems of storing company information. We follow national standards to make sure that proper authentication of persons is guaranteed, within the security norms of that country. That also allows us to negotiate with national registries to adopt a mutual trust. This will lead to less administrative overhead and more stable distribution since given consent only needs to be stored in a single register, automatically trusted by other registers.
Purpose
When data is shared, it is shared under the assumption that it is used properly. What that means depends on the reason why the data is being shared in the first place. If no restriction is placed on the usage of data, anything goes. And there is no turning back. It is no more than reasonable that if someone asks for your data, they need to explain what they are using it for and what restrictions will be placed on that usage. If no purpose and restrictions are given, you are less likely to share that data since there is no way to undo that decision. For this reason, JoinData always asks for a purpose when access to data is requested. So, by using a system build on trust and transparency, you become resilient to future trends on data sharing and keep control over your data. A consumer of data may use that data to aggregate and transform it into new data, and in turn offer that data for distribution again, as free, licensed or aggregated data. As long as that fits within the constraints of the original purpose with which the data was acquired.
General Data Protection Regulation and Trends in Transparency
In Europe, the GDPR (General Data Protection Regulation) governs how personal data should be treated. JoinData uses this as a norm on how to govern data distribution. The reason for this is twofold. Data that we distribute may contain personal data. In some cases, the distinction between a company and a person is a very small one. Business-to-business data involving a small family businesses or one-man-shop is easily related to a person. And this is quite common in the agricultural sector. To keep things simple and consistent, JoinData treats all data as if it potentially could contain personal data. By doing so, we create a system that is sustainable and resilient.
But more importantly, we embrace the GDPR since we simply believe that it is the proper way of transparently handling data. Without proper controls, businesses are less likely to trust other parties and thus less likely to share data. Questions regarding ‘data ownership’ or ‘control over data’ are not initiated by the GDPR. This is a worldwide trend which will force companies to reorganize and redefine their business and data strategy. With proper controls, sharing data will become more and more commonplace.
Adapters and Standards
With only consent to distribute data we still have nothing. The data has to be accessible as well. Data sources (data custodians) need to open up their platform and make distribution of data possible. And they need to do it willingly. Without their cooperation, accessing the data is difficult. Even if the data can be retrieved without cooperation, via a custom adapter or hardware plugin, that adapter or plugin needs to be maintained. Changes on the data source side will interrupt the working of that adapter. Without cooperation, the service level will be low, interruptions will be plentiful and maintenance expensive. One could even expect a data source to actively undermine the operation of such an adapter, legally or technically, and maybe rightfully so.
JoinData actively engages with data sources and negotiates with them to willingly open up and distribute their data. For that, a data source needs to be able to trust that their interests are protected as well as those of the farmer and data consumers. JoinData strongly believes that this is the only future-proof way to ensure high quality and economic data distribution. There are plenty of examples where third parties deploy self-made adapters or ‘hack’ equipment. They can’t offer a high quality and highly available dataflow in a cost-efficient way because equipment manufacturers do not support these adapters. We believe that building adapters without support of equipment manufacturers is not a sustainable business model.
Access to data is still useless if the data is not readable. JoinData actively works with standardization committees and data sources to make sure that data is conforming to national and international standards. Again, cooperation of a data source is essential; only they can correctly interpret and transform their data to comply with a standard.
Conclusion
JoinData firmly believes that a transparent and fair way of distributing data is essential. All parties, farmers, data sources and application providers need to adopt the same standards in data governance. Without transparency and cooperation, there is no trust. And without trust, data will not be shared and innovation will slow down. JoinData aims to establish trust via transparency, bringing innovation to the entire sector.
Example
Assume a farmer John has a temperature sensor in his stable. This temperature sensor is developed by a company Temp Inc. Their business model is selling these sensors. Another company Climate Inc is interested in the temperature readings to make weather predictions and sell those. None of these parties have data distribution and managing consent as a core business so all of them want to use JoinData.
Company | The farm is the legal entity whose data is being shared. Typically, we use a Chamber of Commerce or similar number to identify the farm. |
Farmer | John is the sole owner of his farm. He as a natural person is legally authorized to sign on behalf of his farm. JoinData facilitates per country a way for him to authenticate himself and to proof that he is authorized to sign for his company. |
Advisor | John also has advisors and employees that help him with the administration for the farm. John has authorized the advisor to perform certain tasks for him. For JoinData this works in the same way as for John himself: the advisor is authorized to sign for the company. The reason why the advisor is authorized is not relevant for JoinData. |
Raw data | The temperature sensor measures all kinds of raw signals in the stable.These signals need to be interpreted before they are useful for farmer John. This raw data is not useful for the farmer. JoinData plays no role in transporting this data. The processing of these signals is typically done inside the sensor itself. |
Free data | When the raw data is interpreted, the sensor knows the temperature. This temperature can be displayed on the sensor itself but is also transported to the Temp Inc database. From there, other parties such as Climate Inc could access that data as well. Since farmer John has paid for the sensor, Temp Inc does not have any objections in sharing that data. The data is related to farmer John’s business so John (or his authorized advisor) will have to give consent.Climate Inc will need to explain why they need that data and how they are going to use it: the purpose. Based on that, John can decide if he wants to share his temperature readings with Climate Inc. |
Licensed data | Climate Inc wants to use this data to build local and global weather forecasts and sell those. But the people at Climate Inc understand that they need the cooperation of both the farmer as well as the data source, to be able to get a reliable data stream. In this case, Temp Inc has already made their business case by selling the sensor, and see the sharing of the temperature as an additional feature of their sensor so they have classified the data as free.Climate Inc decides to give something back to the farmer: the farmer can get warning signs if the temperature in the stable is not ok. They develop a model that can predict temperatures and send alerts faster than competing products. John decides to agree with Climate Inc’s purpose and gives consent.The farmer can use these alerts but Climate Inc wants to be able to control what other companies have access to these alerts. They thus classify these alerts as licensed. If another company, e.g. a farm management system wants to use these alerts, they need consent of both farmer John as well as Climate Inc. |
Aggregated data | Climate Inc creates their weather models based on the temperatures and other sources and offers weather predictions as a service. Since these are not granular enough to pick out individual farmers, there is no consent required of John, even though John’s sensor has contributed to that data. The purpose that John agreed to allows for this usage. Climate Inc needs to make their business case based on these models and thus this data requires their consent before it will be distributed. This data is thus classified as aggregated. |