A line in the Dire Straits’ classic Telegraph Road goes something like “Then came the lawyers and then came the rules”. Is the sequence significant? Cause/effect? Do we need lawyers to manage the rules, or do we need rules to manage the lawyers?
When personal computing began it took a while for the concept of a “database” to be developed, but as soon as it was, it was very closely followed by the concept of GIGO. The sequence (cause and effect) also being particularly significant. GIGO of course, refers to that particularly database centric phrase, Garbage In, Garbage Out. So, do we have garbage because we have databases, or do we simply have databases so that we can manage the garbage? The real question is: do we really have to have the garbage?
This is my third in a sequence of blogs considering how the mining sector can learn a bit from the Oil and Gas, and Processing sectors, when it comes to data management. My first blog here, looked at the background and the second, here, looked at the concepts behind the use of Class Libraries to structure data. This blog is going to look at how the Process sector tries to address and manage data quality.
The $5m walk
You cannot do anything about the GO bit of GIGO, if you’ve got GI, then you are going to get GO; so, we will have to concentrate on the GI bit. This becomes especially tricky if you are running data capture of large volumes of data from several different sources.
In any large process sector project, much of the data comes from a multitude of international vendors, in a multitude of formats, coordinate systems, units and cultures (decimal point vs decimal commas and so on). The balance of the data is often added as post process as-built data captured on an incremental basis. If the project has been executed without the benefit of a formalised Engineering/Information Handover specification, this latter process is often referred to as the $5m walk! (you can read more about the 7 Deadly Sins of Information Handover here).
To ensure the data is captured rigorously, use is often made of the combination of an ETL (export, transform, load) pre-process, followed by the loading of the data into a quarantine area. So, the data is Exported from the database that the vendor has provided (often simply an Excel spreadsheet), Transformed via a re-useable data mapping process which manages the class (object) names, units and formats, and then Loaded into the holding/quarantine area. Once in the quarantine area, the data can be verified and validated by internal processes and the QA/QC process executed. The process industry typically uses this process to check the data back against the Data Handover Specification for compliance checking and the results used to drive the data correction process. The data set can be viewed by other parties who might be affected by the new data. Once all parties are satisfied that the data is good, it is then added to the main dataset where it becomes accepted project data.
What are the benefits?
The process sounds a little long winded and laborious, so is it? And what are the benefits?
In most cases, the actual process is managed via the database software system in use. The ETL mappings can be pre-set and the relevant mapping selected as the data set is processed. Once Transformed the data is automatically loaded into the holding area by default so the process can be very smooth and simple.
By applying this rigid process, the initial data verification is done as part of the Transform process but the data set is not part of the accepted database. Whilst in the holding area, a full set of verification and validation processes can be applied ensuring minimal GI.
In addition, as the process is so rigid, a full audit trail can be automatically generated, adding value to the data collection. Any user of the overall database can see the data as it sits in the quarantine area and can provide input as to the QA/QC process prior to the data becoming a permanent part of the database. When the authorised owner of the dataset is comfortable that the data is fully QA/QC’d, then it is released from the holding area and becomes part of the rigorous database. This workflow is fundamental to the Management of Change (MOC) and applies equally to the management of data.
And in the mining industry…
Is this process viable, or even applicable, to the day to day collection of mining data? Well, mining data certainly comes from a multitude of sources: geology, survey, environmental, drill and blast and so on. Units and coordinates grids certainly vary, and the data collected very often affects and needs to be shared with other departments.
Is it applicable?
Well, in the process sector the data is initially moving in a single direction, i.e. the workflow is generally data handover, from projects to operations. In the mining environment, the workflow is multi directional between all the various departments i.e. a data sharing workflow. The argument is easily made that it is more important in the mining sector for GI to be avoided, or at least, managed. Of course, in the process sector, once the project data has been handed over it becomes the responsibility of the owner/operator to maintain the data and the process becomes one of data sharing.
And is it viable?
The process sector has aggressively engaged with the BIM (Building Information Modelling) approach which relies entirely on the integration/sharing of all digital content from all disciplines contributing to a project and this is where the absolute requirement, and the true value, of good QA/QC’d data is critical.
Do we have integrated data sets in the mining industry? Probably not, the ubiquitous departmental data silos prohibit it, and so the value and benefits of the ETL/quarantine process remain unattainable. Could the mining sector benefit from integrated, centralised data management and the associated ETL/quarantine data management approach? Well, there is enough in that question to merit its own blog so look out for my next blog – The case for application agnostic centralised data, coming soon to the screen in front of you!
|Sign up for the Datum360 blog updates in your inbox|
|Digital technology reducing risk and costs for your assets - sign up for our next webinar|