Enhanced data governance and improved access.
In the 1960s the area now known as East Trinity Reserve was a natural wetland, rich with native plants and animals. In the 1970s the area began being used to grow sugar cane.
After the cane farming ended in the late 1990s, efforts began to remediate the land and by 2016 East Trinity Reserve’s ecosystem was recovering well – but progress is still being closely monitored and managed by Queensland’s Department of Environment & Science.
Today East Trinity Reserve is serving not just as evidence for how important scientists are to improving the environment, but as a pilot for an important cloud-based data and analytics transformation that is designed to break open the Department’s on-premises silos of information and make them more widely accessible and useful to accredited scientists.
Data is the lifeblood of science; the Department of Environment and Science has six directorates focussed on areas ranging from environmental management to climate modelling. Data collections cover water, air, soil measurements as well as flora and fauna surveys and encompass remote sensing and satellite imagery.
Armed with its own high performance computing environment to manage data processing, the Department’s data collections have been largely siloed in the past with different scientists using different systems to store, manage and analyse their data. The Department was concerned that as a result it was missing opportunities to share data among scientists and risking duplication of effort.
There was also a risk that the key people in the department who understood where data was stored and how it could be accessed might retire or leave the organisation – taking with them the knowledge accumulated over the years.
To encourage greater data access, enhance analytic opportunities and improve the governance and security surrounding its collections, the Department is modernising its approach to data and analytics, first undertaking the proof of concept with data from East Trinity Reserve.
Enhanced data governance and improved access
Working with Microsoft partner, Versor, the Department identified East Trinity Reserve as a good candidate for its data and analytics project designed to demonstrate the value of cracking open data silos.
Since the remediation of East Trinity Reserve began in 2001 the Department has collected data every ten minutes from sensors located in 15 stations. Over 20 years a significant collection of data exists – but it’s always been siloed and hard to analyse.
In the past scientists relied on manual processes to manage sensor data cleansing, error analysis, storage, and version management. As a result, there was a huge backlog of unvalidated monitoring data and low confidence about the overall quality of data.
It meant that scientists spent more time wrangling data than conducting scientific work which was frustrating innovation efforts.
The data project, which was rolled out in around two months involved transferring scientific data into Azure Data Lake where it is cleansed using Azure Databricks, turned into an SQL database which is then available for analysis using Power BI.
Daniel Brough, Science Leader for Science Information Services at the Department adds that at the same time there’s a push on to improve data governance across the organisation.
“This is a way to kickstart that modernisation of not just our process, but also our governance and how we deal with data as well.
More robust data governance also sets up opportunities to augment data collections with external information. For example, adding in weather forecasts would provide scientists with early warnings of overcast conditions that might prompt them to replace sensor batteries that would normally be charged by solar.
Improved data governance also brings peace of mind regarding data sharing both within the Department and externally. In the East Trinity Reserve case for example, there are indigenous owned ecotourism businesses that are hungry for data about the health of the environment.
Now the Department feels a lot more confident about that as Power BI dashboards could be developed for specific users to deliver the information, they need with confidence that accurate and timely data has been used to generate the reports.
Evan Thomas, Science Leader Soil and Land Resources, adds; “The East Trinity team is looking forward to being able to do more with the data than we have in the past. And that’s partly because of the access on a single platform to all the tools. So, there’s a bit of skilling up that we’ve got to do, but I think the opportunity there is now to do it, and to be a little more able to service our own needs internally. And reach out to others within department for more ready and reliable support, which I think with the model we were on, it just wasn’t possible.”
For Michelle Martens, Land Resource Officer; “It’s given me, I think, a new way to look at the data I’m collecting. Data management today is completely different to what it was 20 years ago. I think even restructuring the data in a way that makes it more accessible for a lot of people will make a big difference to the amount of wrangling required to get data in and out and do things with it.”
Nigel Rablin, Principal Consultant at Versor, adds; “There were some existing environments in place that the guys are using. But a lot of it was manual work in respect to that. Pretty much, Michelle and the groups around her for the past 20 years have been collecting and analysing data with whatever tools they could get their hands on. So obviously they can only analyse so much data at a time. “
The initial workload indicates that once the system is in full production with all data loaded, authorised scientists will have unfettered access to data and tools that dramatically simplify analysis allowing them to focus on the science.
Widespread opportunities to improve science outcomes
Department business analyst, Jennifer Richards says that following the East Trinity pilot there are plans to broaden the strategy to support the entire science division.
She says the focus is on; “Thinking about ‘what are the things that we’re aiming for in our whole program?’ and it’s to get better use from our data. So, using the fair principles, making the data more findable, accessible, interoperable, reusable.
Brough adds that the modern data and analytics platform also frees up scientists so they; “Have more time to do science rather than chasing data and converting data. Michelle was saying before that she’s really looking forward to being able to get in and analyse the data. What’s the data telling you about what’s going on at the East Trinity site? Not just looking at a graph every morning to work out if everything’s still working.”
This pilot was able to save the efforts of the equivalent of 0.5 of a full-time employee.
Marten agrees; “That’s it for me – I’m really looking forward to having more time to not just wrangle equipment and keep things operating on site, but to actually look at some of the longer-term trends. In the stuff we were graphing before, I couldn’t graph more than three months at a time, or it would fall over because of the data set.”
At present two of the 20 years of East Trinity Reserve data has been loaded onto the platform with another 18 to follow.
Marten adds; “I’m so excited at the prospect of looking at 20 years of data from the one station. That’s something I haven’t been able to do like all in one big, long series before.”
For scientists, that transparency and access to entire longitudinal data sets, and the ability to use cutting edge analytical tools to see patterns and trends emerge is critically important, allowing them to make evidence backed decisions with real and lasting impact.
Tags: AzureCloudDE&SQueensland