More on the surprisingly awesome power of AI in coding for hydrology

After the last post (https://horritt.blogspot.com/2025/04/llms-for-coding-surprisingly-helpful.html) describing my first foray into using AIs for coding, I've been using Claude a lot more. It's now my go-to starting point for any "blank page" projects, and for a lot of reuse of existing code too.

Earlier this week I was reminded of the NRFA API. For non-hydrologists the NRFA is the National River Flow Archive, which hosts a load of useful information from gauging stations in the UK. And for non-programmers, an API is an Application Programming Interface which tells us how to write code to access this information.  

Accessing this data is the sort of thing I'd usually have to set aside half a day to get my head round - so I thought I'd see if an LLM could help speed up this process. The NRFA has provided a really detailed description (https://nrfaapps.ceh.ac.uk/nrfa/nrfa-api.html) of the API - so I've given that to Claude and asked for a library to access it: 

 

This generates a ~600 line Python library - I ask Claude to fix a couple of things and it's working. That's taken about 5 minutes. 

Next I want to start to play around with the data: 

 

Which gives me some code that accesses the median annual flood (something useful for generating estimates of more extreme floods like the 1 in 100 year) and relates it to the catchment area:

 

There's a lot of scatter there - mostly because we're not taking catchment geology into account - there's a load of catchments with a much lower flow than we'd expect based on area alone. Let's try to represent that using the baseflow index catchment descriptor also listed in NRFA: 

 

(Notice the different prompt format - I've moved to Claude Code for this bit, which is much better at integrating my changes with LLM outputs, and can do loads of other housekeeping for programmers too). 

This generates a better fit, you can see on this plot how higher BFIHOST values lead to lower QMEDs: 

 

And how much including both area and BFIHOST improves the fit between the fitted model and the measured QMED:

 

Finally we can do the same with 4 parameters:

 

      

All the plots are generated by Claude - I didn't ask for all these, but it made them anyway.

Of course there's an issue here - it's so easy to get things running that it's easy to do a load of analysis without really understanding what's going on. So you still need to put in the time to understand the data and check the outputs you're getting. But the LLM is so powerful in grabbing and organising the data that it's a definite productivity boost, leaving me to focus on the data. 

 

 

 

Comments

Popular posts from this blog

LLMs for coding - surprisingly helpful