The Problem

Sound is one of the hardest signals to work with.

Unlike a controlled lab environment, a construction site is acoustically chaotic. Tools run at different speeds and intensities. Sounds bounce off walls, get muffled by distance, and blend with background noise that has nothing to do with the activity being monitored and each building has its own baseline acoustic properties, unique in every case.

Teaching a machine learning model to make sense of that, reliably and in real time, is a genuinely difficult problem. It was not clear at the outset whether it was solvable at all within the constraints available.

The Challenge

No existing dataset to work from

There was no off-the-shelf dataset of construction tool sounds suitable for training a model. Everything needed to be built from scratch: recorded, sourced or generated.

Worse, the model needed to recognise not just the tools, but everything that wasn't a tool. Without that, any unfamiliar sound would be misclassified as the closest tool match rather than correctly identified as background noise.

Tools that sound nearly identical

Some of the sounds that mattered most were also the hardest to distinguish. A drill and a bathroom hand dryer, for instance, produce very similar acoustic signatures. Differentiating between similar tools running at different speeds, intensities or distances required far more training data than was available.

Equipment that didn't behave the way you'd expect

The scissor lift platform on site, (referred to as a Mobile Elevating Work Platform (MEWP)) , only emitted a beep intermittently while moving, with no audio signal at all while stationary. That made it extremely difficult for a model to recognise continuous use, since the sound itself was not continuous.

Working within a tight budget

This was an internal R&D initiative for GSST, funded in small increments rather than as a fully resourced project. Work was scoped in short bursts, a handful of days at a time, which shaped how much could realistically be tested and refined.

The Solution

The Curve used Edge Impulse, a machine learning platform designed for embedded devices, to build and train an audio classification model intended to run on an Arduino Nicla Voice board. The approach combined supervised learning with a custom-built dataset drawn from multiple sources, aiming to distinguish between distinct categories of construction tool sounds and background noise on a small, low-power device suitable for deployment on site.

Our Approach

Building a dataset from nothing

With no existing data to draw on, The Curve assembled training data from several sources: recordings of power tools made in-house, relevant clips sourced from YouTube, and open datasets including Google's speech commands dataset.

Critically, the dataset also had to include extensive examples of what wasn't a tool. Voices, ambient noise, silence. Without that negative training data, the model would default to matching every sound to the nearest tool category, however poor the match.

Training and testing iteratively

Using Edge Impulse's supervised learning workflow, audio samples were labelled and run through a configurable neural network. The model was tested for its ability to distinguish between broad categories of tool sound, such as reciprocal, drilling-style tools versus impact tools like hammers. Early results were genuinely promising at this broad category level.

Diagnosing the data problem in real time

As testing progressed, it became clear that data quality and quantity were the binding constraint. One particularly memorable issue involved a microphone mounted to a pole housing equipment with a large fan. From the office, recordings sounded fine. On site, every recording was overwhelmed by the fan's vibration, and the model began classifying nearly everything as that humming sound.

That kind of real-world interference is exactly the type of problem that only becomes visible once you start testing in the actual environment, not a controlled one.

The Results

01.

Feasibility demonstrated at a broad level: The proof of concept showed that broad categories of construction tool sounds, such as drilling versus impact tools, could be distinguished by an embedded machine learning model. That was a genuine, validated finding.

02.

A clear picture of what more it would take: The work also surfaced, clearly and specifically, what stood between a promising proof of concept and a reliable product. More data, captured across a far wider range of distances, intensities and real-world conditions, was the single biggest factor limiting further progress. Distinguishing acoustically similar tools required significantly more training samples than were available. Intermittent equipment sounds, like the scissor lift's beep, needed a different modelling approach to detect continuous use. The microphone picked up ambient vibrations from a nearby cooling fan, which skewed the audio input and caused the model to misclassify most sounds. That kind of interference only became visible through on-site testing.

03.

A foundation GSST could build on: The R&D phase gave GSST a working demonstration of feasibility, a documented methodology and a clear view of next steps. Given the scale of data collection required to take the concept further, GSST progressed the next phase of work with Edge Impulse's own specialist audio and machine learning team, who were better positioned to support that scale of dataset development.

Not every R&D project ends in a finished product. Some end in a clear, evidence-based answer to the question that was actually being asked: is this possible, and what would it take? For GSST, The Curve's work proved the core idea had real merit while identifying exactly what stood in the way of scaling it. That clarity, delivered within a tight budget and a genuinely difficult technical domain, gave GSST the foundation to make an informed decision about how to take the idea forward.

Their Thoughts

The team were very knowledgeable, quick to turn things around and genuinely engaged in developing ideas and proposing solutions. It was a complex, unconventional problem and they approached it with real curiosity.

Mike Joseph

GSST