top of page

From Detection to Decision: How Cyanite Approaches AI-Generated Music

  • Writer: Andrea Zuckermann
    Andrea Zuckermann
  • 15 hours ago
  • 9 min read

We recently sat down with Roman Gebhardt, Chief Artificial Intelligence Officer (CAIO) of Cyanite, to discuss the complex frontier of AI music detection. As generative AI continues to blur the lines between human and machine-made content, the need for clarity in music catalogs has never been higher.


How has Cyanite’s mission evolved with the rise of generative AI, and how has this changed your view on audio analysis?

"We’ve actually been working in Music AI long before generative models became a major topic. From the beginning, our focus has been on understanding music at scale, for example through tagging, search, and similarity. That’s the space we’ve been building in.


When generative AI started to emerge, we were of course  interested. But it was never a direction we wanted to move into ourselves. What became interesting is something else: at some point, these models reached a level where their output is no longer easily distinguishable from human-created music.


That’s where our analytical perspective suddenly becomes very relevant.


So in a way, this is not a pivot for us. It’s more that the space has evolved to a point where analysis plays a new role. Detection is where these two worlds meet.

At the same time, it also changes the nature of the problem quite a bit.


Before, much of what we worked on was inherently subjective. Describing music, capturing semantics, helping people navigate catalogs. Even though we always aimed to be precise, there was always some level of interpretation involved.


With detection of AI-generated content, we’re dealing with something much more objective. It becomes less about semantics and more about identifying stable, technical patterns in the signal. In that sense, it’s closer to anomaly detection than to traditional music understanding, because we’re not interpreting music, but identifying patterns stemming from the generation process of musical audio.


That shift has consequences for how we build the system.


Our customers don’t just want helpful metadata, in addition they need to be able to rely on the result in a very concrete way."



What is the most difficult aspect of defining whether a piece of music is “AI-generated” in practice?

"The hardest part is that it is not a clean category to begin with.


As soon as you look at real-world cases, the question becomes quite blurry. Is a track AI-generated if the musical idea comes from a human but is then expanded, polished, or even just mastered by a generative system? Is it AI-generated if you take a composition from a system like Suno and re-record it yourself? Where exactly do you draw that line?


In a way, the question already contains the problem. As soon as you ask “is this AI-generated”, you assume that there is a clear definition behind it. In practice, there often isn’t.


We are not trying to judge authorship or creativity. We are not looking at the composition or the musical idea. We look solely at the signal.


More specifically, we assess whether the audio itself shows characteristics of having been generated by a generative model, regardless of what the musical content is or where it originally came from. This is important because musical information can easily bias a system if you rely on it.


Instead of treating this as a yes-or-no question, we approach it as a question of signal strength and confidence. The output reflects how strongly the audio exhibits characteristics of generated content, not a final statement about what the piece is or how it was created."



Is there a clear boundary between “AI-assisted” and “AI-generated” music, or is this better understood as a spectrum?

"Certainly, AI's role in music creation exists on a spectrum - from AI-assisted sample selection, mixing, and mastering, to full AI composition or instrument design. Our product does not attempt to define where the line is; that is for our clients and users to determine.


What we have observed increasingly, however, is a clear-cut use case at the far end of that spectrum: music that is entirely generated from a generative AI platform, with no further human intervention. This is an unambiguous AI generation.


Our approach is to analyze the audio signal and assess whether it exhibits characteristics consistent with model-generated content, and how strong those indicators are. When the indicators are strong, we can say with a high degree of confidence that the track was most likely AI-generated. How that determination is then applied within a customer's operations is entirely their decision."



How can Cyanite’s detection tools support real-world decisions, for example for a Sync agent pitching to a brand or studio?

"In a sync workflow, the question is often not what a track is in an absolute sense, but whether there is anything that should be taken into account before moving forward. Detection becomes useful at exactly that point.


For example, a high detection score can flag a track that might require clarification before pitching to a brand with strict AI policies. A low score, on the other hand, can provide reassurance that there are no strong indicators in the audio.

It adds an additional layer of awareness. 


How that signal is used depends on the context. For some, it is about risk management before pitching. For others, it becomes part of a broader evaluation process.


So in practice, we support decision making. We help people act more consciously and with a better understanding of what they are working with."



How should detection results be interpreted in practice? What does a high or low score actually mean for decision-making?

"In very clear cases, for example fully AI-generated and unprocessed tracks, the signal can be very strong. In those situations, the system will typically produce a very high score, and given our conservative calibration, that can be treated as a reliable indicator that will be correct in 99.99% of the cases. 


On the other end, very low scores mean that there are no strong indications in the audio. It does not prove that something is entirely human-created, but it does mean that it is very unlikely to come from the models we actively cover.


That part is important, because our coverage is not random. We focus on models that are actually used in practice, based on continuous evaluation and feedback, both from the industry and musicians.


Between those extremes, interpretation becomes more nuanced. The score reflects how strongly the audio exhibits characteristics of generated content, not a binary classification.


So in practice, very high scores are actionable, very low scores are reassuring within the scope of what we cover, and everything in between should be understood as a matter of degree rather than a definitive answer."



Why is minimizing false positives more important than maximizing detection rates?

"Because the consequences are very different.


If you miss an AI-generated track, that can be an issue depending on the use case. But in most cases, it does not immediately break trust in the system.


Mislabelling a human-created track as AI-generated has real consequences. It directly impacts artists, and it creates operational and compliance headaches for our customers. Picture a distributor suddenly having to deal with a misclassification issue at scale — the reputational and legal exposure alone makes that a problem nobody wants on their hands. That's exactly what we're here to prevent.


Once that trust is gone, people stop relying on the output.


That’s why we take a conservative approach. We don’t try to detect everything. Instead, we focus on signals that are strong and consistent enough to act on.


The upside of that is that when we do produce a very strong signal, it tends to be meaningful. You can treat it as a reliable indicator, not just a probabilistic guess.


So it’s not about being cautious everywhere. It’s about being very confident in the cases where it matters, and accepting that there will be cases where you don’t make a strong call.


In the end, I believe the bigger risk is not the use of AI itself, but the possibility of fully automated generation workflows where content is produced at scale without meaningful human involvement. In those scenarios, detection becomes important to maintain transparency and prevent a purely quantity-driven dynamic."



Many detection approaches rely on training on known models. What are the limitations of this approach as new models emerge?

"The main limitation is that these systems are tied to what they have seen before.

If you train on a fixed set of models, you are essentially learning patterns that are specific to those systems. As soon as new models or new versions emerge, those patterns can change, and reliability can drop..


At the same time, we see something encouraging in our detectors in practice. Certain indicators tend to persist across versions of the same platform. Even when a model is updated, some of the underlying characteristics remain surprisingly stable. That gives us some room, because it means detection does not completely break every time a new version appears.


But you cannot rely on that alone. The space is evolving too quickly for a static solution.


We have also experimented with building more generalized detectors that aim to function across many models and platforms. These can work quite well in controlled settings, but in practice they often become brittle as new models emerge or the generation process changes. That is where I think there is sometimes a tendency in the industry to overclaim what these systems can do.


We treat detection as an ongoing process rather than a fixed model. We continuously evaluate new systems and update our understanding of which signals remain reliable."



How do you balance enabling AI-driven workflows with protecting the integrity of music catalogs?

"I don’t really see those as opposing goals.


AI is becoming part of how music is created, and there is a lot of value in that. Trying to block it entirely is neither realistic nor desirable. 


At the same time, the more these tools are used, the more important it becomes to maintain transparency around how content is produced.


That’s where we position ourselves. We are not trying to restrict workflows or take a stance against AI. We provide a technical layer that makes it possible to understand what is happening in the audio.


Through this, different stakeholders are given the possibility to make their own decisions, based on their context. A platform might have certain policies, a label might have different requirements, and a sync agent might look at it from a risk perspective.


In that sense, enabling AI workflows and protecting catalog integrity both depend on having trustworthy information about the content."



What does responsible AI use look like in day-to-day practice for a company like Cyanite?

"Frankly speaking, I am glad not to have to answer this question from the position of a company that builds generative music models. Defining what responsible use looks like there is a very complex and, in many ways, still unresolved topic where not even ethical and legal perspectives are always aligned.


Our position is different from companies building generative models. We focus on analysis, which means building tools that help people make sense of what is happening, rather than generating content.


For us, responsible AI is less about defining rules and more about what kind of systems we choose to build. 


We aim to create tools that are actually useful for musicians and people working with music, especially in a landscape that is becoming more complex through the use of generative models.


A big part of that also is understanding what these systems are actually doing. We spend a lot of time on research, for example, looking into structural biases in representation models, and more generally questioning what kind of signals these models produce and why. That way of thinking carries directly into our detection work.


It’s not enough to build something that performs well, but about building something where we understand the underlying behavior well enough to rely on it in practice.


That continuous questioning is a core part of how we work. Not taking outputs at face value, but trying to understand where they come from and where they might break. That mindset is what we try to bring into everything we do, and it’s a big part of why we feel comfortable working in this space."



What is one misconception about AI music detection that you think the industry still gets wrong?

"That it is possible to build a system today that reliably detects all AI-generated music and always gives a clear “this is AI” or “this is not”, across all models and all scenarios.


I think that assumption is misleading. The space is evolving too quickly, and the signals are too diverse for a single, universal solution.


What you can do is build systems that are very reliable in the cases where strong and well-understood signals exist. In those situations, you can make a clear call, and we are proud that we are able to do that for the most widely used, state-of-the-art models in practice.


But outside of that, there will always be cases where the signal is weaker or mixed, and where a definitive answer is not possible.


So, for me, the misconception is treating detection as a solved, universal classification problem, when in reality it is about working with uncertainty and being very deliberate about where you can actually be confident."



 
 
 

Comments


Gradient Background

Subscribe to the Synchtank Weekly Newsletter

bottom of page