The Cuvave Cube Baby is one of these mystery items which pop up on AliExpress under different brand names and at a price tag which seems almost too good to be true. I found about its existence through a friend who’s into IR-based guitar effects. There are a lot of review videos online which attests its usefulness and the price tag was low enough (~ 35EUR) that I could easily buy one to play with. The Cube Baby offers the follow functionality:
- 48000 Hz, 24-bit audio input/output
- Amplifier modelling engine based on Impulse Response data
- Built-in delay/modulation section
- Built-in distortion/overdrive effects
Let’s be realistic, this is no NeuralDSP Quad Cortex, and the distortion effects are pretty bad. But the price point is two orders of magnitude lower and the IR feature alone is well worth it.
The Cube Baby is configurable via a purpose-made software client (CubeSuite) , which allows all settings, including the IR parameters, to be updated live, as well as the tuning of preset values.
I immediately noticed that there doesn’t seem to be an open source tool which is capable of doing CubeSuite’s job, which is a shame. While it seems to run quite happily on Wine, having a native Linux client and even a WebMIDI-based interface would be great. So, I decided I would find out how client application and device communicate behind the scenes, and try to reproduce it.
Spoiler: I managed to do it, but it wasn’t as simple as it seemed.
The CubeSuite communicates with it via MIDI SysEx messages. The messages are sent over USB MIDI.
The MIDI Messages
The thing with SysEx messages is that their content is simply outside of the MIDI standard. They’re meant as System Exclusive extensions to the basic communications protocol. While that allows for lots of flexibility, it means that we need to know what the information inside them means exactly if we want to be able to decipher them. It so happens that, as any AliExpress finding worth its salt, this manufacturers of this product seem to provide absolutely zero information on the SysEx messages involved. I wasn’t able to find anything, but if you do, please feel to share with me on Mastodon or Matrix.
So, my first idea was: “let’s hook up a MIDI logger to this thing, check what is being sent in either direction and try to understand what the messages mean!” Well, that’s what I did!
I found a nice tool called terminal-midi-monitor
, which does exactly as it says. I booted the CubeSuite, sampled a few messages and started trying to make sense of them.
Making sense of the messages
A first example:
Preset A: set Amp cabinet type to 1
Message sent: f000320949000040020900000018000000015e01f7
The F0
and F7
characters are easy to identify, since they mark the beginning and end of the SysEx message:
F0 <body> F7
I started by trying to find some sense in the messages which were sent back and forth. A few patterns were easily recognizable. E.g. IDs for particular parameters as well as values.
Example of a message being sent from the client, and what I gathered were the meanings of some of the bytes:
[FO] 00 32 09 49 00 00 40 02 00 00 00 00 18 00 00 00 01 70 01 F7
^ ^ ^ ^
|- PARAM | |--|- CHECKSUM?
|- VALUE
In the message above, we have parameter 0x18
being set to 0x01
. That sounds easy enough to figure out, right?
The checksum
The initial feeling of having solved a problem which should have been harder was quickly replaced by one of confusion. I noticed, as I started analysing more and more messages, that I couldn’t make sense of the last two bytes before the SysEx end delimiter (F7
). Was that a 16-bit checksum? How was it calculated?
I started looking into CRC algorithms, as those are the usual choices when it comes to checksums for error detection. Although I couldn’t see why anyone would add a checksum in a message which gets sent over a USB-MIDI link, which is supposed to be reliable by definition, thanks to the underlying serial protocol layers.
The answer is: whoever designed this protocol wanted to make it harder for others to be able to make sense of it, perhaps because they would like to avoid competition. The bad news is that if I could break it this easily, it’s very likely many more will. And you don’t even need that much determination, just curiosity. After scratching my head over the supposed CRC, I decided I had to change strategy and be a little bit more efficient in my approach. That’s when I decided to reverse-engineer the code. I used Ghidra on the MacOS version, which includes less OS-specific boilerplate crap likely to add to the confusion. Making sense of disassembled/decompiled code is already hard enough as it is, no need to make it more difficult.
I’m not a reverse engineer by any means, so it took me a few hours of work over a couple of days to figure out how to identify the spots I had to look at and find the function at the centre of everything. That’s CMIDITranfer::send_long_event
(it’s not my typo, look below), which is supposed to process a string of bytes to be sent as a SysEx MIDI Message. We can clearly see, in line 18, 0xf0
being produced; and further down, 0xf7
. Those are the SysEx delimiters we talked about before.
What comes in-between is a clever, even if pointless, attempt at obfuscating the protocol in order to discourage any curious minds. It took me a bit to disentangle the mess of disassembled variables, but I managed to produced a working equivalent Rust implementation:
Can you see what’s going on? Each successive byte is being shifted left N times, with N increasing along the way (from 0 to 7 and then back to 0, or mod 8
for the mathematically-minded), with the bits which are shifted out added to some sort of “accumulator” which is then added on the right of the next successive byte. This will, of course, lead to there being one additional byte at the end, which will take the “shifted out” bytes of the last input one. Is this cool in its own way? Yes. Is it useful at all? Not really.
I then produced an equivalent decoding function:
With this piece of information in hand, I started decoding the messages I had seen before. For instance, the message we analyzed above:
[FO] 00 32 09 49 00 00 40 02 00 00 00 00 18 00 00 00 01 70 01 [F7]
becomes:
00 59 22 09 00 00 05 00 00 00 80 01 00 00 01 78
----- -- -------- -------------------------- --
HEADER | LENGTH MSG_CONTENT CHECKSUM
MSG_TYPE
See how the checksum is now a single byte? It’s actually a simple overflowing byte add followed by a bitwise NOT:
And guess what? It works! I managed to craft messages which are accepted by the device, and to easily intercept all traffic while properly validating the checksum. From what I saw, the checksum is calculated over MESSAGE[6..]
, or 7th byte on (this is excluding SysEx delimiter bytes).
Soon, I was figuring out what each message field meant, and which message types there were. I have to confess that, in order to do that, I also disassembled the APK of the Cuvave Android app. Although it doesn’t support the Cube Baby, the protocol used by the products it supports is quite similar and the message types resemble a lot what I had seen going back and forth.
A little insight into the MSG_CONTENT
field above: that will depend on the MSG_TYPE, and for a message with type 0x22
(“write”), that would be:
05 00 00 00 80 01 00 00 01
__ ___________ ________ __
| ADDRESS LEN CONTENT
MEM
I’m not 100% sure what to call the MEM
field - I have a feeling it’s meant to specify what kind of memory we’re writing/reading from, but then that should already be encoded by the ADDRESS
field itself. Anyway, the message above means “write a 1 to address 0x80000000
”, which should be the device preset RAM (current settings).
It so happened that my initial intuition about parameter “IDs” being specified in the message body was only partially correct: what is specified is actually a memory position, which we can of course interpret as a sort of ID. Nonetheless, the model which is used is EEPROM-oriented, rather than a higher level API.
Rust crate
I’ve put together a bare-bones Rust crate which illustrates how these streams of bytes can be decoded and encoded. I also tried to include some information about the memory layout in the README. Bear in mind that it is still very incomplete and probably inaccurate. But you should be able to use this information to make your own Cube Baby client, interact with the device or even emulate it. Future plans include a WebMIDI app which allows presets to be managed (whenever time allows).
The IR files
Of course the main point of the Cuvave is to provide amplifier emulation through IR files. I managed to figure out the following:
- Sampling rate: 48000Hz
- Number of channels: 1 (mono)
- IR Memory: 2048 bytes, which means 512
float32
samples (~10ms)
As I said at the beginning, it’s by no means a NeuralDSP Quad Cortex, but that’s what you get for 35 bucks. I was able to take third-party IRs and loading them into my device using my own code. I might release a tool to do that if there is enough demand.
Conclusion
This was a fun project and, as far as I could find, I believe this is the first documented case of someone reverse engineering this little AliExpress gem.
Don’t hesitate to ping me on the Fediverse if you have any questions/ideas/comments!