Good day Guys
Im working on migrating data from a PostgreSQL Roundcube database into MariaDB, and I've run into an issue with the collected_addresses table that I'd like some clarification on.
In PostgreSQL, the email column is defined as varchar(255) with default collation, which performs strict byte-wise comparisons. Because of this, PostgreSQL correctly treats accented characters as distinct. For example:
kardiológia@removed.
kardiologia@removed.
These coexist without conflict.
However, after importing into MariaDB, I receive a "Duplicate entry" error when inserting an address like kardiologia@..., because MariaDB's default collations (utf8mb4_unicode_ci, utf8mb4_general_ci, etc.) are accent-insensitive and fold characters like ó into o. This means MariaDB considers the two addresses equivalent at the index level.
To resolve this, I am considering changing the email column and the related UNIQUE index to use COLLATE utf8mb4_bin
This would ensure that email addresses are compared strictly based on their binary/Unicode code points and would match PostgreSQL's behavior.
Before I make this change, I want to check with the Roundcube team and community:
1) Is using utf8mb4_bin on collected_addresses.email safe and compatible with Roundcube's expected behavior?
2) Are there any pitfalls, side effects, or Roundcube features that may break when using binary collation on this column?
Any guidance, recommendations, or experiences would be greatly appreciated.
Thank you in advance.