I've just nailed a problem that's been bugging me for WEEKS. I cannot explain how satisfying this is. :)
To make a long story short: I've got the markasjunk2 plugin installed. (https://github.com/JohnDoh/Roundcube-Plugin-Mark-as-Junk-2)
I noticed that when I marked emails in RoundCube as spam, they were learned, but if I ran "sa-learn" via shell on that message, it was learned again. Odd, since SpamAssassin is not supposed to relearn things it already knows as ham or spam.
After MUCH digging, I discovered that markasjunk2's call to $rcmail->storage->get_raw_body($uid) was returning a body with CRs (^M) at the end of each line. These CRs are *NOT* in the actual email files stored in /var/vmail! get_raw_body() is appending them. This resulted in a message with an entirely different SHA1 hash, which was throwing off SpamAssassin and hence registering as a different email.
So I've fixed this by using str_replace() in markasjunk2 to remove the \r characters. But I'm wondering if get_raw_body() is supposed to be appending CRs like that? Is it a bug? Do other parts of RoundCube depend on those CRs being added?
Quote from: CaptSaltyJack on April 21, 2013, 12:42:08 PM
$rcmail->storage->get_raw_body($uid) was returning a body with CRs (^M) at the end of each line.
I think this is not true, at least in 0.9.
I can almost guarantee it's true. Take a look at this file in the markasjunk2 plugin:
https://github.com/JohnDoh/Roundcube-Plugin-Mark-as-Junk-2/blob/master/drivers/cmd_learn.php
Specifically:
if (preg_match('/%f/', $command)) {
$tmpfname = tempnam($temp_dir, 'rcmSALearn');
file_put_contents($tmpfname, $rcmail->storage->get_raw_body($uid));
$tmp_command = str_replace('%f', $tmpfname, $command);
}
The resulting file created from file_put_contents() definitely has CRs inserted at the ends of each line which is causing problems when sa-learn creates a SHA1 hex hash made from the first 1024 characters of the body. Unless PHP's file_put_contents() is somehow inserting CRs. Otherwise, get_raw_body() is most certainly adding CRs because those CRs are not present in the mail files in /var/vmail.
I applied a fix to markasjunk2 by changing this:
file_put_contents($tmpfname, $rcmail->storage->get_raw_body($uid));
To this:
file_put_contents($tmpfname, str_replace("\r", "", $rcmail->storage->get_raw_body($uid)));
And now SpamAssassin's Bayes.pm plugin is getting the right msg_id hash whether I mark an email as spam via RoundCube, or use sa-learn in the shell.
This is also causing the "bounce" plugin to send bad-formatted messages. The extra line-break is causing most headers to be rendered as a part of the body in the redirected/bounced messages.
in bounce.php, line 90
$msg_body = $rcmail->imap->get_raw_body($msg_uid);
changed to
$msg_body = str_replace("\r", "", $rcmail->imap->get_raw_body($msg_uid));
Cool. Seems the problem lies with roundcube, perhaps.