Roundcube Community Forum

 

Huge problem with rcube_imap get_raw_body function?

Started by CaptSaltyJack, April 21, 2013, 12:42:08 PM

Previous topic - Next topic

CaptSaltyJack

I've just nailed a problem that's been bugging me for WEEKS. I cannot explain how satisfying this is. :)

To make a long story short: I've got the markasjunk2 plugin installed. (https://github.com/JohnDoh/Roundcube-Plugin-Mark-as-Junk-2)

I noticed that when I marked emails in RoundCube as spam, they were learned, but if I ran "sa-learn" via shell on that message, it was learned again. Odd, since SpamAssassin is not supposed to relearn things it already knows as ham or spam.

After MUCH digging, I discovered that markasjunk2's call to $rcmail->storage->get_raw_body($uid) was returning a body with CRs (^M) at the end of each line. These CRs are *NOT* in the actual email files stored in /var/vmail! get_raw_body() is appending them. This resulted in a message with an entirely different SHA1 hash, which was throwing off SpamAssassin and hence registering as a different email.

So I've fixed this by using str_replace() in markasjunk2 to remove the \r characters. But I'm wondering if get_raw_body() is supposed to be appending CRs like that? Is it a bug? Do other parts of RoundCube depend on those CRs being added?

alec

Quote from: CaptSaltyJack on April 21, 2013, 12:42:08 PM
$rcmail->storage->get_raw_body($uid) was returning a body with CRs (^M) at the end of each line.
I think this is not true, at least in 0.9.

CaptSaltyJack

I can almost guarantee it's true. Take a look at this file in the markasjunk2 plugin:

https://github.com/JohnDoh/Roundcube-Plugin-Mark-as-Junk-2/blob/master/drivers/cmd_learn.php

Specifically:


if (preg_match('/%f/', $command)) {
    $tmpfname = tempnam($temp_dir, 'rcmSALearn');
    file_put_contents($tmpfname, $rcmail->storage->get_raw_body($uid));
    $tmp_command = str_replace('%f', $tmpfname, $command);
}


The resulting file created from file_put_contents() definitely has CRs inserted at the ends of each line which is causing problems when sa-learn creates a SHA1 hex hash made from the first 1024 characters of the body. Unless PHP's file_put_contents() is somehow inserting CRs. Otherwise, get_raw_body() is most certainly adding CRs because those CRs are not present in the mail files in /var/vmail.

CaptSaltyJack

I applied a fix to markasjunk2 by changing this:

file_put_contents($tmpfname, $rcmail->storage->get_raw_body($uid));

To this:

file_put_contents($tmpfname, str_replace("\r", "", $rcmail->storage->get_raw_body($uid)));

And now SpamAssassin's Bayes.pm plugin is getting the right msg_id hash whether I mark an email as spam via RoundCube, or use sa-learn in the shell.

jmouse888

#4
This is also causing the "bounce" plugin to send bad-formatted messages.  The extra line-break is causing most headers to be rendered as a part of the body in the redirected/bounced messages.

in bounce.php, line 90

    $msg_body = $rcmail->imap->get_raw_body($msg_uid);

changed to

    $msg_body = str_replace("\r", "", $rcmail->imap->get_raw_body($msg_uid));


CaptSaltyJack

Cool. Seems the problem lies with roundcube, perhaps.