This module is for developing and delivering extensions to James for the Rspamd (the spam filtering system) and ClamAV (the antivirus engine).
-
The Rspamd extension requires an extra configuration file
rspamd.propertiesto configure Rspamd connection Configuration parameters:rSpamdUrl: URL defining the Rspamd's server. Eg: http://rspamd:11334rSpamdPassword: Password for pass authentication when request to Rspamd's server. Eg: adminrspamdTimeout: Timeout for HTTP requests called to Rspamd. Default to 15 seconds.perUserBayes: Use per-user Bayes for mail scanning/feedback. Default to false.
-
Declare the
extensions.propertiesfor this module.
guice.extension.module=org.apache.james.rspamd.module.RspamdModule
guice.extension.task=org.apache.james.rspamd.module.RspamdTaskExtensionModule
- Declare the Rspamd mailbox listeners in
listeners.xml. Eg:
<listener>
<class>org.apache.james.rspamd.RspamdListener</class>
</listener>
This listener can report mails to per-user Bayes by configure perUserBayes in rspamd.properties.
-
Declare the Rspamd mailet for custom mail processing.
You can specify the
virusProcessorif you want to enable virus scanning for mail. Upon configurablevirusProcessoryou can specify how James process mail virus. We provide a sample Rspamd mailet andvirusProcessorconfiguration:You can specify the
rejectSpamProcessor. Emails marked asrejectedby Rspamd will be redirected to this processor. This corresponds to emails with the highest spam score, thus delivering them to users as marked as spam might not even be desirable.The
rewriteSubjectoption allows to rewritte subjects when asked by Rspamd.This mailet can scan mails against per-user Bayes by configure
perUserBayesinrspamd.properties. This is achieved through the use of RspamdDeliver-ToHTTP header. If true, Rspamd will be called for each recipient of the mail, which comes at a performance cost. If true, subjects are not rewritten. If truevirusProcessorandrejectSpamProcessorare honnered per user, at the cost of email copies. Default to false.
<processor state="local-delivery" enableJmx="true">
<mailet match="All" class="org.apache.james.rspamd.RspamdScanner">
<rewriteSubject>true</rewriteSubject>
<virusProcessor>virus</virusProcessor>
<rejectSpamProcessor>spam</rejectSpamProcessor>
<onMailetException>ignore</onMailetException>
</mailet>
<mailet match="IsMarkedAsSpam=org.apache.james.rspamd.status" class="WithStorageDirective">
<targetFolderName>Spam</targetFolderName>
</mailet>
<mailet match="All" class="LocalDelivery"/>
</processor>
<!--Choose one between these two following virus processor, or configure a custom one if you want-->
<!--Hard reject virus mail-->
<processor state="virus" enableJmx="false">
<mailet match="All" class="ToRepository">
<repositoryPath>file://var/mail/virus/</repositoryPath>
</mailet>
</processor>
<!--Soft reject virus mail-->
<processor state="virus" enableJmx="false">
<mailet match="All" class="StripAttachment">
<remove>all</remove>
<pattern>.*</pattern>
</mailet>
<mailet match="All" class="AddSubjectPrefix">
<subjectPrefix>[VIRUS]</subjectPrefix>
</mailet>
<mailet match="All" class="LocalDelivery"/>
</processor>
<!--Store rejected spam emails (with a very high score) -->
<processor state="spam" enableJmx="false">
<mailet match="All" class="ToRepository">
<repositoryPath>cassandra://var/mail/spam</repositoryPath>
</mailet>
</processor>RSpamdScanner supports addition rspamdUrl, rspamdPassword, rspamdTimeout, perUserBayes properties allowing to
override content defined in rspamd.properties, which allows running several instances on distict Rspamd instance. A
possible use case is to use 1 RSpamD cluster on user incoming spam, trained in perUserBayes mode, and another RSpamD
cluster configured to check outgoing email for spams with a tolerant threshold and a specifc configuration.
- Declare the webadmin for Rspamd in
webadmin.properties
extensions.routes=org.apache.james.rspamd.route.FeedMessageRoute
How to use admin endpoint, see more at Additional webadmin endpoints
- Declare the Rspamd healthcheck in
healthcheck.properties
additional.healthchecks=org.apache.james.rspamd.healthcheck.RspamdHealthCheck
-
Docker compose file example: docker-compose.yml or docker-compose-distributed.yml.
Please configure
ClamAVintegration intoRspamdif you want to enable virus scanning. -
The sample-configuration: sample-configuration
-
For running docker-compose, first compile this project
mvn clean install -DskipTests
then run it: docker-compose up
One can use this route to schedule a task that reports spam messages to Rspamd for its spam classify learning.
This task can be configured to report spam messages to per-user Bayes via perUserBayes in rspamd.properties.
curl -XPOST 'http://ip:port/rspamd?action=reportSpamThis endpoint has the following param:
action(required): need to bereportSpammessagesPerSecond(optional): Concurrent learns performed for Rspamd, default to 10period(optional): duration (support many time units, default in seconds), only messages betweennowandnow - durationare reported. By default, all messages are reported. These inputs represent the same duration:1d,1day,86400 seconds,86400...samplingProbability(optional): float between 0 and 1, represent the chance to report each given message to Rspamd. By default, all messages are reported.classifiedAsSpam(optional): Boolean, true to only include messages tagged as Spam by Rspamd, false for only messages tagged as ham by Rspamd. If omitted all messages are included.rspamdTimeout(optional): duration, Default is 15 seconds. Provide configuration timeout when HTTP request to rspamd for learning. Will return the task id. E.g:
{
"taskId": "70c12761-ab86-4321-bb6f-fde99e2f74b0"
}
Response codes:
- 201: Task generation succeeded. Corresponding task id is returned.
- 400: Invalid arguments supplied in the user request.
More details about endpoints returning a task.
The scheduled task will have the following type FeedSpamToRspamdTask and the following additionalInformation:
{
"errorCount": 1,
"reportedSpamMessageCount": 2,
"runningOptions": {
"messagesPerSecond": 10,
"rspamdTimeoutInSeconds": 15,
"periodInSecond": 3600,
"samplingProbability": 1.0
},
"spamMessageCount": 4,
"timestamp": "2007-12-03T10:15:30Z",
"type": "FeedSpamToRspamdTask"
}One can use this route to schedule a task that reports ham messages to Rspamd for its spam classify learning.
This task can be configured to report ham messages to per-user Bayes via perUserBayes in rspamd.properties.
curl -XPOST 'http://ip:port/rspamd?action=reportHamThis endpoint has the following param:
action(required): need to bereportHammessagesPerSecond(optional): Concurrent learns performed for Rspamd, default to 10period(optional): duration (support many time units, default in seconds), only messages betweennowandnow - durationare reported. By default, all messages are reported. These inputs represent the same duration:1d,1day,86400 seconds,86400...samplingProbability(optional): float between 0 and 1, represent the chance to report each given message to Rspamd. By default, all messages are reported.classifiedAsSpam(optional): Boolean, true to only include messages tagged as Spam by Rspamd, false for only messages tagged as ham by Rspamd. If omitted all messages are included.rspamdTimeout(optional): duration, Default is 15 seconds. Provide configuration timeout when HTTP request to rspamd for learning. Will return the task id. E.g:
{
"taskId": "70c12761-ab86-4321-bb6f-fde99e2f74b0"
}
Response codes:
- 201: Task generation succeeded. Corresponding task id is returned.
- 400: Invalid arguments supplied in the user request.
More details about endpoints returning a task.
The scheduled task will have the following type FeedHamToRspamdTask and the following additionalInformation:
{
"errorCount": 1,
"reportedHamMessageCount": 2,
"runningOptions": {
"messagesPerSecond": 10,
"rspamdTimeoutInSeconds": 15,
"periodInSecond": 3600,
"samplingProbability": 1.0
},
"hamMessageCount": 4,
"timestamp": "2007-12-03T10:15:30Z",
"type": "FeedHamToRspamdTask"
}Alternatively, ham/spam can be reported by using a mailbox listener. To do so enable RspamdListener within listeners.xml
configuration file:
<listeners>
<listener>
<class>org.apache.james.rspamd.RspamdListener</class>
<async>true</async>
</listener>
</listeners>Note that you can turn off reportAdded (which reports incoming messages as Ham) resulting in lesser work:
<listeners>
<listener>
<class>org.apache.james.rspamd.RspamdListener</class>
<async>true</async>
<configuration>
<reportAdded>false</reportAdded>
</configuration>
</listener>
</listeners>Note: Kvrocks integration is currently a work-in-progress and under triage on a realistic setup. As of today, the Apache James PMC does not endorse its use in production environments.
The Rspamd extension can use Apache Kvrocks as storage. Apache Kvrocks is a more suitable option for Rspamd storage compared to Redis for several reasons:
- Kvrocks stores data on disk, which is beneficial when dealing with large datasets that may not fit entirely in memory. This ensures that you can handle more extensive spam training data without running into Redis memory limitations.
- Kvrocks is Redis APIs compatible.
We document accordingly the docker compose setup:
-
Apache James + Rspamd + Apache Kvrocks Sentinel
Please note that to make Rspamd work well with Kvrocks Sentinel:
- Configure
slave-read-only noinkvrocks.conffile (allow Rspamd to execute read-only Lua script to get its Bayes statistics against the Kvrocks replicas, which Kvrocks is strict about by default). - Use Rspamd
3.10or later.
- Configure
Hereby we document a sample to use RedisShake to migrate data from Redis to Kvrocks.
Sample command:
docker run --network=emaily \
--entrypoint "/bin/sh" \
-v ${PWD}/sample-configuration/redis-shake/shake.toml:/app/shake.toml \
-e SHAKE_SRC_ADDRESS=redis:6379 \
-e SHAKE_DST_ADDRESS=kvrocks:6379 \
ghcr.io/tair-opensource/redisshake:4.4.0 \
-c "./redis-shake /app/shake.toml"