![]() |
| | LinkBack | Thread Tools | Display Modes |
| |||
| Spam Checking I do not know if I should be posting in this section. . . but I incorporated a routine in the libs/link.php file to do an surbl check on the link for a news article. It can also check the contents of the user-supplied comment for a spam link. To start, I added the following switch to config.php Code: define('CHECK_SPAM', true); Code: function check_spam($text ) {
//get site names found in body of passed text
$regex_url = "/(www\.)([^\/\"<\s]*)/im";
$mk_regex_array = array();
preg_match_all($regex_url, $text, $mk_regex_array);
for( $cnt=0; $cnt < count($mk_regex_array[2]); $cnt++ ) {
$domain_to_test = rtrim($mk_regex_array[2][$cnt],"\\");
if (strlen($domain_to_test) > 3)
{
$domain_to_test = $domain_to_test . ".multi.surbl.org";
if( strstr(gethostbyname($domain_to_test),'127.0.0')) {
return true;
}
}
}
return false;
} It can handle the full comment or just the URL being passed. I implemented it to just look at that by adding a check to the get function: Code: // # spam check -- return invalud URL if spam
if( CHECK_SPAM && $this->check_spam( $url))
{ $this->valid = false; return; } The downside is this takes a little longer to do the domain check on a URL link. The upside is it reduces the workload for the admin and could lessen the amount of spam being placed in the system. Should I continue? |
| Sponsored Links |
| ||||
| Also, If you have some time to enhance this, please do. thanks again!
__________________ - Ash |
| |||
| I will try to get at it. How should I approach letting you know about the hacks I have come up with? Right now I am trying to figure out how to make it easy for pligg to be installed in sub-directories. Well, it is easy, it just takes a lof of editing and testing to make sure everything still works. I am also, for my own purposes taking a hard look at the templates. I am trying to model things after some nice open source work being done. My goal is to prevent any questions about the origin of the css. I have also found that some of the css actually does not work as intended. Figuring out why is another matter. Today, I added a singular variable for votes cast. It is just a style thing. But my journalistic kicks in from time to time. These are just a couple of things I am playing with. Everuday seems to take me to another part of the source. |
| |||
| I updated the code for the function to allow the use of a rule set as well as contacting the surbl list to check domain names which have been. I created three more global variables to hold the names of local and an imported spam rules files and a log file. I have a master file assembled. I am just working on a way to update it based on some of the blacklist projects which are around. I will host the rules as part of another antispam project I have undertaken. I need to work out an updating method. Instead of updating the full file, people would just update the file they have in hand. People need to maintain their own local rules. I need to extend the program on which this is based to better integrate into pligg so they can make these changes through a web interface. At any rate. Following is the latest version of the antispam module functions: Code: function check_spam($text )
{
global $MAIN_SPAM_RULESET;
global $USER_SPAM_RULESET;
$regex_url = "/(http:\/\/|https:\/\/|ftp:\/\/|www\.)([^\/\"<\s]*)/im";
$mk_regex_array = array();
preg_match_all($regex_url, $text, $mk_regex_array);
for( $cnt=0; $cnt < count($mk_regex_array[2]); $cnt++ )
{
$test_domain = rtrim($mk_regex_array[2][$cnt],"\\");
if (strlen($domain_to_test) > 3)
{
$domain_to_test = $test_domain . ".multi.surbl.org";
if( strstr(gethostbyname($domain_to_test),'127.0.0'))
{ logSpam( "surbl rejected $test_domain"); return true; }
}
}
$retVal = check_spam_rules( $MAIN_SPAM_RULESET, $text);
if(!$retVal) { $retVal = check_spam_rules( $USER_SPAM_RULESET, $text); }
return $retVal;
}
#####################################
# check a file of local rules
# . . the rules are written in a regex format for php
# . . or one entry per line eg: bigtimespammer.com on one line
####################
function check_spam_rules( $ruleFile, $text)
{
if(!file_exists( $ruleFile)) { echo $ruleFile . " does not exist\n"; return false; }
$handle = fopen( $ruleFile, "r");
while (!feof($handle))
{
$buffer = fgets($handle, 4096);
$splitbuffer = explode("####", $buffer);
$expression = $splitbuffer[0];
$explodedSplitBuffer = explode("/", $expression);
$expression = $explodedSplitBuffer[0];
if (strlen($expression) > 0)
{
if(preg_match("/".trim($expression)."/", $text))
{ logSpam( "$ruleFile violation: $expression"); return true; }
}
}
fclose($handle);
return false;
}
##############
## log date, time, IP address and rule which triggered the spam
##############
function logSpam($message)
{
global $SPAM_LOG_BOOK;
$ip = "127.0.0.0";
if(!empty($_SERVER["REMOTE_ADDR"])) { $ip = $_SERVER["REMOTE_ADDR"]; }
$date = date('M-d-Y');
$timestamp = time();
$message = $date . "\t" . $timestamp . "\t" . $ip . "\t" . $message . "\n";
$file = fopen( $SPAM_LOG_BOOK, "a");
fwrite( $file, $message );
fclose($file);
} Code: $MAIN_SPAM_RULESET = "antispam.txt"; $USER_SPAM_RULESET = "local-antispam.txt"; $SPAM_LOG_BOOK = "spamlog.log"; |
| ||||
| The code needed a few changes like changing from this $retVal = check_spam_rules( $MAIN_SPAM_RULESET, $text); to this $retVal = $this->check_spam_rules($MAIN_SPAM_RULESET, $text); But it blocked a domain that I put into the antispam.txt file, and logged it. ![]() I'm a bit confused as to why there is antispam.txt and local-antispam.txt. thanks!
__________________ - Ash |
| |||
| Sorry to take so long to get back. I have been busy. My idea is to create a master file of spam rules which could be added to over time. This would allow the system to draw in rules from other projects and those that I come across in my own work. Individual users would have the ability to create a local file, which would contain rules which are tailoered to their specific need sand which might not work well in a global rule set. I keep meaning to draft the global rule set and post a link. I will inforporate that into my main antispam project and post a link as soon as I am done. |
| |||
| I've been having a look at the antispam implementation in this project. I think its possible that this script could move away from flat file based to a MySQL based antispam system, giving you more flexibility and less files. This would render the following files unnecessary: antispam.txt, local-antispam.txt, and spamlog.log Would anyone be interested in my implementation of a database driven version? Also, just a point to be made, this is perhaps not a very pro-active method of spam reduction. It might be worth looking at giving the ability for users to use regex (regular expressions), as regex is possibly the best method of matching spam. If you are interested in this, let me know, I will implement it. |
| |||
| I was thinking a plugin for SpamAssassin would be a good thing. It support surbl and gains a bunch of other tests. Plus you can always write your own rules. |
| |||
| Quote:
How would you propose to implement this? |
![]() |
« Previous Thread
|
Next Thread »
| Thread Tools | |
| Display Modes | |






Linear Mode
