Go Back   Pligg CMS Forum > Other > Retired Threads

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 01-18-2006, 03:17 AM
New Pligger
 
Join Date: Jan 2006
Posts: 18
Thanks: 0
Thanked 0 Times in 0 Posts
Spam Checking

I do not know if I should be posting in this section. . . but I incorporated a routine in the libs/link.php file to do an surbl check on the link for a news article. It can also check the contents of the user-supplied comment for a spam link.

To start, I added the following switch to config.php

Code:
define('CHECK_SPAM', true);
I added the following function to libs/link.php:

Code:
        function check_spam($text ) {
                //get site names found in body of passed text
                $regex_url   = "/(www\.)([^\/\"<\s]*)/im";
                $mk_regex_array = array();
                preg_match_all($regex_url, $text, $mk_regex_array);

                for( $cnt=0; $cnt < count($mk_regex_array[2]); $cnt++ ) {
                        $domain_to_test = rtrim($mk_regex_array[2][$cnt],"\\");

                        if (strlen($domain_to_test) > 3)
                        {
                                $domain_to_test = $domain_to_test . ".multi.surbl.org";
                                if( strstr(gethostbyname($domain_to_test),'127.0.0')) {
                                        return true;
                                }
                        }
                }
                return false;
        }

It can handle the full comment or just the URL being passed. I implemented it to just look at that by adding a check to the get function:

Code:
//              # spam check -- return invalud URL if spam
                if( CHECK_SPAM && $this->check_spam( $url))
                       { $this->valid = false; return; }
This code is adapted from NP_Blacklist which is used by the nucleus CMS. It will be easy to also adapt a local blacklist if this is of interest.

The downside is this takes a little longer to do the domain check on a URL link.
The upside is it reduces the workload for the admin and could lessen the amount of spam being placed in the system.

Should I continue?
Reply With Quote
  #2 (permalink)  
Old 01-23-2006, 11:29 PM
AshDigg's Avatar
Coder
 
Join Date: Dec 2005
Posts: 1,574
Thanks: 235
Thanked 345 Times in 206 Posts
This will be implemented in beta 5.9. It will be off by default, but it'll be there.

Thank you for helping with this!
__________________
- Ash
Reply With Quote
  #3 (permalink)  
Old 01-23-2006, 11:30 PM
AshDigg's Avatar
Coder
 
Join Date: Dec 2005
Posts: 1,574
Thanks: 235
Thanked 345 Times in 206 Posts
Also, If you have some time to enhance this, please do.

thanks again!
__________________
- Ash
Reply With Quote
  #4 (permalink)  
Old 01-24-2006, 04:58 PM
New Pligger
 
Join Date: Jan 2006
Posts: 18
Thanks: 0
Thanked 0 Times in 0 Posts
I will try to get at it.

How should I approach letting you know about the hacks I have come up with? Right now I am trying to figure out how to make it easy for pligg to be installed in sub-directories. Well, it is easy, it just takes a lof of editing and testing to make sure everything still works.
I am also, for my own purposes taking a hard look at the templates. I am trying to model things after some nice open source work being done. My goal is to prevent any questions about the origin of the css.
I have also found that some of the css actually does not work as intended. Figuring out why is another matter.

Today, I added a singular variable for votes cast. It is just a style thing. But my journalistic kicks in from time to time.

These are just a couple of things I am playing with. Everuday seems to take me to another part of the source.
Reply With Quote
  #5 (permalink)  
Old 01-30-2006, 10:30 AM
New Pligger
 
Join Date: Jan 2006
Posts: 18
Thanks: 0
Thanked 0 Times in 0 Posts
I updated the code for the function to allow the use of a rule set as well as contacting the surbl list to check domain names which have been.

I created three more global variables to hold the names of local and an imported spam rules files and a log file. I have a master file assembled. I am just working on a way to update it based on some of the blacklist projects which are around. I will host the rules as part of another antispam project I have undertaken. I need to work out an updating method. Instead of updating the full file, people would just update the file they have in hand.

People need to maintain their own local rules. I need to extend the program on which this is based to better integrate into pligg so they can make these changes through a web interface.

At any rate. Following is the latest version of the antispam module functions:

Code:
function check_spam($text )
{
global $MAIN_SPAM_RULESET;
global $USER_SPAM_RULESET;

$regex_url   = "/(http:\/\/|https:\/\/|ftp:\/\/|www\.)([^\/\"<\s]*)/im";
$mk_regex_array = array();
preg_match_all($regex_url, $text, $mk_regex_array);

for( $cnt=0; $cnt < count($mk_regex_array[2]); $cnt++ )
    {
    $test_domain = rtrim($mk_regex_array[2][$cnt],"\\");
    if (strlen($domain_to_test) > 3)
        {
        $domain_to_test = $test_domain . ".multi.surbl.org";
        if( strstr(gethostbyname($domain_to_test),'127.0.0'))
            { logSpam( "surbl rejected $test_domain");  return true; }
        }
    }
$retVal = check_spam_rules( $MAIN_SPAM_RULESET, $text);
if(!$retVal) { $retVal = check_spam_rules( $USER_SPAM_RULESET, $text); }

return $retVal;
}

#####################################
# check a file of local rules
# . . the rules are written in a regex format for php
#     . . or one entry per line eg: bigtimespammer.com on one line 
####################

function check_spam_rules( $ruleFile, $text)
{
if(!file_exists( $ruleFile)) { echo $ruleFile . " does not exist\n"; return false; }
$handle = fopen( $ruleFile, "r");
while (!feof($handle))
    {
    $buffer = fgets($handle, 4096);
    $splitbuffer = explode("####", $buffer);
    $expression = $splitbuffer[0];
    $explodedSplitBuffer = explode("/", $expression);
    $expression = $explodedSplitBuffer[0];
    if (strlen($expression) > 0)
        {
        if(preg_match("/".trim($expression)."/", $text))
            { logSpam( "$ruleFile violation: $expression"); return true; }
        }
    }
fclose($handle);
return false;
}

##############
## log date, time, IP address and rule which triggered the spam
##############

function logSpam($message)
{
global $SPAM_LOG_BOOK;

$ip = "127.0.0.0";
if(!empty($_SERVER["REMOTE_ADDR"])) { $ip = $_SERVER["REMOTE_ADDR"]; }
$date = date('M-d-Y');
$timestamp = time();

$message = $date . "\t" . $timestamp . "\t" . $ip . "\t" . $message . "\n";

$file = fopen( $SPAM_LOG_BOOK, "a");
fwrite( $file, $message );
fclose($file);
}
The three new globval variables are:

Code:
$MAIN_SPAM_RULESET = "antispam.txt";
$USER_SPAM_RULESET = "local-antispam.txt";
$SPAM_LOG_BOOK = "spamlog.log";
Reply With Quote
  #6 (permalink)  
Old 01-30-2006, 11:22 PM
AshDigg's Avatar
Coder
 
Join Date: Dec 2005
Posts: 1,574
Thanks: 235
Thanked 345 Times in 206 Posts
The code needed a few changes like changing from this

$retVal = check_spam_rules( $MAIN_SPAM_RULESET, $text);

to this

$retVal = $this->check_spam_rules($MAIN_SPAM_RULESET, $text);

But it blocked a domain that I put into the antispam.txt file, and logged it.


I'm a bit confused as to why there is antispam.txt and local-antispam.txt.

thanks!
__________________
- Ash
Reply With Quote
  #7 (permalink)  
Old 02-09-2006, 12:03 AM
New Pligger
 
Join Date: Jan 2006
Posts: 18
Thanks: 0
Thanked 0 Times in 0 Posts
Sorry to take so long to get back. I have been busy. My idea is to create a master file of spam rules which could be added to over time. This would allow the system to draw in rules from other projects and those that I come across in my own work.

Individual users would have the ability to create a local file, which would contain rules which are tailoered to their specific need sand which might not work well in a global rule set.

I keep meaning to draft the global rule set and post a link. I will inforporate that into my main antispam project and post a link as soon as I am done.
Reply With Quote
  #8 (permalink)  
Old 11-16-2006, 05:33 PM
Casual Pligger
 
Join Date: Oct 2006
Posts: 81
Thanks: 0
Thanked 8 Times in 6 Posts
I've been having a look at the antispam implementation in this project.

I think its possible that this script could move away from flat file based to a MySQL based antispam system, giving you more flexibility and less files.

This would render the following files unnecessary: antispam.txt, local-antispam.txt, and spamlog.log

Would anyone be interested in my implementation of a database driven version?

Also, just a point to be made, this is perhaps not a very pro-active method of spam reduction. It might be worth looking at giving the ability for users to use regex (regular expressions), as regex is possibly the best method of matching spam.

If you are interested in this, let me know, I will implement it.
Reply With Quote
  #9 (permalink)  
Old 11-17-2006, 01:17 PM
Casual Pligger
 
Join Date: Oct 2006
Posts: 35
Thanks: 1
Thanked 6 Times in 4 Posts
I was thinking a plugin for SpamAssassin would be a good thing. It support surbl and gains a bunch of other tests. Plus you can always write your own rules.
Reply With Quote
  #10 (permalink)  
Old 11-20-2006, 08:51 AM
Casual Pligger
 
Join Date: Oct 2006
Posts: 81
Thanks: 0
Thanked 8 Times in 6 Posts
Quote:
Originally Posted by aixelsyd View Post
I was thinking a plugin for SpamAssassin would be a good thing. It support surbl and gains a bunch of other tests. Plus you can always write your own rules.
Sounds fantastic.

How would you propose to implement this?
Reply With Quote
Reply

Thread Tools
Display Modes
Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Similar Threads
Thread Thread Starter Forum Replies Last Post
Spam Report Button w/ Inaccurate Story Label eH9116 Modification Tutorials 24 03-09-2008 01:29 PM
Can we create a sticky on how to deal with SPAM? mobass General Help 3 03-08-2008 08:42 PM
Why release 9.9.0 with out spam protection, or e-mail comfirmation? bbrian017 Off-topic 14 01-09-2008 11:54 AM
This spam thing is killing me. The Humanaught General Help 47 12-13-2007 09:49 PM
Spam Control Options richrf General Help 3 02-09-2007 10:56 AM


Search Engine Friendly URLs by vBSEO 3.2.0