Auto Tagging

Register an Account
Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 11-08-2007, 07:10 AM
New Pligger
Pligg Version: 9.8.2
Pligg Template: Emerald
 
Join Date: Nov 2007
Posts: 29
OK.... another day... another step closer to the vision ..... more questions

People are goddam lazy when it comes to tagging. I know I am, and don't doubt all submitters will be too. So, does anyone know a way to auto-insert-tags?

I would like this to happen in two different scenarios which would require distinct processes as far as I can tell:

1. During submission ... take all words in the title and put them in the tags box (perhaps with an ignore list for words like 'the' 'and' 'it' etc). The tag box would remain editable so if the user didn't like the suggestions they could change them.

2. RSS imports - same thing take all words form title, put them in tags. Obviously there is no user involved so these would simply be saved.

However there is a 3rd possibility... a job that I can run occasionally that runs on all untagged stories in the db. It could run once an hour and do:
- find stories where tagfield = null
- parse title for those stories for unique words in story title excluding those words we store in a file somewhere (if, the, and, what, etc)
- write remaining words to tagfield

Now here is the biggest problem.... I am a terrible programmer... I can 'hello world' in a million languages but not much more lol.
So is there a solution out there for any of these scenarios? Anyone have something similar that could be adapted?
Or am I screwed in which case which do you think would be the easiest approach for me to try and tackle?
Reply With Quote
  #2 (permalink)  
Old 11-08-2007, 10:34 AM
New Pligger
Pligg Version: 9.8.2
Pligg Template: Emerald
 
Join Date: Nov 2007
Posts: 29
Well I have been playing around with the RSS feed side of things today with reasonable success.

Simply creating a field link between the link_title of the feed and the tag field of pligg seems to work most of the time on the few feeds I tried it on. However occasionally it says there was an error in the SQL but think this is due to a symbol such as ? being in the title confusing the query. Plus I get a lot of the crap words like 'the' and 'and' in tags.... there is obviously no intelligence in this method

So right now I would like to focus on the user side of things if anyone has ideas?
Reply With Quote
  #3 (permalink)  
Old 11-16-2007, 05:49 AM
New Pligger
Pligg Version: 9.8.2
Pligg Template: Emerald
 
Join Date: Nov 2007
Posts: 29
OK, not that I think many are paying attention but as the vast majority of my content so far is pulled in from RSS feeds I am fairly happy with the solution of creating a field link between the title of the feed and the pligg tag field. It works and puts all the words from the title into the tags (I only have it enabled on a couple of the imported fields atm but that will change soon).

My concern now is only getting rid of all the words from the tags that are not important.

e.g. the, and, at, to, some, many, a, then, that, etc

My only blunt instrument to do this is for me to run a delete query on the db as a cron job. However my host only allows me one cron job (lol) which is being used for the RSS import itself.

So I was wondering if anyone could be my saviour and suggest the code to run this query from the RSS import itself, i.e. appended to the end of the import.php

e.g. delete link_tags from pligg_links where link_tags contains 'at' or 'or' or 'and' or 'that' or 'they' .... etc etc
Reply With Quote
  #4 (permalink)  
Old 11-16-2007, 06:38 AM
Constant Pligger
 
Join Date: Feb 2007
Posts: 226
Sorry, can't help you with your problem but want to ask something.

When you're using feed title as tags I think the whole title is one tag because there is not commas. Am I wrong?
Does tags that you import from feeds appear on your tag cloud? At least a few months ago there was somekind of bug with this.
Reply With Quote
  #5 (permalink)  
Old 11-16-2007, 08:43 AM
New Pligger
Pligg Version: 9.8.2
Pligg Template: Emerald
 
Join Date: Nov 2007
Posts: 29
Quote:
Originally Posted by Andtony View Post
Sorry, can't help you with your problem but want to ask something.

When you're using feed title as tags I think the whole title is one tag because there is not commas. Am I wrong?
Does tags that you import from feeds appear on your tag cloud? At least a few months ago there was somekind of bug with this.
Crap you are right

so now the query needs to replace spaces with comma & space perhaps? I'll delve into the db itself and see exactly what is written and whether it is correctly showing in the tag cloud.
Reply With Quote
  #6 (permalink)  
Old 11-19-2007, 11:11 AM
Pligg Donor
Pligg Version: 9.9.5
 
Join Date: Sep 2007
Posts: 192
Quote:
Originally Posted by seer View Post
Crap you are right

so now the query needs to replace spaces with comma & space perhaps? I'll delve into the db itself and see exactly what is written and whether it is correctly showing in the tag cloud.
Well, I'm sure this won't exactly be helpful, but try using "category" as your tag from the imported feeds. I know it will be only one keyword usually, but at least it will be a useful tag.

On another note, I have been editing the rss feed I import by creating my own xml file for it. I edit the original file and manually insert the tag <keywords> for each item. So my new xml file looks like this:

Code:
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Somesite.com</title>
<link>http://www.somesite.com</link>
<description>Here is my description</description>
<language>en</language>
<copyright>Somesite.com</copyright>
<lastBuildDate>Fri, 16 Nov 2007 08:05:02 +0300</lastBuildDate>
<managingEditor>info@somesite.com</managingEditor>
<item>
 <title>My first title</title>
 <link>http://www.somesite.com/my-first-link.html</link>
 <keywords>keyword 1, keyword 2, keyword 3</keywords>
 <description>Pligg is a great script....</description>
 <category>Category 1</category>
 <pubDate>Fri, 16 Nov 2007 08:06:02 +0300</pubDate>
</item>
<item>
 <title>My second title</title>
 <link>http://www.somesite.com/my-second-link.html</link>
 <keywords>keyword 4, keyword 5, keyword 6</keywords>
 <description>Don't you just love pligg?!!</description>
 <category>Category 2</category>
 <pubDate>Fri, 16 Nov 2007 08:07:02 +0300</pubDate>
</item>
</channel>
</rss>
I then save the file and upload it onto my server, whereby importing that file as an rss feed. I then use the keywords_field as a tags_field. It's a lot of extra work (especially if you have a lot of items), but I find it necessary for me to do. It takes me about an hour to do roughly 100 items. I only use one rss feed though, so I am sure this is not exactly the best way for you to do if you have many rss feeds imported, as it would be a lot of extra work.

Sorry I can't be more helpful.
Reply With Quote
  #7 (permalink)  
Old 11-19-2007, 01:10 PM
clems365's Avatar
Casual Pligger
Pligg Version: 9.8.2
Pligg Template: CMS Theme
 
Join Date: May 2007
Location: santiago chile
Posts: 56
Maybe an other way is compare the words in the title with the data tags, for example if the title of the story is:
"The Truth about Your Seasonal Allergies"

a function could find in the data tags, "the", "truth", "about", "Your","Seasonal" and "Allergies".

If this function find an existing word it will put it in the field tag, we can imagine that "allegies" exist in the data tags then in the field tags we can see this world "allegies".

If not, nothing, so we need to update the field tag

I think in a frist time there a lot of work, but next with a big data tags all will be ok and just update some news with some new tags.

I'm sure it's possible to make this mod...
Reply With Quote
  #8 (permalink)  
Old 11-19-2007, 04:56 PM
Casual Pligger
 
Join Date: Oct 2007
Posts: 38
If your trying to get keywords from the title and your title is stored in the variable $theTitle for example

Build an array of keywords you don't want put into tags

Code:
$search = array('the', 'and', 'of', 'or');
convert your title to lowercase

Code:
$theTitle = strtolower($theTitle);
Remove any special characters

Code:
$str = preg_replace('#[^\w ]#', '', $theTitle);
Convert it to an array

Code:
$arr = explode(' ', $str);
Do the search and replace using the $search array created above

Code:
$keywords = str_replace($search, '', $arr);
Unset the empty values in the array

Code:
foreach($keywords as $key => $value) {
	if($value == ''){
		unset($keywords[$key]);
    }
}
Now you should have a clean array of keywords you can use for your tags. If you want to make it a comma delimited list then you can implode it

Code:
$keyString = implode($keywords, ",");
Hope that helps someone.
Reply With Quote
  #9 (permalink)  
Old 11-20-2007, 05:02 AM
New Pligger
Pligg Version: 9.8.2
Pligg Template: Emerald
 
Join Date: Nov 2007
Posts: 29
Quote:
Originally Posted by Rob472 View Post
If your trying to get keywords from the title and your title is stored in the variable $theTitle for example

Build an array of keywords you don't want put into tags

Code:
$search = array('the', 'and', 'of', 'or');
convert your title to lowercase

Code:
$theTitle = strtolower($theTitle);
Remove any special characters

Code:
$str = preg_replace('#[^\w ]#', '', $theTitle);
Convert it to an array

Code:
$arr = explode(' ', $str);
Do the search and replace using the $search array created above

Code:
$keywords = str_replace($search, '', $arr);
Unset the empty values in the array

Code:
foreach($keywords as $key => $value) {
	if($value == ''){
		unset($keywords[$key]);
    }
}
Now you should have a clean array of keywords you can use for your tags. If you want to make it a comma delimited list then you can implode it

Code:
$keyString = implode($keywords, ",");
Hope that helps someone.
Really awesome feedback there, THANKS!

I will see what I can do with it today and post back my results.
Reply With Quote
  #10 (permalink)  
Old 11-20-2007, 08:53 AM
New Pligger
Pligg Version: 9.8.2
Pligg Template: Emerald
 
Join Date: Nov 2007
Posts: 29
The linking to category option may actually be the simplest solution for what I want (I just want RSS imported items to have at least one tag).

I will try this with a feed and let you all know how it goes. In the meantime I will try and construct something more complex along the lines that Rob suggested.
Reply With Quote
Reply

Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
Pligg auto tagging script. kazey Questions and Comments 35 03-07-2011 05:19 PM
I'm looking for : Auto Sitemaps for pligg l2aelba Questions and Comments 12 10-23-2009 02:46 PM
RSS Auto Import? bhatiacane Questions and Comments 6 01-14-2008 02:59 PM
trackback url auto discovery chrispian Questions and Comments 0 04-10-2007 10:54 PM
Folksonomy (Tagging) Yankidank Questions and Comments 7 01-15-2006 02:33 PM


Pligg Modules and Pligg Templates from Pligg Pro Find support on the Pligg CMS Forum - 24 hours a day! Make a donation to support Pligg CMS development