![]() |
| | LinkBack | Thread Tools | Display Modes |
| |||
| Idea for scalable story promotion I've been thinking about the best way to handle promoting a story/article from a queued status onto the main page and I've had a few thoughts I wanted to share with everyone. The current scheme for promotion is very simple. Number of votes passing a defined threshold in the config file (and the story is fresher than X days). While this will work for very low volume sites, it doesn't exactly scale well. In a site with a large number of users, and one would assume proportionately a large number of votes, this breaks down. Our ideal system would be able to quickly determine that a story is of high value and promote it to the homepage based on the frequency of votes as compared to a large sample population. Using some basic statistics we can determine if a story is an "above average performer" and promote it quickly. In order to accomplish this we need to take into consideration several variables, including: 1) Number of stories submitted in a given time period 2) The average number of votes over a given time period for all "active" stories in the queue 3) The standard deviation of votes over a given time period for all "active" stories in the queue 4) The target number of stories to be promoted to the home page in a given time period For the purposes of explaining my ideas I am going to define the time period as one day. We need to calculate and story the answers to some of these questions on a regular basis. Ideally we create a new DB table that stores this information for us to easy lookup. This way we can also track information statistically and show trends. For my plans I plan on implementing a cron job that runs daily that calculates the required and stores them as follows. 1) Calculating the number of stories submitted is trivial. The SQL query I am using is: select count(*) from links where link_status = 'queued' and date(link_date) = '2006-04-16'; 2&3) Calculating the average and standard deviation can be done with the following query: select avg(link_votes) as 'average', stddev(link_votes) as 'stdev' from links where link_status = 'queued' and date(link_date) = '2006-04-16'; 4) This can be set in the config.php file and up to the site administrator. These values will be used to "predict" the future for our new stories. Each story will have a new variable that stores a floating point number. This number is the number of standard deviations above or below the mean (average). We need an additional check running on a much more frequent interval (I plan on using every 5 minutes) to update the items in the database with their new "score" and promote them once they pass a specific threshold. I plan on calculating this as follows: $score = //Z score that indicated if a story is above/below the mean $numofvotes = //The number of votes that a given story has received $stddev = //Standard deviation for the given time period as calculated above $average = //Average number of votes for the given time period as calculated above $numberofstories = //The number of stories submitted in the given time period as calculated above $desirednumofstories = //Setting from config.php The cut-off value determines a "score" threshold, or essentially a percentile that a story must fall into in order to be promoted. I plan on calculating this as follows: PHP Code: select link_id, ((link_votes - $avg)/$stddev) from links where link_status = 'queued' and date(link_date) = '2006-04-16'; Once we have this score we need to decide if this story needs to be promoted or not. This is done by first calculating where in the rank order this story is likely to fall for the day using some basic probability statistics. PHP Code: PHP Code: This method should work well assuming that traffic is fairly stable from day to day. Since we are using the previous days data to predict the current days volumes if a sudden traffic spike is hit it will mean that a larger number of stories will be promoted than desired. This can be mitigated on larger volume websites by decreasing the "given period of time" from a day to something shorter. Additionally this whole idea could be retooled to calculate these variables on a per category level. Some sites might have much higher traffic, and therefore votes for one category than another. I'm currently working on implementing the above for a site that I plan on launching in the near future, and I would love feedback, criticism before I do it. Last edited by jvallery : 04-17-2006 at 02:34 PM. |
| Sponsored Links |
|
Check out the New Modules at the Pligg Pro Shop.
|
| |||
| Minor correction to the rank formula. I think this would be more accurate: PHP Code: |
| |||
| I really like the way you are thinking here... I have also expressed concerns about the promotion algorithm. |
| ||||
| Love it. Great idea and well executed. |
| |||
| Looks indeed a good way to handle high traffic sites! But you should mind about the "karma" which is impanted in Pligg. I think the karma has an influence on the power of your vote (I'm not sure though), but if it's the case, you should include that into your script. Anyway, would you like to share it then? |
| |||
| I'll hopefully be working on it this weekend more and will post the script then. It will be a completley seperate cron script that manages the whole process seperate from the Pligg core. -J |
| |||
| I was thinking about this today, and I came up with a similar situation as Laurent did.. I think it would be very beneficial to take karma into effect.. For a SIMPLE example it would be cool if a story got front page if it hit a certain "credit" point threshold. For example someone who is logged in and has 100 karma points votes for a story, it is given a certain number of credits. (lets say 10) Then a user with 10 karma points votes for the story it gets a smaller number of credits (lets say 1) That means as the site grows you can depend on your top "pliggers" to help move the good stories up, but they are not required. Now I think you could add your equation to the mix to make the forumla a lot better. |
![]() |
« Previous Thread
|
Next Thread »
| Thread Tools | |
| Display Modes | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| I got a great idea for a pligg site. Need someone with coding knowledge to join me. | gragland | General Help | 4 | 06-04-2008 01:06 AM |
| Second URL in the same story. | Infin8jest | Pligg Mods | 0 | 04-11-2006 06:28 PM |
| SEO: Story title in URL | aetjansen | Suggestions | 10 | 02-08-2006 01:26 AM |
| HUGE security concern - anybody can promote story instantly | shane | Bug Report | 2 | 01-20-2006 08:06 PM |
| streamline story submission workflow | cryptkeeper | Suggestions | 0 | 01-12-2006 02:28 PM |




Linear Mode

