Scoop -- the swiss army chainsaw of content management
Front Page Everything News Code Help! Wishlist Project Scoop Sites Dev Notes Latest CVS changes Development Activities
scoop site statistics Feature Requests
By sleeper22 , Section Wishlist []
Posted on Tue Jan 08, 2002 at 12:00:00 PM PST
It seems that every popular site eventually gives rise to the desire to get site usage stats. Due to Scoop's particular design, we should consider having powerful built-in stats reporting.

There already exist many sites that chart the origin, browser, and access time of the front page and possibly other pages within the site. I haven't looked around much, but NedStat seems to work ok.

But what about descriptions of time spent on each page and typical paths taken through the site? Does anyone know of log file analysers that can do the job as well as a simple in-house solution?

I envision a link at the bottom of each page that leads to the stats for that page, e.g.
view stats

clicking that link would lead to something like:
Stats for /special/faq:
median time spent on page: 30 sec
mean time spent on page: 30 sec
Departure paths
Ordered by popularity
pathrate
/18%
/section/__all__12%
/logout11%
/section/sec110%
/section/sec29%
/section/sec38%
/search7%
...

HOW TO LOG
we might want a new table to keep track of visits, logging the series of locations hit within each visit. When should we expire visits and create new ones? E.g. we could create a new visit when the HTTP REFERER is external to the site, or (perhaps better yet) when REFERER does not equal the last logged location for that session id.

Maybe enough information could be culled from access and referrer logs. If so, anyone know any good free programs that would do the job?

< Announcing the hotlist_flex box | Announcing Freedomboard.net >

Menu
create account
faq
search
report bugs
Scoop Administrators Guide
Scoop Box Exchange

Login
Make a new account
Username:
Password:

Poll
best option for site stats in scoop
external log analyser 0%
external hit counter 0%
builtin solution 50%
hybrid external and built-in 50%
don't need stats 0%

Votes: 2
Results | Other Polls

Related Links
Scoop
NedStat
view stats
More on Feature Requests
Also by sleeper22

Story Views
  51 Scoop users have viewed this story.

Display: Sort:
scoop site statistics | 13 comments (13 topical, 0 hidden)
Kinda funny timing. (none / 0) (#1)
by dram on Tue Jan 08, 2002 at 03:26:59 PM PST

One of my friends told me yesterday that Ingenuitas.org needed some sort of site stats, preferably in a box to fill some of the blank space. Since then (late yesterday) I have been looking into this idea a bit. What I was originally thinking was to try and incorporate a script like AWStats or Perl WebStats into its own box.

What I would like to see is a box similar in design to hotlist_flex where there are multiple views in the one box. The default view would be a hit counter for both unique and non-unique hits. Another view could be what percentage of users use which browser. And a third view could be how long the average stay on the site is or what the most viewed sections are or something of that nature.

My only problem with this is that I am just learning perl and I am finding that this little project puts me in way over my head. I think I am still going to try and work on making this box, most likely at first it will be something very simple. But if people have ideas on what should be included or how to go about doing anything I have mentioned here I would like to know them.

-dram
[Ingenuitas.org]



For my money, I'd keep it out(side) of scoop (5.00 / 1) (#2)
by hillct on Tue Jan 08, 2002 at 03:32:31 PM PST

Everybody loves stats. Stats are your friend. They tell you how popular you are. It'd be interesting to see per-story stats, perhaps, but it seems to me that such a system should run independantly of scoop because the most efficient source of stats data is that generated by apache itself (unless you want to write a log handler which I presume you could do with mod_perl if you wanted to)and by using an external datasource you've already changed the scoop operational model.

For my money, products like Webalier, when carefully configured, profice all the site data I could ever want, but it might be interesting to see data on particular stories or other types of scoop-centric constructs so to this end, I'd suggest writing a simple script which grabs all the story IDs out of a scoop DB and greps apart an apache accedd_log then passes it to your favorite log parsing program, or if you really wanted to do the statistics yourself, you could parse the log file any way you like.

The nice thing about apache log files is they're extremely flexible and powerful. There really is no reason to re-create that fucnctionality since it will be built into any system where scoop is deployed anyway. I would suggest this as a guideline when considering scoop statistics: "Can I provide this statistic without recording the statistical data inside scoop?" If the answer is yes, then the statistic is probably apropriate for inclusion within scoop. If the answer is no, then consider generating the statistic using externally available data before adding logging functions to scoop.

That's my 2 cents
--CTH


--
ScoopHost.com - Premier Scoop Hosting and custom development from the lead developers.


exploring our needs (none / 0) (#3)
by sleeper22 on Wed Jan 09, 2002 at 07:17:20 AM PST

i agree that at this point we should let existing log analysers do the job that they can do. Now what about the need for additional stats functionality built-in to scoop? To begin an outline...

A. Potential features
1) Feeding a log analyser subsets of the log files (a good idea!)
a. comparing stories
2) Making a box that shows very select information about the current page, for usability engineering purposes.
a. mean and median time spent on the current page
b. top links from the current page
c. percentage of entrances to this page (coming from outside the site)
d. percentage of exits from this page (leaving the site)
3) general stats, such as those from webalizer, but in a simplified form suited to a box.


B. Considerations for making a stats box:
- desired stats are succinct, conveniently located, and relevant to the current page
- flex box can support all features mentioned in A.


C. Considerations for adding more logging functionality to scoop:
- extra logging would slow down scoop. any idea how much?
- we can leverage username (or sessionid) allows us to identify "visits" better than log analysers. This is important for the features of A.2.

--
babelguides.com <<world literature in translation>>


Why add overhead? (none / 0) (#9)
by panner on Wed Jan 09, 2002 at 05:02:11 PM PST

Apache is already writing a perfectly good access log out every single request. Instead of wasting time on each request stuffing extra data into a database, just add it to the access_log using mod_log_config (which I'm sure you're already using).

Somewhere in scoop, during the request (a box would work for this, even), put whatever you want to log into Apache's notes table (using $S->{APACHE}->notes). Then in your httpd.conf, where the log format is setup for what you use (probably combined, though I think common is the default), you'd add them in using the %{NOTE}n syntax.

Any number of fields could be added to the access_log this way, or a log could be created that is made from various notes set by scoop. But with this approach, the two logs would have to be combined by whatever parses them to get all the data.

Also, in any case, a parser will have to be written specifically for the extended format, since I'm fairly sure none are customizable to the point that they can work with scoop's data. But then, maybe some are, I'm not sure.

In any case, keeping stats in the database isn't a good idea when scoop is already way too database-intensive, especially when Apache already provides an easily-customized logging facility.



--
Keith Smiley



here's a working demo (none / 0) (#12)
by sleeper22 on Tue Jan 15, 2002 at 10:57:59 AM PST

You can try out the page stats box i just made, at a demo site (log in as guest/guest, who is an editor -- i've designated in a Var that editors can see the box). I haven't done much testing, so let me know what you find, as well as any opinions about usability, more features, etc.

A cron function parses the httpd access log, which should have a modified combined log format that includes the cookie string. Currently the location of the log (CustomLog directive) and it's exact format (LogFormat directive) are specified in Vars, since i don't know any way to access those values thru mod_perl.

The added field to the log is the only addition to scoop's logging.

The box currently makes a total of two selects, on one table each.

Here relevant Vars, as well as the new table structures, are indicated by the statements below.
The stats box code can currently be accessed here, and the cron function is in this version of Cron.pm. But like i said, it needs testing.

INSERT INTO vars (name,value,description,type,category) VALUES ('stats_skip_groups', '', 'List of groups to ignore when calculating page statistics. E.g. Superuser, Admins, Anonymous', 'text', 'Stats');
INSERT INTO vars (name,value,description,type,category) VALUES ('stats_pageview_timeout', '30', 'Number of minutes a page viewing can take before being ignored by page statistics calculator', 'num', 'Stats');
INSERT INTO vars (name,value,description,type,category) VALUES ('stats_logformat', '%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" "%{Cookie}i"', 'Format of log, as specified by the Apache directive', 'text', 'Stats');
INSERT INTO vars (name,value,description,type,category) VALUES ('stats_logfile', '', 'full path of log file (should be access log)', 'text', 'Stats');
INSERT INTO vars (name,value,description,type,category) VALUES ('stats_displayto_groups', 'Superuser, Admins', 'List of groups that can see the stats box', 'text', 'Stats');
INSERT INTO vars (name,value,description,type,category) VALUES ('stats_links_number', 6, 'How many of the most popular links should be shown in the list', 'num', 'Stats');

CREATE TABLE page_route (loc1 TEXT, loc2 TEXT, label TEXT, period CHAR(1), count MEDIUMINT UNSIGNED, percent TINYINT); CREATE INDEX per_loc ON page_route (period,loc1(20));

CREATE TABLE page_views (loc TEXT, period CHAR(1), count MEDIUMINT UNSIGNED, elapsed MEDIUMINT UNSIGNED, avg TINYINT UNSIGNED); CREATE INDEX per_loc ON page_views (period,loc(20));

INSERT INTO cron (name,func,run_every,enabled) VALUES ('stats', 'cron_stats', 86400, 1);


--
babelguides.com <<world literature in translation>>


scoop site statistics | 13 comments (13 topical, 0 hidden)
Display: Sort:

Hosted by ScoopHost.com Powered by Scoop
All trademarks and copyrights on this page are owned by their respective companies. Comments are owned by the Poster. The Rest 1999 The Management

create account | faq | search