Saturday, April 30, 2005

Bugathon 2

Usually London's weather is perfect for programming. But yesterday the sun came out and Meg and I decided to take the afternoon off!

Here is where we got to:
  • Tested the BETA version of the IE TrailBar - it's looking good!
  • The IE TrailBar and Firefox TrailBar are very close to completion
  • Designed the download and installation process
  • The distributed TrailBucket subsystem caused havoc with the WebTrailBar which now needs to be refactored
  • We discussed the brief for the graphic design of the backend pages. Meg's partner Marty, has helped us with the graphic design of all our products (e1mail, Zoomtickets.com and Turbo10) and has offered to take a look at Trexy this weekend. We're excited about seeing what designs he comes up with!
So, here's what's left to do:
  • Final testing for IE and Firefox TrailBars
  • Implement and test the download process
  • Refactor and test the WebTrailBar
  • Create templates of Marty's design
  • Fix the look and feel of backend pages and forms
  • Alpha Testing phase for IE and Firefox
  • More Testing!!

Thursday, April 28, 2005

Turbo10 Trails

It's really ironic that trailblazing works on just about all search engines except for our favourite metasearch engine: Turbo10.com! This is due to the complicated mechanics behind the scenes of Turbo10.

We want Trexy to become a universal searching memory that works with as many engines as possible. This means you don't get locked into using any one search engine and you can safely and anonymously share your trails with others - across all the engines you use!
So today, we're doing indepth debugging of trailblazing, including complex search engines like Turbo10 - fingers crossed we can get it to work. However, there will always be a small number of engines that for whatever reason we can't connect to!

Wednesday, April 27, 2005

Distributed Trail Bucket

Things went well today. Meg and I created a 'distributed trail bucket'.

For the last six months I've been tossing up the best way to effectively distribute Trexy's database load across the cluster.

After discussing the pros and cons we finally decided to bite the bullet and settled upon a distributed RAM bucket for storing incoming trail steps. Scalability and speed is crucial to this part of the system. When you blaze a trail each step you make lands in a temporary RAM bucket on one of the servers in our cluster. Soon after that the bucket is emptied into the database. By distributing the trail buckets around the cluster we share the load, guarantee speed and responsiveness while protecting the scalability of the overall system.

Phew! So what does all that mean? Basically we'll never again have a major meltdown moment!

Bugathon 2 - the Sequel

I'm getting psyched up for our next bugathon starting today!

Over the next three days we're taking on the following bugs:
  • Internet Explorer TrailBar - installation, packaging and testing
  • Consistency in interface presentation and usability across all TrailBars
  • WebTrailBar - lots of bugs here!
  • Various Javascript, DOM, Trailblazing bugs etc.
  • Distributed TrailBucket flushing subsystem
  • Test Delete, Share, Unshare MyTrails - and global index updating
  • Fix site presentation - graphics, text etc
These are the final things we need to complete before the BETA release.

Tuesday, April 26, 2005

Alpha Feedback

We've got some great feedback already! As a result we will certainly be making changes.

We need to clearly communicate what the engine does. For first time users it's a bit of a downer to search on 'My Trails' and get no results, then to search on 'All Trails' and get not-so-good results, to arrive at 'Blaze a New Trail' and wonder what's the point? This was one of our alpha tester's experiences.

Trexy starts with a database of over 700,000 All Trails but nontheless some first time searchers will be disappointed by the All Trails results. A high proportion of first time searchers do a 'vanity search' on their own name or website. This is fine if your name is Britney Spears, or your website is Amazon.com, but a lot of first time searchers may be surprised to see there is no relevant trail for them.

The challenge is to quickly get across how Trexy works, its benefits, and how the system naturally fits with their current searching habits. Trexy encourages searchers to still use their favourite search engines (i.e., google, yahoo, msn etc) to answer their search questions - but our service helps answer a whole new set of questions:
  1. "Have I found that before?" - Search My Trails.
  2. "Has someone else found it before?" - Search All Trails.
  3. "Ok. I'll go find it once and for all." - Blaze a New Trail.

Sunday, April 24, 2005

Vannevar's Trailblazer

Trexy's design is inspired by a visionary paper written way back in 1945, "As We May Think" by Vannevar Bush. Bush describes a machine called a Memex that augments your memory and searching powers by helping you to create and share 'trails of association' between things in the 'common record' (the Internet).

In 1945 this was a wild concept but since then lots has changed. Bush has influenced Ted Nelson (hypertext), Tim Berners-Lee (www) and many others to realise his vision of an interlinked 'common record'. However, there are still major parts of his functional specification missing. Trailblazing, for example, is a crucial, yet largely overlooked part of Vannevar's invention:
"There is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record. The inheritance from the master becomes, not only his additions to the world's record, but for his disciples the entire scaffolding by which they are erected."
"As We May Think" 1945.
Trailblazing needs three things to work. First we need to navigate the 'common record.' Thanks to Ted Nelson's idea of hypertext, and Berners-Lee's implementation, the WWW provides an interlinked common record. Second, we need a way to remember useful trails of association through the common record. Third, the system should enable us to share our trails with others.

Trexy is designed to add the missing parts of Vannevar's specification. We carefully protect your privacy while enabling you to safely and anonymously share you trails with others. This is what we mean by trailblazing.

Saturday, April 23, 2005

Alpha Release

Meg and I are pleased to announce the alpha release of Trexy.

This release includes:
  • Trailblazing on Google with the Firefox TrailBar
  • All Trails search
  • My Trails search
  • Blaze a New Trail
  • Preferences
To download the Firefox TrailBar click here: http://trailbar.trexy.com/trailbar.html.

Friday, April 22, 2005

Day 4 Bugathon - Brain Drain

Meg and I are really tired today ... that was one of the most intensive debugging/coding sessions I've ever done. The last bugs are amongst the nastiest to fix and they're often buried deep in the code.

Here is where we've got to:
  • the trail index has been completely rebuilt and is optimised
  • response time for an All Trails search is now less than 500 ms! :-)
  • the Firefox TrailBar is working well and is ready for release
  • the preferences subsystem is completed
  • the Web TrailBar has been refactored but needs more testing
  • the bucket clearance system needs further testing but will work very soon
  • the system successfully moved from development to staging
  • the IE TrailBar needs more development and testing
Next week we're planning another bugathon - stay tuned for more developments. Even though we're tired we're also really excited that an ALPHA version is really close.

Thursday, April 21, 2005

Day 3 Bugathon - 99% Finished

It seems with software the last 1% takes 90% of the time!!??

Meg and I had a BIG day killing bugs yesterday - but we're still not there yet!

Stay tuned. It's day three of the bugathon and we're off and running again ...

Bugathon Day 2 - Completed

At the end of day two of the Bugathon Meg and I finished off testing and coding the following:
  • Firefox TrailBar event handling
  • Firefox TrailBar preferences handling
  • No result, pagination, and speed testing for My Trails and All Trails
  • Blaze a New Trail result page presentation
  • Web TrailBar interaction with Firefox
Some of the code is almost three years old so we had to do a bit of archaeology to get to some bugs. The nastiest bug award goes to a missing right parenthesis ')' that cost us forty minutes to find!

Day 3 we need to:
  • Prepare the Web TrailBar for ALPHA
  • Finalise IE TrailBar
  • Fix Trail Bucket handling

Wednesday, April 20, 2005

Still Running

Day two of the scheduled bugathon is soon to begin. Nige and I had a good day of bug killing yesterday.

We fixed the page links at the base of the All Trails results. This allows you to jump to any page in the result set.

I know sometimes if I can't find what I am looking for in the first or second page of results, I'll click on a page towards the end of the list. This is known as the 'random jump' where a user will randomly select a high page number to try their luck before typically typing in a new serch query.

Result pagination is now working on All Trails. We now need to fix this on MyTrails. In order to create a trail and assign a user ID to the trail, we need to get one of the toolbars fully working. This is where we'll start today.

Tuesday, April 19, 2005

Bugathon

Today starts a three day scheduled bugathon. Just like the runners at the London Marathon on Sunday, there's no stopping until we reach the finish line and all the bugs are killed.

The first finish line of our bug run is the Alpha release. We are aiming to have the alpha version ready by the end of this week. We need to get the toolbars in place and get the web bar ready too so people can start blazing their own trails.

Monday, April 18, 2005

This Saturday - ALPHA Release

Our ALPHA release date is looming: 23rd of April 2005.

This week Meg and I are going on a big bug run and doing last minute development in preparation for the ALPHA release this Saturday!

We've already selected a top team of alpha testers who will give Trexy its first public workout. Fortunately we have lots of interested friends and family we can call upon. The alpha testing group includes professional software testers, programmers, academics, non-programmers and search engine newbies. We rely on the alpha testing group to give us frank feedback. This feedback is crucial in planning the BETA release.

We keep getting asked by business partners, friends and family when is Trexy coming out? And it seems like we're forever saying "soon, soon ...".

But if the ALPHA release goes well the public BETA release will be soon!

Sunday, April 17, 2005

Seven plus or minus two

Part of making sure our search engine 'cognitively fits' means we have to respect how we're all wired up.

We all have limited mental bandwith (for the moment) and a search engine should help you navigate through the morass of information by helping to manage your cognitive load.

In George A. Miller's classic psychological study (
The Magical Number Seven, Plus or Minus Two) he found that we have a cognitive capacity for seven 'chunks' of information plus or minus two. We've all had to grab for a pen when taking down a phone number that's too long. Miller's, 'seven plus or minus two' rule explains why.

What does this mean for the design of Trexy? It's tempting to follow all the other search engines and show ten results per page. But Miller's rule suggests we're more comfortable handling less than ten things at a time. Imagine driving your car and coming to an intersection with ten exits!? It's crazy.

This is the reason why we only show five trails per search result - so they can comfortably fit - cognitively speaking!

Saturday, April 16, 2005

Can you smell G.A.S.?

Meg and I had a laugh yesterday. She came back from speaking at the Search Engine Strategies (SES) Conference in Germany with a new acronym: G.A.S. - Google Anxiety Syndrome - the fear of not ranking highly on Google.

Apparently the smell of GAS was palpable at the conference! It seems that web marketeers the world over are obsessed with their Google result rank and SEO companies are cashing in. Hang on, what does S.E.O. stand for? Search Engine Optimisation. And now SEO is a mini-industry in itself.

SEO gurus are constantly inventing ways to manipulate Google's result page so their client's website ranks highly. It seems they're winning. Have you searched on Google lately - found yourself caught in an SEO link farm? Unfortunately Google has been a victim of their own success and their index is under siege by SEO link farming, link spamming, keyword stuffing, proxy linking, cloaking, phreaking and other clandestine SEO techniques. And this is just the beginning!

Although this is bad news for Google, it's good news for us.

Trexy is inherently resistant to manipulation by SEO companies. We only show search trails generated by real searchers - not bots - real people. We want trailblazers to provide the 'authority' when it comes to deciding what's relevant on the Web - not an SEO company.

Large scale automated SEO techniques will not work on Trexy. But our system will still experience some problems - we expect a minority of people will leave 'vanity trails' that link to their own sites. To combat vanity trails our Goat Trail algorithm weeds them out - if a trail is not authoritative it fades - and a vanity trail will fade quickly. If a trailblazer abuses the system by leaving excessive vanity trails we will remove their trails from the 'All Trails' index altogether.

Wednesday, April 13, 2005

Viele Gruesse aus Muenchen!

I am at the Search Engine Strategies conference in Munich at the moment. Yesterday, I gave a short presentation about our metasearch engine, Turbo10, in the session: 'Neue Entwicklungen im Deutschen Suchmarkt.'

The purpose of the session was to introduce new search engine technologies and the speakers included managers from: MSN, InfoSpace, Seekport (previously Infoseek) and a new engine Neomo.

We only had 10 minutes to speak about our respective technologies. I introduced the concept of the Invisible Web which in German is - Das unsichtbare Web. This refers to information on the web that is contained in dynamic webpages such as online databases, which traditionally search engines have not been able to index. I also took everyone through a whirlwind tour of how users can create their own collections of engines to search the web with. Everything you need to know can be found here: http://turbo10.com/collections1.html

Would have loved to have introduced our new product Trexy. We are aiming to have a beta version by the 1st of May. I have, however, mentioned to a few journalists that we have something exciting waiting in the wings!

Tuesday, April 12, 2005

Psychology of Search - Once and For All

We've designed Trexy so it 'cognitively fits' on an individual scale and collectively.

At the moment, we search, sometimes find, and often forget. What did I search on again? Where did I find that? How many times have you gone back and searched for the same thing? Did you know that one of the most popular search queries is 'hotmail'!?

Our bookmarks/favourites list is so long we need a search engine just to search it. Browser history? No thanks, give me a search engine. And we're off again to the nearest search engine with our 2.5 search terms hoping to get lucky.

Here's how we want Trexy to 'cognitively fit' into this process:
  1. "Have I found that before?" - Search My Trails.
  2. "Has someone else found it before?" - Search All Trails.
  3. "Ok. I'll go find it once and for all." - Blaze a New Trail.
This models how we search in real life. We first consult our memories - "do I know the answer to that?" If we don't, then we ask someone who does. If that's too hard, or they don't know, we go looking for the answer ourselves.

This is how we naturally search and Trexy models this process. Simple, isn't it?

Sunday, April 10, 2005

The Goat Trail

I spent my first five years at university trudging from the carpark (known then as the 'dust bowl') up a winding path to lectures. The winding path, colloquially called the 'goat trail', was etched into the grass thanks to the collective unconscious of all the students rushing to lectures - a planner could not have designed a more optimal route.

Around November each year the goat trail would inexplicably change route - it suddenly diverted around a large pair of jacaranda trees. The jacaranda trees at the University of Queensland flower beautifully and smell even better, but local legend has it they are deadly to students - especially around exam time. The legend warns that if a jacaranda flower lands on your head you're certain to fail your exams. New students who had never heard of the dangerous jacaranda flower were spared failure thanks to following the goat trail.


It must have been while walking along the goat trail after a philosophy lecture by Professor Priest (a long-haired member of the 'Hells Logicians' motorcycle club) that I marvelled at the high technology of a simple path on the ground - a meme filtering machine.

The goat trail has influenced Trexy in many ways. When we chose a mascot for Trexy we considered other trailblazing animals like ants (too ugly) and snails (too slimy) but it really had to be a Goat. The ranking algorithm that distills the optimal trail for a search is modelled on the simplicity of people finding an optimal route from points A to B - a 'goat trail'. But it needs to be smart enough to guide you past the flowering jacaranda trees too!

Saturday, April 09, 2005

The Need for Speed

A search engine must be fast - ideally we need a sub-second response time.

We time every search and you can see the time taken in the footer of the page. If ever you see this number above 2 seconds please contact me: nigel@trexy.com. "Tim the Timer" one of our dutiful robots also helps us monitor speed.

Trexy has different storage and retrieval requirements to a traditional inverted index based system (e.g., Google, Yahoo etc). But we do share the same speed requirements - as fast as possible, please!

Recently I've been working on making sure we stay sub-second - this means optimisation, optimisation, optimisation! But the fateful words of the legendary computer scientist CAR Hoare have been ringing in my ears:
"Premature optimisation is the root of all evil."
And normally he's right. But for us:
"A slow search engine is even more evil."
So it's back to the optimisation battle ... next time you search have a look at the footer of the results page to see how the war is going.

Sunday, April 03, 2005

Where do Trails start?

Trails start when you enter a search query into a search engine.

A trail is the path from what you're looking for to what you find.

At the moment we express what we're looking for in about 2.5 words by plugging them into the nearest search engine. It's testament to the cool computer science behind the scenes that out of the 20,000,000,000+ documents out there anything relevant comes back at all!

So this is where the trail begins - 2.5 words and we're away. Diving into billions of documents:
Browsing like mad through the results, scrolling down the page. Is that a spam link? Hmmm. That one looks dodgy? Ok, I might follow that one ... I'll just keep looking to the bottom of the page. Scroll. Scroll. This looks better. Ok. Oopps. No. No. No. That's not it. Better go back. Back. Back. Back. Where was the good one again? Is that it? That's it. Ok. Finally found IT!!
And that's only if you get lucky!

Browsing is a big part of the time spent searching which is why much of Turbo10's design is aimed at making browsing faster.
Trexy is aimed at not wasting all the browsing effort that millions of human searchers put in every day. We can all benefit from each other's searching and browsing efforts, provided your privacy is protected - this is the idea behind All Trails.