Numb3rs -- How to calculate future hits on your website or blog

Rough, Smooth or Woven Palm Tree Trunks

I don't know why people Google the terms that they do. However the fourth most popular Google search term on this blog has to do with rough versus smooth and/or woven palm tree trunks. I don't why. This little factoid came as a result me being bored at the airport. Let me explain.

I was looking for something to read on my return from a business trip to this archipelago in the sun. I don't like reading fiction any more, so I chose a book called Super Crunchers by Ian Ayres. This author has had almost as much impact on me as Ray Kurzweil and his writings on Artificial Intelligence. Once I started reading it, I couldn't put it down.

Ayres contends that today, data is the name of the game. Data mining and statistical analysis has suddenly become cool. I have decided to become a super cruncher myself, and the data set at hand, is this blog. I am fortunate enough to have had a page view explosion of hits, largely due to posting pictures of a ghetto wedding reception at McDonalds here in the islands. There were other factors as well, including some links to my blog in the days before the jump that contributed to the rise.

According to Technorati, 7.4 million blogs were updated in the past 120 days. Before I had the explosion of hits, I was ranked at 2,631,291. I was in the top 35% of the blogsphere. However, within days, I had jumped 29,858 spaces and now I rank at 2,601,433 and climbing with a bullet. I had decided to crunch the numbers, and do some regression analysis.

The purpose was to come up with a formula to calculate the number of hits one could reasonably expect on a day to day basis if they came up with a popular blog or website. This would be for a 20 day basis that would describe the hits pattern as the word got out on the Internet.

This formula could be immensely valuable if one is doing a business plan and wants to predict pro forma advertizing revenue for a website based on the number of page views. Recently we went through a business plan exercise and wished that we had some way of predicting some of the data on which we had to base revenue projections. Or this formula could be used just for fun.

This would be a valuable contribution to internet mathematics and prediction, because this is a brand new idea and a brand new field. When you google "internet hits calculator" or "page view calculator" you get nothing relevant. This is exciting for me -- breaking new ground. Maybe this will become The Cosmological Cabbage Constant, or The Cosmological Cabbage Hits Theorem, or something famous like Fermat's Last Theorem. Einstein had his e = mc^2 and I have my y = m * 0.6787 * x^1.87972 .

So without further ado, let's get on with the math and prove that my university statistics courses were not a complete waste of time and money. The first step was to see what kind of graph my initial data set made. Plotting the data points looked something like this.

It was obvious that it was not a straight line, or linear. It was more exponential, so I had to use exponential regression analysis. Actually the exponential phase went on for about twenty days (which is about double the half life of interesting pages on the internet), so this formula is only good for twenty consecutive days. I could and will do a quartic regression to show the levelling off curve once I get enough data points, but for now, my formula is just a predictor for where you will get in page hits after 20 days. So x(max) = 20. After that point the hits will start levelling off and might even start to drop. That will be the topic of a later blog article.

I won't bore you with the math, but I did come up with a regression formula after super crunching the data set. The formula looks like this:

y = the resultant calculated number of hits per day,

m = the average number of hits per day that you had before the explosion

x is an element of the set {1,2,3, ....20} where 20 represents the twentieth day after a blockbuster web page or hit blog article went on the internet.

You don't need a scientific calculator if you have Exel or Open Office Calc. Simply open a sheet. The first column is the X or day number. In this case, we want to calculate what the hits will be on the 14th day. M is the average number of hits that you were getting. For the example, it is 250 hits per day. The cell A4 was put in formula form so that you could change the number 14 and see different calculations.

After you have entered the formula, hit return and you will have your answer. In this case, if you start with an average of 250 hits a day, and all of a sudden you become very popular because folks are coming to your web page via email, Facebook, Google, AOL, other website links, then you should see your page views go to around 24,211.

How good are the numbers. The correlation coefficient, r = 0.942378 which defines how good the predicted value is.

The whole premise of Super Crunchers is that exercises like this are relatively accurate in predicting complex phenomena based on the data. Even though the hits do come from Facebook, email, search engines and a plethora of other sources, I do believe that the behaviour of website hit creep on the internet can be modeled by my formula. I am waiting for the Stockholm Committee to come calling to give me my prize in Internet Matheology -- the topology of Internet Hits. It is a brave new world out there.

No comments: