pancakes

MicrostockGroup Sponsors


Author Topic: System Update  (Read 2867 times)

0 Members and 1 Guest are viewing this topic.

Istock News

« on: September 26, 2007, 13:52 »
0
The iStock IT infrastructure is very large and complex.  We have about 45 database servers and over 35 webservers spread across two datacenters.  We have fairly large Fiberchannel and Ethernet networks.  The code base for the site is probably close to a hundred thousand lines of code - with thousands of database queries.  All this stuff needs to work together properly to generate a web page for you.

Sometimes there is not just a single problem - but a thousand things - that together form a system that is not quite up to the task.  Some of these little problems are easy to fix - and give good returns by fixing them.  Obviously these are the ones you try first.  Some are hard to fix and give little returns.  Some are hard to fix and give you big returns.  You do not always know how much better things will get before you try something - you just know things will be better.

Over the past few weeks, we have been working on these problems, a few at a time.  We had a bottleneck in the network: We fixed it.  Once we fixed it, the site got better for a while, then a cluster of database servers got overloaded.  We found the worst queries on this cluster that were taking the most time - and fixed them.  The site got better for a while, then the master database server got overloaded.  We found the queries that were taking the most time on the Master.  We fixed them - then we started getting a locking problem within MySQL on a simple query due to the sheer number of database updates we were getting.  We created a work around to avoid these locks.  Then we ran into another locking issue.  We created a work around to avoid the second locking issue - and ran into a third.  We did a work around for the third, and ran into a fourth.  We have upgraded servers as far as they can be upgraded and doubled the amount of servers that we have (some portions of the site can be spread across multiple servers - others cannot).

Some site features were creating too much load on the servers - so they were disabled until they could be rewritten.  The view count for files was one of these cases.  Updating a counter seems like it should be easy and fast - but the way the data was stored, and the rate at which people view files, this placed critical strain on a key database table.   In order to fix this,  we had to change the back end database structure and all code that dealt with it.  We knew that this was an important feature for many contributors - so we got to work rebuilding the database and the code to operate in a way that did not negatively affect the performance the entire site.  The views that happened while this feature was disabled are gone forever - we were just not able to capture that data fast enough.  What you see now are the views from before it was disabled plus the views from after it was re-enabled.

Through out these trials, the changes that we have made have greatly expanded our capacity.  When things are running smoothly, we serve far more requests now than we did a month ago.  But, when traffic exceeds a threshold - it bogs down - like a freeway in rushhour.  The trip that should take you 15 minutes now takes you an hour.  It doesn"t matter if the highway was expanded to handle twice the traffic - if you have twice the traffic + 1, the system breaks down and you get gridlock.  The problems within MySQL that we are seeing these days are like an accident in the ditch - that causes gridlock where the freeway should have more than enough capacity for the demand.  Sometimes, when this "gridlock" occurs, the servers recover on their own.  Other times, they do not.  Then we have little choice right now but to shut down the site - to clear the roads and let people drive at full speed when we come back.  Unfortunately, sometimes when we are forced to do this - we run into another MySQL bug and the server crashes - taking up to 45 minutes to recover.  This is what happened this morning.

The changes that we have been making the past few weeks have been adding capacity to the road - and working around the limitations of our current architecture and software.  We will find the right combination of fixes that will buy us enough capacity to make it to the next phase.  Have we made it there yet?  Obviously, we have not.  Even though the site is far faster - and the servers are far less loaded under normal conditions, we still periodically get overloaded with table locking issues.  I will never say that our problems are 100% solved.  (Recent) History has shown us that once you fix one problem, another one will come and take it"s place.  It might be in 20 minutes, it might be in a week or it might be in six months.  The changes that we made yesterday look promising for speeding up the system, buying us performance headroom and reducing the chances of deadlocks within MySQL.  But there are still portions of the site that are susceptible to the locking limitations of MySQL.  We will be able to work around them - but we are * with critical internal portions of the site and the deep dark guts of MySQL.  It takes time to create and test these work-arounds.  

We are still working on creating workarounds, optimizing queries, re-factoring site features, and upgrading hardware and infrastructure.  We are not sitting around waiting for the MySQL company to solve all our problems for us.  Nor are we doing this all on our own.  We are using MySQL support to help diagnose and solve the internal MySQL problems that we are facing.  We are using the leading MySQL consultants from outside of the MySQL corporation.  We are talking to other consultants, software vendors and hardware vendors for other solutions that could help us out.



Then there are solutions that promise huge returns, but would require re-architecting the entire system; rewriting a very large portion of the code and database queries while deploying a new server infrastructure.  There were reasons why these approaches were not taken seven years ago when the company was starting - with 5 people running a hobby site on a second hand server.  Now that we are the size we are, it is impossible to switch architectures quickly.  To change database technologies, or to use MySQL in a fundamentally more scalable way would require rewriting / retesting pretty much the entire site.

Can we continue to grow indefinitely with MySQL?  Absolutely.  We have looked at how other large sites using MySQL have re-architected to handle much larger loads than iStock currently gets.  We can see how this would fit into our system - but we cannot re-architect / rebuild the system over night.

Can we switch to a different database technology.  Sure we can.  It would take at least the same ammount of effort as re-architecting with MySQL.  SQL is supposed to be a standard, but there are enough differences - especially in optimizing for such a large system that we would have to review / test pretty much every line of code.  Large commercial database systems may avoid some of the bugs and problems that we face with MySQL - but they would come with their own set of problems and limitations.  No system is perfect.  We would be trading our current set of problems for a new set of problems.

While these large projects are not fast or easy, they are essential to our long term future.  We are not just hoping that our current fixes and optimizations will get us by forever.  I am not going to tell you here which approach we are taking - except to say that we are looking forward to growing the site ten times, fifty times, five hundred times the size of what we are today.

      


« Reply #1 on: September 26, 2007, 14:30 »
0
Thirteen hundred and fifty two words. Wow.  :D

« Reply #2 on: September 26, 2007, 14:40 »
0
i'll say!!
it is nice they are keeping people posted though.  :)

« Reply #3 on: September 26, 2007, 15:09 »
0
Not that I really care much but the upload limits have been reduced again (15/week for bronze).

This doesn't matter one bit to me (since I won't fill up the 15 either) but I was wondering if any of you are really maxing out the limit every week?

And if iStock buyers go somewhere else it is not particularly bad for me since I am on most of the other sites (and they will eliminate the competition from exclusive contributors ;D).

I am curious what is going to happen in the long run (i.e. few months)


 

Related Topics

  Subject / Started by Replies Last post
System Update

Started by Istock News Microstock News

3 Replies
3171 Views
Last post October 11, 2007, 20:13
by le_cyclope
3 Replies
4435 Views
Last post November 21, 2008, 15:21
by madelaide
4 Replies
2981 Views
Last post July 13, 2012, 07:30
by Poncke
50 Replies
22811 Views
Last post June 12, 2016, 07:54
by StanRohrer
2 Replies
1195 Views
Last post August 04, 2023, 22:43
by gnirtS

Sponsors

Mega Bundle of 5,900+ Professional Lightroom Presets

Microstock Poll Results

Sponsors