Upload is such fundamental aspect of any web service. And still, there’s no, simple, easy solution out there… Well, it might seem I am exaggerating it but for such basic functionality, if you add the ‘resume’ upload expectation, you should see the problem I am referring here…
So, for us @ krossover.com upload process has been a challenge. It’s been so for number of reasons:
- The uploaded content is videos. Raw, uncompressed, videos are uploaded usually from the video-cam directly or from DVD. On an average, upload content size is around 4 to 5 GB
- We have always focussed on providing the service as web application (no installations, just browser, as our users are Coaches who use school devices where they may not have administrative privileges to install applications) and do not have any desktop application, that can help out with upload and resume upload process. Though desirable and maybe at some point, we might end up having one but we got none for now
- We cannot really control the upload speeds at user’s end. And most of the Internet Service Providers out there, provide an extremely low upload bandwidth
- So uploads take hours. And the laptops, desktops or any other kinds of machines used for uploads, may go into sleep / hibernate (especially if you are on windows) or may just be out of battery power (in case of laptops). For umpteen number of reasons, the upload process might break!
- Hence we needed some ‘resume’ upload functionality. We did have it. Or did we? Well, if it were an upload of 5 files and upload broke on 3rd, next time, you would need to upload only the 3 remaining files. Though, 3rd had to be started again. And you had to notify the user about failed upload and user needed to select the remaining files and re-initiate the upload for those remaining files
- So after spending an hour or more at uploading, if it broke for any reason, it’s quite frustrating for the end user to bother about broken upload and figuring things out to be able to resume
- while upload happens, user cannot go to any other page. we tried to solve this problem by using a small pop-up, that gets triggered when user starts the upload process. Who doesn’t hate pop-ups?
All above, is quite complicated! You would hate it that your end users have to put up with it all. Browser / HTML, does not help you out to be able to come up with some solutions. Well, HTML5 now supports Web-sockets that might help, but it’s still not yet supported by all browsers.
Hmm. So what can be done? Use embedded, run-time components implemented in — Java or Flash or Silverlight, and implement resumable upload functionality! That now adds lot of complexity to this fundamental upload process. We’ll see how. I am just going to go ahead and describe how we did it at Krossover.com.
For some good reasons, one being wide use and acceptance, we chose Silverlight. In fact, we had started with an open-source library — Plupload during our first roll out. But since then, we have revisited the upload code-base so many times and tweaked it so much, that now, it’s turned out to be totally different and independent from it. So, we started by using the Silverlight upload solution provided by Plupload. It’s a great, ready to use library out there. It has many great features, you can checkout their site. But it did not have ‘resume’ upload functionality, which was so very crucial for us.
Well, for an upload to be resumable, few things need to happen:
- Save the start-state of your upload, that has all information about the upload, even before upload starts
- Start the upload and constantly update the upload-state information, when a part of the upload is complete
- In case upload broke and upload has to be resumed, implement a mechanism at the server-side that:
- detects automatically, that there is a broken upload for a particular user, when that user revisits (be user-friendly)
- provide server-side service (api) that can be called to provide information about the upload-state of a particular upload
- trigger the resume upload process
- Once the resume-upload process is triggered, it needs access to the content, that did not get uploaded completely. Browser does NOT allow that, for obvious security reasons. So you need to tackle that problem
- Finally, after having access to the content locally on user’s machine and getting the upload-state from server-side api, resume the upload process
- In return, the server-side api provides a GUID i.e. the upload identifier, for this particular upload. The upload-state information gets saved and identified by the guid henceforth
- The Silverlight component actually spawn 2 Silverlight worker threads — 1st to upload the content to the remote server chunk by chunk and 2nd copy the content locally into the user’s ‘Isolated Storage’, where Silverlight has access permissions (use of Silverlight’s Isolated Storage has been our answer to overcome the limitation that browser puts forth against direct access to the user’s system)
- The local, isolated storage copy is done wisely. It involves copying in the reverse order (i.e. last parts are copied first) and it constantly checks if it has reached the part that been already uploaded. If so, simply stop, cause there’s already enough content copied locally on the isolated storage to be able to resume the upload
- On every page, there’s a light-weight service, that checks if there is any upload to be resumed for the user. If there is, trigger it
- This gives user ability to be able to not just resume upload but browse to any page and upload still continues under the hood. Of course, it is assumed that local copy has already finished (for most practical purposes, local copy takes less than 5 minutes, hence we simply tell user that ‘upload is being prepared’ during this time…and then actually start showing the upload progress)
- For the resume-upload, there’s server-side API that pulls upload-state information from database, as well as, retrieve uploaded files information from the local filesystem. That helps the Silverlight at the client end to be able to exactly determine, which byte, it should resume the upload at.
- To add more to the mix, to make the uploads scalable, the uploads need to happen on different machines, so that many users uploading at the same time (burst traffic loads) don’t bottleneck on one machine’s upload bandwidth. This complicates resume upload, as the upload needs to be resumed only on that particular server, where it was started
- For video content, most likely you might want to transcode and compress the uploaded files. In our case, we even ‘stitch’ those videos into a single file, that corresponds to a game. To be able to do all that, you definitely would want a given upload to completely end up on the same machine, to keep things relatively simple for transcoding and stitching
I agree. It’s NOT simple, for something so fundamental to any web service. Implementing this solution, has been tough, I would say bit crazy. But we finally managed to do it all. It has been worth it though! Upload has been simpler, less taxing part of our service to the user. I’ve heard few compliments that made me feel happy and proud!