Mittwoch, 13. Oktober 2010

Copy, send or share?

Transferring a bulk, user-defined data between computers connected over the Internet is one of the most crucial operations in the modern Internet. The importance of this operation gained last years especially because of wide adoption and complication of the user-generated-content practices.
The file sizes are constantly grow and the question of how to transfer the data between computers got more and more attention in the community.
There are three possible solution: send the file, share the file and copy the file.
Each of them has its benefits and downsides. Lets consider them closer.

Sending the file is a common way for forwarding of small amount of files to one or several dedicated recipients. The great benefit of this is that the files are preserved for a long time at well know place (e-mail account) which make files available from anywhere and easy to find as they are naturally annotated with the attached message.
What are the downsides here? If you once tried to send even the file of moderate size using e-mail, you already got a bad filling in your stomach. Yes, the transmission is very slow, error-prone, could not be resumed on failure, resulting data is blown up by about 30% and in most cases recipient server will reject files greater then several megabytes. Nothing surprising here: using e-mail to send big files is like laying asphalt with the sport car. The e-mail services has been developed to send text messages, not bulk data.
Even if it would be possible to send the data using the e-mail it wouldn’t be much help, as the e-mail account would be blown up by big and rare usable files.
So, sending the file over the years remains most popular for small and “busy” files (small informative pictures or text documents) and not appropriate for any kind of solid file transferring.

In this sense, file sharing is the “big brother” of e-mail sending services. File sharing services provide both space to store the files in the Internet and necessary tools to upload, download and manage files on this space. Great benefits here is performance, especially when there are several potential recipient of the file. File is uploaded once and then it could be downloaded as much as you like by any amount of recipients. The main bottleneck is upload, as up-links often is very slow, but the file sharing shows it strong side exactly by downloading.
Great! Amazing technology - no doubt... what could be wrong here? There are a couple of problems here. First of all it is not really free of charge. In order to get all the benefits of such a service you must pay some money on the monthly basis. And it is great luck to find honest provider that will get as much as it should.
Free accounts are artificially limited in several aspects: the maximum file size is limited, the total space available is limited and the upload and download speeds are artificially limited by the provider. At the end you got conditions that do not demonstrate significant benefit comparing with notorious e-mails both in the speed and the file size.
Another problem with the file sharing services concerns private sphere. In order to use these services you must upload your data to a third-party server: you deliberated provide your property to somebody else. Probably, for most of us it is not exactly, what we happy to do. In that sense it is important to read carefully the conditions and use terms on such sites, especially for free accounts. Take a look at what community writes in forum about such a provider is the best idea here anyway.

What is left for us is file coping. These services allows to copy the data directly from one computer to another. No intermediate server space is used to store intermediate “shadow” copy of the file. The disadvantage of cause is that the file could not be “shared” and downloaded later on as in the case of file sharing. But is it what we need all the time? There are a lot of situations, when you just need to transfer the file once and right now to the person is waiting the transmission somewhere else in the Net. What you need in this case is full speed, loss of any artificial boundaries and reliable transition service. If it is like that - coping is exactly what you need. Additionally to free-of-limits operation mode you get no you files or even parts of it preserved or abused by any of third-party. So, lets try the Click2Copy (http://click2copy.com) - one of a few true file copy providers in the Net.

Sonntag, 10. Oktober 2010

What's wrong with FTP?

FTP is a very old and deeply aprobed protocol. It was developed in the early 70th as part of much larger TCP/IP group of protocols for american millitary.
The idea was simple: in order to reduce risks and maintainence costs and increase compatibility, all complex of network operations have been divided on several layers each of which abstracts some particular aspect of data transmission.. At the top of these layers the TCP protocol was defined. This protocol allows to send any amount of data from one network location to another.
The FTP uses TCP (or TCP-layer) in order to transmit files over the provided channel.
Nice, great idea. But.. as FTP uses TCP as the background layer it also inherits the properties of the TCP. And if we will consider them very close, we can see that TCP is not the best method for bulk data transmission.
TCP is reliable stream protocol. It means that data is transmitted sequentially: one byte after another. It is very important for real time data transmission: for video- or audiostreams, as the data must be presented on the screen in its natural order (or in the order of stream generation).. In general it is not bad, but for file transfer you need to transfer whole file and not some parts of it: there is no order in normal files - you whether get all the file or nothing. In that sense using TCP as a background service protocol is not really necessary...
The main problem with TCP is that it tries to simulate the transmission sequence, what is not needed for file transfer application layer such as FTP.
If you throw out the transmission sequence and defines some kind of protocol that will transmit file as a whole, of cause you will get speed benefit as you will not bound yourself to additional condition of sequence transmission.
In that sense FTP is a very cheap file transfer protocol, that uses ready-to-use relible data transfer service, but is not very efficient.
Thus, for use cases where the speed & performance of data transmission is very important (for large scale data transfer especially) the TCP-based algorithms such as FTP must be replaced with more efficient UDP-based algorithms.. That's why companies are buying commerial products that replaces old FTP-based infrastructure.
Refer MFT page on wikipedia to get more information on possible solutions.

Sonntag, 26. September 2010

How to move large files around

Probably everyone of us has once encountered the simple problem of sending an arbitrary file to another computer connected through the Internet. This sounds like an easy and well-known task which has been solved many thousand times. However, the truth is different. So, let us consider this once again.

Of course, most of people use e-mail services to send copies of their files on a daily basis. If the file is big people commonly use file sharing services like rapidshare to upload and subsequently download the file. Looks familiar and common, but is it also the best solution? Let us take a closer look on how both of these methods work.

Both of the methods obviously rely on an intermediate storage, located somewhere in the Internet. This storage (e-mail account or shared file folder) preserves a shadow copy of the file before the file will be requested by the receiver. So, the whole process of sending the file to the receiver involves two file transmissions: The file is first copied on the remote storage, and then copied once again to the recipient’s computer. If we just need to copy the file once from one computer to another, why would we do that? Is it not irrational to copy the file twice?
If the cost of this intermediate copy is not so high and doing multiple copies is fast enough this of course is not a problem. But is it really like this? In most cases it is not.

Most of the filesharing providers impose a lot of restrictions on free accounts: Both maximum file size and bandwidth i.e. speed are limited in order to hold the costs low. The paid accounts must cover the costs of free-users accounts.
The usage of the e-mail accounts is very limited as the protocols of e-mail transition are not optimized for data transfer. The maximum file size is limited to 10-20 MB and by attaching files to e-mails they get blown up by about 1/3 of their size ( http://ask-leo.com/why_are_emailed_attachments_larger_than_the_original_file.html ) .

In addition: Why would you want to copy things twice, rely on third-party providers and their contract conditions if you could do it directly? It is like, in order to copy your photo archive to your friend’s computer, you archive the file, split it on several 1,4Mb chunks, copy it on 3,5” diskettes, and then copy it back on the friend’s disk and reassemble the fragments all together. What sounds curious happens every day in the Internet with file-sharing and similar services.

What is really needed here is a service that allows you to copy the file peer-to-peer from one computer directly to another one without any intermediate parties.
Easy to say, hard to do. For a long time people are used to use complex software on both computers in order to accomplish this simple task. That’s why the true file copying hasn’t been really promoted all over the Internet until now.

But... snow melt, things changed... Lately, a new independent provider has announced a service what would solve all of these problems: A true peer-to-peer and easy to use file copying service. Its name is Click2Copy (http://www.click2copy.com). This service works how you would expect it to work: It connects two computers in the Internet through a secured channel and copies files  between them of any arbitrary size. There are no limits: Files are sent with the sender’s full upload bandwidth through a direct connection, free of charge, point-to-point. No special software is required: The communicating parties are started in the Java VM as an applet in the browser.