# reliable-download

Reliable Download is designed to download large files across unreliable
network via HTTP. It is especially effective when transfer files across
[the GFW](https://en.wikipedia.org/wiki/Great_Firewall). It has a server side
called rd-api, a client side called rd. To run reliable download, run rd-api
on file source node to serve the file, run rd on target node to download the
file. You can think of it as enhanced http static file server. To learn how
reliable download works, see ```rd-api --help``` below.

## Installation

Reliable download is developed and tested in linux. Windows and MacOS is not
supported. Theoretically the code can be ported to support other OS, but I
don't have the time to handle testing and distribution.

Reliable download server side and client side is both written in
Haskell. However, it is distributed on [pypi](https://pypi.org/) so that user
can install it more easily. Because python and pip is usually bundled with
linux system. To install rd-api and rd, see their pypi page below. The same
doc is kept in git as well, see `./pypi/rd-api/README.rst`.

- [rd-api on pypi](https://pypi.org/project/rd-api/)
- [rd on pypi](https://pypi.org/project/rd/)

## Design and Command Line Help

Here is the command help:

```
$ rd-api --help
rd-api - reliable download server

Usage: rd-api [-h|--host HOST] [-p|--port PORT] [--redis-host REDIS_HOST] 
              [--redis-port REDIS_PORT] [-d|--web-root DIR] [-w|--worker INT] 
              [-v|--verbose] [-V|--version]
  rd-api is an HTTP file server that provides static file hosting and reliable
  download api for rd client.
  
  rd-api serves files under web-root. You can use it like python3 -m http.server
  
  In addition, if rd command line tool is used to do the download, it will
  download in a reliable way by downloading in 2MiB blocks and verify checksum
  for each block.
  
  Usage:
      server side:
          $ ls
          bigfile1 bigfile2
          $ rd-api --host 0.0.0.0 --port 8082

      client side:
          $ rd http://server-ip:8082/bigfile1
  
  Reliable download is implemented this way:
  
  - user uses rd client to request a resource to download.
  - rd client requests resource block metadata via the /rd/ api. block metadata
    contains block count, block id, block byte offset, block content sha1sum.
  - rd-api calculates and serves block metadata to rd client incrementally.
    block metadata is cached in redis after calculation.
  - rd client fetches block and verifies sha1sum incrementally. When all blocks
    are downloaded and verified, combine blocks to get the final resource.
  - rd client will retry on http errors and sha1sum verification failures.
  - rd client supports continuing a partial download. You can press Ctrl-C to
    stop download anytime, and continue later by running the same command again.

Available options:
  -h,--host HOST           http listen host (default: "::")
  -p,--port PORT           http listen port (default: 8082)
  --redis-host REDIS_HOST  redis host (default: "127.0.0.1")
  --redis-port REDIS_PORT  redis port (default: 6379)
  -d,--web-root DIR        web root directory (default: ".")
  -w,--worker INT          how many concurrent workers to calculator sha1sum for
                           file (default: 2)
  -v,--verbose             show more debug message
  -V,--version             show program version and exit
  -h,--help                Show this help text
```

```
$ rd --help
rd - reliable download client

Usage: rd [-r|--block-max-retry INT] [-k|--keep] [-l|--rolling-combine] 
          [-d|--temp-dir TEMP_DIR] [-o|--output-dir OUTPUT_DIR] 
          [-w|--worker INT] [-f|--force] [-i|--progress-interval N] 
          [-v|--verbose] [-V|--version] [URL...]
  Download large files across slow and unstable network reliably. Requires using
  rd-api on server side. For more information, see rd-api --help

Available options:
  -r,--block-max-retry INT max retry times for each block (default: 30)
  -k,--keep                keep block data when download has finished and
                           combined
  -l,--rolling-combine     delete each block data right after combine, conflict
                           with --keep
  -d,--temp-dir TEMP_DIR   the dir to keep block download data
                           (default: ".blocks")
  -o,--output-dir OUTPUT_DIR
                           the dir to keep the final combined file
                           (default: ".")
  -w,--worker INT          concurrent HTTP download worker (default: 5)
  -f,--force               overwrite exiting target file in OUTPUT_DIR
  -i,--progress-interval N how often to show download progress, in seconds
                           (default: 10)
  -v,--verbose             show more debug message
  -V,--version             show version number and exit
  -h,--help                Show this help text
```

## Developer Notes, Build the Project by Yourself

see ./operational file for developer notes.

To build the project, install [stack tool](https://www.haskellstack.org/),
then run:

```
stack build --pedantic
```

Binary will be produced in ````stack path --local-install-root`/bin/``` dir.

You may also run built-in tests:

```
stack build --pedantic --test
```

## Difference With Other Similar Tools

BitTorrent can be used to transfer big files across unreliable network
reliably. I like the protocol a lot. But you need to create torrent file in
advance and either use a public tracker or run your own tracker server. It's
too much work to share a simple big file.

curl, wget, aria2 can be used to download file via HTTP. But they do not check
whether downloaded data is valid. This makes "continue downloading a partially
downloaded file" useless in unreliable network.

## License

Reliable download is released under GPLv3+. Source code can be found at
https://gitlab.emacsos.com/sylecn/reliable-download