Skip to content
operational 86.8 KiB
Newer Older
Yuanle Song's avatar
Yuanle Song committed
* COMMENT -*- mode: org -*-
#+Date: 2018-05-04
Time-stamp: <2024-04-08>
Yuanle Song's avatar
Yuanle Song committed
#+STARTUP: content
* notes                                                               :entry:
** 2022-03-14 project dir structure
- lib           shared code for server and client, the reliable-download library
- rd-api        rd-api cli tool, server side
- rd            rd cli tool, client side
- misc/         learning tools and temp codes
- test/         tests
- package.yaml  stack project description
- pypi/         for release on pypi, see pypi/Makefile

** 2018-05-09 how to release latest code on PyPI? how to make a release?
Yuanle Song's avatar
Yuanle Song committed
- update version number in
  - lib/RD/CliVersion.hs (required, used by pypi pkg)
  - package.yaml (optional)
- build binary
  stack build --test --pedantic
  stack exec hlint -- -g
- update README file. add ChangeLog entry on *README.rst
  ./README.md
  ./pypi/rd-api/README.rst
  ./pypi/rd-client/README.rst
- build wheel and test it in production server
  make dist -C pypi

  wheel will be built in pypi/rd-api/dist, pypi/rd-client/dist dir.

- release binary on PyPI

  export RD_API_TWINE_TOKEN=xxx
  export RD_CLIENT_TWINE_TOKEN=xxx
  make all -C pypi

  To release only the server:
  export RD_API_TWINE_TOKEN=xxx
  make api -C pypi

  To release only the client:
  export RD_CLIENT_TWINE_TOKEN=xxx
  make client -C pypi

- problems
  - how to sync build files for pypi?
    rsync -n -air --files-from=pypi/build_files ./ s02:projects/reliable-download/
    rsync -air --files-from=pypi/build_files ./ s02:projects/reliable-download/

** 2018-05-08 example run in prod env
- try it on de03

  on ryzen5,
  cd ~/projects/reliable-download/
  FN=`stack exec which rd-api`
  gzip -k "$FN"
  scp "$FN.gz" de03:d/
Yuanle Song's avatar
Yuanle Song committed
  on de03,
  cd ~/d/
Yuanle Song's avatar
Yuanle Song committed
  gunzip rd-api.gz
  chmod +x rd-api
  env WEB_ROOT=$PWD ./rd-api
Yuanle Song's avatar
Yuanle Song committed
  curl -v http://de03.dev.emacsos.com:8082/rd/
  curl -I http://de03.dev.emacsos.com:8082/virtio-win-0.1.215.iso
Yuanle Song's avatar
Yuanle Song committed
  516M virtio-win-0.1.215.iso

  on ryzen5,
  tmake stack exec rd -- -d ~/d/.blocks -o ~/d/ http://de03.dev.emacsos.com:8082/virtio-win-0.1.215.iso
  tmake ~/d/rd -d ~/d/.blocks -o ~/d/ http://de03.dev.emacsos.com:8082/virtio-win-0.1.215.iso
Yuanle Song's avatar
Yuanle Song committed
  below is a run log from old rd version.
  #+BEGIN_SRC sh
    sylecn@ryzen5:~/projects/reliable-download$ tmake stack exec rd -- -d ~/d/.blocks -o ~/d/ http://138.201.95.248:8082/gitlab-ce_10.3.5-ce.0_amd64_xenial.deb
    Tue May  8 00:45:31 CST 2018
    will start timed run in 3 sec
    running command: stack exec rd -- -d /home/sylecn/d/.blocks -o /home/sylecn/d/ http://138.201.95.248:8082/gitlab-ce_10.3.5-ce.0_amd64_xenial.deb
    GET /rd/ api ok
    Downloading file: "gitlab-ce_10.3.5-ce.0_amd64_xenial.deb", 377 MiB, 189 blocks
    189 new block(s) ready on server side
    combining blocks to create /home/sylecn/d/gitlab-ce_10.3.5-ce.0_amd64_xenial.deb
    file downloaded to /home/sylecn/d/gitlab-ce_10.3.5-ce.0_amd64_xenial.deb
    all urls downloaded.
    started at 2018-05-08 00:45:34
    stopped at 2018-05-08 00:47:20
    Duration: 106 seconds
    sylecn@ryzen5:~/projects/reliable-download$
  #+END_SRC
  except for lacking progress info and download speed info.
  download works perfectly.
  sha1sum for the whole file matches.

- 2018-05-08 when using 5 threads for the download.
  #+BEGIN_SRC sh
    block 188 fetched
    combining blocks to create /home/sylecn/d/gitlab-ce_10.3.5-ce.0_amd64_xenial.deb
    file downloaded to /home/sylecn/d/gitlab-ce_10.3.5-ce.0_amd64_xenial.deb
    all urls downloaded.
    started at 2018-05-08 21:56:26
    stopped at 2018-05-08 21:57:42
    Duration: 76 seconds
  #+END_SRC

** 2018-05-06 how to run rd-api in dev env
- how to run rd-api

  cd ~/projects/reliable-download/
  env WEB_ROOT=/home/sylecn/persist/cache stack exec rd-api

- Test it is working:

  static file hosting:
  curl -XGET http://localhost:8082/sdkman.sh

  rd api:
  curl -XGET http://localhost:8082/rd/ideaIC-2018.1.tar.gz
  curl -XGET http://localhost:8082/rd/ideaIC-2018.1.tar.gz | jq .

  To clear cached file status for ideaIC-2018.1.tar.gz,
  redis-cli del "/home/sylecn/persist/cache/ideaIC-2018.1.tar.gz_2097152_status"
  the 2097152 there is 2MiB block size in bytes.

- client tool:
  cd ~/projects/reliable-download/
  curl http://localhost:8082/rd/ideaIC-2018.1.tar.gz
  stack exec rd -- -d ~/d/.blocks -o ~/d/ http://localhost:8082/ideaIC-2018.1.tar.gz

  test fresh block-not-ready state:
  redis-cli del "/home/sylecn/persist/cache/sdkman.sh_2097152_status"
  redis-cli del "/home/sylecn/persist/cache/sdkman.sh_2097152"
  stack exec rd -- -d ~/d/.blocks -o ~/d/ http://localhost:8082/sdkman.sh

** 2018-05-05 write the main logic of creating block metadata.
then make it work with a thread pool with a single thread.

env WEB_ROOT=/home/sylecn/persist/cache stack exec rd-api

curl -XGET http://localhost:8082/rd/ideaIC-2018.1.tar.gz

this should return json of the block metadata.

- block metadata looks like this:
  GET /rd/bigfile
  #+BEGIN_SRC sh
    {"ok": true,
     "block_size": "2MiB",    # this is a fixed value.
     "file_size": xxxx,       # file size in bytes
     "block_count": 24,
     "blocks": [
             [0, 0, 2097151, block1_sha1sum],
             [1, 2097152, 4194303, block2_sha1sum],
             ...
             [N, start, end, blockN_sha1sum]
     ]}
  #+END_SRC

** 2018-05-06 calculate sha1sum for blocks using a thread pool. design try 2.
- data protocol via redis.
  hset <filepath>_blockSize blockId sha1sum

  set <filepath>_blockSize_status working|done

- worker pool is there for calculating all blocks for one file.

         fileQueue
  Main ------------> fileWorker

  GET /rd/file
  if file status is None, push file to fileQueue.
  do normal logic.

  fileWorker:
  fetch file from fileQueue.
  start working on blocks one by one.
  if block already cached in redis, skip it.
  when all done, set <filepath>_blockSize_status done.

- this works and is easy to understand.
  WIP info is also kept in redis for each file's block.

- works on first try. excellent.

- problems
  - how to fail when redis hget or hset fail?
    just return False
    if some block fail, set status to error.
    next time a GET /rd/file, it will trigger the queue again.

    a cron job can also trigger a run.
  - mapM how to skip rest when some action failed?

    If there is a redis error halfway during calculation, I don't want to
    calculate the rest sha1sum, because the result can't be stored.
  - 

** 2018-05-06 how to run hlint
stack exec hlint -- src api client logtest

or run on all git files:
stack exec hlint -- -g
** 2018-05-05 it's impossible to do logging easily in haskell.
two problems

- no way to pass around the logger engine/configuration without a log monad in
  your stack. since you may want to log anywhere, you have to change all the
  data types.

  This requires you know monad and monad transformer really well.

  You can't just get a logger from "global env" like in other language.

- no easy way to format a string as Data.Text.Text.
  both printf and Text.Format.format has too much noise when doing logging.

  I need to write:
  runLogT "Main" logger $ do
    logInfo_ $ LT.toStrict $ TF.format "will listen on {}:{}" ((host config), (port config))

  I would like to write:
  logInfo_ $ format "will listen on {}:{}" (host config) (port config)

  try https://hackage.haskell.org/package/formatting
  Formatting in Haskell
  https://chrisdone.com/posts/formatting

  logInfo_ $ sformat ("will listen on " % string % ":" % int) (host config) (port config)
  // this is better.

Yuanle Song's avatar
Yuanle Song committed
** 2018-05-05 writing tests in hspec
- Test WAI application using hspec
  hspec/hspec-wai: Helpers to test WAI application with Hspec
  https://github.com/hspec/hspec-wai
  Here has example. good.
  Hspec: A Testing Framework for Haskell
  http://hspec.github.io/

- Test.Hspec.Expectations
  https://hackage.haskell.org/package/hspec-expectations-0.8.2/docs/Test-Hspec-Expectations.html#v:Expectation
  shouldBe
  shouldStartWith
  shouldEndWith
  shouldContain
  etc

Yuanle Song's avatar
Yuanle Song committed
** 2018-05-04 make hoogle work in current project
stack build hoogle
requires building 54 pkgs. lots of dependencies.
DONE hoogle-5.0.14

- build local database
  stack hoogle

  DONE 64 pkgs

  Updating Haddock index for snapshot packages in
  /home/sylecn/.stack/snapshots/x86_64-linux-nopie/lts-10.3/8.2.2/doc/index.html

- stack hoogle html
  now it works.

- info: file size
  53M	.stack-work/

** 2018-05-04 for project notes, see GTD.org id002
Yuanle Song's avatar
Yuanle Song committed
** 2018-05-06 client design doc (moved from GTD.org id002)
- client side start downloading blocks using a thread pool or similar.
  block data is saved to .blocks/<fn>/blockN_<block_sha1sum> when it is
  fetched in whole and verified. block data is removed when all blocks are
  joined to the final file, unless user specify -k --keep-blocks on rd cli.

  show a nice progress bar.
  #+BEGIN_SRC sh
    downloading bigfile
    xxx blocks
    downloading block 1
    downloading block 2
    1/24 ready, X%
    downloading block 3
    downloading block 4
    2/24 ready, X%
    downloading block 5
    ...
    block N ready, 100%
    bigfile downloaded.
  #+END_SRC

  block download use HTTP/1.1 Range header to fetch that block.
  Range: bytes=0-499
  Range: bytes=500-999
** 2018-05-05 API spec                                               :design:
- GET /<path>
  static file server. works even without redis.

- GET /rd/<path>
  provides rd API for given static file, requires redis-server.

  when <path> rd metadata is ready, it will return
  #+begin_src json
    {
      "ok": true,                      // metadata is ready or not
      "msg": "",                       // msg to user
      "path": "test/中文.txt",          // file path in URL
      "filepath": "./test/中文.txt",    // file path on server side
      "file_size": 8,
      "blocks": [
	[
	  0,    // block ID
	  0,    // start byte, inclusive
	  7,    // stop byte, inclusive
	  "69bca99b923859f2dc486b55b87f49689b7358c7"    // block sha1sum
	]
      ],
      "block_count": 1,
      "block_size": "2MiB"
    }
  #+end_src

- GET /test-rd/<path>
  test rd API path parsing without calculating sha1sum

* lib docs 							      :entry:
** 2018-05-06 http client docs
Making HTTP requests - http-client library
https://haskell-lang.org/library/http-client

Network.HTTP.Simple
https://www.stackage.org/haddock/lts-10.3/http-conduit-2.2.4/Network-HTTP-Simple.html

Network.HTTP.Client
https://www.stackage.org/haddock/lts-10.3/http-client-0.5.7.1/Network-HTTP-Client.html

how to handle exceptions in http-client?
http-client/TUTORIAL.md at master · snoyberg/http-client
https://github.com/snoyberg/http-client/blob/master/TUTORIAL.md#exceptions

** 2018-05-06 handle IO exceptions
- Control.Exception
  https://www.stackage.org/haddock/lts-10.3/base-4.10.1.0/Control-Exception.html
- System.IO.Error
  https://www.stackage.org/haddock/lts-10.3/base-4.10.1.0/System-IO-Error.html

** 2018-05-05 sol/hpack: hpack: An alternative format for Haskell packages
https://github.com/sol/hpack
Yuanle Song's avatar
Yuanle Song committed
** 2018-05-05 HUnit: A unit testing framework for Haskell
https://hackage.haskell.org/package/HUnit
** 2018-05-05 Web.Scotty
https://www.stackage.org/haddock/lts-11.7/scotty-0.11.1/Web-Scotty.html
html :: Text -> ActionM ()

scotty/examples at master · scotty-web/scotty
https://github.com/scotty-web/scotty/tree/master/examples

** 2018-05-05 Data.Aeson
http://hackage.haskell.org/package/aeson-1.3.1.0/docs/Data-Aeson.html
// this seems more readable than lts haskell's doc.
Aeson: the tutorial
https://artyom.me/aeson

** 2018-05-06 optparse-applicative :: Stackage Server
https://www.stackage.org/lts-10.3/package/optparse-applicative-0.14.0.0
Yuanle Song's avatar
Yuanle Song committed
* later                                                               :entry:
** 2022-03-12 drop redis-server as rd-api dependency.            :featurereq:
Yuanle Song's avatar
Yuanle Song committed
- use a built-in key-value db. such as Berkeley db, sqlite3, or leveldb.
  use a well known path for the db name.

  $HOME/.cache/reliable-downloader/rd-api.db
- 

** 2018-05-05 allow config app at runtime.
via env var and command line parameter.

- HOST
- PORT
- REDIS_HOST
- REDIS_PORT
- WEB_ROOT    web root dir, HTTP Path will be relative to this dir.
- WORKER

- 2018-05-10 I'm trying to write my parser for parsing env var, just like
  optparse-applicative.

  see ~/haskell/env-var-parser/src/Lib.hs

  it is difficult. I can parse a single parameter easily, but I don't know how
  to compose parsers to parse more complex data structure. read more about
  optparse-applicative. I can't understand the source code without more
  reading.

  pcapriotti/optparse-applicative: Applicative option parser
  https://github.com/pcapriotti/optparse-applicative

  An applicative Parser is essentially a heterogeneous list or tree of
  Options, implemented with existential types.

  See this blog post for a more detailed explanation based on a simplified
  implementation.

  Applicative option parser
  https://paolocapriotti.com/blog/2012/04/27/applicative-option-parser/
  it requires usage of GADTs, that's why I don't understand it on first look.

  The ConsP constructor is the combination of an Option returning a function,
  and an arbitrary parser returning an argument for that function. The
  combined parser applies the function to the argument and returns a result.

  the ConsP constructor is where all magic happen.
  I can't define a data structure like this myself.
  because I don't understand what it is.

  In the end, the applicative is defined on the list structure. Not on any
  option itself. when creating parser, you are creating a list. see option and
  optionR. list is easily an instance of functor and applicative.

  I think I can make it work on env variable, although I can't write this code
  myself.

- how it constructs a value for any data type? I think the Applicative Parser
  already make that work.

  how to do error handling for "parse" failures? just return Nothing.

  how to make all fields optional? if key not found, just return Nothing for
  that field.

- I think it really should happen inside optparse-applicative. otherwise
  default value, parsing data is duplicated.

  but that will be too difficult for me. This is the first time I see GADT
  used. and first time I see a value can be constructed for any data type.

  // wait. about "a value can be constructed for any data type", in non-record
  syntax, it's just a normal function call. should be easy to construct value
  using code.

- for my use case, I will just write non-portable functions.
  allow config rd-api using env variable.

  - how to construct a data value not using the constructor? just do a regular
    function call. the data constructor is a function.

  - can I update all records using: config1 {config2}? no.

  - how to clean it up, remove intermediate variables.

- test these commands:
  stack exec rd-api -- --host=127.0.0.1 --port=8060
  env HOST=127.0.0.1 PORT=8060 stack exec rd-api
  env HOST=127.0.1.1 PORT=8061 stack exec rd-api -- --host=127.0.0.1 --port=8060
  env HOST=127.0.1.1 PORT=abc stack exec rd-api -- --host=127.0.0.1 --port=8060
  env WORKER=1 stack exec rd-api
Yuanle Song's avatar
Yuanle Song committed
  v1.1.1.0 all works.

  but I don't like the code in updateRDConfigFromEnvPure
  it's unpleasant code.
  error handling shouldn't be this complicated.

- check how other people handle env variable.
  - envparse, this is similar to ~/haskell/env-var-parser/
  - envy
    https://www.stackage.org/lts-10.3/package/envy-1.3.0.2

    based on the doc. I can see why updateRDConfigFromEnvPure is too
    complex. it is trying to do too many things.

    the applicative style should be used to create values.

    just find another way to do the merge on two value of the same type. or
    update the runParser to support it.

    envy also support infer env var from record field name, by using
    GHC.Generics. So you no longer need to write a parser.

    - check how envy works.
      ~/haskell/testing/handle-params/app/Main.hs

      TODO envy doesn't handle Bool well.
      only True is accepted as true.
      yes, true, on, 1 not accepted as true.

  - read-env-var, this is a simple wrapper on lookupEnv and
    Text.Read.readMaybe. Not what I need.

* current                                                             :entry:
** 
** 2024-04-07 when combining blocks to create final file, don't print
"No block fetched in last 10 seconds" log any more if there is no other files
in DL list.
** 2024-04-07 rd-api: when a file content is changed on disk, auto invalidate all
cached blocks.

- when user request rd metadata, and file's mtime changed, do a sha1sum on
  first 4M and last 4M of the file. if any of these sha1sum changes,
  invalidate cache in redis.

  when calculate metadata, cache file's mtime, first 4M and last 4M of the
  file's content's sha1sum.
- this will allow DL the correct file when file content for the same file name
  is changed.
- give some log in console when file content changes.
- 

** 2024-03-12 rd-api, if file is already transferred block by block, I can support
live compress easily. If the client request compress as param such as
?compress=zstd. default is no compress.

- when the source file is just tar, not compressed format like squashfs or
  zip, this can speed up transfer by delaying compress.

** 2024-03-12 rd, add an DL remaining time estimate, based on estimated DL speed.
each block has a block size. I know when it is started and when it is
finished. I know how many more blocks to fetch. it should be easy to
estimate. calculate a moving avg speed using the last 5 blocks DLed.

** 2022-03-15 rd client, is there a built-in repeat/loop function?
IO () -> IO ()

I should not need to write showProgressLoop explicitly.

** 2018-05-09 test the app under unstable network.
I remember there are tools that can simulate packet loss.
policy in ovs can do it.

- 2018-05-11 tcp - Simulate delayed and dropped packets on Linux - Stack Overflow
  https://stackoverflow.com/questions/614795/simulate-delayed-and-dropped-packets-on-linux

  test this in vbox VM. stretch01
  see stretch01 daylog.

* done                                                                :entry:
** 2019-02-28 bug: rd-api -d java/
option -d: cannot parse value `java/'

- stack exec rd-api -- -d t1/
  this works fine.

  probably a python wrapper issue.

- problems
  - 2024-04-08 -d foo/ works fine.
    -d foo, -d ./foo/ also works.
  - but getFileStatus on ./test/中文.txt failed:
    ./test/中文.txt: getFileStatus: does not exist (No such file or directory)

    on s02,
    cd ~/projects/reliable-download
    rd-api -d ./test/

    on agem10,
    rd http://[240e:388:8a05:500:be24:11ff:fe06:be5f]:8082/中文.txt
  - try it on agem10,

    rd http://127.0.0.1:8082/中文.txt
    also fail.

    file name encoding issue?

    import System.FilePath ((</>))

    let filepath = webRoot (rcConfig rc) </> T.unpack reqFilePath
    getFileStatus filepath

    I see no encoding issue.
    try add some log.
    no issue.
    getFileStatus works in ghci.

    stack ghci reliable-download:exe:rd-api
    #+begin_src sh
      ghci> import System.Posix.Files (getFileStatus, fileSize)
      ghci> s1 <- getFileStatus("./test/TestApi.hs")
      ghci> fileSize s1
      7647
      ghci> s2 <- getFileStatus("./test/中文.txt")
      ghci> fileSize s2
      8
      ghci> :q
      Leaving GHCi.
    #+end_src
    so why did it fail?
    maybe it only fails when run via python?
    LANG/locale issue?

    try run via stack or just binary.
    ~/.local/pipx/venvs/rd-api/lib/python3.11/site-packages/rdapi/rd-api
    it also fail. not python wrapper issue.

  - use abs path in -d works.
    it's relative path and CWD issue.
    maybe I changed CWD somewhere. check it.

    git grep setCurrentDirectory

    yes.
    -- static app only support serving from PWD
    setCurrentDirectory (webRoot config)

    so just always use relative path, don't join with webroot dir.

** 2024-04-08 should I use absolute file path in redis key?
this can reduce some sha1 calculation if user run rd-api in different root dir.

e.g.

cd /foo/bar/
rd-api

cd /foo/
rd-api -d bar
# or
rd-api -d /foo/bar

** 2024-04-07 rd-api, rd: switch to fast-logger, use local datetime in logs, not UTC time. :logging:featurereq:
- tinylog is not maintained any more. no longer in latest stackage LTS.
- tinylog types doesn't allow use local time in logs. requires writing lots of code.
- create a demo project for using RIO and fast-logger.
  cd ~/projects/
  stack new rio-fastlogger-demo rio

  app/Main.hs
  withLogFunc lo

  https://www.stackage.org/haddock/lts-22.15/rio-0.1.22.0/RIO.html#v:withLogFunc

  RIO has built-in log ts support.

  RIO timestamp and level output is not pretty and not easily customizable.
  maybe later.

** 2023-11-12 rd, ipv6 based URL not supported.
- on de05,
  rd-api -h :: -p 8083 --redis-host 10.96.195.242

  on pve,
  cd /wh01/share/songs/
  rd http://[2a01:4f8:c0c:9c42::1]:8083/可一儿歌.tar
  BUG nope. rd doesn't support ipv6 URL.

- 2024-04-08
  stack exec rd-api -- -h ::

  curl http://[::1]:8082/test/中文.txt
  this works.
  stack exec rd -- -v http://127.0.0.1:8082/test/中文.txt
  this works.
  stack exec rd -- -v http://[::1]:8082/test/中文.txt
  this fails.

  getRDResponse :: RDClientRuntimeConfig -> T.Text -> IO RDResponse

  (do
    req <- parseRequest $ T.unpack url
    debugl rc $ "GET /rd" <> decodeUtf8 (path req)
    resp <- httpJSON $ req { path="/rd" <> path req }
    return $ getResponseBody resp)

  search: haskell http-client host ipv6 address support

  How to make a request to an IPv6 address using the http-client package in haskell? - Stack Overflow
  https://stackoverflow.com/questions/70863436/how-to-make-a-request-to-an-ipv6-address-using-the-http-client-package-in-haskel

  http-client 0.7.11 has the fix merged.

  lts-18.27 http-client-0.6.4.1

  try upgrade lts.

  lts-20.26 http-client-0.7.13.1

  yeah, that would work.

- problems
  - tinylog is not in lts-20.26

    build tinylog failed under lts-20.26

    #+begin_quote
    tinylog                      > Preprocessing library for tinylog-0.15.0..
    tinylog                      > Building library for tinylog-0.15.0..
    tinylog                      > [1 of 4] Compiling System.Logger.Message
    tinylog                      > 
    tinylog                      > /tmp/stack-f92234dbf5c8a904/tinylog-0.15.0/src/System/Logger/Message.hs:57:1: error:
    tinylog                      >     Could not find module ‘Data.ByteString.Lazy.Builder’
    tinylog                      >     Perhaps you meant
    tinylog                      >       Data.ByteString.Builder (from bytestring-0.11.4.0)
    tinylog                      >       Data.ByteString.Lazy.Char8 (from bytestring-0.11.4.0)
    tinylog                      >     Use -v (or `:set -v` in ghci) to see a list of the files searched for.
    tinylog                      >    |
    tinylog                      > 57 | import qualified Data.ByteString.Lazy.Builder        as B
    tinylog                      >    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    tinylog                      > 
    tinylog                      > /tmp/stack-f92234dbf5c8a904/tinylog-0.15.0/src/System/Logger/Message.hs:58:1: error:
    tinylog                      >     Could not find module ‘Data.ByteString.Lazy.Builder.Extras’
    tinylog                      >     Perhaps you meant
    tinylog                      >       Data.ByteString.Builder.Extra (from bytestring-0.11.4.0)
    tinylog                      >     Use -v (or `:set -v` in ghci) to see a list of the files searched for.
    tinylog                      >    |
    tinylog                      > 58 | import qualified Data.ByteString.Lazy.Builder.Extras as B
    tinylog                      >    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    #+end_quote
    last bytestring package that has this module is
    https://hackage.haskell.org/package/bytestring-0.10.12.1
    but this is a base package, many pkg fail when this is downgraded.

  - it's better to get rid of tinylog or maintain it myself to drop dependency
    on

    Data.ByteString.Lazy.Builder
    Data.ByteString.Lazy.Builder.Extras

  - should really get rid of tinylog.
    which logger does rd-api use?
    also tinylog.

  - FIXED fix tinylog turned out to be easy. only need to update the import.
    no other code change.
    I can maintain that.
    see agem10 ~/projects/tinylog/

  - FIXED after fix tinylog. one aeson API change.

    J.decode now returns KeyMap instead of HashMap.

** 2024-04-06 rd client: when server side doesn't support GET /rd/ api.
give a more clear msg to client side.
it's not client side's fault.

#+begin_quote
root@pve:/wh01/share/tv-series/Kingdom# rd http://1.116.206.228:8082/kingdom.tar
2024-04-06T07:14:31  E  GET /rd/ api failed: "No redis connection, GET /rd/ disabled"
2024-04-06T07:14:31  E  1 urls failed/skipped.
#+end_quote

** 2018-05-05 utf-8 character not working well in path.                 :bug:
Yuanle Song's avatar
Yuanle Song committed
curl http://localhost:8082/rd/%E4%B8%AD%E6%96%87%E6%96%87%E4%BB%B6%E5%90%8D.rar
{"ok":true,"path":"中"}
only first character is in path key.

- a scotty bug?
  full_path is also wrong. not regexp problem.
- well, since this encoding doesn't work well. I think I will pass the path in
  json body instead of in the URL.

  I don't need it in the URL anyway. download must be handled by a rd client.
- search: haskell scotty path utf-8 character

  check source code.
Yuanle Song's avatar
Yuanle Song committed
  Web.Scotty.Route
  https://www.stackage.org/haddock/lts-11.7/scotty-0.11.1/src/Web.Scotty.Route.html
  matchRoute
  path req

  req :: Request
  path :: Request -> T.Text
  path = T.fromStrict . TS.cons '/' . TS.intercalate "/" . pathInfo

  pathInfo is defined in Network.Wai!
  so problem is not in scotty. probably in Warp.

- check Network.Wai.pathInfo source code
  https://www.stackage.org/haddock/lts-11.7/wai-3.2.1.2/Network-Wai.html#v:pathInfo
  data Request = Request {
      ,  pathInfo             :: [Text]
  }

  check how warp fill this field.
  Network.Wai.Handler.Warp.Request
  https://www.stackage.org/haddock/lts-11.7/warp-3.2.22/src/Network.Wai.Handler.Warp.Request.html
  import qualified Network.HTTP.Types as H

  hdrlines <- headerLines firstRequest src
  (method, unparsedPath, path, query, httpversion, hdr) <- parseHeaderLines hdrlines
  pathInfo = H.decodePathSegments path

  H.decodePathSegments doesn't have problem. I already tested it.

- check parseHeaderLines
  https://www.stackage.org/haddock/lts-11.7/warp-3.2.22/src/Network.Wai.Handler.Warp.RequestHeader.html#parseHeaderLines

  try parse this line using warp's code:
  GET /rd/%E4%B8%AD%E6%96%87%E6%96%87%E4%BB%B6%E5%90%8D.rar HTTP/1.1

  stack repl
  import Network.Wai.Handler.Warp.RequestHeader (parseHeaderLines)
Yuanle Song's avatar
Yuanle Song committed
  :l ~/fromsource/wai/warp/Network/Wai/Handler/Warp/RequestHeader
  too many dependencies

  try build warp in it's source dir.
  cd ~/fromsource/wai/warp
  stack build
  lts-10.0 plan.
  96 pkgs to build.

- problems
  - can't find warp source code.
    warp is inside wai repo.
    https://github.com/yesodweb/wai

    cloned to ~/fromsource/wai/

    just import Network.Wai.Handler.Warp.Internal
    it includes every module. but it doesn't export that function.

  - search: haskell how to use non exported function

** 2023-11-12 rd-api unicode path bug.
- on de05,
  rd-api -h :: -p 8083 --redis-host 10.96.195.242
  on pve,
  cd /wh01/share/songs/
  rd http://49.12.207.182:8083/可一儿歌.tar
  unicode in path is not supported. on rd-api server side.
- rd-api: .: openBinaryFile: inappropriate type (is a directory)
  2023-11-12T06:07:53  I  user request rd metadata for "."
  BUG: unicode character in PATH is not properly supported.
- 2024-04-08 this is a known bug in warp. see later section in this file.
  check whether it's fixed in latest warp.
  check wai/warp changelog.
  search: wai/warp changelog path unicode
  https://hackage.haskell.org/package/wai-3.2.4/changelog
  nothing.
  wai 2.x doesn't have a changelog file.
- wait. curl on the resource works fine.
  only rd fail. it's my code's problem?
  I see, it's client issue. not server issue.
  client when send request to server, should encode URL first.
  it should be easy to fix.
  downloadFile :: RDClientRuntimeConfig -> T.Text -> MaybeT IO Bool
  I see, the url is of type T.Text, not properly encoded before sending via
  HTTP.
  req <- parseRequest $ T.unpack url
  I can add test case for this function.
  this works. it will auto encode URL unsafe characters.
- resp <- httpJSON $ req { path="/rd" <> path req }
  update this code to always parse from full URL. don't use <> on path segment.
  stack exec rd-api -- -v
  stack exec rd -- http://127.0.0.1:8082/中文.txt
  I do need the path segment.
  client is correct.
  now check server side, how it convert the path back to Text.
- getRdHandler :: RDRuntimeConfig -> ExceptT T.Text ActionM ()

- when using urlDecode on matched path.

  path <- lift $ param "1"
  debugl rc $ "path is " <> showt path

  it's incomplete.

-   get (regex "^/rd/(.*)") $ do
  this capture doesn't capture all of path.

  try just use raw path, no captures.

  how to get path inside ActionM?

  https://hackage.haskell.org/package/wai-3.2.4/docs/Network-Wai.html#g:3

  using a query param seems easier to parse in server side.
  in next major release, I think I can switch to use new API on client side.
  server side will serve both old and new /rd/ API.

- DL works now.
  log msg needs some work.
  DONE also the /test/rd/ URL requires some work. change to /test-rd/ should fix it.
  
  use T.pack instead of showt should fix the log msg.

  client side:
  #+begin_quote
  2024-04-08T07:20:22  I  GET /rd/ api ok for "\20013\25991.txt"
  2024-04-08T07:20:22  I  Downloading file: /rd/中文.txt, 0.0 MiB, 1 blocks
  2024-04-08T07:20:22  I  all 1 block(s) ready on server side
  2024-04-08T07:20:22  I  progress: [100%] 1/1 blocks, /rd/中文.txt
  2024-04-08T07:20:22  I  Combining blocks to create "/home/sylecn/d/t2/\20013\25991.txt"
  2024-04-08T07:20:22  I  File downloaded to "/home/sylecn/d/t2/\20013\25991.txt"
  2024-04-08T07:20:22  I  All urls downloaded. 1 files, 1 blocks.
  #+end_quote

  server side:
  #+begin_quote
  sylecn@agem10:~/projects/reliable-download$ stack exec rd-api -- -v
  2024-04-08T07:18:34  I  creating 2 file worker(s)
  2024-04-08T07:18:34  I  fileWorker is waiting for jobs...
  2024-04-08T07:18:34  I  fileWorker is waiting for jobs...
  2024-04-08T07:18:34  I  rd-api 1.4.0.0
  2024-04-08T07:18:34  I  webRoot is .
  2024-04-08T07:18:34  I  will listen on :::8082
  2024-04-08T07:19:15  D  path is "/rd/\20013\25991.txt"
  2024-04-08T07:19:15  D  decodedPath is "\20013\25991.txt"
  2024-04-08T07:19:15  D  filepath is "./\20013\25991.txt"
  2024-04-08T07:19:15  I  user request rd metadata for "./\20013\25991.txt"
  2024-04-08T07:19:15  I  "./\20013\25991.txt" is a new file, sending task to worker
  2024-04-08T07:19:15  I  fileWorker working on "./\20013\25991.txt"
  2024-04-08T07:19:15  D  fillSha1sum: redis hgetall "./-\135.txt_2097152" ok
  2024-04-08T07:19:15  D  redis hset "./-\135.txt_2097152" 0 ok
  2024-04-08T07:19:15  D  Set file status to done for "./\20013\25991.txt"
  2024-04-08T07:19:15  I  fileWorker done for ./中文.txt, 0.0 MiB, 1 blocks
  2024-04-08T07:19:15  I  fileWorker is waiting for jobs...
  2024-04-08T07:20:22  D  path is "/rd/\20013\25991.txt"
  2024-04-08T07:20:22  D  decodedPath is "\20013\25991.txt"
  2024-04-08T07:20:22  D  filepath is "./\20013\25991.txt"
  2024-04-08T07:20:22  I  user request rd metadata for "./\20013\25991.txt"
  2024-04-08T07:20:22  D  "./\20013\25991.txt" is not a new file
  2024-04-08T07:20:22  D  file status is done
  2024-04-08T07:20:22  D  fillSha1sum: redis hgetall "./-\135.txt_2097152" ok
  #+end_quote

- blockSha1sumHashKey fbp = Char8.pack (fbpFilepath fbp) <> "_" <> (Char8.pack . show) (fbpBlockSize fbp)
  redis key seems not well encoded.

- git grep -n showt
  check and fixed all usage of showt.

- MOVED should I use full path in redis key?
  this can reduce some sha1 calculation if user run rd-api in different root dir.
** 2022-03-15 stack test should not rely on
/home/sylecn/persist/cache/ideaIC-2018.1.tar.gz

try use a smaller file within git tree.
** 2018-05-10 use a proper module hierarchy.
import RD.Utils
import RD.Api.Lib
import RD.Api.Config
import RD.Client.Lib
import RD.Client.Opts

- stack repl doesn't like duplicated Lib module. also for libs, correct mdoule
  hierarchy is important.
- 

Yuanle Song's avatar
Yuanle Song committed
** 2022-03-14 build on debian 9. push a new release to pypi.
- binary built on ryzen5 won't work because of high libc version.
  #+BEGIN_SRC sh
    root@de03:~/d# ./rd-api --version
    ./rd-api: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.27' not found (required by ./rd-api)
    ./rd-api: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by ./rd-api)
  #+END_SRC
Yuanle Song's avatar
Yuanle Song committed
- problems
  - is readme_renderer still required? does twine require it?
    yes. it's required for "twine check" command.
Yuanle Song's avatar
Yuanle Song committed
** 2022-03-16 io-thread-pool, github field in package.yaml is wrong.
how to use my own non-github url for source URL?
Yuanle Song's avatar
Yuanle Song committed
- search: haskell package.yaml github field

  package.yaml is from https://github.com/sol/hpack

  use git field.

** 2022-03-16 add io-thread-pool as a git submodule. so the project can be built by other people.
just use git URL in extra-deps.
** 2018-05-07 loopUntilAllBlocksReady, how to track progress?
use a thread pool to download blocks, print overall progress when some parts
done or some time elapsed.

- how to track progress?
  I used mapM to fetch block.
  results <- mapM (fetchBlockAsync opts rc url rdResp) newReadyBlocks

  how to show some progress info?
  I need a supervisor thread. and I need a shared data structure.

  a mapM is not enough to do this.
- 

** 2022-03-15 client log, don't show each block fetch. show progress instead.
- log overall progress every 30s
- xx/xx blocks fetched, xx%
  percentage show integer.