Skip to content
operational 105 KiB
Newer Older
Yuanle Song's avatar
Yuanle Song committed
* COMMENT -*- mode: org -*-
#+Date: 2018-05-04
Yuanle Song's avatar
Yuanle Song committed
#+STARTUP: content
* notes                                                               :entry:
** 2022-03-14 project dir structure
- lib           shared code for server and client, the reliable-download library
- rd-api        rd-api cli tool, server side
- rd            rd cli tool, client side
- misc/         learning tools and temp codes
- test/         tests
- package.yaml  stack project description
- pypi/         for release on pypi, see pypi/Makefile

** 2018-05-09 how to release latest code on PyPI? how to make a release?
Yuanle Song's avatar
Yuanle Song committed
- update version number in
  - lib/RD/CliVersion.hs (required, used by pypi pkg)
  - package.yaml (optional)
- build binary on an old OS, like debian stretch.
  so the built binary can run everywhere.

Yuanle Song's avatar
Yuanle Song committed
  install haskell stack tool.
  setup libncurses-dev

  stack build --test --pedantic
  stack exec hlint -- -g
- update README file. add ChangeLog entry on *README.rst
  ./README.md
  ./pypi/rd-api/README.rst
  ./pypi/rd-client/README.rst
Yuanle Song's avatar
Yuanle Song committed

- build wheel and test it in production server
  make dist -C pypi

  wheel will be built in pypi/rd-api/dist, pypi/rd-client/dist dir.

- release binary on PyPI

  export RD_API_TWINE_TOKEN=xxx
  export RD_CLIENT_TWINE_TOKEN=xxx
  make all -C pypi

  To release only the server:
  export RD_API_TWINE_TOKEN=xxx
  make api -C pypi

  To release only the client:
  export RD_CLIENT_TWINE_TOKEN=xxx
  make client -C pypi

- problems
  - how to sync build files for pypi?
    rsync -n -air --files-from=pypi/build_files ./ s02:projects/reliable-download/
    rsync -air --files-from=pypi/build_files ./ s02:projects/reliable-download/

** 2018-05-08 example run in prod env
- try it on de03

  on ryzen5,
  cd ~/projects/reliable-download/
  FN=`stack exec which rd-api`
  gzip -k "$FN"
  scp "$FN.gz" de03:d/
Yuanle Song's avatar
Yuanle Song committed
  on de03,
  cd ~/d/
Yuanle Song's avatar
Yuanle Song committed
  gunzip rd-api.gz
  chmod +x rd-api
  env WEB_ROOT=$PWD ./rd-api
Yuanle Song's avatar
Yuanle Song committed
  curl -v http://de03.dev.emacsos.com:8082/rd/
  curl -I http://de03.dev.emacsos.com:8082/virtio-win-0.1.215.iso
Yuanle Song's avatar
Yuanle Song committed
  516M virtio-win-0.1.215.iso

  on ryzen5,
  tmake stack exec rd -- -d ~/d/.blocks -o ~/d/ http://de03.dev.emacsos.com:8082/virtio-win-0.1.215.iso
  tmake ~/d/rd -d ~/d/.blocks -o ~/d/ http://de03.dev.emacsos.com:8082/virtio-win-0.1.215.iso
Yuanle Song's avatar
Yuanle Song committed
  below is a run log from old rd version.
  #+BEGIN_SRC sh
    sylecn@ryzen5:~/projects/reliable-download$ tmake stack exec rd -- -d ~/d/.blocks -o ~/d/ http://138.201.95.248:8082/gitlab-ce_10.3.5-ce.0_amd64_xenial.deb
    Tue May  8 00:45:31 CST 2018
    will start timed run in 3 sec
    running command: stack exec rd -- -d /home/sylecn/d/.blocks -o /home/sylecn/d/ http://138.201.95.248:8082/gitlab-ce_10.3.5-ce.0_amd64_xenial.deb
    GET /rd/ api ok
    Downloading file: "gitlab-ce_10.3.5-ce.0_amd64_xenial.deb", 377 MiB, 189 blocks
    189 new block(s) ready on server side
    combining blocks to create /home/sylecn/d/gitlab-ce_10.3.5-ce.0_amd64_xenial.deb
    file downloaded to /home/sylecn/d/gitlab-ce_10.3.5-ce.0_amd64_xenial.deb
    all urls downloaded.
    started at 2018-05-08 00:45:34
    stopped at 2018-05-08 00:47:20
    Duration: 106 seconds
    sylecn@ryzen5:~/projects/reliable-download$
  #+END_SRC
  except for lacking progress info and download speed info.
  download works perfectly.
  sha1sum for the whole file matches.

- 2018-05-08 when using 5 threads for the download.
  #+BEGIN_SRC sh
    block 188 fetched
    combining blocks to create /home/sylecn/d/gitlab-ce_10.3.5-ce.0_amd64_xenial.deb
    file downloaded to /home/sylecn/d/gitlab-ce_10.3.5-ce.0_amd64_xenial.deb
    all urls downloaded.
    started at 2018-05-08 21:56:26
    stopped at 2018-05-08 21:57:42
    Duration: 76 seconds
  #+END_SRC

** 2018-05-06 how to run rd-api in dev env
- how to run rd-api

  cd ~/projects/reliable-download/
  env WEB_ROOT=/home/sylecn/persist/cache stack exec rd-api

- Test it is working:

  static file hosting:
  curl -XGET http://localhost:8082/sdkman.sh

  rd api:
  curl -XGET http://localhost:8082/rd/ideaIC-2018.1.tar.gz
  curl -XGET http://localhost:8082/rd/ideaIC-2018.1.tar.gz | jq .

  To clear cached file status for ideaIC-2018.1.tar.gz,
  redis-cli del "/home/sylecn/persist/cache/ideaIC-2018.1.tar.gz_2097152_status"
  the 2097152 there is 2MiB block size in bytes.

- client tool:
  cd ~/projects/reliable-download/
  curl http://localhost:8082/rd/ideaIC-2018.1.tar.gz
  stack exec rd -- -d ~/d/.blocks -o ~/d/ http://localhost:8082/ideaIC-2018.1.tar.gz

  test fresh block-not-ready state:
  redis-cli del "/home/sylecn/persist/cache/sdkman.sh_2097152_status"
  redis-cli del "/home/sylecn/persist/cache/sdkman.sh_2097152"
  stack exec rd -- -d ~/d/.blocks -o ~/d/ http://localhost:8082/sdkman.sh

** 2018-05-05 write the main logic of creating block metadata.
then make it work with a thread pool with a single thread.

env WEB_ROOT=/home/sylecn/persist/cache stack exec rd-api

curl -XGET http://localhost:8082/rd/ideaIC-2018.1.tar.gz

this should return json of the block metadata.

- block metadata looks like this:
  GET /rd/bigfile
  #+BEGIN_SRC sh
    {"ok": true,
     "block_size": "2MiB",    # this is a fixed value.
     "file_size": xxxx,       # file size in bytes
     "block_count": 24,
     "blocks": [
             [0, 0, 2097151, block1_sha1sum],
             [1, 2097152, 4194303, block2_sha1sum],
             ...
             [N, start, end, blockN_sha1sum]
     ]}
  #+END_SRC

** 2018-05-06 calculate sha1sum for blocks using a thread pool. design try 2.
- data protocol via redis.
  hset <filepath>_blockSize blockId sha1sum

  set <filepath>_blockSize_status working|done

- worker pool is there for calculating all blocks for one file.

         fileQueue
  Main ------------> fileWorker

  GET /rd/file
  if file status is None, push file to fileQueue.
  do normal logic.

  fileWorker:
  fetch file from fileQueue.
  start working on blocks one by one.
  if block already cached in redis, skip it.
  when all done, set <filepath>_blockSize_status done.

- this works and is easy to understand.
  WIP info is also kept in redis for each file's block.

- works on first try. excellent.

- problems
  - how to fail when redis hget or hset fail?
    just return False
    if some block fail, set status to error.
    next time a GET /rd/file, it will trigger the queue again.

    a cron job can also trigger a run.
  - mapM how to skip rest when some action failed?

    If there is a redis error halfway during calculation, I don't want to
    calculate the rest sha1sum, because the result can't be stored.
  - 

** 2018-05-06 how to run hlint
stack exec hlint -- src api client logtest

or run on all git files:
stack exec hlint -- -g
** 2018-05-05 it's impossible to do logging easily in haskell.
two problems

- no way to pass around the logger engine/configuration without a log monad in
  your stack. since you may want to log anywhere, you have to change all the
  data types.

  This requires you know monad and monad transformer really well.

  You can't just get a logger from "global env" like in other language.

- no easy way to format a string as Data.Text.Text.
  both printf and Text.Format.format has too much noise when doing logging.

  I need to write:
  runLogT "Main" logger $ do
    logInfo_ $ LT.toStrict $ TF.format "will listen on {}:{}" ((host config), (port config))

  I would like to write:
  logInfo_ $ format "will listen on {}:{}" (host config) (port config)

  try https://hackage.haskell.org/package/formatting
  Formatting in Haskell
  https://chrisdone.com/posts/formatting

  logInfo_ $ sformat ("will listen on " % string % ":" % int) (host config) (port config)
  // this is better.

Yuanle Song's avatar
Yuanle Song committed
** 2018-05-05 writing tests in hspec
- Test WAI application using hspec
  hspec/hspec-wai: Helpers to test WAI application with Hspec
  https://github.com/hspec/hspec-wai
  Here has example. good.
  Hspec: A Testing Framework for Haskell
  http://hspec.github.io/

- Test.Hspec.Expectations
  https://hackage.haskell.org/package/hspec-expectations-0.8.2/docs/Test-Hspec-Expectations.html#v:Expectation
  shouldBe
  shouldStartWith
  shouldEndWith
  shouldContain
  etc

Yuanle Song's avatar
Yuanle Song committed
** 2018-05-04 make hoogle work in current project
stack build hoogle
requires building 54 pkgs. lots of dependencies.
DONE hoogle-5.0.14

- build local database
  stack hoogle

  DONE 64 pkgs

  Updating Haddock index for snapshot packages in
  /home/sylecn/.stack/snapshots/x86_64-linux-nopie/lts-10.3/8.2.2/doc/index.html

- stack hoogle html
  now it works.

- info: file size
  53M	.stack-work/

** 2018-05-04 for project notes, see GTD.org id002
Yuanle Song's avatar
Yuanle Song committed
** 2018-05-06 client design doc (moved from GTD.org id002)
- client side start downloading blocks using a thread pool or similar.
  block data is saved to .blocks/<fn>/blockN_<block_sha1sum> when it is
  fetched in whole and verified. block data is removed when all blocks are
  joined to the final file, unless user specify -k --keep-blocks on rd cli.

  show a nice progress bar.
  #+BEGIN_SRC sh
    downloading bigfile
    xxx blocks
    downloading block 1
    downloading block 2
    1/24 ready, X%
    downloading block 3
    downloading block 4
    2/24 ready, X%
    downloading block 5
    ...
    block N ready, 100%
    bigfile downloaded.
  #+END_SRC

  block download use HTTP/1.1 Range header to fetch that block.
  Range: bytes=0-499
  Range: bytes=500-999
** 2018-05-05 API spec                                               :design:
- GET /<path>
  static file server. works even without redis.

- GET /rd/<path>
  provides rd API for given static file, requires redis-server.

  when <path> rd metadata is ready, it will return
  #+begin_src json
    {
      "ok": true,                      // metadata is ready or not
      "msg": "",                       // msg to user
      "path": "test/中文.txt",          // file path in URL
      "filepath": "./test/中文.txt",    // file path on server side
      "file_size": 8,
      "blocks": [
	[
	  0,    // block ID
	  0,    // start byte, inclusive
	  7,    // stop byte, inclusive
	  "69bca99b923859f2dc486b55b87f49689b7358c7"    // block sha1sum
	]
      ],
      "block_count": 1,
      "block_size": "2MiB"
    }
  #+end_src

- GET /test-rd/<path>
  test rd API path parsing without calculating sha1sum

* lib docs 							      :entry:
** 2018-05-06 http client docs
Making HTTP requests - http-client library
https://haskell-lang.org/library/http-client

Network.HTTP.Simple
https://www.stackage.org/haddock/lts-10.3/http-conduit-2.2.4/Network-HTTP-Simple.html

Network.HTTP.Client
https://www.stackage.org/haddock/lts-10.3/http-client-0.5.7.1/Network-HTTP-Client.html

how to handle exceptions in http-client?
http-client/TUTORIAL.md at master · snoyberg/http-client
https://github.com/snoyberg/http-client/blob/master/TUTORIAL.md#exceptions

** 2018-05-06 handle IO exceptions
- Control.Exception
  https://www.stackage.org/haddock/lts-10.3/base-4.10.1.0/Control-Exception.html
- System.IO.Error
  https://www.stackage.org/haddock/lts-10.3/base-4.10.1.0/System-IO-Error.html

** 2018-05-05 sol/hpack: hpack: An alternative format for Haskell packages
https://github.com/sol/hpack
Yuanle Song's avatar
Yuanle Song committed
** 2018-05-05 HUnit: A unit testing framework for Haskell
https://hackage.haskell.org/package/HUnit
** 2018-05-05 Web.Scotty
https://www.stackage.org/haddock/lts-11.7/scotty-0.11.1/Web-Scotty.html
html :: Text -> ActionM ()

scotty/examples at master · scotty-web/scotty
https://github.com/scotty-web/scotty/tree/master/examples

** 2018-05-05 Data.Aeson
http://hackage.haskell.org/package/aeson-1.3.1.0/docs/Data-Aeson.html
// this seems more readable than lts haskell's doc.
Aeson: the tutorial
https://artyom.me/aeson

** 2018-05-06 optparse-applicative :: Stackage Server
https://www.stackage.org/lts-10.3/package/optparse-applicative-0.14.0.0
Yuanle Song's avatar
Yuanle Song committed
* later                                                               :entry:
** 2022-03-12 drop redis-server as rd-api dependency.            :featurereq:
Yuanle Song's avatar
Yuanle Song committed
- use a built-in key-value db. such as Berkeley db, sqlite3, or leveldb.
  use a well known path for the db name.

  $HOME/.cache/reliable-downloader/rd-api.db
- 

** 2018-05-05 allow config app at runtime.
via env var and command line parameter.

- HOST
- PORT
- REDIS_HOST
- REDIS_PORT
- WEB_ROOT    web root dir, HTTP Path will be relative to this dir.
- WORKER

- 2018-05-10 I'm trying to write my parser for parsing env var, just like
  optparse-applicative.

  see ~/haskell/env-var-parser/src/Lib.hs

  it is difficult. I can parse a single parameter easily, but I don't know how
  to compose parsers to parse more complex data structure. read more about
  optparse-applicative. I can't understand the source code without more
  reading.

  pcapriotti/optparse-applicative: Applicative option parser
  https://github.com/pcapriotti/optparse-applicative

  An applicative Parser is essentially a heterogeneous list or tree of
  Options, implemented with existential types.

  See this blog post for a more detailed explanation based on a simplified
  implementation.

  Applicative option parser
  https://paolocapriotti.com/blog/2012/04/27/applicative-option-parser/
  it requires usage of GADTs, that's why I don't understand it on first look.

  The ConsP constructor is the combination of an Option returning a function,
  and an arbitrary parser returning an argument for that function. The
  combined parser applies the function to the argument and returns a result.

  the ConsP constructor is where all magic happen.
  I can't define a data structure like this myself.
  because I don't understand what it is.

  In the end, the applicative is defined on the list structure. Not on any
  option itself. when creating parser, you are creating a list. see option and
  optionR. list is easily an instance of functor and applicative.

  I think I can make it work on env variable, although I can't write this code
  myself.

- how it constructs a value for any data type? I think the Applicative Parser
  already make that work.

  how to do error handling for "parse" failures? just return Nothing.

  how to make all fields optional? if key not found, just return Nothing for
  that field.

- I think it really should happen inside optparse-applicative. otherwise
  default value, parsing data is duplicated.

  but that will be too difficult for me. This is the first time I see GADT
  used. and first time I see a value can be constructed for any data type.

  // wait. about "a value can be constructed for any data type", in non-record
  syntax, it's just a normal function call. should be easy to construct value
  using code.

- for my use case, I will just write non-portable functions.
  allow config rd-api using env variable.

  - how to construct a data value not using the constructor? just do a regular
    function call. the data constructor is a function.

  - can I update all records using: config1 {config2}? no.

  - how to clean it up, remove intermediate variables.

- test these commands:
  stack exec rd-api -- --host=127.0.0.1 --port=8060
  env HOST=127.0.0.1 PORT=8060 stack exec rd-api
  env HOST=127.0.1.1 PORT=8061 stack exec rd-api -- --host=127.0.0.1 --port=8060
  env HOST=127.0.1.1 PORT=abc stack exec rd-api -- --host=127.0.0.1 --port=8060
  env WORKER=1 stack exec rd-api
Yuanle Song's avatar
Yuanle Song committed
  v1.1.1.0 all works.

  but I don't like the code in updateRDConfigFromEnvPure
  it's unpleasant code.
  error handling shouldn't be this complicated.

- check how other people handle env variable.
  - envparse, this is similar to ~/haskell/env-var-parser/
  - envy
    https://www.stackage.org/lts-10.3/package/envy-1.3.0.2

    based on the doc. I can see why updateRDConfigFromEnvPure is too
    complex. it is trying to do too many things.

    the applicative style should be used to create values.

    just find another way to do the merge on two value of the same type. or
    update the runParser to support it.

    envy also support infer env var from record field name, by using
    GHC.Generics. So you no longer need to write a parser.

    - check how envy works.
      ~/haskell/testing/handle-params/app/Main.hs

      TODO envy doesn't handle Bool well.
      only True is accepted as true.
      yes, true, on, 1 not accepted as true.

  - read-env-var, this is a simple wrapper on lookupEnv and
    Text.Read.readMaybe. Not what I need.

** 2024-03-12 rd-api, if file is already transferred block by block, I can support
live compress easily. If the client request compress as param such as
?compress=zstd. default is no compress.

- when the source file is just tar, not compressed format like squashfs or
  zip, this can speed up transfer by delaying compress.

** 2024-03-12 rd, add an DL remaining time estimate, based on estimated DL speed.
each block has a block size. I know when it is started and when it is
finished. I know how many more blocks to fetch. it should be easy to
estimate. calculate a moving avg speed using the last 5 blocks DLed.

** 2018-05-09 test the app under unstable network.
I remember there are tools that can simulate packet loss.
policy in ovs can do it.

- 2018-05-11 tcp - Simulate delayed and dropped packets on Linux - Stack Overflow
  https://stackoverflow.com/questions/614795/simulate-delayed-and-dropped-packets-on-linux

  test this in vbox VM. stretch01
  see stretch01 daylog.

* done                                                                :entry:
** 2024-04-26 log refine.
- DONE 2024-04-26T19:06:17  I  fileWorker done for /home/sylecn/d/t2/foo, 0.0 MiB, 1 blocks
  don't use 0.0MiB, just say <1MiB or so.

  humanReadableSize

  add unit test for this.

- DONE 2024-04-26T19:12:02  I  fileWorker is waiting for jobs...
  2024-04-26T19:12:02  I  fileWorker is waiting for jobs...

  fileWorker-<N> is waiting for jobs

  2024-04-26T19:12:02  I  fileWorker working on /home/sylecn/d/t2/foo
  2024-04-26T19:12:02  I  fileWorker done for /home/sylecn/d/t2/foo, 0.0 MiB, 1 blocks

  fileWorker-<N> working on ...

  in startWorkers, give it a name.

- 

** 2024-04-07 rd-api: when a file content is changed on disk, auto invalidate all
cached blocks.

- when user request rd metadata, and file's mtime changed, do a sha1sum on
  first 4M and last 4M of the file. if any of these sha1sum changes,
  invalidate cache in redis.

  when calculate metadata, cache file's mtime, first 4M and last 4M of the
  file's content's sha1sum.

  2024-04-25 is it easy to get last 4M? yes.
  in C, should be seek to length-4M byte and read 4M.

  from correctness point of view, this is not enough. user could modify the
  center part of the file. How about just always recalculate if mtime changed?

  how about this? just do it for small files, use full content hash as
  fingerprint. For large files, user is supposed to rename it or clear file
  metadata cache. If so, user can always rename the file before run rd again.

  just give a warning will be enough.

  design:
  auto calculate and check fingerprint for small files (<200M), skip
  fingerprint check for large files.

- this will allow DL the correct file when file content for the same file name
  is changed.
- give some log in console when file content changes.

*** auto calculate and check fingerprint for small files (<200M), 
skip fingerprint check for large files.

- implement this now.

  this change should be backward compatible to older rd clients.
  it only affect server side metadata caching.

- code in getRdHandler

  resultE <- liftIO $ runExceptT $ processNewFileAsyncMaybe rc fbp

  processNewFileAsyncMaybe :: RDRuntimeConfig -> FillBlockParam -> ExceptT T.Text IO ()

  resultE <- liftIO $ DB.insertIfNotExist rc strKey $ fsBytes FileStatusWorking
  throwOnLeft resultE

  here, if files' sha1 has changed, use DB.set instead of DB.insertIfNotExist.

- add proper logging.

- create a unit test for this?
  it's all IO actions, maybe I will test it myself.

  on agem10,
  stack exec -- rd-api -d ~/d/t2/
  create ~/d/t2/foo with some content
  stack exec -- rd http://127.0.0.1:8082/foo

- problems
  - if I send job to worker. will old redis key be rewritten?
    will it ever return expired cache data?

    it will call this:
    blocksWithSha1sum <- liftIO $ fillSha1sum rc fbp

    I need to delete the old redis key.

  - test the new code.
    when foo content changed.

    it works.

** 2024-04-25 haskell: when only executable code changes, should I in increase version in package.yaml
I think I will defer version change to avoid re-compilation of the base library.

** 2024-04-07 when combining blocks to create final file, don't print
"No block fetched in last 10 seconds" log any more if there is no other files
in DL list.
** 2024-04-24 try build this project in gocd.
- build in debian stretch agent node.
- build in docker is fine.
  cp the built binary to artifacts dir.
  use debian stretch base debian image.

  debian stretch is no longer supported. maybe try a supported old OS.
  like debian 10 buster or RHEL 7?

  Red Hat Universal Base Images for Docker users | Red Hat Developer
  https://developers.redhat.com/blog/2020/03/24/red-hat-universal-base-images-for-docker-users#introducing_red_hat_universal_base_images

  Life-cycle Dates
  https://access.redhat.com/support/policy/updates/errata/#Life_Cycle_Dates
  RHEL 7.9 is still supported, until June 30, 2024.

  I think I should just use debian 10 buster as a base.
- how to make build reuse as much cache as possible?
  cache should be stored on host?

  nix is indeed a good match for this scenario. too bad it can't use a
  pre-built ghc.

  run build on bare metal is best, everything can be debugged and purged as
  will. it is not as reproducible though. To build on go-agent, I need to run
  go-agent on stretch or buster VM. It is similar to manual build, just with
  some automation and I can config it to always build on the same go-agent
  node and it can store the built artifacts for me.

  I think it's worth it. Just deploy go-agent on stretch02 VM.
- deploy go-agent on stretch02 VM.
  check how to specify which files should be copied to artifacts.

  I need to run go-agent as sylecn user so I can reuse stack cache and
  configuration. so I decide to use the zip file based agent.

  Generic Zip | GoCD User Documentation
  https://docs.gocd.org/current/installation/install/agent/zip.html

  Download zip
  https://www.gocd.org/download/#zip
  DL to /wh01/share/soft/cache/go-agent-23.5.0-18179.zip
  sha256
  74cfe036906b5ca246541fa3fe9c20e0ca04f80b27b5f6d03350cfbc1a2b3890

  create auto reg config file.
  specify go-server URL.

  ./config/autoregister.properties
  #+begin_quote
agent.auto.register.key=67c2fdfe-a9a3-490a-9c65-190fc2258483
agent.auto.register.resources=inside-gfw,stack
agent.auto.register.hostname=stretch02
  #+end_quote

  ./wrapper-config/wrapper-properties.conf
  #+begin_quote
wrapper.app.parameter.100=-serverUrl
wrapper.app.parameter.101=https://go.emacsos.com/go
# env variables
set.AGENT_STARTUP_ARGS=-Xms256m -Xmx1024m
  #+end_quote
  install openjdk 17. use nix? no. just use amazon corretto deb.

  ./bin/go-agent start

  it works perfectly. agent is detected, resources is auto added.

- now config pipeline for reliable-download.

  allow go-server access my git repos.
  using deploy read-only key.

  what's the path of the built binary?
  /home/sylecn/projects/reliable-download/.stack-work/install/x86_64-linux-tinfo6/21acf9206f416a1e336a58fea37773f98a7eb66bb7c93dff696534ca35928f26/9.2.8/bin/rd-api
  /home/sylecn/projects/reliable-download/.stack-work/install/x86_64-linux-tinfo6/21acf9206f416a1e336a58fea37773f98a7eb66bb7c93dff696534ca35928f26/9.2.8/bin/rd
  is this path stable? nope.
  s02 path is different. it has no -tinfo and a different hash.
  probably different stack version. yes. s02 stack 2.15.5, agem10 stack 2.13.1

  after upgrading stack. path is still not modified.
  well, I need to cp the files in gocd build stage. using scripts.

Yuanle Song's avatar
Yuanle Song committed
- build and upload artifacts works.

  add a new stage to create PyPI packages?
  I also want to render README.rst file and generate html file.

  readme-renderer · PyPI
  https://pypi.org/project/readme-renderer/
  requires py38+, stretch02 has py37.
  it's a library used by twine check. but I need html. how?
  search: how to render rst to html using readme-renderer

  https://github.com/pypa/readme_renderer/blob/main/readme_renderer/__main__.py
  here is a main. it is not exposed via pyproject.toml

  I will run readme_renderer on d12 bookworm go-agent nodes.
  preinstall it in the agent nodes?
  it is in docker, I have to rebuild the docker image.

  it's in d12 bookworm.
  DONE require readme-renderer agent resource.
  setup python3-readme-renderer
  python3 -m readme_renderer --help
  python3 -m readme_renderer -o README.html README.rst
  it works.

  created ./render-readme.sh to run in gocd.

Yuanle Song's avatar
Yuanle Song committed
- how to pull binary from artifacts before running release-rd-api and
  release-rd?

  how to make release-rd-api and release-rd not depend on each other?
  do I need to create more than one pipeline?
  read more about pipeline design.

  Concepts in GoCD | GoCD User Documentation
  https://docs.gocd.org/current/introduction/concepts_in_go.html

  create two new downstream pipelines to do the release.

  release-rd-api
  release-rd

  need to fix
  ~/projects/reliable-download/pypi/rd-api/Makefile
  ~/projects/reliable-download/pypi/rd-client/Makefile
  to
  support artifacts in build/
  and
  support debian bookworm and later.

  well, I can just create package on debian stretch.
  an older version of readme_renderer can work on older python.

Yuanle Song's avatar
Yuanle Song committed
- 3rd party storage (i.e. S3) is supported for artifacts.
  https://go.emacsos.com/go/admin/artifact_stores

- problems
  - git clone hangs.
    network issue?

    https://gitlab.emacsos.com/sylecn/reliable-download
    this URL clone okay.

    git@gitlab.emacsos.com:sylecn/reliable-download.git
    this URL clone fail.
    why? probably ssh config issue.
    is the home dir properly mount?

    go-server uses jgit?
    /wh01/share/apps/go-server/home/.config/jgit/config

    search: gocd jgit how to set StrictHostKeyChecking=no

Yuanle Song's avatar
Yuanle Song committed
  - stack cache broken?
    #+begin_quote
    Warning: Trouble loading CompilerPaths cache:

         Error: [S-2673]
         Global dump did not parse correctly.
    #+end_quote
    it tries to rebuild everything.

    #+begin_quote
    resource-pool                > Installing library in /home/sylecn/.stack/snapshots/x86_64-linux/77927a2d79232715851f143c4898fcc1c39e5573770dc4db45fc8afa42702d2e/9.2.8/lib/x86_64-linux-ghc-9.2.8/resource-pool-0.2.3.2-8RTY8siXmdJHwIRBi4f3K5
    #+end_quote

    search: haskell stack Warning: Trouble loading CompilerPaths cache Global dump did not parse correctly

    as long as the cache works after initial build, it is fine.

  - stack build fail on s02 VM.
    While building package hspec-core-2.9.7

    hspec-core                   > /usr/bin/ld.gold: error: cannot find -ltinfo

    I have seen this before.
    setup libncurses-dev
    it works.

    if using nix, I won't have this problem.

  - build okay. but artifacts are not collected by gocd.

    it is collected. just not shown in web UI.
    I can see it on disk.
    /wh01/share/apps/go-server/artifacts/pipelines/reliable-download/2/testAndBuild/1/run-stack

    it's also shown in web UI.
    in job details page.

    artifacts are associated with each job run.
    so there is no need to use sub folder.

    why both build/reliable-download-1.5.0.1/ and reliable-download-1.5.0.1/
    is included in artifacts? probably because "/**" can match empty string.

  - can I delete old artifacts?

    yes. it's a global thing. when disk space is low, old artifacts can be
    deleted, unless it's marked as never delete.

  - can I run readme_renderer via nix?
    is it in nix? yes.
    nix search nixpkgs readme-renderer$
    nix profile install nixpkgs#python312 nixpkgs#python312Packages.readme-renderer nixpkgs#python312Packages.readme_renderer

    python3.12 -m readme_renderer -- --help
    nope. can't find the lib. wtf?

    nix requires learning before I can use python packages.
    search: nix how to use python packages from shell?

    just don't use it for library and cli tools for now.

    I will run readme_renderer on d12 bookworm go-agent nodes.

  - DONE release-rd-api
    when I add two materials and trigger a run,
    #+begin_quote
    [go] Task: fetch artifact [build/rd-api] => [] from [reliable-download/testAndBuild/run-stack]
    [go] Fetching artifact [build/rd-api] from [reliable-download/5/testAndBuild/latest/run-stack]
    [go] Could not fetch artifact https://go.emacsos.com/go/remoting/files/reliable-download/5/testAndBuild/latest/run-stack/build/rd-api. Pausing 17 seconds to retry. Error was : Caught an exception 'pipelines/release-rd-api/rd-api: Is a directory'
    [go] Could not fetch artifact https://go.emacsos.com/go/remoting/files/reliable-download/5/testAndBuild/latest/run-stack/build/rd-api. Pausing 27 seconds to retry. Error was : Caught an exception 'pipelines/release-rd-api/rd-api: Is a directory'
    [go] Could not fetch artifact https://go.emacsos.com/go/remoting/files/reliable-download/5/testAndBuild/latest/run-stack/build/rd-api. Pausing 34 seconds to retry. Error was : Caught an exception 'pipelines/release-rd-api/rd-api: Is a directory'
    #+end_quote
    search: fetching artifact Caught an exception Is a directory

    check artifact API.
    https://go.emacsos.com/go/remoting/files/reliable-download/5/testAndBuild/latest/run-stack/build/rd-api

    web UI click file link is
    https://go.emacsos.com/go/files/reliable-download/5/testAndBuild/1/run-stack/build/rd-api
    path looks alright to me.

    is it source is file switch issue?
    when using build/ and is a directory, it seems to work.
    I just don't want to get both rd and rd-api when only one is needed.

    // Oh, I see the problem.
    I didn't set Destination. so it tries to copy ./build/rd-api to ./rd-api
    but ./rd-api already exists as a directory. just set destination to
    build/ should work.
    // yes. it works.

  - "make bootstrap" fail in release-rd-api pipeline.

    python3 -c 'import wheel' || python3 -m pip install --user wheel
    this step fail in docker image.

    try it,
    docker run -ti --rm pvereg.emacsos.com/sylecn/gocd-agent-debian-12-cn:v23.5.1 bash

  - "make bootstrap" fail in s02 go-agent.
    if [ -f build/rd-api ] returns false. why?

    fetch artifact is okay.

    I see. in Makefile the path is not build/rd-api.
    it's ../../build/rd-api
Yuanle Song's avatar
Yuanle Song committed

  - in setup.py python3-setuptools is required.
    I think I should have a python3 build image and run the package commands
    in that docker image instead of rebuilding go-agent image too frequently.
    use pre-built twine docker image maybe.
    I really hate docker-in-docker though.
    maybe just don't run go-agent in docker.
    that will fix everything.
    I can manage OS env using salt easily.
    but I also don't want to install compilers in production system.
    maybe run it in VMs?
    recreate docker image is not slow when change is not big.
    give it another try.
    it works now.
    on next release, I can use gocd to build and deploy to PyPI.
** 2019-02-28 bug: rd-api -d java/
option -d: cannot parse value `java/'

- stack exec rd-api -- -d t1/
  this works fine.

  probably a python wrapper issue.

- problems
  - 2024-04-08 -d foo/ works fine.
    -d foo, -d ./foo/ also works.
  - but getFileStatus on ./test/中文.txt failed:
    ./test/中文.txt: getFileStatus: does not exist (No such file or directory)

    on s02,
    cd ~/projects/reliable-download
    rd-api -d ./test/

    on agem10,
    rd http://[240e:388:8a05:500:be24:11ff:fe06:be5f]:8082/中文.txt
  - try it on agem10,

    rd http://127.0.0.1:8082/中文.txt
    also fail.

    file name encoding issue?

    import System.FilePath ((</>))

    let filepath = webRoot (rcConfig rc) </> T.unpack reqFilePath
    getFileStatus filepath

    I see no encoding issue.
    try add some log.
    no issue.
    getFileStatus works in ghci.

    stack ghci reliable-download:exe:rd-api
    #+begin_src sh
      ghci> import System.Posix.Files (getFileStatus, fileSize)
      ghci> s1 <- getFileStatus("./test/TestApi.hs")
      ghci> fileSize s1
      7647
      ghci> s2 <- getFileStatus("./test/中文.txt")
      ghci> fileSize s2
      8
      ghci> :q
      Leaving GHCi.
    #+end_src
    so why did it fail?
    maybe it only fails when run via python?
    LANG/locale issue?

    try run via stack or just binary.
    ~/.local/pipx/venvs/rd-api/lib/python3.11/site-packages/rdapi/rd-api
    it also fail. not python wrapper issue.

  - use abs path in -d works.
    it's relative path and CWD issue.
    maybe I changed CWD somewhere. check it.

    git grep setCurrentDirectory

    yes.
    -- static app only support serving from PWD
    setCurrentDirectory (webRoot config)

    so just always use relative path, don't join with webroot dir.

** 2024-04-08 should I use absolute file path in redis key?
this can reduce some sha1 calculation if user run rd-api in different root dir.

e.g.

cd /foo/bar/
rd-api

cd /foo/
rd-api -d bar
# or
rd-api -d /foo/bar

** 2024-04-07 rd-api, rd: switch to fast-logger, use local datetime in logs, not UTC time. :logging:featurereq:
- tinylog is not maintained any more. no longer in latest stackage LTS.
- tinylog types doesn't allow use local time in logs. requires writing lots of code.
- create a demo project for using RIO and fast-logger.
  cd ~/projects/
  stack new rio-fastlogger-demo rio

  app/Main.hs
  withLogFunc lo

  https://www.stackage.org/haddock/lts-22.15/rio-0.1.22.0/RIO.html#v:withLogFunc