Skip to content

Latest commit

 

History

History
131 lines (102 loc) · 3.6 KB

notes.md

File metadata and controls

131 lines (102 loc) · 3.6 KB

Notes during implementation

API new go

  • Hasher.hash()

    • if not hash exists...
      • result = ...
      • cache.put(hash_data)
    • cache.load(hash_data)
  • during cache.put

    • cache.evict()
    • ... ?

User story

run uncached request

  • hasher.hash(func) -> Request
  • cache.load(request) -> fails
  • run function -> result
  • cache.save(item)
    • save result in file fs.save(request, result)
    • add DB entry db.save(request, result) (request.rel_path)
  • return Result

run cached request and evict

  • hash request
  • return cache.load(request)
    • db.load(request) # to update last used date
    • fs.load(request)

eviction (not sure yet WHEN to run)

  • cache.evict(func_name=..., max_count=...)
    • check db.get_all_requests()
    • for request to evict...
      • db.remove(request)
      • fs.remove(request)

when to run eviction?

  • before cache.put <-- maybe safest for now although not performant?
  • after cache.put
  • as background process

Alternatives

APIs

Metadata

TODOs

  • make a few proper test cases for the Cache (in particular with parallel processing!)
  • use logging statements instead of logfunc (and maybe with different levels of verbosity?!)

MetadataStorage

  • How to store metadata?
    • id
    • byte_size
    • last_accessed
    • last_modified
    • ...

Implementation options:

Why is joblib not sufficient?

  • does not work across jupyter notebook sessions because dir is named according to kernel (this may very well be intended, but doesn't serve my use case)
  • unclear to me how exactly hashing works
cache
cache/joblib
cache/joblib/__main__--tmp-ipykernel-4032011732
cache/joblib/__main__--tmp-ipykernel-4032011732/get_gdf
cache/joblib/__main__--tmp-ipykernel-4032011732/get_gdf/4652c901c0c669e4db83383b50f91968
cache/joblib/__main__--tmp-ipykernel-4032011732/get_gdf/4652c901c0c669e4db83383b50f91968/output.pkl
cache/joblib/__main__--tmp-ipykernel-4032011732/get_gdf/4652c901c0c669e4db83383b50f91968/metadata.json
cache/joblib/__main__--tmp-ipykernel-4032011732/get_gdf/func_code.py

TODO

  • also hash function code and at least warn when changed? Not sure...