• Nomecks@lemmy.ca
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 month ago

    Is there a benefit to doing CoW with Pandas vs. offloading it to the storage? Practically all modern storage systems support CoW snaps. The pattern I’m used to (Infra, not big data) is to leverage storage APIs to offload storage operations from client systems.

    • Sem@lemmy.ml
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 month ago

      If you are doing data processing in pandas CoW allows to avoid of a lot of redundant computations on intermediate steps. Before CoW any data processing in Pandas required manual and careful working with code to avoid the case described in the blog post. To be honest I cannot imagine the case of offloading each result of each operation in the pipeline to the storage…