Dataset batch hierarchies
Maturity labels
- Now: Stable and supported in current releases.
- Preview: Usable today, but behavior and APIs may evolve.
- Planned: Not yet implemented.
Note
Status: Now
1) What it solves
Batch processing often devolves into nested loops with weak typing and unclear structure.
2) The idea
Dataset[...] gives keyed batch semantics, and datasets can nest to represent hierarchy.
3) Example
>>> from omnipy import Dataset, Model
>>> Inner = Dataset[Model[int]]
>>> Outer = Dataset[Inner]
>>> grouped = Outer({'group1': {'a': '1', 'b': 2}, 'group2': {'x': 10}})
>>> grouped.json()
4) Output / display
╭───┬────────────────┬────────────┬────────┬──────────────────╮
│ # │ Data file name │ Type │ Length │ Size (in memory) │
│ │ │ │ │ │
│ 0 │ a │ Model[int] │ - │ 589 Bytes │
│ 1 │ b │ Model[int] │ - │ 589 Bytes │
╰───┴────────────────┴────────────┴────────┴──────────────────╯
5) When to use / when not
Use it for record sets, grouped records, file collections, or keyed intermediate artifacts.
Skip it when you truly only process one scalar/record and no grouping is needed.
6) Gotchas
- Define stable key semantics early (sample id, filename, partition key, etc.).
- Very deep nesting usually means you need a clearer boundary between phases.
7) Links
- How-to: Mapping over datasets
- How-to: Parametrized models