Skip to content

Tutorial 5: Domain tabular formats

Maturity labels

  • Now: Stable and supported in current releases.
  • Preview: Usable today, but behavior and APIs may evolve.
  • Planned: Not yet implemented.

Note

Status: Now

Row-based parsing is Now. Column-based parsing support is Preview.

This tutorial shows BED/GFF-style row parsing using typed model specs.

Step 1: Parse BED-like rows

>>> import omnipy as om
>>> bed = "chrom\tstart\tend\tname\nchr1\t10\t20\tgeneA\nchr2\t5\t9\tgeneB\n"
>>> rows = om.TsvTableModel(bed)
>>> rows

Step 2: Define row model spec

>>> import pydantic as pyd
>>> class BedRow(pyd.v1.BaseModel):
...     chrom: str
...     start: int
...     end: int
...     name: str | None = None

Step 3: Parse into typed records

>>> import omnipy as om
>>> import pydantic as pyd
>>> class BedRow(pyd.v1.BaseModel):
...     chrom: str
...     start: int
...     end: int
...     name: str | None = None
>>> rows = om.TsvTableModel("chrom\tstart\tend\tname\nchr1\t10\t20\tgeneA\n")
>>> typed_rows = om.Model[list[BedRow]](rows)
>>> typed_rows

Optional: convert for table tooling

>>> import omnipy as om
>>> import pydantic as pyd
>>> class BedRow(pyd.v1.BaseModel):
...     chrom: str
...     start: int
...     end: int
...     name: str | None = None
>>> rows = om.TsvTableModel("chrom\tstart\tend\tname\nchr1\t10\t20\tgeneA\n")
>>> typed_rows = om.Model[list[BedRow]](rows)
>>> om.RowWiseTableWithColNamesModel(typed_rows).to(om.PandasModel)