A Python package for reading variably structured text files at scale
Tabbed is a Python library for reading variably structured text files. It automatically deduces data start locations, data types and performs iterative and value-based conditional reading of data rows.
Key Features | Usage | Documentation | Dependencies | Installation | Contributing | Acknowledgments
Key Features
-
Structural Inference:
A common variant of the standard text file is one that contains metadata prior to a header or data section. Tabbed can locate the metadata, header and data locations in a file. -
Type inference:
Tabbed can parseint,float,complex,time,dateanddatetimeinstances at high-speed via a polling strategy. -
Conditional Reading:
Tabbed can filter rows during reading with equality, membership, rich comparison, regular expression matching and custom callables via simple keyword arguments. -
Partial and Iterative Reading:
Tabbed supports reading of large text files that consumes only as much memory as you choose.
Usage
Below is a sample file with a Metadata section and Header using the tab character as the delimiter.
annotations.txt
Animal ID Animal
Researcher Test
Directory path
Number Start Time End Time Time From Start Channel Annotation
0 02/09/22 09:17:38.948 02/09/22 09:17:38.948 0.0000 ALL Started Recording
1 02/09/22 09:37:00.000 02/09/22 09:37:00.000 1161.0520 ALL start
2 02/09/22 09:37:00.000 02/09/22 09:37:08.784 1161.0520 ALL exploring
3 02/09/22 09:37:08.784 02/09/22 09:37:13.897 1169.8360 ALL grooming
4 02/09/22 09:37:13.897 02/09/22 09:38:01.262 1174.9490 ALL exploring
5 02/09/22 09:38:01.262 02/09/22 09:38:07.909 1222.3140 ALL grooming
6 02/09/22 09:38:07.909 02/09/22 09:38:20.258 1228.9610 ALL exploring
7 02/09/22 09:38:20.258 02/09/22 09:38:25.435 1241.3100 ALL grooming
8 02/09/22 09:38:25.435 02/09/22 09:40:07.055 1246.4870 ALL exploring
9 02/09/22 09:40:07.055 02/09/22 09:40:22.334 1348.1070 ALL grooming
10 02/09/22 09:40:22.334 02/09/22 09:41:36.664 1363.3860 ALL exploring
Dialect and Type Inference
Tabbed can detect the dialect via clevercsv and infer the data types.
from tabbed.samples import paths
infile = open(paths.annotations, 'r')
reader = Reader(infile)
dialect = reader.sniffer.dialect
types, _ = reader.sniffer.types(poll=10)
print(dialect) # a clevercsv SimpleDialect
print('---')
print(types)
Output
SimpleDialect('\t', '"', None)
---
[, , , , , ]
Metadata and Header detection
Tabbed can automatically locate the metadata, header and data rows.
print('---')
print(reader.metadata())
Output
Header(line=6,
names=['Number', 'Start_Time', 'End_Time', 'Time_From_Start', 'Channel', 'Annotation'],
string='Number\tStart Time\tEnd Time\tTime From Start\tChannel\tAnnotation')
---
MetaData(lines=(0, 6),
string='Experiment ID\tExperiment\nAnimal ID\tAnimal\nResearcher\tTest\nDirectory path\t\n\n')