swpt_lib.scan_table¶
-
class
swpt_lib.scan_table.
TableScanner
¶ A table-scanner super-class. Sub-classes may override class attributes.
Exapmle:
from swpt_lib.scan_table import TableScanner from mymodels import Customer class CustomerScanner(TableScanner): table = Customer.__table__ columns = [Customer.id, Customer.last_order_date] def process_rows(self, rows): for row in rows: print(row['id'], row['last_order_date'])
-
blocks_per_query
= 40¶ The number of database pages (blocks) to be retrieved per query. It might be a good idea to increase this number when the size of the table row is big.
-
columns
= None¶ An optional list of
sqlalchemy.sql.expression.ColumnElement
instances to be be retrieved for each row. Most of the time it will be a list ofColumn
instances. Defaults to all columns.
-
process_rows
(rows: list) → None¶ Process a list or rows.
Must be defined in the subclass.
Parameters: rows – A list of table rows.
-
run
(engine: sqlalchemy.engine.interfaces.Connectable, completion_goal: datetime.timedelta, quit_early: bool = False)¶ Scan table continuously.
The table is scanned sequentially, starting from a random row. During the scan
process_rows()
will be continuously invoked with a list of rows. When the end of the table is reached, the scan continues from the beginning, ad infinitum.Parameters: - engine – SQLAlchemy engine
- completion_goal – The time interval in which the whole table should be processed. This is merely an approximate goal. In reality, scans can take any amount of time.
- quit_early – Exit after some time. This is mainly useful during testing.
-
table
= None¶ The
sqlalchemy.schema.Table
that will be scanned. (Model.__table__
if declarative base is used.)Must be defined in the subclass.
-
target_beat_duration
= 25¶ The scanning of the table is done in a sequence of “beats”. This attribute determines the ideal duration in milliseconds of those beats. The value should be big enough so that, on average, all the operations performed on table’s rows could be completed within this interval. Setting this value too high may have the effect of too many rows being processed simultaneously in one beat.
-