Skip to content
Back to Milestones

Writing Python and GAP scripts for data extraction and cleanup from sources

Open
Overdue by 11 year(s)
β€’
Due by February 12, 2015
β€’Last updated Dec 24, 2014

The data has two sources: (1) the HTML tables which contain data about maximal-by-size subgroup TPP triples for almost groups of order up to 1000, with some important exceptions and (2) the small groups of order up to 1000 contained in the GAP small groups library (SGL). The Python scripts will extract data from the HTML tables, but the extracted data needs to be partly cleaned up by means of GAP scripts before writing to CSV files - this is because currently the HTML tables contain data about the components of a subgroup TPP triple that are actually GAP statements and are not in a serialised format suitable for storage in database. The combination of these scripts will produce CSV files that store the data about groups and subgroup TPP triples consistent with the database schema.

0% complete

List view

    There are no open issues in this milestone

    Add issues to milestones to help organize your work for a particular release or project. Find and add issues with no milestones in this repo.