Open lab2_data.csv
file and understand its contents. The file consists
of 11 columns separated by tab and each row is a customer with all of the products they have ordered. The columns are:
Name
column is made up of the customer FirstName
and LastName
separated by a spaceAddress
column has the customer addressCity
column has the customer cityCountry
column has the customer countryRegion
column has the customer regionAll the *
columns are linked, meaning if you split the columns on a semicolon and zip them up, you will get one product name, product category, product category description, product unit price, quantity ordered, and ordered date. Remember, each row/line in the data represents a customer and all their product orders. The product orders are separated by a semicolon.
In this lab you will parse the data file and create a normalized database with six tables. To make this lab manageable, I will lay down the steps you need to create the following tables:
Region
[RegionID] Integer not null primary key
[Region] Text not null
Country
[CountryID] integer not null Primary key
[Country] Text not null
[RegionID] integer not null foreign key to Region table
Customer
[CustomerID] integer not null Primary Key
[FirstName] Text not null
[LastName] Text not null
[Address] Text not null
[City] Text not null
[CountryID] integer not null foreign key to Country table
ProductCateogry
[ProductCategoryID] integer not null Primary Key
[ProductCategory] Text not null
[ProductCategoryDescription] Text not null
Product
[ProductID] integer not null Primary key
[ProductName] Text not null
[ProductUnitPrice] Real not null
[ProductCategoryID] integer not null foreign key to ProductCateogry table
OrderDetail
[OrderID] integer not null Primary Key
[CustomerID] integer not null foreign key to Customer table
[ProductID] integer not null foreign key to Product table
[OrderDate] integer not null
[QuantityOrdered] integer not null
Note: The create_table
function has been updated. Please study it and try to understand how it is different. You can now use it to drop a table before inserting into it. This is useful because you do not have to start from scratch. You can drop an individual table before recreating it if it exists.
Also checkout out this link to understand how executemany
works: https://www.kite.com/python/docs/sqlite3.Cursor.executemany and incorporate it into your insert_table functions. If you use execute
, your insertions will be very slow. If you have only one value to insert, the values tuple will look like
values = (('Graduate', ), ('Undergraduate',))
Always, always use the with
context to insert.
executemany
, I will take points offThe purpose of this step is to create a dictionary to look up the primary key lookup for a given region.
Tasks
Note: You will be using region_to_regionid_dict
in this step when inserting into the country table. You can access region_to_regionid_dict
by calling the function in step2 inside step3.
The purpose of this step is to create a dictionary to look up the primary key lookup for a given country.
Tasks
Note: You will be country_to_countryid_dict
in this step when inserting into the customer table. You can access country_to_countryid_dict
by calling the function in step4 inside step5.
Tasks
The purpose of this step is to create a dictionary to look up the primary key lookup for a given name (FirstName LastName)
Tasks
The purpose of this step is to create a dictionary to look up the primary key lookup for a given product category
Tasks
Note: You will be using productcategory_to_productcategoryid_dict
in this step when inserting into the product table.
The purpose of this step is to create a dictionary to look up the primary key lookup for a given product
Tasks
Note: You will be using product_to_productid_dict
and customer_to_customerid_dict
in this step when inserting into the order detail table.
import datetime.datetime
datetime.datetime.strptime(input_date, '%Y%m%d').strftime('%Y-%m-%d')