Title

Lab - Working with Python and SQLite

Objectives

  • Part 1: Create the Database
  • Part 2: Enter Data and Query the Database
  • Part 3: Automate the Process with Python </b>

Scenario/Background

In this lab you will learn how to connect to a relational database, place queries, and retrieve the data through a Jupyter notebook. You will also learn how to transfer the data from a Microsoft Excel spreadsheet to a SQLite database.

The data you are provided contains various measurements for ping, upload, and download speed made in different Local Authorities around England. At the moment, the data is stored in a .csv file, but you will create a database and populate it with the measurement.

Required Resources

  • 1 PC with Internet access
  • Raspberry Pi version 2 or higher
  • Python libraries: sqlite3 and csvkit

SQL refresh

Part 1: Create the Database

We are going to use SQLite to connect to the database. SQLite is a library that implements a self-contained transactional SQL database engine that eliminates the need for an SQL server. A simple SQLite utility called sqlite3 can be installed in the Raspberry Pi. SQLite greatly simplifies the SQL database process. We are also going to use csvkit, a suite of tools designed to convert various databases to the .csv (comma separated values) format.

Step 1: Set up the working environment.

We will need sqlite3 and csvkit to complete this lab. The commands below can be entered and executed directly on a Raspberry Pi terminal. You can get access to the terminal by clicking New > Terminal in the Home Tab in your web browser. The Home Tab is the tab that contains the list of labs you used to launch this lab.

Note: It is recommended to place the terminal tab side-by-side with this tab. This allows you to easily switch from the lab to the terminal window.

a) Install SQLite.

Below, the first command updates and synchronizes the Pi's package list with the repository server. This is to ensure the latest versions of packages are used. After the package list is up to date, the second command installs sqlite3, an SQLite package for the Pi that contains the sqlite3 utility.

Note: Since the Raspberry Pi has to contact the package respository servers to download and install the packages, these commands may take a while to finish executing.

Run the cell by either clicking the play icon to the left of the code cell or by selecting the cell pressing <SHIFT+ENTER>. Running a code cell executes the contained commands in your Raspberry Pi.

Note: In a Jupyter notebook, lines starting with an exclamation mark (!) are commands that will be executed by the Raspberry Pi's terminal.

# apt-get update
# apt-get -y install sqlite3

After the installation process finishes, install csvkit by running the code cell below:
Note: The installation may take a while.

# pip install csvkit

Now your Raspberry Pi has sqlite3 and csvkit installed and ready to be used.

Step 2: Work with SQLite.

Now that sqlite3 and csvkit are installed, we can start using them to create a database and its tables. A table is a set of information presented in a matrix of rows and columns. A database is a collection of tables. Consider the table below:

</table>

The table above contains information about eight individuals, presented in a row and column fashion.

Note: To ensure consistency in the file locations, consider working from the /home/pi/notebooks/myfiles directory. To change to that directory, use the cd command, shown below:

# cd /home/pi/notebooks/myfiles

Before we can start working with a database, use sqlite3 to create a new SQLite database named phonebook.db. To create the new database, execute the command below in the terminal window:

# sqlite3 phonebook.db
SQLite version 3.8.7.1 2016-02-06 13:59:56 Enter ".help" for usage hints. sqlite>

The sqlite3 utility displays some version information and presents its propmpt, sqlite> . This prompt indicates that the database phonebook.db was successfully created and sqlite3 is ready to execute SQL commands against it.

Create a table in the database. From the sqlite3 prompt, issue the command below to create a table named coworkers. The coworkers table has 3 columns (or fields): workid, name and title. Notice that the command also specifies the type and amount of data each field can accomodate. The workid field supports integer values. The name field supports 20 varchar characters. The title field accepts 20 varchar characters. The number field is also an integer.

sqlite> create table coworkers(workid integer,name varchar(20),title varchar(20),number integer);

The same database can contain multiple tables. From the sqlite3 prompt, use the first command below to create a second table in the phonebook.db database. The command creates a table named departments. Created similarly to the coworkers table, the departments table contains three fields: deptid, name and number.

The second command leaves the sqlite3 prompt and drops back to the Terminal.

sqlite> create table department(deptid integer,name varchar(20),number integer);
sqlite> .quit

Notice the dot "." before the quit command in the cell above. Usually, sqlite3 passes lines entered by the user on to the SQLite library for execution. However, input lines that begin with a dot are intercepted and interpreted by the sqlite3 utility itself. Also known as dot commands, these commands are often used to change the output format of queries, or to execute certain prepackaged query statements.

SQLite databases are useful because the database and all its tables are contained in one single .db file and stored on disk. From the Pi's terminal, use the ls command to list the file containing the phonebook.db database and its tables. Use the -l switch to show the long listing format to display permissions, ownership, size, and date and time of the file.

# ls -l phonebook.db
-rw-r--r-- 1 root root 3072 Feb 7 01:29 phonebook.db

Part 2: Enter Data and Query the Database

Step 1: Enter data into the database.

Enter the sqlite3 prompt again, while selecting the phonebook.db database.

# sqlite3 phonebook.db

SQLite version 3.8.7.1 2016-02-06 13:59:56 Enter ".help" for usage hints. sqlite>

Use the command below to add one entry to the coworkers table:

sqlite> insert into coworkers values(101,'Han Solo','Bounty Hunter',5556667578);

The command above inserts a row of four values in the coworkers table as follows:

  • 101 in the workid column
  • Han Solo in the name column
  • Bounty Hunter in the title column
  • 5556667578 in the number column
  • While the command above is fairly simple, it doesn't scale well. Another option is to import a .csv file directly into the database. The table structure is created based on the structure of the .csv file.

    The steps below create the .csv file to be imported into the database. In a production environemnt, .csv files will likely already exist.

    a) Select and copy the text shown below to the clipboard. Once the text is selected, right-click the selected text and choose Copy to copy the text to the clipboard.

    workid,name,title,number
    101,Han Solo,Bounty Hunter,5556667578
    102,Leia Skywalker,Princess,5556542398
    103,Luke Skywalker,Jedi,5558963267
    104,Obi-Wan Kenobi,Jedi,5558963276
    105,Anakin Skywalker,Sith Lord,5553477621
    106,Jabba The Hutt,Gangster,5556613456
    107,Greedo,Debt Collector,5552360918
    108,R2D2,Astro Droid,5555210125
    109,C3PO,Protocol Droid,5556633345

    b) From the Pi's terminal, start nano. Nano is a command line text editor and is very easy to use. Use the command below to start nano and create a new text file called coworkers.csv

    # nano coworkers.csv

    c) Paste the copied text into the nano window. Make sure to remove any empty lines from the file.

    d) When the text has been pasted to the coworkers.csv file and the empty lines have been cleared out, press CONTROL+X to exit nano.

    e) Press 'Y' (Yes) when Nano asks if you want to save the file.

    f) To check the contents of your newly created CSV file, use the cat command, as shown below:

    # cat coworkers.csv

    Alternatively, you can create the same CSV file automatically through this Jupyter notebook. To create the CSV file automatically using the Jupyter notebook, run the code cell below:

    In [ ]:
    # Code Cell 1
    !test -e coworkers.csv && rm coworkers.csv
    !echo "workid,name,title,number" >> /home/pi/notebooks/myfiles/coworkers.csv
    !echo "101,Han Solo,Bounty Hunter,5556667578" >> /home/pi/notebooks/myfiles/coworkers.csv
    !echo "102,Leia Skywalker,Princess,5556542398" >> /home/pi/notebooks/myfiles/coworkers.csv
    !echo "103,Luke Skywalker,Jedi,5558963267" >> /home/pi/notebooks/myfiles/coworkers.csv
    !echo "104,Obi-Wan Kenobi,Jedi,5558963276" >> /home/pi/notebooks/myfiles/coworkers.csv
    !echo "105,Anakin Skywalker,Sith Lord,5553477621" >> /home/pi/notebooks/myfiles/coworkers.csv
    !echo "106,Jabba The Hutt,Gangster,5556613456" >> /home/pi/notebooks/myfiles/coworkers.csv
    !echo "107,Greedo,Debt Collector,5552360918" >> /home/pi/notebooks/myfiles/coworkers.csv
    !echo "108,R2D2,Astro Droid,5555210125" >> /home/pi/notebooks/myfiles/coworkers.csv
    !echo "109,C3PO,Protocol Droid,5556633345" >> /home/pi/notebooks/myfiles/coworkers.csv
    

    Once again, use the cat command to verify the file creation and its contents.

    # cat coworkers.csv

    Step 2: Import the data into the database.

    First, delete the coworkers table from the phonebook database. Because the table structure will be defined by the structure of the .csv file and we only have one entry in the table, it's easier to delete the table altogether.

    From the terminal, enter the sqlite3 prompt while selecting the phonebook.db database:
    # sqlite3 phonebook.db SQLite version 3.8.7.1 2016-02-06 13:59:56 Enter ".help" for usage hints. sqlite>

    Use the .tables command to list all tables currently in the database:
    sqlite>.tables
    coworkers department

    Two tables, coworkers and department are currently present in the database.

    Use the drop SQL command to delete the coworkers table:
    sqlite> drop table coworkers;

    Issue the .tables command again to list all tables currently present in the database:
    sqlite>.tables
    department

    Notice that the coworkers table has been removed from the database.
    Leave the sqlite3 prompt and return to the terminal:
    sqlite>.quit

    From the terminal, use the csvsql utility to create a new table in the phonebook database and retrieve the data contained in the coworkers.csv file. csvsql is part of the csvkit package, installed earlier in this lab.

    # csvsql --db sqlite:////home/pi/notebooks/myfiles/phonebook.db --insert /home/pi/notebooks/myfiles/coworkers.csv

    Enter the sqlite3 prompt while selecting the phonebook database and issue the .tables command to list all tables in the database:
    # sqlite3 phonebook.db SQLite version 3.8.7.1 2016-02-06 13:59:56 Enter ".help" for usage hints. sqlite>
    sqlite>.tables
    coworkers department

    The coworkers table is once again in the database.

    Step 3: Query the database.

    From the terminal, enter the sqlite3 prompt while selecting the phonebook.db database:
    # sqlite3 phonebook.db SQLite version 3.8.7.1 2016-02-06 13:59:56 Enter ".help" for usage hints. sqlite>

    List all the contents of the coworkers table:

    sqlite> select * from coworkers; 101|Han Solo|Bounty Hunter|5556667578 102|Leia Skywalker|Princess|5556542398 103|Luke Skywalker|Jedi|5558963267 104|Obi-Wan Kenobi|Jedi|5558963276 105|Anakin Skywalker|Sith Lord|5553477621 106|Jabba The Hutt|Gangster|5556613456 107|Greedo|Debt Collector|5552360918 108|R2D2|Astro Droid|5555210125 109|C3PO|Protocol Droid|5556633345 sqlite>

    In the select SQL command show above, the star character (*) selects all fields in the coworkers table. To display names of the entries with the Jedi job title, use the following command:

    sqlite> select name from coworkers where title='Jedi'; Luke Skywalker Obi-Wan Kenobi sqlite>

    Can you build a query to display the names of the bounty hunters in the coworkers table?

    Part 3: Automate the Process with Python

    Use Python to connect to the database and run some queries to see what this data looks like.

    Step 1: Import the modules.

    For this step, we will need the following modules:

    sqlite3
    pandas
    pyplot

    Below are the lines of code to import the required Python modules:

    # this is the library to talk to the database
    import sqlite3
    import pandas as pd
    # this is a library for data visualization
    from matplotlib import pyplot as plt
    # this is to say that we want our plots to be part of this page, and not opened in a new window
    %matplotlib inline

    Step 2: Connect to the database.

    Before queries can be placed to the database through Python, a database connection must be established. The Python sqlite3 object has a connect() method that facilitates the connection to the database.

    Note: While the Python object and the command-line utility used so far in this lab share the same name, sqlite3, their scope of usage is different. The latter must be called from Python while the former requires a Linux terminal to execute.

    To use the sqlite3 Python object to open a connection, call the method connect of the object sqlite3. This method takes a string containing the name of the database for which the connection is needed.

    In this case, the command will be:

    sqlite3.connect('/home/pi/notebooks/myfiles/phonebook.db')

    This function returns the connection, which will be stored in a variable called conn. Storing the details of the connection in a variable makes it easy to refer back to that same connection when needed. The line of code containing the variable assigment and the connection establishement is as follows:

    conn = sqlite3.connect('/home/pi/notebooks/myfiles/phonebook.db')

    Now that the connection to the database is established and its details are stored and accessible via connvariable, we need a way to navigate through the database. In this case, we will use another method of the sqlite3 object called cursor(). Acticng as a moveable pointer inside the database, a cursor is a control structure that will enable us to navigate through tables and records. The cursor is important because it specifies what cell (table, column, and row) is to be read from or written to. To create the cursor, call the method cursor() in an established database connection. Since we stored the details of the connection to the phonebook database in the conn variable, use that variable to create a cursor, as follows:

    conn.cursor()

    Use another variable to store the details of the newly created cursor object. The line of code below stores the details of the cursor in the cur variable:

    cur = conn.cursor()

    Below is the full Python program so far:

    In [ ]:
    # Code Cell 2
    # this is the library to talk to the database
    import sqlite3
    import pandas as pd
    # this is a library for data visualisation
    from matplotlib import pyplot as plt
    # this is to say that we want our plots to be part of this page, and not opened in a new window
    %matplotlib inline
    
    # establishing the connection to the database and storing the details in conn...
    conn = sqlite3.connect('/home/pi/notebooks/myfiles/phonebook.db')
    
    # creating a cursor and storing its details in cur...
    cur = conn.cursor()
    
    Step 3: Execute a query.

    To execute queries against the database, use the execute() method. Written as a method of the cursor object, this method takes, as input, a string containing the query we are interested in. To make the program a little cleaner, we will store the query in a variable named query. The execute method call is as follows:

    query = "SELECT name FROM coworkers;"
    cur.execute(query)

    Run the cell below to place a query and store the results in cur.

    In [ ]:
    # Code Cell 3
    query = "SELECT * FROM coworkers;"
    cur.execute(query)
    

    The results of the query are stored in the cur object. To visualize the results, we must iterate through the cur object, printing out each row.

    Run the cell below to iterate through cur and print its contents:

    In [ ]:
    # Code Cell 4
    for row in cur:
        print(row)
    
    Challenge

    1) Provide the code to query the names of all the princesses.

    2) Provide the code to query the names of all the princesses and the debt collectors.

    3) Provide the code to query the names and numbers of all the Jedi?

    4) Provide the code to query the names of the droids?

    © 2017 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public.

    IDNameGenderRaceForce Sensitive
    001HanMaleHumanNo
    002LeiaFemaleHumanYes
    003LukeMaleHumanYes
    004Obi-WanMaleHumanYes
    005AnakinMaleHumanYes
    006JabbaMaleHuttNo
    007GreedoMaleRodianNo
    008R2D2N/ADroidNo
    009C3PON/ADroidNo