Introducing db.py

by Greg Lamp

Learn More

Data analysis libraries for Python keep getting better and better. pandas is now on 0.15, scikit-learn continues to pick up converts, and a number of visualization libraries have started to emerge: for example, our own baby, ggplot, and others like seaborn. But there's still one subject that's practically synonymous with data analysis where there aren't any new, killer libraries: databases.

There are some great Python database libraries (SQLAlchemy instantly comes to mind), but none are focused on using Python as a means of exploring and analyzing data.

Introducing db.py

So today we're introducing db.py, a library built for doing exactly that! db.py does handy stuff like:

  • Manages/remembers your db connection strings
  • Lets you easily search through your schema
  • Allows for tab-completion for table and column names
  • Integrates with IPython, pandas, and is built for interactive programming
  • Works with most relational databases

To get started, just run:

$ pip install db.py

Connecting to your Database

First things first: you need to connect to your database. This can be much tricker than it should be. Dealing with connection strings is never fun--how many times have you forgotten your password (or even your username) for a database. db.py makes it as straight forward as possible and will always revert to "reasonable defaults". For example, the default MySQL port is 3306. If you forget to specify a port, db.py will use 3306 as a default.

>>> from db import DB
>>> db = DB(username="kermit", password="rainbowconnection", hostname="db.themuppets.com",
            dbname="muppetdb", dbtype="mysql")
Refreshing schema. Please wait...done!
>>> db
DB[mysql][db.themuppets.com]:3306 > kermit@muppetdb

db.py will also save your credentials so you don't have to paste them into every script. Now you'll never have to remember any of that connection stuff again.

>>> db.save_credentials(profile="muppets")

Open a new sesion, load your profile, and voilà!

>>> from db import DB
>>> db = DB(profile="muppets")
Refreshing schema. Please wait...done!
>>> db
DB[mysql][db.themuppets.com]:3306 > kermit@muppetdb

And by database we mean dotfile.

Exploring Tables and Columns

Ever forget the name of a column? Or even forget the names of what table you're trying to query? Me too. db.py comes with a set of functions for navigating your database's schema.

NOTE: For this post we're going to use the Chinook Database. It ships with db.py and is commonly used as an example database.

>>> from db import DemoDB # this is the Chinook DB and is used in the db.py docs/examples
>>> db = DemoDB()
Indexing schema. This will take a second...finished!
>>> db.find_table("A*")
+--------+--------------------------+
| Table  | Columns                  |
+--------+--------------------------+
| Album  | AlbumId, Title, ArtistId |
| Artist | ArtistId, Name           |
+--------+--------------------------+

Ever forget how to join your company's prod_users, prod_trans_monthly, and prod_accounts tables together? (a better question might be: have you ever remembered?)

No worries, db.py has got you covered!

>>> db.tables.Artist
+----------------------------------------------------------+
|                          Artist                          |
+----------+---------------+--------------+----------------+
| Column   | Type          | Foreign Keys | Reference Keys |
+----------+---------------+--------------+----------------+
| ArtistId | INTEGER       |              | Album.ArtistId |
| Name     | NVARCHAR(120) |              |                |
+----------+---------------+--------------+----------------+
>>> db.find_column("*Id")
+---------------+---------------+---------+
| Table         |  Column Name  | Type    |
+---------------+---------------+---------+
| Album         |    AlbumId    | INTEGER |
| Album         |    ArtistId   | INTEGER |
| Artist        |    ArtistId   | INTEGER |
| Customer      |  SupportRepId | INTEGER |
| Customer      |   CustomerId  | INTEGER |
| Employee      |   EmployeeId  | INTEGER |
| Genre         |    GenreId    | INTEGER |
| Invoice       |   InvoiceId   | INTEGER |
| Invoice       |   CustomerId  | INTEGER |
| InvoiceLine   |   InvoiceId   | INTEGER |
| InvoiceLine   |    TrackId    | INTEGER |
| InvoiceLine   | InvoiceLineId | INTEGER |
| MediaType     |  MediaTypeId  | INTEGER |
| Playlist      |   PlaylistId  | INTEGER |
| PlaylistTrack |    TrackId    | INTEGER |
| PlaylistTrack |   PlaylistId  | INTEGER |
| Track         |  MediaTypeId  | INTEGER |
| Track         |    TrackId    | INTEGER |
| Track         |    AlbumId    | INTEGER |
| Track         |    GenreId    | INTEGER |
+---------------+---------------+---------+

Tab Completion

db.py also identifies and loads schema info about all of the tables in your database and makes them attributes of the DB(). It works with IPython's type-ahead functionality, so when you're mucking around in IPython Notebook or in Terminal, you'll be able to see all of the tables and columns in your database.

You can also take a look at all of your tables by accessing the tables variable.

>>> db.tables
+---------------+----------------------------------------------------------------------------------+
| Table         | Columns                                                                          |
+---------------+----------------------------------------------------------------------------------+
| Album         | AlbumId, Title, ArtistId                                                         |
| Artist        | ArtistId, Name                                                                   |
| Customer      | CustomerId, FirstName, LastName, Company, Address, City, State, Country, PostalC |
|               | ode, Phone, Fax, Email, SupportRepId                                             |
| Employee      | EmployeeId, LastName, FirstName, Title, ReportsTo, BirthDate, HireDate, Address, |
|               |  City, State, Country, PostalCode, Phone, Fax, Email                             |
| Genre         | GenreId, Name                                                                    |
| Invoice       | InvoiceId, CustomerId, InvoiceDate, BillingAddress, BillingCity, BillingState, B |
|               | illingCountry, BillingPostalCode, Total                                          |
| InvoiceLine   | InvoiceLineId, InvoiceId, TrackId, UnitPrice, Quantity                           |
| MediaType     | MediaTypeId, Name                                                                |
| Playlist      | PlaylistId, Name                                                                 |
| PlaylistTrack | PlaylistId, TrackId                                                              |
| Track         | TrackId, Name, AlbumId, MediaTypeId, GenreId, Composer, Milliseconds, Bytes, Uni |
|               | tPrice                                                                           |
+---------------+----------------------------------------------------------------------------------+

IPython and pandas

db.py is built with popular Scientific Python tools in mind--namely pandas and IPython. We've tried to make it as friendly as possible for those of you doing interactive computing: this means friendly repr's!

>>> db.tables.Artist

Column Type Foreign Keys Reference Keys
ArtistId INTEGER Album.ArtistId
Name NVARCHAR(120)

>>> db.tables.Track

Column Type Foreign Keys Reference Keys
TrackId INTEGER InvoiceLine.TrackId, PlaylistTrack.TrackId
Name NVARCHAR(200)
AlbumId INTEGER Album.AlbumId
MediaTypeId INTEGER MediaType.MediaTypeId
GenreId INTEGER Genre.GenreId
Composer NVARCHAR(220)
Milliseconds INTEGER
Bytes INTEGER
UnitPrice NUMERIC(10,2)

>>> df = db.tables.Artist.all()
>>> df.head()

ArtistId Name
0 1 AC/DC
1 2 Accept
2 3 Aerosmith
3 4 Alanis Morissette
4 5 Alice In Chains

Querying data

Executing queries in libraries like psycopg2 can be frustrating. Managing connections, cursors, and rollbacks isn't exactly intuitive. db.py operates on a much higher level, and makes executing queries simple.

>>> db.query("select * from Artist limit 10;")
   ArtistId                  Name
0         1                 AC/DC
1         2                Accept
2         3             Aerosmith
3         4     Alanis Morissette
4         5       Alice In Chains
5         6  Antônio Carlos Jobim
6         7          Apocalyptica
7         8            Audioslave
8         9              BackBeat
9        10          Billy Cobham
>>> with open("myscript.sql", "w") as f:
...     f.write("select * from Artist;")
>>> df = db.query_from_file("myscript.sql")

Managing Credentials

You know those really obscure database hostnames and passwords that your sys admin sets up? Well instead of copying and pasting them into all of your scripts, db.py lets you save credentials so you can use them later.

>>> db = DB(username="kermit", password="rainbowconnection", hostname="dw.themuppets.com", dbtype="mysql")
>>> db.save_credentials(profile="muppetdw")
# in a new session
>>> db = DB(profile="muppetdw") # voila!

Hope you like it

We're still working on support for MS SQL. If you have ideas for features or are interested in getting the latest updates, check out the db.py github page.

Special thanks to Eli Bressert from StichFix for giving us the idea for db.py!

::...
免责声明:
当前网页内容, 由 大妈 ZoomQuiet 使用工具: ScrapBook :: Firefox Extension 人工从互联网中收集并分享;
内容版权归原作者所有;
本人对内容的有效性/合法性不承担任何强制性责任.
若有不妥, 欢迎评注提醒:

或是邮件反馈可也:
askdama[AT]googlegroups.com


订阅 substack 体验古早写作:


点击注册~> 获得 100$ 体验券: DigitalOcean Referral Badge

关注公众号, 持续获得相关各种嗯哼:
zoomquiet


自怼圈/年度番新

DU22.4
关于 ~ DebugUself with DAMA ;-)
粤ICP备18025058号-1
公安备案号: 44049002000656 ...::