Import data from external sources

PostgreSQL includes a variety of ways to import data. Here, we'll show how to import a CSV file from the internet.

For this demonstration, we're going to import batter data from the Baseball Databank, which is in CSV form. While it's easy to import the data using PostgreSQL's COPY command, we'll need to first define a table to put that data into.

We're going to add a database called "baseball," which we'll populate with some Major League Baseball statistics.

create database baseball;

Now you can switch to your new (and empty) baseball database.

\c baseball

You can copy and paste this command into your terminal.

CREATE TABLE batters (
                      id SERIAL,
                      playerid VARCHAR(9),
                      yearid INTEGER,
                      stint INTEGER,
                      teamid VARCHAR(3),
                      lgid VARCHAR(2),
                      g INTEGER,
                      ab INTEGER,
                      r INTEGER,
                      h INTEGER,
                      "2b" INTEGER,
                      "3b" INTEGER,
                      hr INTEGER,
                      rbi INTEGER,
                      sb INTEGER,
                      cs INTEGER,
                      bb INTEGER,
                      so INTEGER,
                      ibb INTEGER,
                      hbp INTEGER,
                      sh INTEGER,
                      sf INTEGER,
                      gidp INTEGER,
                      PRIMARY KEY (id)
);

Now we can populate the table from the internet using the most recent data.

\COPY batters(playerid,yearid,stint,teamid,lgid,g,ab,r,h,"2b","3b",hr,rbi,sb,cs,bb,so,ibb,hbp,sh,sf,gidp) FROM PROGRAM 'curl "https://raw.githubusercontent.com/EnterpriseDB/docs/main/product_docs/docs/biganimal/release/free_trial/detail/experiment/data/Batting.csv"' DELIMITER ',' CSV HEADER

Just to prove there's data loaded, let's look at the home run leaders for the 1998 season.

SELECT playerid, yearid, teamid,
       rank() OVER (PARTITION BY yearid ORDER BY hr desc) hr_rank,
       hr
FROM batters
WHERE yearid = 1998
ORDER BY hr_rank LIMIT 5;