So, the point of this is to highlight the Met’s collection of baseball cards, and to spark a conversation about the overlap between sports and art, and history and statistics. A set of numbers may represent a person’s life and career in one way, and a photo of that person represents them in another way. Data kept in tables for sports fans online paints the history of America’s National Pastime in a certain light, and little pulp-paper cards from packs of cigarettes kept in the vaults of one of our greatest art museums casts a different light altogether. This project brings those two things together, and asks the viewer to consider the baseball player and the item of a baseball card from multiple perspectives.
Using Python and the Beautiful Soup package for Python, a dictionary of the 310 HOF members and the links to their career player (if applicable) statistics was scraped from the baseball-reference website, a freely accessible source for baseball statistics and information dating back to the creation of the game.
The item names and some basic metadata (date, description) from the Metropolitan Museum of Art’s collection of baseball cards (6,680 items with Object Type: baseball cards and an associated image in the online collection) were also scraped into a python dictionary. These two dictionaries were then compared in order to locate the items held by the Met that are associated with HOF baseball figures.
Most of these people were baseball players and were elected to the HOF as such, however, there are a few cases where someone was elected to the HOF as a manager or executive, but the Met’s collection contains a baseball card from their career as a player. These exceptions are included in the final list of matched names, and their stats as a player are reflected in the final dataset. It would be possible to collect data only for those inducted as HOF players, but in the interest of completeness, this script collects the cards and stats for all potential people in the HOF.
Once the final list of matching items from the Met and BBR’s HOF list is created, the data is merged into a single JSON directory, so that for each member of the HOF, their career statistics and baseball card image as held in the Met’s collection are associated with that person.
Statistics data from BBR was scraped using a python function I wrote for pitchers and hitters, inspired by the functionality of the br-scraper module for BBR written by Andrew Lim. Since the focus of this project is on the careers of HOF players, the table lines reflecting career totals for standard player metrics as well as League and BBWAA awards (Gold Glove/Silver Slugger, Cy Young, ROY and MVP).
By putting this data into a JSON dictionary, it remains malleable and accessible going forward, and by scraping the list of HOF players and Met items directly from the websites rather than a stable source, this data will update itself as more people are inducted into the HOF and if the Met’s collection of baseball cards should expand.
There are a few areas where these scripts fall short. Firstly, there are instances where the Met holds a sheet of more than one player card as a single item in the collection. This is because of the way that the original owner collected and stored the cards. In some cases, even though the name of the players is visible in the card image, there is no associated metadata that identifies those players, and so potential players for this collection may be missed.
Secondly, due to some inconsistencies in how tables are named at baseball-reference, some of the players do not have associated career stats. This can be fixed by adding some code to the statscraper function that searching BBR pages for the relevant tables.