Presentation Type

Workshop

Conference Track

Technology Trends: Technology Skills

Description

When library information is messy and not easily indexed by discovery layers, it can be problematic to bring the data into a format for easy search and retrieval within our own information ecosystems. Thankfully, there are solutions available to help us get our data into a useful format that can be easily searched in our library catalogs with minimal effort.

With finding aids in html webpage for our example, we will use webscraping tools built into googlesheets to harvest the data. Using minimal coding skills, we will be able to create a CSV file and convert that file into MARC records in batch, ready to be used by any library system.

Learning Outcomes

  • Learn about available tools for webscraping
  • Apply some easy tricks to harvest html data

Start Date

29-3-2019 11:15 AM

End Date

29-3-2019 12:00 PM

Persistent Identifier

https://archives.pdx.edu/ds/psu/28044

Subjects

Data mining -- Applications to library services, Automatic data collection systems, Information storage and retrieval systems, Machine-readable bibliographic data -- Data processing

Share

COinS
 
Mar 29th, 11:15 AM Mar 29th, 12:00 PM

HTML to MARC: Webscraping Using Googlesheets

When library information is messy and not easily indexed by discovery layers, it can be problematic to bring the data into a format for easy search and retrieval within our own information ecosystems. Thankfully, there are solutions available to help us get our data into a useful format that can be easily searched in our library catalogs with minimal effort.

With finding aids in html webpage for our example, we will use webscraping tools built into googlesheets to harvest the data. Using minimal coding skills, we will be able to create a CSV file and convert that file into MARC records in batch, ready to be used by any library system.