Title of Presentation
Presentation Type
Workshop
Conference Track
Technology Trends: Technology Skills
Description
When library information is messy and not easily indexed by discovery layers, it can be problematic to bring the data into a format for easy search and retrieval within our own information ecosystems. Thankfully, there are solutions available to help us get our data into a useful format that can be easily searched in our library catalogs with minimal effort.
With finding aids in html webpage for our example, we will use webscraping tools built into googlesheets to harvest the data. Using minimal coding skills, we will be able to create a CSV file and convert that file into MARC records in batch, ready to be used by any library system.
Learning Outcomes
- Learn about available tools for webscraping
- Apply some easy tricks to harvest html data
Start Date
29-3-2019 11:15 AM
End Date
29-3-2019 12:00 PM
Persistent Identifier
https://archives.pdx.edu/ds/psu/28044
Subjects
Data mining -- Applications to library services, Automatic data collection systems, Information storage and retrieval systems, Machine-readable bibliographic data -- Data processing
HTML to MARC: Webscraping Using Googlesheets
When library information is messy and not easily indexed by discovery layers, it can be problematic to bring the data into a format for easy search and retrieval within our own information ecosystems. Thankfully, there are solutions available to help us get our data into a useful format that can be easily searched in our library catalogs with minimal effort.
With finding aids in html webpage for our example, we will use webscraping tools built into googlesheets to harvest the data. Using minimal coding skills, we will be able to create a CSV file and convert that file into MARC records in batch, ready to be used by any library system.