Presentation Type

Workshop

Conference Track

Technology Trends: Technology Skills

Description

When library information is messy and not easily indexed by discovery layers, it can be problematic to bring the data into a format for easy search and retrieval within our own information ecosystems. Thankfully, there are solutions available to help us get our data into a useful format that can be easily searched in our library catalogs with minimal effort.

With finding aids in html webpage for our example, we will use webscraping tools built into googlesheets to harvest the data. Using minimal coding skills, we will be able to create a CSV file and convert that file into MARC records in batch, ready to be used by any library system.

Learning Outcomes

  • Learn about available tools for webscraping
  • Apply some easy tricks to harvest html data

Rights

© Copyright the author(s)

IN COPYRIGHT:
http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

DISCLAIMER:
The purpose of this statement is to help the public understand how this Item may be used. When there is a (non-standard) License or contract that governs re-use of the associated Item, this statement only summarizes the effects of some of its terms. It is not a License, and should not be used to license your Work. To license your own Work, use a License offered at https://creativecommons.org/

Start Date

3-29-2019 11:15 AM

End Date

3-29-2019 12:00 PM

Persistent Identifier

https://archives.pdx.edu/ds/psu/28044

Subjects

Data mining -- Applications to library services, Automatic data collection systems, Information storage and retrieval systems, Machine-readable bibliographic data -- Data processing

Share

COinS
 
Mar 29th, 11:15 AM Mar 29th, 12:00 PM

HTML to MARC: Webscraping Using Googlesheets

When library information is messy and not easily indexed by discovery layers, it can be problematic to bring the data into a format for easy search and retrieval within our own information ecosystems. Thankfully, there are solutions available to help us get our data into a useful format that can be easily searched in our library catalogs with minimal effort.

With finding aids in html webpage for our example, we will use webscraping tools built into googlesheets to harvest the data. Using minimal coding skills, we will be able to create a CSV file and convert that file into MARC records in batch, ready to be used by any library system.