First Advisor

Anthony Rhodes

Date of Award

Summer 2020

Document Type

Thesis

Degree Name

Bachelor of Science (B.S.) in Computer Science and University Honors

Department

Computer Science

Language

English

Subjects

Computational linguistics, Natural language processing (Computer science)

DOI

10.15760/honors.956

Abstract

CBOW and Skip Gram are two NLP techniques to produce word embedding models that are accurate and performant. They were invented in the seminal paper by T. Mikolov et al. and have since observed optimizations such as negative sampling and subsampling. This paper implements a fully-optimized version of these models using Py-Torch and runs them through a toy sentiment/subject analysis. It is weakly observed that different corpus types affect the skew of word embeddings such that fictional corpus are better suited for sentiment analysis and non-fictional for subject analysis.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier

https://archives.pdx.edu/ds/psu/33885

Recommended Citation

Menon, Tejas, "Empirical Analysis of CBOW and Skip Gram NLP Models" (2020). University Honors Theses. Paper 934.
https://doi.org/10.15760/honors.956

Download

Included in

Computational Linguistics Commons, Computer Sciences Commons

COinS

University Honors Theses

Empirical Analysis of CBOW and Skip Gram NLP Models

First Advisor

Date of Award

Document Type

Degree Name

Department

Language

Subjects

DOI

Abstract

Rights

Persistent Identifier

Recommended Citation

Included in

Find

Connect

University Honors Theses

Empirical Analysis of CBOW and Skip Gram NLP Models

Author

First Advisor

Date of Award

Document Type

Degree Name

Department

Language

Subjects

DOI

Abstract

Rights

Persistent Identifier

Recommended Citation

Included in

Share

Find

Connect