Primary Menu

Education, Events, Publication

Funding & Recognition

Decoding DNA: Exploring the Impact of Tokenization on Genomic Language Models

Semester: Spring 2024


Presentation description

The language-like structure of DNA suggests it may be possible to use LLMs to extract meaningful insights from genomic data. Currently there is no standard tokenization method or set of fine tuning tasks for genomic language models. Our strategy has been to fine tune multiple foundational models on all of their existing tasks. Additionally, we performed a preliminary investigation on whether an LLM can accurately identify the locations of prophage sequences integrated in the bacterial genome.

Presenter Name: Anisa Habib

Presentation Type: Poster
Presentation Format: In Person
Presentation #C11
College: Engineering
School / Department: School of Computing
Research Mentor: Hari Sundar
Date | Time: Tuesday, Apr 9th | 1:00 PM