Hackathon

Submission Link: kaushikr@email.sc.edu

Abstract

Purpose

Targeted Audience

Description of Projects

There are multiple projects to choose from in this hackathon, the details of which will be elaborated in the following subsections:

a.) Sports Playoff: Rank teams given incomplete data

Background: In many sports, a champion is determined by seeding teams based on regular season records into a final, single elimination tournament. In professional baseball, each team plays a total of 162 games, which provides a significant result sample given that each team in a league plays every other team multiple times. However, in many sports, the number of contests between teams is severely limited. For example, in the Football Bowl Subdivision (FBS) of college football, there are 130 teams with each team playing 12 games. From this data, a commission is charged with the task of determining and seeding the top 4 teams in the College Football Playoffs.

Task: Develop a computer code that ranks the teams at the end of the season based on the regular season results. Data available includes historical win-loss record for regular season games and postseason games of each team may be used to develop or train the algorithm. For any given season, the decision should be made based only the regular season results prior to postseason games.

Reference: www.ncaa.com

b.) Reinforcement Learning with Connect Four

Background: Connect four is a simple game similar to Tic-Tac-Toe [1]. The game has two players and takes place on a 7×6 grid. During each of their turns, each player places a disk in one of the 7 columns [3]. The disk then falls to the lowest possible position in the column [3]. The first player to line up disks horizontally, vertically, or along a diagonal, wins [3].

Task: To implement reinforcement learning algorithms to solve the game of Connect Four. This project will focus on the Q learning algorithm [2].

c.) Automated Detection of Phenophase from Herbarium Images

Background: Images of specimens from natural history collections are powerful data sources for understanding ecological change over time. However, annotating these images at scale is very time-intensive, limiting the use of data from current large-scale digitization efforts.

Task: This project will use deep learning to automatically detect phenological stages from images of plant specimens. Characterizing changes in phenology, or the timing of life history events such as flowering, is critical to understanding how organisms are reposing to changing climates. In Phase I, we will initially use annotated images of plant specimens previously described by Lorieul et al. [4], but try to improve their models by instead using a 6 stage classification system [5].

d.) Maize Genomics Project

Background: In this project, we will be predicting performance of maize from single nucleotide polymorphism (SNP) markers from the “Genomes to Fields” Initiative https://www.genomes2fields.org. Since 2014, more than 2500 corn hybrid varieties have been tested across 162 different environments, including locations in Arkansas. [6,7]

Task:

  1. Genomic prediction: Use decision-tree based approaches to predict maize phenotype in a particular environment with a high level of accuracy, incorporating information about environment, maize genotype, and controlling for population structure.
  2. Genome-wide association study (GWAS): Identify particular SNPs associated with variation in phenotype (the observable characteristic of an organism).

e.) Agricultural Imaging

Background: Of the three major crops — rice, wheat and maize — rice is by far the most important food crop for people in low-and lower-middle-income countries. Although rich and poor people alike eat rice in low-income countries, the poorest consume relatively little wheat and are therefore deep affected by the cost and availability of rice [8].

Task: Given a dataset of 120 JPEG images of disease infected rice leaves [9], we will explore models for predicting the type of disease present in each. The diseases we will consider are the following:

  • Leaf smut
  • Brown spot
  • Bacterial leaf blight

f.) Natural Language Processing

Background: Natural Language Processing (NLP) is the branch of Artificial Intelligence that attempts to decipher, understand, and make sense of human language.

Task: In this project, we will investigate a corpus of data collected from social media websites. The project will begin by understanding the data and progress into a development pipeline for generating sentiment analysis, text classification, and question answering.

g.) Self-Driving Car Project: Scene Understanding

Background: Scene understanding is an important component for self-driving cars and requires the ability to perform natural image segmentation and utilize sequence information available from the sequential frames of video. Image segmentation is a core problem that has driven many of this decade’s AI breakthroughs.

Task: In this project, we will focus on building and training semantic segmentation neural networks to identify key objects in driving scenes such as roads, pedestrians, and other cars.

h.) Image Captioning: Image to Text

Background: Image captioning is to produce a descriptive sentence given an image. It has a wide range of applications, such as automating the creation of metadata for images, assisting people with impaired vision, and the general purpose of robotics.

Task: In this project, we will focus on deep learning models to integrate image and text analysis. We will build neural networks to obtain high-level understanding of images, generate text from context, and identify the network and workflow for image captioning that gives both visual and language-based data.

i.) Fire Detection: Embedded Machine Learning

Background: With the ever-present threat of wildfires, the need for efficient and scalable wildfire detection is increasing. Image based fire detection has recently received more and more attention due to great success of deep learning in image classification and object detection. However, it is challenging to deploy deep learning models on embedded systems with limited computing resources and energy supply.

Task: In this project, we will examine efficient learning models for fire detection and evaluate their real-time performances on embedded systems.

j.) Parameter Estimation: Non-linear materials

Background: It is common in engineering systems to arrive at a differential equation such as ∂v(t)∕∂t + kv(t) = 0 where v(t) represents some scalar quantity of interest and k is a parameter typically determined by the materials in the system. For example, a simple resistor-capacitor (RC) circuit yields such an equation where v(t) is voltage and k is the inverse of the time constant. The solution to this differential equation is a decaying

exponential v(t) = Ae-ktwhere A is constant that can be determined from the initial conditions. However, many materials are nonlinear.

Task: Determine a nonlinear model for the decay function k such that we maintain the idea of a material parameter that is nonlinear with the applied variable v(t).

k.) Business Analytics: Demand Forecasting

Background: Given historical sales information for an online store, we want to be able to make predictions about the amount of sales that the shops will generate in the future.

Task: You are provided with daily historical sales data. The task is to forecast the total amount of products sold in every shop for the test set. Note that the list of shops and products slightly changes every month. Creating a robust model that can handle such situations is part of the challenge [10,11].

Submission Guidelines

References

  1. Masters,J.(n.d.) .History Of Stacking four in a Row GAMES(CONNECT 4™, Captain’s MISTRESS, QUBIC). Retrieved February 24, 2021, from https://www.tradgames.org.uk/games/Four-in-a-row.htm
  2. Shyalika, C. (2019, November16). A beginner’s guide to q-learning. Retrieved February 24, 2021, from https://towardsdatascience.com/a-beginners-guide-to-q-learning-c3e2a30a653c
  3. Solving Connect 4: How to Build a Perfect AI. (2016, May 01). Retrieved February 25, 2021, from http://blog.gamesolver.org/solving-connect-four/01-introduction/
  4. Lorieul, Titouan, Katelin D. Pearson, Elizabeth R. Ellwood, et al. 2019. “Toward a Large-Scale and Deep Phenological Stage Annotation of Herbarium Specimens: Case Studies from Temperate, Tropical, and Equatorial Floras.” Applications in Plant Sciences 7 (3): e01233. https://doi.org/10.1002/aps3.1233.
  5. Yost, Jennifer M, Patrick W Sweeney, et al. 2018. “Digitization protocol for scoring reproductive phenology from herbarium specimens of seed plants.” Applications in Plant Sciences 6 (2). John Wiley; Sons Inc.: e1022–e1022. https://doi.org/10.1002/aps3.1022.
  6. Battey, C J, Gabrielle C Coffing, and Andrew D Kern. 2021. “Visualizing population structure with variational autoencoders.” G3 Genes|Genomes|Genetics 11 (1). https://doi.org/10.1093/g3journal/jkaa036.
  7. McFarland, Bridget A., Naser AlKhalifah, Martin Bohn, et al. 2020. “Maize Genomes to Fields (G2f): 2014-2017 Field Seasons: Genotype, Phenotype, Climatic, Soil, and Inbred Ear Image Datasets.” BMC Research Notes 13 (1): 71. https://doi.org/10.1186/s13104-020-4922-8.
  8. http://ricepedia.org/challenges
  9. https://www.kaggle.com/vbookshelf/rice-leaf-diseases
  10. https://www.kaggle.com/c/competitive-data-science-predict-future-sales/data
  11. https://www.coursera.org/learn/competitive-data-science
back to top