Machine-Learning Based Background Rejection Software for the HAWC Gamma-Ray Observatory

Background

The High Altitude Water Cherenkov (HAWC) observatory is designed to observe high-energy gamma and cosmic rays. When high-energy gamma and cosmic rays enter the atmosphere, they interact with air molecules and generate extensive air showers (EAS). HAWC measures the arrival direction, energy, and the lateral distribution of the particles in the shower. The energy and the arrival direction of the primary gamma/cosmic rays are derived using this information. The primary interest of our group is to use HAWC to observe gamma-rays. Therefore, the events generated due to cosmic rays, our background, need to be removed from the data. However, the ratio of triggered gamma-ray events to cosmic-ray events is 1 to 1000. Therefore, a good background rejection algorithm is necessary to detect gamma-rays. HAWC currently uses a simple linear selection filter, which is based on the particle clustering along the lateral plane. Our studies of simulated data show that other information, such as the charge distribution, and arrival time of particles can improve the background rejection algorithm. However, the selection filter becomes nonlinear when we add more parameters, and becomes impractical to define an analytical formula as a selection filter. Therefore, our group applied a Boosted Decision Tree (BDT) algorithm implemented using the TMVA package to derive a machine learning based cut. Preliminary studies show that the machine learning algorithm can improve the background rejection by about 20%. We propose that an undergraduate student apply the python based scikit-learn machine learning tool, an industry standard, to improve the background rejection algorithm.

Student Role

We will provide a computer with the software package Scikit-learn, HAWC simulated data, HAWC real data, and few python scripts that can be used to analyze HAWC data and reformat few observables into Scikit-learn readable format. We expect the student to modify these scripts to reformat all the HAWC observables into Scikit-learn readable format. The student would also change the scripts to generate new parameters by non-linearly combining HAWC observables, and use the new parameters to train the algorithm. Our experience with TMVA shows that this improves the efficiency of the machine learning algorithm.The undergraduate student would try several machine learning algorithms available in the Scikit-learn tool, and compare their performance. The student would test how combining observables can be used to improve the efficiency of the algorithm. The algorithm with the best performance will be used on HAWC real data. The current HAWC all-sky map has more than 32 spots with gamma-rays just below the detection significance threshold. Improved background rejection would increase the significance of the spot that corresponds to real sources while reducing the significance of background fluctuations. Therefore, this work may lead to the discovery of new gamma-ray sources.Basic knowledge of Python would be helpful for the student to start the work. However, we do not expect any programming skills. In the past, we have worked with summer students that had no programming experience. They could analyze HAWC data after one week of a crash course.

Outcomes

Our primary goal is to expose the student to an interdisciplinary research across physics and computer science. In the field of astronomy, data samples are rapidly growing, and machine learning techniques are being employed in data analysis. Therefore, learning programming skills, and machine learning techniques will be an advantage for the student's future. After the 10 weeks, the student will have reasonable experience of how to apply an industrial standard software tools to cutting-edge astrophysics research, Python programming, and machine learning. The student will also learn how to analyze HAWC data, and how to perform measurements from the data of such quantities as flux.The student would also have the opportunity to improve writing skills, presentation skills, and work in a collaboration. 

Anushka Udara Abeysekara
Research Assistant Professor

Physics & Astronomy
College of Science

Our group consists of three faculty members; Dr. David Kieda Dean of the graduate school, Dr. Wayne Springer Graduate Director, Dr. A. Udara Abeysekara, and four graduate students. We will provide a desk to the student in one of our graduate students' offices, which is right next to faculty members offices. This will allow the student to work in close collaboration with other group members. The student will work closely, having daily conversations, and under the direct supervision of Udara. Based on the student's current programming skills, Udara will provide a crash course on HAWC observatory, Python programming, and basics of machine learning. The student will present his progress in the weekly group meeting, and discuss future steps with other members. The faculty members will work with the student to improve writing skills and have the student write a summary report at the end of the 10-week internship. The student will also be asked to do an oral presentation to the HAWC collaboration, which will enhance student's presentation skills. The student will also present his/her work at the University of Utah Undergraduate Symposium, and at a national level conference.