Coding for Astronomy
By Dillon Dong
I’m Dillon, an observational astronomer finishing up my 3rd year as a PhD student at Caltech. I use telescopes like the Very Large Array in New Mexico, and the Keck telescopes in Hawaii to study the birth and death of massive stars.
My Background
I’m a second generation Chinese American, born and raised in San Francisco. My parents immigrated to the US in the late 80s, and worked a bunch of odd jobs (delivering newspapers, selling jewelry wholesale, selling farm equipment for a German manufacturing conglomerate, and as a personal assistant in a non profit, just to name a few) so that they could save up and provide me with a comfortable childhood. They also worked hard to give me a good education really early on. Through many bribes of candy, I could read most children’s books and could multiply up to 12x12 by the time I was 3 or 4.
This early and deeply ingrained foundation in reading and math made learning new things that much easier for me. Learning became fun, and as a result, I started developing this curiosity-driven approach to life. I’d spend a lot of time hanging out in the library next to my elementary school, just reading about anything that looked interesting. I didn’t really care if it was useful or if it would help me “succeed” in any way — as long as it was fun, I’d dive right in. At various points in my life, I’ve wanted to become a mathematician, a historian, a short story writer, a theoretical physicist, a veterinarian for small animals, and a long-distance hiker.
With our eyes, we can see a few thousand of the brightest nearby stars. But with some hard work in instrument design, observation, theoretical calculations, and computer programming, we can capture so much more of that information that’s raining down on us from the skies above.
I still might do some of these things — particularly long-distance hiking. But ever since college, the field that has really caught my eye is astronomy. One of the things I find most appealing about astronomy is this idea that we’re just awash in light, particles, and gravitational waves from unimaginable numbers of astronomical sources, all the way to the edge of the observable universe. With our eyes, we can see a few thousand of the brightest nearby stars. But with some hard work in instrument design, observation, theoretical calculations, and computer programming, we can capture so much more of that information that’s raining down on us from the skies above.
Learning How To Code
I first learned to program from my favorite high school math teacher. One summer, Mr. Cohen decided to teach me and another one of his students how to code in Python. I have no idea how or why he decided to volunteer his time to do this (other than just being a really great teacher), but I’m really glad he did! Programming always seemed a bit scary and mysterious to me, but this was the perfect way for me to start.
That summer, we met up every week at a local coffee shop, and Mr. Cohen would teach us the basics of programming — functions, loops, etc. Then over the week, we’d apply them to Project Euler problems. The next week, we’d meet up again, go over the problems we worked on, and learn some new things. I loved this! Each problem taught me something new about how to code. And as the problems got harder, I had to get more careful and resourceful in the way that I approached them. This process of trying, failing, outlining a new approach, trying again, checking stack overflow, trying again, debugging, and finally getting the answer has really stuck with me in both science and programming.
How I Use Computer Science Today
Right now, I’m working on the VLA Sky Survey (VLASS): a multi-epoch survey of the entire northern sky with the Very Large Array, which will detect about 2 million radio sources. Most of these sources are quiescent, changing slowly over millions or billions of years. But some of them are brief and brilliant flashes: black holes launching new jets at close to the speed of light, massive stars exploding and plowing into the dense gas that they ejected decades before collapse, and whole new populations of explosions that we might have never seen before.
To search for these explosions, I’ve been running some source extraction code on the radio images produced by VLASS. I cross match the catalog of VLASS sources against a historical radio survey looking for sources that have recently appeared or disappeared. To narrow down what might be producing these transient sources, I then cross match their locations against many catalogs of known astronomical objects. Then after some vetting to make sure that these objects are real, I do followup observations with optical and radio telescopes to identify what they might be. Finally, I interpret these observations using theoretical models, and hope to learn (and publish) something new about astrophysics/the universe!
Something like 50–70% of my work time is spent writing and debugging code.
Something like 50–70% of my work time is spent writing and debugging code. This time is split into 3 main categories right now: 1) developing the software pipeline for identifying explosions, 2) writing wrappers for pre-existing software packages to turn raw radio interferometer data into usable scientific images, and 3) miscellaneous tasks, such as making plots to visualize my data. In the near future, I’m hoping to improve my pipeline by automating artifact rejection with machine learning (perhaps using a convolutional neural net).
I’m really lucky to have a huge amount of data to play with (courtesy of all of the hard work put in by the staff of the National Radio Astronomy Observatory). There are lots of new discoveries to be made buried just beneath the surface of this data. The first step to making these discoveries is to ask approximately the right scientific question(s). But the next 10 steps are to play around with code until you find something new!
Some of the biggest data rates in the world will be generated by next generation radio interferometers! The Square Kilometer Array will produce something like an exabyte per day of raw data, which is a substantial fraction of the current global internet traffic. High-performance computing and dedicated clusters of ASICs and FPGAs will be essential in being able to handle that tidal wave of data.
Programming is one of the most flexible and useful skills that you can add to your toolkit.
Programming is one of the most flexible and useful skills that you can add to your toolkit. With just a little bit of code, you can do routine things so much more efficiently, and with a little more, you might find yourself doing things that you never thought were possible.
What tools and programming languages do I use?
- Python (Numpy and Pandas)
- Various astronomy packages (Astropy, CASA, PyBDSF)
- Bash/shell scripting
- HTML/CSS
Outside of work, I like hiking, rock climbing, and cooking vegan food.
Interested in utilizing this resource in your elementary classroom? Check out our elementary version of this blog here. Comprehension is designed for upper elementary independent reading (Lexile level no greater than 1000), or guided reading for younger students.