Coding for Statistics
By Tiffany Schleeter
My name is Tiffany, and I am a data scientist in Washington, D.C. Unlike other types of scientists, I don’t spend my days in a laboratory surrounded by bubbling test tubes and Petri dishes. I don’t even own a lab coat. My background is in applied statistics. I spend a lot of my time sifting through all sorts of datasets — quantitative and qualitative data, longitudinal studies, functional data, etc. I summarize, diagnose, and assess information from data using statistical methods.
Part of my job is to code computer programs that allow me to make sense of large and complex sets of data.
Mountains of data are being generated every day for just about anything you can imagine — computers, websites, wearable devices (like Fitbits), all generate data — and much of this information is available to anyone. But data can be complex, messy, and oftentimes completely overwhelming. Part of my job is to code computer programs that allow me to make sense of large and complex sets of data.
The type of data we have, and the questions we try to answer, determines the computer software I will use and the statistical methods that I’ll apply. Components can vary from dataset to dataset, which means statisticians need to be fluent in several programming languages. With every degree I’ve earned, and with every job I’ve had, my programming skills have evolved and expanded.
In school, I was always good with numbers. Numbers made sense. When it was time for me to declare a major in college, I made what I thought was the most logical choice — statistics. Although I didn’t major in computer science, it was no surprise to me that my major required a number of CS courses.
C language was the first programming language I learned in college. While I don’t use C anymore, it gave me the foundation I needed to learn other languages later. I also studied a language known as R in college and another called MATLAB in graduate school. While I rarely use MATLAB in my profession, I still use R for statistical modeling and for applying data science techniques, such as network theory and functional estimation. SAS is a common programming language that I have used quite a lot — both in graduate school and in several jobs that I’ve had. SAS is useful when dealing with survey-based data, longitudinal data, or large datasets. Over the years, I’ve worked with many other programing languages including SQL, Python, and LaTeX.
Data is everywhere, and wherever there is data, there are statistics to be analyzed.
Data is everywhere, and wherever there is data, there are statistics to be analyzed. Statistics provide insight into how things work and why, or they can help explain why something didn’t work as expected. The remarkable thing about data is that it comes from so many different sources, places, and industries — from public health and finances to defense and innovation. Statistics and computer science help us make sense of data and arms us with the tools we need to make informed decisions.
Tools and Programming languages I use:
- SAS
- R
- SQL
Outside of work, Tiffany enjoys yoga, travel, and eating sushi.