SQL for CMDs

Applying SQL For Color-Magnitude Diagrams

This page is meant for students who want to dive into SQL with little background in programming and with too little time to learn the fundamentals. As such, it’s built to offer a taste of practical knowledge and skips the traditional learning curve. The intended audience is those being redirected from the Star Clusters Expedition, but the concepts taught here might be useful for other applications as well.

If you want a deeper and more structured learning experience, please use the SDSS SQL Tutorial; it teaches the nuanced principles which are important for writing your own SQL queries beyond the Star Clusters activity.

Sample Query

We’ll jump right in with a sample query. First, read through it and try to figure out what it means on your own; SQL phrases have been written in capital letters to increase the code’s readability. Then read the explanation below to see if that improves your understanding.

select p.dered_u AS u, p.dered_g AS g, p.dered_r AS r, p.dered_i AS i, p.dered AS z
FROM PhotoObj p, fGetNearbyObjEq(10.750156, 5.334219, 1.3) n
WHERE p.objID = n.objID
AND p.type = 6

Let’s work from the top. The first two rows explain which data we want and where it’s stored. p is the view we’re extracting the data from; we define it in line 2 as an abbreviation for the view PhotoObj. So when the compiler reads p.dered_u it knows to go inside PhotoObj, look for the column titled dered_u, and extract values from there. We then define an alias for each column, so this query will return a table with columns titled u, g, r, i, and z instead of using the default names.

In line 2 we also define n as the table produced by the SDSS function fGetNearbyObjEq, which has inputs of (right ascension in degrees, declination in degrees, radius in arcminutes) and outputs a list of objects which are in that circle of the sky.

Line 3 is where things get interesting, since the WHERE keyphrase allows you to put restrictions on what kind of data you receive. WHERE p.objID = n.objID means the query will output data only for objects which appear in both PhotoObj and fGetNearbyObjEq.

Line 4 continues filtering the data: we know from reading the Schema Browser that PhotoObj has a column titled “type,” which stores different values depending on what kind of object is being observed. p.type = 6 corresponds to stars, so this line tells the compiler to only return data from stars, and not galaxies or other objects.

Try the query out using the SkyServer SQL Search tool.

Graphical Representation

The image captures a (simplified) process of what the compiler does while reading through the query above

We start with the pink boxes, where the compiler figures out which objects (rows) are in both n and p. It does this by finding matching objID values. Then, in the green boxes, it finds all of the columns that we requested data from and ignores the rest; it will store these values using the column names that we defined. Finally, the blue boxes demonstrate that only the objects which have a value of 6 in the “type” column will be extracted to the final output.