Alex McClung – Page 2

Degrees by Academic Division 1985-2011

There always seems to be plenty of discussion in higher ed about the shifts in student interest in the academic disciplines and divisions over the years. The issue has probably taken on a heightened sense of urgency in the last few years with the economic situation, prompting statements about the “death” or “rebirth” of certain disciplines. So what’s my take on it? I’d be happy to share some lengthy tome, some 1,000 word screed on the subject, but instead… Check out the p r e t t y c o l o r s!

The chart above depicts the percentage of degrees awarded at Swarthmore by academic division. Percentages are based on the number of majors, so graduates with double majors may appear in more than one division if their majors were in different divisions. (For more info on degrees, head over to the “degrees” section of our Fact Book page).

In addition to having pretty colors, this chart also happens to be very easy to make in R. In fact, if your data are arranged properly, which you can always do ahead of time in Excel, this chart can be created using one line of code with the ggplot2 package:

qplot(Year, Percent, data=mydata, colour=Division, geom=”line”, main=”Degrees by Academic Division 1985-2011″)

If you are new to R and you are like me and hate worrying about getting the file path right when reading data into R, save your data as a .csv file and use file.choose:

mydata<-read.csv(file.choose())

You could also just highlight the data in Excel, copy it to the clipboard, and then read it into R, being sure to tell R that the data are tab-delimited:

mydata<-read.table(file=”clipboard”, sep=”\t”)

So there you have it, an increase in pretty colors with a minimum of effort which surely means more time for ~~Angry Birds~~ important stuff.

Speedy PSPP

Yes, someone is using that acronym for their software. And yes, I promise not to make any bad jokes that reference the early 90s rap song, also with an acronym. If you’re not sure which song I am referring to, so much the better for you.

PSPP is intended as a “free replacement” for SPSS. Since I’m not a big user of SPSS, I had not paid PSPP much attention until just recently. The reason I looked at PSPP a second time is that I wanted to quickly open a .sav file (the SPSS native file format) to look at value labels. We have access to SPSS here at the college, but why PSPP offered an alternative in this situation is that we access a networked version of SPSS which can take some time to open. PSPP, on the other hand, is very light and can reside on my machine. So I decided to give it a try and found that I can open data sets very quickly.

I was so impressed with the speed improvement that I changed the .sav file type association on my machine to PSPP. Of course, what better way to show one’s appreciation! Now, keep in mind that I do not use SPSS much at all and PSPP only offers what they call a “large subset” of the capabilities of SPSS, so this may not be a suitable replacement for the SPSS overachievers out there. You can also open .sav files in R using the read.spss command in the foreign package, but if you’re like me and you might want to look at them first, PSPP allows you to do this. It also offers the opportunity to work with SPSS files at home, for those of us for aren’t going to want to purchase an SPSS license for the home computer.

If others have PSPP experiences to share, I’d love to hear them!

Mapping Student Counties

We thought it might be interesting to create a map of the home counties of our domestic students. Since this is something that I have seen done in R and I am always up for trying to sharpen my R programming skills, I thought I would give it a shot.

My first step was to retrieve zip codes for all current students from Banner. I am able to do this using the RODBC package in R. This requires downloading Oracle client software and then setting up an ODBC connection to Oracle first. Once this is set up, I can connect to banner, enter my username and password, and then pass a SQL statement to Banner. Here is the code for this step:

library(RODBC)

prod<-odbcConnect("proddb")

zip<-sqlQuery(prod,
paste("select ZIP1 from AS_STUDENT_ENROLLMENT_SUMMARY where TERM_CODE_KEY=201102 and STST_CODE='AS' and LEVL_CODE='UG'"))

odbcClose(prod)

This creates an R dataframe called “zip” and closes my RODBC connection to Banner. The example that I am following uses FIPS county codes, so I will need to prep these zip codes for use with a FIPS lookup table by first making sure they are only 5 digits. Then I import my FIPS lookup table (making sure to preserve leading zeros) and merge with student zip codes. Once I have done this, I can get the counts of students in each of the FIPS codes.

zip$ZIP<-substr(zip$ZIP1,1,5)

fips<-read.csv("C:/R/FIPSlookup.csv",
colClasses=c("character","character"))

m<-merge(zip, fips, by="ZIP")

fipstable<-as.data.frame(table(m$fips))

Now I can proceed with the example that I am using. This example comes from Barry Rowlingson by way of David Smith’s “Choropleth Map Challenge” on his excellent, all-things-R blog. I chose this method because it does not rely on merging counties by name, but instead uses FIPS codes – which we now have thanks to the steps above.

Then I use the “rgdal” package to read in US Census shapefile (available here), prep the FIPS codes in the shapefile and match with our student counts, and assign zeros to counties with no students:

library(rgdal)

county<-readOGR("C:/Maps","co99_d00")
county$fips<-paste(county$STATE,county$COUNTY,sep="")

m2<-match(county$fips,fipstable$Var1)
county$Freq<-fipstable$Freq[m2]
county$Freq[is.na(county$Freq)]=0

Following Rowlingson, we use the “RColorBrewer” package and his own “colorschemes” package to get the colors for our map and associate them with counts of students. We then set the plot region with blank axes, add the counties, and then draw the plot:

require(RColorBrewer)
require(colourschemes)

col<-brewer.pal(6,"Reds")
sd<-data.frame(col,values=c(0,2,4,6,8,10))
sc<-nearestScheme(sd)

plot(c(-129,-61),c(21,53),type="n",axes=F,xlab="",ylab="")
plot(county,col=sc(county$Freq),add=TRUE,border="grey",lwd=0.2)

Click the thumbnail below to see the resulting map:

As you can see, the map is pretty sparse as you might expect with 1531 students from 325 different counties. This represents only a first pass at trying this, so there will be more to come, possibly a googleVis version. If others have had success with the above approach, we would love to hear about it in the comments!

To get more info about the geographic distribution of our students (both international and domestic), check out the “enrollments” section of our Fact Book page here.

The R syntax highlighting used in this post was creating using Pretty R, a tool made available by Revolution Analytics.

Planes Over Swarthmore

Institutional research offices are typically known as “clearinghouses” for information on their campuses. Well, this morning I am proud to say that with WolframAlpha’s help, we are able to start tracking yet another important higher ed metric: planes overhead.

If you enter “planes overhead” into the WolframAlpha search box, you will see a listing of planes flying over the location of your IP address.

Searching from my office on campus, I can see 5 planes flying over Swarthmore right now, including a NetJets flight at 15,000 feet. Maybe Roger Federer asked his pilot if he could take a closer look at the Adirondack chair!

You can read more about this feature on WolframAlpha’s Tumblr.

Hello, Internet!

This being the inaugural post for my half of the new blog, I should begin by talking about my approach to blogging. Or at least what I think my approach will be – as you can see from the title of this post and the picture of the cat, I am new to the internet, or at least blogging. Heck, I don’t even “have the facebook”!

My favorite blogs are usually those of the HowTo variety and I often enjoy them most when the blogger is learning the skill alongside/in interaction with the reader. I hope to emulate this style – which shouldn’t be difficult to do since I am hardly an expert in any of the tools that I use.

I should first begin by introducing the tools that I use if I am going to share what I learn/learn from others in this blog:

We use SAS quite a bit in this office. I’ve been told by more than one person that SPSS is pretty much de rigueur in our industry (institutional research), but for me, I prefer something that is a data management program first and a statistics program second. I am also able to produce professional looking tabular output to a spreadsheet or pdf very quickly with SAS by looping through procedures and items with the macro facility. I know this can be done with other tools, R for example, but in my opinion, R does not have the best options as far as producing tabular reports.

Having said that, I do love R and use it quite a bit. I believe that R has many other advantages. For example, we do not have a license to SAS/GRAPH in this office, but even if we did, R still has superior visualization capabilities (IMHO). If you are curious or if you don’t believe me, all you need to do is visit this site. I use the lattice package to create “small multiples” graphs on a regular basis and I plan on doing more visualization stuff with ggplot2. In addition to this, I use R with an ODBC connection to pull data from Banner (an Oracle database) and for other data analysis tasks that would require purchasing an entire additional module in SAS or SPSS – time series or data mining tasks, for example.

This has less to do with any specific piece of software or programming language, but we are also heavily involved in survey research on campus and we are lucky enough to have the web survey tool LimeSurvey (branded SwatSurvey here) available to us. Robin and I have been getting more proficient at using it everyday. But, in general, there is always something new to learn about the design, administration, and analysis of surveys.

So I hope to have more to share (and learn) about these tools in this space soon! And of course I hope to ~~harness the power of the internet~~ hear from others using these or similar tools along the way.