[ANN - Berkeley py4science] Talk Weds. April 27: Python capabilities on a Hadoop based cluster
a reminder of tomorrow's talk:
* April 28, 2pm: Title: Python capabilities on a Hadoop based
cluster. By Dan Starr, Astronomy, UC Berkeley. To make use of several
Hadoop clusters recently made available, I ported portions of our
Python based project into Hadoop run-able jobs using Hadoop treaming
and Cascading. I'll discuss tricks which helped make this possible and
give some comparisons between Yahoo's M45 cluster and an Amazon EC2
cluster using customized Cloudera AMIs. I would also like to give an
overview of Hadoop Dumbo and Python hooks for HIVE.
As usual, we meet at the Redwood Center's conference room: 508-20
Evans Hall (5th floor).