- Data Science: Pandas, Scikit-learn, Gensim, NLP, NLTK, Machine Learning, Topic Modeling, Data Analysis, Writing Reports, Data Exploration, Feature Engineering, Google BigQuery
- Programing: Python, Clojure, Bash, SQL, Java, SQL, Test Driven Development (TDD), Open Source
- Amazon Web Services: EC2, RDS, S3, Workspaces, VPC
- Developer Tools: git, BitBucket, GitHub, vim, emacs, Leiningen, tmux, pip, conda, virtualenv, Docker
- Operating Systems/Databases: Ubuntu, CentOS, ArchLinux, PostgreSQL
Consultant at Google, Taos (September 2017 – Present)
Contributing to a number of projects within a team at Google:
- Improving the systems alerting for use by operations and support teams. This involves using BigQuery (SQL) and Python to analyze and model hundreds of terabytes of systems data, then using the insights gained to guide changes to the alerts. I am also involved in creating and improving the alerting tools and infrastructure.
- Teaching a Python course for 21 members of the team.
- Creating chatbots and RESTful API’s using Python.
- Writing software to automate processes done by operations and support team members.
Consultant (January 2015 – Present)
Worked on a variety of software solutions for business problems and different companies, such as:
- Created legal discovery software to retrieve case winning documents from a collection of over 500,000 in less than 2 days that led to the recovery of more than $4,000,000 in damages in a court case whose initial target was $250,000. This software package (BakerStreet) has also been used in subsequent cases.
- Applied Python topic modeling algorithms (Latent Dirichlet Allocation and Latent Semantic Indexing ) to automatically create indexes and topics on telephone transcripts in Hebrew, research content from different departments of the University of Singapore, and a variety of corporate documents.
- Managed team and defined labeling scheme for document sets from past financial fraud cases for use in supervised machine learning based on input from domain expert.
- Constructed economic forecast model that was used in court to estimate the monetary impact caused by defendants.
- Built development and testing environments using Docker for various projects.
- Set up AWS to run experiments and manage applications using large EC2 instances, RDS (PostgreSQL) and S3. Used Workspaces to enhance security of the EDiscovery (Electronic Discovery) system.
- Developed expertise of the legal compliance required for the creation of an EDiscovery product on a limited budget that included novel features in addition to many of those found in competitor systems.
- Designed a domain specific language (LISP-Like Structured Query Language) for querying against a large collection for documents. Used PostgreSQL as the back end.
- Created a virtual file system (VFS) in PostgreSQL using its ltree datatype and wrote optimized SQL queries for operation on the VFS.
- Created dashboards that were backed by SQL queries providing users with real time analytics of how each users actions were affecting the system.
- Created software manuals, documentation and guides for technical and non-technical users.
- Helped clients to better define their problems based on vague requirements and to implement sensible solutions.
Pandas Python Package for Data Analysis, Contributor (May 2017 - Present)
Improved the string representation of all Index objects. Improved the documentation, making it easier for others to contribute.
Toolz Python Package, Contributing Author (April 2016 - Present)
Wrote the code, documentation and tests for the random-sample function. Assisted with advising other potential contributors.
text2math, Creator (February 2016 – Present)
A package designed to be used for demonstrating basic Natural Language Processing (NLP) feature engineering in Python; constructed test environments using Docker.
SPEAKING AND COMMUNITY ENGAGEMENTS
- Practical Packaging For Machine Learning Solutions, BAyPIGgies - Data Science Night (2017), PyTexas Conference (2017) - Suggested ways in which standard Python packaging tools can improve the way we share and deploy Machine Learning projects. Presented to a total of ~300 people.
- Teaching Assistant and Volunteer, Python Regular Expressions Tutorial, PyCon (2017)
- toolz Lightning Talk, San Diego Python Users Group (2016) - Introduction to the toolz package for Python.
- text2math Talk, DFW Pythoneers Meetup (2016) - Demonstrated how to write a simple Python program for natural language feature extraction.
- Instructor (2016 - Present) - Programing and software development in Java and Python involving classes and individual mentorship for adults and teenagers
- Volunteer poll worker, US General election (2016)
- Clojure Standard Library Series (2016 – Present) - A series of posts about the Clojure standard library
- Organizing Committee, PyData Dallas (2015) - Organized speakers and fund raising for the 2015 PyData Dallas conference
Single board computer cluster (Hadoop, Spark, MPICH), Creator
A learning cluster for experimenting with MPICH and the Hadoop ecosystem; deployed Hadoop, Apache Spark and MPICH on a cluster of ODROID C1’s running Arch ARM Linux.
Texas A&M University, Bachelor of Science in Economics (August 2011 – May 2015)