Mining Georeferenced Data
As modern technologies gradually come to permeate our lives, our use of them becomes second nature as the “real’’ world naturally extends to include the online on. Even though the cognitive load imposed upon us to interact with state of the art technologies decreases, the amount of information that is collected and processed in the background can only increase. Such records provide a unique view of how we interact with these systems and, through them, how we interact with each other. In this tutorial we introduce the students to tools and techniques designed to harness this wealth of data with a special emphasis on datasets that reflect human behavior, interactions and collaborations. In particular, we will cover online social networks like Twitter and Foursquare, collaboration platforms such as Wikipedia and Github and, finally, how to scrape and extract information from generic webpages.
Data Mining, Web Scraping
Python 3.5, scrapy, beautifulsoup, tweepy, twitter, foursquare, github, requests, matplotlib basemap, shapefile, geopy
Dr. Bruno Gonçalves
Affiliation: New York University Center for Data Science.
Bruno Gonçalves is a Data Science fellow at NYU’s Center for Data Science while on leave from a tenured faculty position at Aix-Marseille Université. He has a strong expertise in using large scale datasets for the analysis of human behavior. After completing his joint PhD in Physics, MSc in C.S. at Emory University in Atlanta, GA in 2008 he joined the Center for Complex Networks and Systems Research at Indiana University as a Research Associate.
From September 2011 until August 2012 he was an Associate Research Scientist at the Laboratory for the Modeling of Biological and Technical Systems at Northeastern University. Since 2008 he has been pursuing the use of Data Science and Machine Learning to study human behavior.
By processing and analyzing large datasets from Twitter, Wikipedia, web access logs, and Yahoo! Meme he studied how we can observe both large scale and individual human behavior in an obtrusive and widespread manner. The main applications have been to the study of Computational Linguistics, Information Diffusion, Behavioral Change and Epidemic Spreading.
He is the author of 60+ publications with over 4700+ Google Scholar citations and an h-index of 28. In 2015 he was awarded the Complex Systems Society’s 2015 Junior Scientific Award for “outstanding contributions in Complex Systems Science” and he is the editor of the book Social Phenomena: From Data Analysis to Models (Springer, 2015) and the author of the forthcoming “Twitterology: The Social Science of Twitter” (Springer, 2018)
Fields of interests: Data Science, Social Networks, Human Behavior.