Have you ever thought if your starting point is in a rich neighborhood, Uber is smart enough, in terms of a dynamic pricing model, to charge you more? To test this hypothesis, I am going to scrape real estate markets and use python to analyze the relation between Uber charges and house prices.
If you have ever ridden a Uber before, you probably know the cost of a ride should be based on the distance, wait time and the surging price (if the area is busier than usual Uber charges more. Thanks to AL and data, Uber knows that people are willing to pay a certain price at a certain time. ) In other words, Uber’s rides to the airport within the same zip code cost around the same. BUT, what if we take the housing price into account?
I’ve scraped 6000 sold houses in Seattle between the second half of 2018 and the first half of 2019 using Octoparse. Then I use the data and map them with an Uber cost estimate based on each address.
Scrape the data using Octoparse
Step 1: Scrape list of URLs from Trulia.com
Step 2: Load the List into Octoparse
Step 3: Select extracted data fields from Octoparse
Step 4: Save and Run extraction.
Step 5: Export the file to Excel/JS
First, I get the address of every sold home within the past year.
Second, I reverse look up the GPS coordinates of each home so that, third, I can feed the coordinates into the Uber’s estimate API.
Then I get back the low and high estimate of an UberX ride to the Seattle-Tacoma International Airport.
Finally, I correlate the estimate with the sold price of the home which is initially set at the starting point.
I randomly pick two zip codes that are located right next to Downtown Seattle, namely 98121 (Belltown) and 98122 (Central District).
The ride from those to the airport via I-5 is around 15 miles. If we find that the Uber’s estimates between a $400k home and $2M home have a huge difference, we can assume that Uber does charge more in rich neighbors. But in case we find neighbors that the range of the estimates is relatively narrow, the hypothesis doesn’t stand because the housing price is not a factor of the cost of an Uber’s ride.
Housing price ranges from low $200k to high $2M (10 times) and the estimated cost of a ride to the airport ranges from $33 to $42 as the high estimate and $30 to $38 as the low estimate ($7–8 difference.)
From the guided red lines, we can discover the more expensive houses get a slightly higher cost estimate because they have located near the water thus a longer ride in the local roads before the driver gets to the I-5 freeway. But the price difference is too small to imply that more expensive houses get charged for more in an Uber ride.
To conclude from the data analysis, Uber’s estimate is regardless of the housing price as the starting price. Therefore, we cannot reject the null hypothesis. (In other words, the commonly accepted fact that the cost of a ride is based on the distance, time and the surging price still stands.)
Author: Ashley Ng
Ashley is a data enthusiast and passionate blogger with hands-on experience in web scraping. She focuses on capturing web data and analyzing in a way that empowers companies and businesses with actionable insights. Read her blog here to discover practical tips and applications on web data extraction
Si desea ver el contenido en español, por favor haga clic en: 5 Razones por El Web Scraping Puede Beneficiar a Su Negocio