Data Visualization

Putting US Tax Reform on the Map

Major income tax changes in the US proposed by Republicans have been big news recently.  The impact will be significant for nearly every taxpayer, creating winners and losers in varying degrees. The complexity of the Federal tax system and numerous variables when filing returns result in no one-size-fits-all formula to assess the impact on any given wallet.

We will start with a the most basic taxation scenario, simplify it some more, and present some findings on a map like this one:


The savings are highest in certain Northeast and Mid-Atlantic states, hovering around $600 per year.  This correlates with higher median wages in the region.  In the South and Midwest, it’s less. The savings are under $300 in Mississippi, where the median annual pay of $24,000 is about $11,000 less than in the high-wage states.  But you should ask, who are the people on this map anyway?

They are a composite person between the ages of 18 and 65, not married with no dependents, earning a median wage for the state they live in.  They also rent their home, and they get all of their income as an employee earning a wage.  State income taxes don’t come into play at any of the median wages used here.  All players on the board take the standard deduction under both existing and proposed rules.

This is about as simple as it gets, other than not needing to file taxes at all.  I should also note that incomes below $10,650 were excluded from the dataset before calculating the median, since income under that is not taxable.  All of this data was distilled from 2013 American Survey courtesy of the US Census Bureau. About 261,000 respondents all told.  Real wages haven’t exactly skyrocketed in recent years, so 2013 vintage income data is good enough for this purpose.


Things change when homeownership enters the picture.  For this scenario we will deduct 30% off the median incomes for mortgage interest and property taxes, as well as factor in state income or sales taxes (applies to every state except Alaska).  Now the tax savings are greatly reduced under the new rules, and in fact some would pay more under the new system.  Some of this impact on homeowners can be attributed to eliminating deductions of state tax under the proposed system.  Also, unlike for non-homeowners, the tax changes are less favorable in the Northeast and Mid-Atlantic high wage states than elsewhere.  The loss is as much as $226 per year in Massachusetts at the median wage level.


This is done with Python code in Kaggle, where there are also interactive versions of the above maps.  The homeowner map was generated here.  Details about the source data and how it was processed can be found in these links as well.  I am hoping this will inspire others who enjoy analyzing data to try it out with other tax scenarios.  Such examples could be a married couple, adding dependent children, and using different income brackets.  I believe more maps of this sort will arm taxpayers in the US with a better understanding of any new tax law proposals.


Geospatial, MongoDB

Import GeoJSON Data Into MongoDB

If you want good support for location data and maps in your database, MongoDB can be a great choice.  One of the most widely supported formats on the Web for geospatial data is GeoJSON, and MongoDB location data looks and behaves pretty much exactly like GeoJSON.  This is very cool if you want to create a map using a JavaScript library like Leaflet, or run common types of geospatial queries on your data.

This is all pretty awesome, until you find or generate some GeoJSON data in a file outside of MongoDB and decide you want to import it.

For some reason, the native mongoimport tool, that will import JSON just fine (well, mostly, anyway), falls to pieces dealing with GeoJSON.  There are multiple ways to hack your way through this, as any Web search will tell you, but they generally involve opening your HUGE GeoJSON file in a text editor to remove some stuff (or using a command line tool for this) then running mongoimport on the cleaned up file.  I went ahead and created a Python script that does the cleanup and import in one step.  The script is also fast at importing as it leverages the bulk-write operations available in MongoDB 3.2 and later.  This is a 10x improvement over using normal inserts with the PyMongo driver and is very noticeable with large GeoJSON files.  The script tops things off at the end by creating a 2dsphere index on your collection.  Without this index, you can’t run geospatial queries.

My script is freely available on GitHub, and to run it you could just do this:

python -f points.geojson -d geospatial -c points

This assumes you are running against a local instance of Mongo.  There are additional parameters for host name and user credentials, along with other things to know about the script in the README

So there you have it, a complete, fast, easy solution to importing your GeoJSON data into MongoDB!

In addition, if your geospatial data happens to be in shapefile format, I have a similar tool for importing that into MongoDB.