Tag Archives: books

Setting up Apache Nutch with ElasticSearch, Naval’s podcast and new books

I made some good progress on my Apache Nutch set up. I finally got Nutch to fetch and parse walmart.com. I also managed to get Nutch store index on ElasticSearch. You would think that setting up a basic web crawler using Apache Nutch in 2016 would be an easy, a couple of hours worth of effort. Turns out it isn’t.

One of the issues I ran into while trying to set things up was specifying certain config values across a few files for Nutch and Hbase to work together correctly. You can grab these config values at https://github.com/balajiathreya/nutch-hbase-config-setup

View post on imgur.com

The above is a screenshot of my local ElasticSearch instance containing an index created by my crawler. The next step is to figure out how to get Nutch extract and parse a specific section of the web page – particularly, the item name, price and number of items available.

Finished the book – Colorless Tsukuru Tazaki that I picked a couple of weeks back. I quite liked the book even though the book ended with some loose ends not tied up. I gave the book 3/5 stars on goodreads. (A random thought that popped into my head while reading this book – I have read quite a few books by Murakami and I don’t remember any one of these books ever mention the atomic bombings even though the stories take place in Japan – not even a casual, off-hand mention. I thought it was quite weird. May be, Murakami indeed has mentioned the atomic bombing in his other books that I have not read yet.)

The Rational Optimist

Picked up two new books from my to-read list. The first one is “The Rational Optimist: How Prosperity Evolves” by Matt Ridley. This non-fiction book was a recommendation in a podcast by Tim Ferriss with Naval Ravikant, the founder of Angel-list. In this podcast, Naval shares his thoughts on life, habits and start-ups; once you get past the first 20-25 minutes, it gets really really interesting and perceptive.  My favorite moments from the podcast:

The best way to prepare for the future in 20 years is find something you love to do. Build an independent brand around it with your name. Make creative work, so that you stay interesting, you can stay ahead of the game. Anything that is not creative society can replicate and not pay you full value overtime, so it’s better always solving new problems and doing new things. Get comfortable with working in a boom/bust fashion where a couple of weeks at a time you can have a lot of work and then a couple of weeks at a time you’re on vacation.

The future will be gradual and then it will be sudden. The best way to prepare is just not to give up your independence in a first place.


At the end of the day, I think you have to work on your internal state until you are free of as many biases and conditioned responses as possible…. these are extremely hard skills to build; they are not things you are gonna build by reading one book and ah ha… I don’t believe in the epiphany theory of self development… you read one book, you read a phrase and thats it… this changes myself…. you scrawl on it a paper and look at it for a long time… you make it desktop background.. life doesn’t work that way… what you kinda have to do is build skills. I think happiness is a skill, dieting is a skill… skills get built over decades with feedback loop and you keep working on it.

True happiness comes out of peace. And peace comes out of fundamentally understanding yourself. It comes from looking inside yourself.


The act of judging something separates you from that thing. Overtime as you judge, judge, judge, you invariably judge people, you judge yourself. You separate yourself from everything and then you end up lonely. That feeling of disconnection, loneliness is what eventually leads to suffering. And then you struggle, you resist the world the way it is. Happiness is the absence of suffering. It comes from peace.


The most important trick to be happy is to realise that happiness is a skill that you develop and a choice that you make. You choose to be happy and then you work at it.


Individual entrepreneurial efforts often fail, but individual entrepreneurs over their careers rarely fail. As long as you can keep taking shots on goal and you keep getting back up eventually you’ll get through.


It’s only after you’re bored that you’re going to have good ideas. It’s never going to be when you’re stressed or busy or running around or rushed. Make the time. Same way with people. You need to have space in your life where you’re not booked with the people that you already know. You have to be pretty ruthless about saying no to things, about turning people down and leaving room in your life for serendipity.

This podcast became so popular that Tim and Naval met for a second time – I’m yet to listen to this one.

View post on imgur.com

The second book I picked up is Dune by Frank Herbert, a popular science fiction book. I started reading it last year, but had to return since someone had made a hold on the book and I couldn’t renew.

My fiancee visited me for the weekend and we spent sometime preparing for interviews. I couldn’t help but think that the interviewing dynamics would be quite different and interviewers would be a lot more empathetic if he/she doesn’t know the solution to the  problems already.

Other than that, the weekdays were quite uneventful and passed quite fast.

Weekly update – 3/14

This week was quite a busy week at work – the code I have been working on for the past 2-3 months went live and I was busy in preparing for the release, smoke-testing, fixing last minute bugs and making sure nothing is broken in production. The smoke-testing will go on for a couple of more days before the code is fully rolled out. It was quite an interesting few months of work and a great way to begin this year!

My contributions involved:

  1. migrating 3 business rule engine policies(related to managing seller risk) from an old platform to a new platform.
  2. updating the new platform to process these policies (adding new datapoints and code to handle what to do with the outcomes of policies)
  3. and some clever maneuvering(at least, I think so 🙂 ) in a legacy codebase so that sellers can be seamlessly made to go through the new platform instead of the old platform.

As a result, I had little time to read “The Count of Monte Cristo” during the weekdays. I’m kind of in the middle of this 1400 page book and I hate to not finish a book after picking it up. However, I made good progress with “Colorless Tsukuru Tazaki” – the other book I have been reading. I think I’ll probably finish the book sometime in the middle of the next week unless it gets too crazy at work.

Interesting reads from this week:

  1. Peter Norvig, the director of research at Google on making errors:  Link
  2. The usual H1B visa discussion you see on hackernews every couple of months – Link – the anti-H1B sentiment on hacker news is nothing new, but I can’t deny that I learn new information/perspective every time one of these threads crop up.
  3. StackOverflow’s developer survey – Link . I’ll save you click – 2015 has essentially beenAnd, referrals continue to be the number one means of getting jobs(Link) – one thing I really wish changes.
  4. Make your own bubble in ten easy steps – Link
  5. turkeypants comments on What phase are you going through on reddit – Link

I also made some progress with my side project with Apache Nutch but not too far – setting it up turned out to be more difficult than I expected. I got Nutch to fetch and parse web pages, but I’m still having issues with storing the index on elastic search. It is probably a couple of hours of work –  I need to spend some time reading the docs to figure out what I’m doing wrong. I’m hoping to work on it again next weekend.

Week of 3/7

I’m still trying to plough through The Count of Monte Cristo. I’m currently at page 745 – so, another 700 pages to go. I hope I wrap it up before I leave for India. I also picked up “Colorless Tsukuru Tazaki” by Haruki Murakami. I didn’t like Murakami’s last book 1Q84 that much, but I’m hoping I like this one better.

Finally, completed watching House of Cards season 4. I liked season 4 a lot more than season 3(which I felt was a damp squid). House of cards has always been about power play. Consequently,  a game-of-who-blinks-first between Frank and Clair Underwood was so much fun…

Played a little bit with Apache nutch and elastic search for a personal project – hopefully, I will have something to demo next weekend.

Favorite tweet(storm) of this week – an interesting argument against the common belief that multi-tasking is bad for you. If you haven’t heard of this guy before you should definitely subscribe to his site – ribbonfarm.com; or at least read this series of post called The Gervais Principle