Setting up Apache Nutch with ElasticSearch, Naval’s podcast and new books

I made some good progress on my Apache Nutch set up. I finally got Nutch to fetch and parse walmart.com. I also managed to get Nutch store index on ElasticSearch. You would think that setting up a basic web crawler using Apache Nutch in 2016 would be an easy, a couple of hours worth of effort. Turns out it isn’t.

One of the issues I ran into while trying to set things up was specifying certain config values across a few files for Nutch and Hbase to work together correctly. You can grab these config values at https://github.com/balajiathreya/nutch-hbase-config-setup

View post on imgur.com

The above is a screenshot of my local ElasticSearch instance containing an index created by my crawler. The next step is to figure out how to get Nutch extract and parse a specific section of the web page – particularly, the item name, price and number of items available.

Finished the book – Colorless Tsukuru Tazaki that I picked a couple of weeks back. I quite liked the book even though the book ended with some loose ends not tied up. I gave the book 3/5 stars on goodreads. (A random thought that popped into my head while reading this book – I have read quite a few books by Murakami and I don’t remember any one of these books ever mention the atomic bombings even though the stories take place in Japan – not even a casual, off-hand mention. I thought it was quite weird. May be, Murakami indeed has mentioned the atomic bombing in his other books that I have not read yet.)

The Rational Optimist

Picked up two new books from my to-read list. The first one is “The Rational Optimist: How Prosperity Evolves” by Matt Ridley. This non-fiction book was a recommendation in a podcast by Tim Ferriss with Naval Ravikant, the founder of Angel-list. In this podcast, Naval shares his thoughts on life, habits and start-ups; once you get past the first 20-25 minutes, it gets really really interesting and perceptive.  My favorite moments from the podcast:

The best way to prepare for the future in 20 years is find something you love to do. Build an independent brand around it with your name. Make creative work, so that you stay interesting, you can stay ahead of the game. Anything that is not creative society can replicate and not pay you full value overtime, so it’s better always solving new problems and doing new things. Get comfortable with working in a boom/bust fashion where a couple of weeks at a time you can have a lot of work and then a couple of weeks at a time you’re on vacation.

The future will be gradual and then it will be sudden. The best way to prepare is just not to give up your independence in a first place.

 

At the end of the day, I think you have to work on your internal state until you are free of as many biases and conditioned responses as possible…. these are extremely hard skills to build; they are not things you are gonna build by reading one book and ah ha… I don’t believe in the epiphany theory of self development… you read one book, you read a phrase and thats it… this changes myself…. you scrawl on it a paper and look at it for a long time… you make it desktop background.. life doesn’t work that way… what you kinda have to do is build skills. I think happiness is a skill, dieting is a skill… skills get built over decades with feedback loop and you keep working on it.

True happiness comes out of peace. And peace comes out of fundamentally understanding yourself. It comes from looking inside yourself.

 

The act of judging something separates you from that thing. Overtime as you judge, judge, judge, you invariably judge people, you judge yourself. You separate yourself from everything and then you end up lonely. That feeling of disconnection, loneliness is what eventually leads to suffering. And then you struggle, you resist the world the way it is. Happiness is the absence of suffering. It comes from peace.

 

The most important trick to be happy is to realise that happiness is a skill that you develop and a choice that you make. You choose to be happy and then you work at it.

 

Individual entrepreneurial efforts often fail, but individual entrepreneurs over their careers rarely fail. As long as you can keep taking shots on goal and you keep getting back up eventually you’ll get through.

 

It’s only after you’re bored that you’re going to have good ideas. It’s never going to be when you’re stressed or busy or running around or rushed. Make the time. Same way with people. You need to have space in your life where you’re not booked with the people that you already know. You have to be pretty ruthless about saying no to things, about turning people down and leaving room in your life for serendipity.

This podcast became so popular that Tim and Naval met for a second time – I’m yet to listen to this one.

View post on imgur.com


The second book I picked up is Dune by Frank Herbert, a popular science fiction book. I started reading it last year, but had to return since someone had made a hold on the book and I couldn’t renew.

My fiancee visited me for the weekend and we spent sometime preparing for interviews. I couldn’t help but think that the interviewing dynamics would be quite different and interviewers would be a lot more empathetic if he/she doesn’t know the solution to the  problems already.

Other than that, the weekdays were quite uneventful and passed quite fast.

Leave a Reply

Your email address will not be published. Required fields are marked *