e-Commerce with Hybris: 10 Million Product Catalog?

Published on February 22, 2013February 25, 2013 by Yakov Fain

Last month a prospective customer called our office.
“We know that you have a team of software developers that build e-commerce applications with Hybris software. Can you help us with developing our online store?”
“Sure we can – we have a solid expertise in developing e-commerce applications with Hybris software”.
“But our store will have pretty large catalog: ten million products”.
“We didn’t have a chance to develop online stores that have more than a million products. We know how to approach such a project to minimize your risk, but so far we didn’t deploy such an application in production.”

After this conversation we’ve never heard from this person again. We know why: he was looking for another answer like, “Sure, we did it before and will do it again!” This is not to say that creating a responsive online store with 10M products is not possible with Hybris, but we didn’t do it. In any online store product catalogs have to be indexed from time to time. For example, a store needs to add a new line of products. How much time is required to re-index a store with 10M with Hybris software? How about importing of the CSV product data and synchronization? We simply don’t know.
Hybris does several passes over these IMPEX to resolve referential integrity, in fact – many passes. In may take a day and finish or a day and break/start over. Then either SOLR (Lucene) or Endeca full text search index will need to be built. Then, there is a task of synchronizing the staging catalog with its production version. Indexing of itself is not Hybris specific, but the import and synch are. Hybris has yet to show that it’s a high performing solution and publish the appropriate case studies.

In fact, if you do not provide indexing information, importing of a mere half million records may never finish on Hybris server. We do not have any metrics of the synchronization process yet, but the good news is that Hybris is build on Java servers and careful clusterization, cacheing, and fine-tuning of the database will produce acceptable solution to large inventories. Wall Street application written im Java handle huge amounts of data in timely fashion. But Wall Street managers understand that they have to hire the right people. Large-scale e-Commerce projects have the same level of complexity.

We’d be happy to set up a lab (parallel merging while loading, finding deltas, proper Java clustering, stress tests) with the appropriate hardware and create a pilot to answer these questions and to optimize the process to get rid of the bottle necks, but this prospect customer is gone.

What will happen next? I’ll tell you: the deja vu of IT consulting. Some brave salesman of another consulting company will explain to this customer that working with large product catalogs is their bread and butter and will get this project. Six months down the road the customer will see a lot of hours billed to the project and detailed explanations of “unforeseen circumstances” followed by new promises.

We’ve seen this scenario several times – it doesn’t depend on the technology in use. Today it’s e-commerce, but six years ago I wrote a blog about a similar scenario, but that time it was about redesigning a portal for a major publisher. Software developers hate working on such projects. They don’t know that the customer has these unrealistic expectations because of some over-promising salesman.

What’s the moral of this story? We value a moral dimension to consulting, which costs us dearly.

10 thoughts on “e-Commerce with Hybris: 10 Million Product Catalog?”

Saul says:

February 23, 2013 at 11:38 am

It is rare and amassing nowadays notion. Today a lot of salesmen are not more than showmen. I bend my head.

Reply
Saul says:

February 23, 2013 at 11:45 am

P.S. I meant amazing 🙂

Reply
ssamayoa says:

February 25, 2013 at 3:02 pm

Time after time sales dept of our company get us into trouble because the “yea, we can do that!”

And happens that we HAVE to do but at very high cost (time and money). But the worst part is that normally is sold as turn-key solution without the possibility of recover the extra cost.

I do freelance work apart from my daily work and lost a lot of time because we, tech people, are normally frank and most customers don’t valuate such think.

Reply
1. ssamayoa says:
  
  February 25, 2013 at 3:03 pm
  
  Urggg auto complete!
  
  I mean thing!
  
  Reply
2. Yakov Fain says:
  
  February 25, 2013 at 3:15 pm
  
  The main goal of many enterprise managers is CYA. When they are hiring a vendor XYZ, they want to hear (and see in writing) that a company XYZ can do it. It’s even nicer for them if XYZ is on the list of approved vendors of their enterprise. If the project fails 6 months down the road, the enterprise manager will say, “We’ve selected the approved vendor, but they overpromised and underdelivered. They are guilty.” The project has failed, the money spent will be written off by accountants, and the corporate America will continue moving on!
  
  Reply
Thomas Hertz says:

February 27, 2013 at 3:11 pm

Yakov,

Thanks for your post. I stumbled across this today and I think I can bring in some more information and feedback about hybris and how it performs with large data sets.
I am with hybris from the beginning, I was the main architect of the hybris stack and I am a true technical person – that’s where my heart and passion is.

We have several customers in the 10M SKU area and many dozends in the area 1-9M.
I just recently talked to a hybris customer who is live (and happy) with his 34 Million SKU’s in the system, 10M are in the online catalog, with a quite significant update rate.
I am not talking about rows in the database, I am referring to actual articles you can buy. This transforms easily to 100 Millions DB rows (multilanguage, attributes, …)
Setting up and configuring these kind of large scaled systems are of course not a out of the box one-click installation, it truly is an enterprise setup.
But still it’s the same hybris platform code that is powering the 100.000 SKUs demo system on my notebook.

We’ve also done several performance tests on large scale hardware, in dedicated benchmark centers, and – if used correctly – I am still truly amazed by the performance and speed of handling large data sets.
And I am talking specifically about our Impex Import, the catalog handling and also how we do indexing and searching with Solr/lucene.

Something our sales/marketing people might be not happy to hear, but I am also honest that there is still a lot of potential of improving how we e.g. do catalog synchronization, handling DB batches/transaction, utilizing the java heap etc etc.
But I know how we do OR/mapping, caching and clustering from the depths of java lines in our core classes to the customers using it in the field and can tell you that all (ok..almost all) more sales/marketing-flavoured statements from us are not exaggerated..

I am happy to take any challenge from competitors 

Thomas Hertz, SVP Technology, hybris

Reply
1. Yakov Fain says:
  
  February 27, 2013 at 4:01 pm
  
  Thomas, thank you for your comment. Being a Java enterprise architect for 15 years, I’m a strong believer in this platform. Java application servers are very capable and can handle huge amounts of data in timely manner. I’m sure, Hybris has talented engineers who are professionals and will address whatever technical issues you may face.
  
  The main message of my blog was that complex tasks should be approached in a professional manner by the people with the right qualifications. But I’ll be honest with you, I find the Hybris ecosphere different to what I’m used to. The main power of the Java platform is that it’s open. Hybris Software decided to live in a closed community. IMO, this hurts the product. No matter how good your engineers are, you can’t beat the collective brain of the Java community. I realize that there are some commercial secrets to protect, but not to the extent when there is not even one book is published about the Hybris e-commerce solution. I was surprised to see no printed manual at a recent training I attended.
  
  We have about 20 software developers working on several Hybris projects and we maintain a Skype chat where people help each other finding technical solutions. In some tough cases I ask them, “Have you posted this question on the internal Hybris forum?” They say that unless it’s a simple question no one will answer there. Do you have a policy encouraging senior technical engineers to monitor and answer all the questions? Where is the Best Practices document to help developers (other than on flexible search)?
  
  I’m used to work in highly technical Wall Street organizations that develop applications processing tons of financial data. They also have some secrets to hide, but they are a lot more open than Hybris in sharing how to accomplish complex architectural goals in Java. Let alone Goldman Sachs who even created their own Java collections and contributed to the community.
  
  Anyway, I like Hybris Software and wish you guys all the best no matter how you’ll decide to run your business.
  
  Please do me a favor – I’ve asked one of our developers to post a technical question as a comment in this blog – please find the right person to answer it.
  
  Reply
  1. Thomas Hertz says:
    
    February 27, 2013 at 5:27 pm
    
    thanks Yakov.
    
    yes, we also see the knowledge scalability challenge.. you’re right.
    we even in the past thought about releasing our tech-stack or parts of it as open source.
    currently we are going with a somehow ‘in between’ approach where we give selected partners access to our source code in a ‘read only/not compileable’ way.
    
    but in the end we are a commercial company and we want to protect one of our biggest assets – our business logic/source code – for sure. I doubt you’ll get the complete IBM Websphere stack source code.;)
    But i completely agree is that we need to strenghten our documentation/trails and also feedback from and to developers.
    
    Even releasing the not-commerce related areas where we do not see a huge diffentiator for us (e.g. our caching framework) as open source – or even better just switching to an open-source solution is definitely on our agenda.
    
    We will release a new forum software soon, we definitely will invest significant more (people/money) into technical documentation this year, we’ll add online training and best practice courses and will establish the already existing hybris certification program.
    
    long story short: I agree with your points that we have to be more open to the community to be able to scale and spread hybris knowledge.
    
    having a lot of hybris knowledge is very valuable these days..good for these who have, but i would rather see it much more spread.
    
    thanks for sharing your thoughts on that.
    
    thomas
    
    Reply
Valery says:

February 27, 2013 at 4:03 pm

Thomas,

Short question: Hybris uses double/java.lang.Double for monetary values This is against best practices for handling financial data (example: http://epramono.blogspot.com/2005/01/double-vs-bigdecimal.html). Rounding errors accumulated over catalogs with 10M SKU (that transforms easily to 100+M orders) will be huge. Is there any explanation?

Reply
1. Thomas Hertz says:
  
  February 27, 2013 at 5:35 pm
  
  Valery, thanks for your question.
  
  Let me try to answer very briefly:
  
  All price data is stored with a proper datatype in the database – where we currently use double/java.lang.Double is in our java interfaces, this is right.
  
  We are very sensitive on the quality of our product and will obviously provide patches if we see concrete cases where there are miscalculations in our price/promotion engine.
  I will make sure these reports get the right priority as we are aware of the fact, that using ‘double’ types in certain areas of our price calculation routines could lead to issues.
  We are currently not aware of problems with existing customers, though – and we can assure you that we have huge test coverage for assuring our algorithms are returning correct values.
  
  We now have to balance the backwards-compatibility of API’s with the best architecture and interface here.
  We are planning a new price/promotion engine for our hybris 5 release line which might be also backported to 4.x, This price engine will also make use of ‘bigdecimal’ data types for all price related APIs.
  
  But important to mention is – as said before – that the actual storing of data in the database is also in all current versions done with a proper datatype, not as a double.
  
  thanks thomas
  
  Reply