December 1: The Big (Data) Picture: NoSQL in Government Agencies - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Comments
December 1: The Big (Data) Picture: NoSQL in Government Agencies
You must login to participate in this chat. Please login.

The second limitation I see comes down to engineering.  Many RDBMS vendors have innovated through acquiring different technologies, but then failed to integrate them in a meaningful way.  For example, an RDBMS vendor acquiring a search company and "bolting" it onto the database, versus MarkLogic making that engineering decision up front and purposefully designing it that way.

Apprentice

I'll add my two cents, too.  There will always be a place for relational databases.  However, when you have to bring together data from multiple silos and that data or its structure changes frequently, relational database architectures struggle.  They are also difficult to use with sparse data such as social media or intelligence data.

Apprentice

Good point... I know a few of those DBA soldiers who would agree (or maybe disagree, I guess, in terms of job security)... well said! Thank you!

Apprentice

I view a few different limitations:  First off, cost.  Development time is costly, and trying to fit "round data" into a "square" database is difficult at best.  Also, there is infrastructure cost.  Proprietary hardware vs. commodity hardware like MarkLogic uses.  And the final cost, the army of DBAs required to manage.

Apprentice

I've heard of CIOs doing amazing things with big data through relational databases -- but to me it sounds really cobbled together, given the enormity of the datasets. Greg, what are your thoughts on the limitations of relational database use for big data?

 

Apprentice

It does, thanks Sarah!

Apprentice

does that answer your question?  We have search engine indexing, structure indexing, collection indexing, security indexing, geospatial indexing plus triple store indexes that can all be used simultaneously

Apprentice

(I have so many questions about these indexes, but I'll wait until we come to it later this week) :) 

Apprentice

And as Sara pointed out earlier, the indexes are managed in real time.  Indexes are created when a document is loaded or updated, so when that transaction commits, the system has and uses the index.

Apprentice

That helps, thanks!

Apprentice

The initial project was for the FBI.  They wanted a search engine and the application required a combination storage and search.  As for someone's version of the truth, well just think of it as MarkLogic doing everything a typical search engine does plus since we are a database we can also understand structure of the data.  So you can do searches like show me all the Suspicious activity records with blue van in the title only.

Apprentice

Exactly Wendy. Thanks Wendy

Apprentice

I guess my question is what's the logic behind the index? Is this someone's version of the truth?

Apprentice

Thanks Sarah... and where did he get the data? 

Apprentice

It came from our founder, Chris Linblad, who was chief architect at InfoSeek.

Apprentice

Good to know that we're going to get more into the indexing, as it sounds fascinating, but just a quick question: where did that universal index derrive from originally?

Apprentice

Also important to our government customers is security, which is also an index.  We are the only NoSQL database with NIAP Common Criteria certification.  We are used extensively throughout the government.  High side and low side networks.

Apprentice

Our indexing is built in to the product's kernal so as soon as data is committed, you can find it on the next clock cycle.  There is no delay in managing search engine indexes and database indexes.  They are the same.

Apprentice

RE:  Index creation using any algorithm behind that?

The indexing algorithms are built into the product.  In MarkLogic, the indexes are built very intelligently to optimize many different types of queries.  We index words and do things like language based tokenization and stemming (identifying words in your data based on language and stemming to a root word.  Example in English:  Baking stems to bake, while baker stems to baker).

We also index structure (document "A" has a specific parent / child element structure).

We also provide indexes for geospatial data, wildcard search, case / diacritics, and triples amongst others.  We'll look at indexing in more detail as well later this week

Apprentice

Not sure what you mean by algorithm.  Our universal index index every word, phrase, value, structure, geospatial index, triple, collections, security, etc in a highly compressed format.  We do have algorithms you can tweak for search relevancy results if that is what you mean.

Apprentice

I think that is the beauty of MarkLogic.  Just load it in and start looking at it.  We have an Application Builder which is a simple search application that takes less than 10 minutes to create a full web application.

Apprentice

Index creation using any algorithm behind that?

 

Apprentice

What's one suggestion you could give to a CIO who is looking at a bunch of unstructured data and doesn't know where to start?

Apprentice

Optimization can be done on the data itself, on the implementation of your search queries, or both.  MarkLogic offers some neat visualization tools such as showing what concepts occur frequently together to help uncover new aspects of your data.

Apprentice

I think the biggest thing to understand is that MarkLogic has effectively integrated search into the database.  Later this week we will cover search in more detail, but at a high level, think of it as enabling you to effectively do a wide range of full text, geospatial, and structured search across your data.

The indexes that support search are highly optimized.

Apprentice

So we have a lot of flexibility which shortens the time to develop applications.  You don't need to do as much data modeling, and certainly much less up front.  Also note, we can support schemas if you have them.

Apprentice

Is there an optimization process you can go through once you start to uncover common patterns of connections between data elements?

Author

You can first load data and then create a data model as you go, or perform some ETL as data is coming in, or because this is document/hierarchical you can have one parent node with multiple children (thus different names for things), or keep legacy names and relate your data using RDF triples and semantics because MarkLogic has a triple store.

Apprentice

Hello David and Wendy -- you are right in that there are still some data modeling decisions that you need to think about.  But what you don't have to do is anticipate every possible piece of data, and structural variation that might come your way and build a schema.  Only to find that if a new source of data is introduced, you have to go back and rebuild that schema.

As Sara mentions, we can do this because we index all the different structural variations that are present in your documents, and instead of using SQL, we use search to locate relevant content

Apprentice

MarkLogic has a "universal index" which allows us to find data even without a data model.  However...

Apprentice

Greg, like David, I'm really stuck on this data model idea.... how does that work exactly, especially if the majority of the data is unstructured?

Apprentice

When you talk about "not having to worry about the data model," don't you still need to come up with a unified data model in the process of building applications? You may not organize it in a traditional fashion, but it seems to me you still have to organize it somehow.

Author

Healthcare.gov was hard to create a test plan to simulate all user functions and load before go live, and there were many infrastructure challenges that added complexity.  For Fairfax, finding mainframe experts that understood their custom data models to aid with transferring to MarkLogic was key.

Apprentice

What's the most challenging aspect of implementing this type of solution for Healthcare.gov or Fairfax? I'm always looking for the story of challenges overcome, beyond "we installed this project and everything magically worked." That's rarely the bottom line.

Author

ldip dot fairfaxcounty dot gov  - feel free to check it out

Apprentice

@Milnaz I'm using Chrome and it's working perfectly.

Apprentice

@milnaz - If you don't see the audio bar at the top of the screen, please refresh your browser. It may take a couple tries. When you see the audio bar, if it doesn't start automatically, hit the play button. If you experience audio interruptions and are using IE, try using FF or Chrome as your browser. Many people experience issues with IE. Also, make sure your flash player is updated with the current version. 

Apprentice

Try refreshing your browser or clicking on "allow" if the audio is blocked by your browser (do you see X in upper right hand corner of browser? -select and allow Information Week to run)

Apprentice

Hi,

I dont hear any sound.

 

Apprentice

If you all have questions, I'm here to back Greg up while he is speaking.  I am a MarkLogic Solutions Engineer.

Apprentice

Hey, @David -- it's good to see you here!

Strategist

Sounds great! I'm very excited for this university class. This is one of my passions.

Apprentice

The player has popped up on my browser: If you're not seeing (and hearing) the player, then it's time to refresh your browser window.

Strategist

Listening in for InformationWeek Government

Author

@nkannedari750 - Can you hear the broadcast?

Apprentice

hi @nkannedari750, please download the slide deck above and listen to the lecture.

Apprentice

Hi all -Audio is live! If you don't see the audio bar at the top of the screen, please refresh your browser. It may take a couple tries. When you see the audio bar, if it doesn't start automatically, hit the play button. If you experience audio interruptions and are using IE, try using FF or Chrome as your browser. Many people experience issues with IE. Also, make sure your flash player is updated with the current version. Some companies block live audio streams, so if that is the case for your company, the class will be archived on this page immediately following the class and you can listen then. People don't experience any issues with the audio for the archived version.

Apprentice

How to view the webcast

?

Apprentice

This should be an exciting class -- we're really looking forward to having a great class discussion to go with the lectures!

Strategist

We'd love to have your voice in the class discussion here. To take part, just type your comment or question into the "Your Post" box and then click on the "Post" button below the box. Feel free to introduce yourself before the class starts -- I think you'll find that we're a very friendly community here! 

Strategist

Hey, everyone, we're glad you could join us! When the show is scheduled to start, at 2:00 p.m. EST, an audio player should appear above the "Your Post" window. If it doesn't appear, you might need to refresh your browser until it does. If it appears but doesn't start playing, then you may need to click on the "play" button on the far left of the player.

 

Strategist


The State of Cloud Computing - Fall 2020
The State of Cloud Computing - Fall 2020
Download this report to compare how cloud usage and spending patterns have changed in 2020, and how respondents think they'll evolve over the next two years.
Slideshows
10 Ways to Transition Traditional IT Talent to Cloud Talent
Lisa Morgan, Freelance Writer,  11/23/2020
News
What Comes Next for the COVID-19 Computing Consortium
Joao-Pierre S. Ruth, Senior Writer,  11/24/2020
News
Top 10 Data and Analytics Trends for 2021
Jessica Davis, Senior Editor, Enterprise Apps,  11/13/2020
Register for InformationWeek Newsletters
Video
Current Issue
Why Chatbots Are So Popular Right Now
In this IT Trend Report, you will learn more about why chatbots are gaining traction within businesses, particularly while a pandemic is impacting the world.
White Papers
Slideshows
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Sponsored Video
Flash Poll