Why we care about the SharePoint Cloud Search Service Application

Earlier this month, I was lucky enough to attend Microsoft’s Ignite 2015 conference in Chicago along with a handful of other Content and Code colleagues. When we weren’t eating delicious (but oh-so-cheesy!) deep pan pizza, we were learning about upcoming Microsoft technology changes. Although Ignite focused on Microsoft’s public clouds, a few SharePoint Server 2016 sessions were scattered throughout the conference. We have found that there is still significant demand for on-premises SharePoint expertise, so I made sure to attend those sessions.

Today’s hybrid SharePoint Cloud search challenges

A stand-out session for me introduced the forthcoming Cloud Search Service Application, which looks to be the most significant enhancement to-date for the SharePoint hybrid story. It promises to overcome a major obstacle in the present SharePoint search hybrid story: today, there is no way to “merge” SharePoint Online and on-premises search results. Sure, you can “mash up” results using code (as documented by my colleague Chris O’Brien), but that doesn’t give you a single unified index and associated benefits (a consolidated relevance/ranking model, and one index to maintain).

While the absence of a merged index is probably the most common user-impacting complaint about today’s SharePoint hybrid search model, we have found through design discussions with our larger enterprise clients that maintaining the on-premises index required for hybrid search can be equally troublesome for SharePoint administrators. Firstly, all but the smallest SharePoint 2013 Search Service Applications require four servers, assuming you need high availability. Search servers are resource hungry, and patching them can be an onerous task (even if the correct procedures are followed). Secondly, it is a fact of life that full crawls must be run regularly in an on-premises SharePoint farm – TechNet lists more than a dozen reasons for doing so. Running a full crawl against a large and/or geographically distributed corpus might take days or even weeks. While a full crawl is running, changes to content are unlikely to be reflected in search queries, which will frustrate your users. There is also that nagging feeling that one day, your on-premises search index might need to be reset, rendering solutions that are dependent on search useless for the duration of the subsequent full crawl(s). Wouldn’t it be nice if you could make this someone else’s problem?

Enter the Cloud Search Service Application

The big idea is that Microsoft look after your search index in the cloud, regardless of whether its content was indexed on-premises or in SharePoint Online (SPO). Since there will be one unified index, your users will finally have a single set of results that can be ordered by relevance in an authentic manner.

Prior to Ignite, Microsoft referred to this thing as a “Hybrid Search Crawl Appliance“, so I was slightly surprised to learn that this new capability will be bundled in the Search Service Application. This means that you will still require an on-premises SharePoint Server 2013 or 2016 farm configured to crawl and parse on-premises content, and support hybrid authentication flows. The footprint of a farm designed to host a Cloud Search Service Application is likely to be a few shoe sizes smaller than a farm that might support today’s hybrid search scenarios, as fewer search components are required. To be more specific, the only on-premises search component strictly required by the Cloud Search Service Application is the crawler, and you have the option of leveraging a query processing component if queries must flow “outbound” from an on-premises farm to SPO. Note: all Search Service Application components (including the admin, analytics processing, content processing, and index components) were present within the on-premises SharePoint 2016 search topology during the Ignite demo. However, I understand that only the crawl and query processing components are actually used when a Search Service Application is “cloud enabled”, meaning that only those components would require performance and capacity planning in this context.

A slide from Microsoft Ignite illustrating how search roles are split between on-premises SharePoint and SPO when using the Cloud Search Service Application.

how search roles are split between on-premises SharePoint and SPO

Other than crawling, the jobs performed by other search components (content processing, ACL mapping, index building etc.) are outsourced to SharePoint Online, meaning that you no longer have to maintain an on-premises search index. A copy of parsed content is stored in SPO so that cloud-based search infrastructure changes do not require on-premises content to be re-indexed. It remains to be seen precisely how much easier a cloud-enabled Search Service Application will be to manage, but I expect some of the administration problems I’ve highlighted in this blog will go away. At the very least, I think that index resets will become a thing of the past. The optimist in me thinks that some of the other reasons to run a full crawl in SharePoint 2013 today will also become non-issues (SharePoint Online runs on something that closely resembles a more recent version of whatever binaries are available on-premises, so patching my SharePoint farm shouldn’t require a full crawl. Right?).

A slimmer on-premises SharePoint farm

During the Ignite session, Microsoft gave an example that indicated a company might need 10 on-premises SharePoint 2013 servers to support a “traditional” Search Service Application. A separate slide indicated that as few as 2 servers would support the Cloud Search Service Application:

A slide from Microsoft Ignite indicating that the Cloud Search Service Application may only require 2 SharePoint Server 2013 servers.

2 SharePoint Server 2013 servers

It was unclear whether that example included any servers required for SQL Server, and it raises other questions (will the Distributed Cache Service happily co-exist with the Cloud Search Service Application?), so I’ll wait for more detailed guidance before getting too excited.

Give me more detail!

I know many readers out there will be technical, so I’ve included the bullet point notes I took during the session below. Keep in mind that we got a pretty early view of the capability at Microsoft Ignite, so these details are likely to change between now and General Availability.

The Cloud Search Service Application:

  • Will be shipped in an update to SP2013 later this year (2015), and will be baked into SP2016.
  • Pushes indexed items into a single consolidated SPO index, instead of relying on query federation (the current SP2013 search hybrid model). This means we get a single ranked results set with refiners and search previews. About time!
  • Means that on-premises content shows up in Delve (which uses the SPO search index), albeit without the “rich” thumbnail previews that we get with SPO content
  • Is able to crawl the same content sources that the present SP2013 Search Service application can crawl. This is great news, as it means that content housed in older (2007 or 2010) SharePoint farms can be surfaced in SPO, along with other supported content sources such as file shares
  • Can be consumed by a *SharePoint 2010* farm using today’s SharePoint Service Application Federation model, meaning that SPO content can be queried from a SP2010 farm! As you might expect, there are a few trade-offs and constraints here (e.g. No WAC previews; Web Applications must be in claims mode, on-premises SP2013 Query Processing Component needed). This means that older farms need not be islands of information in a hybrid scenario
  • Can co-exist with other Cloud Search Service Applications to feed a single SPO tenant from multiple locations. I expect that this will be a big deal for some of our globally distributed clients.
  • Strictly respects on-premises permissions. SPO permissions do not “override” on-premises ACLs, even if you are an Office 365 Global Admin.
  • Means that there is no longer a need to have an on-premises search index (save for data residency concerns, see next bullet point).
  • Can co-exist with the current SP2013 search hybrid model (query federation) if some indexed items need to remain on-premises (e.g. data residency reasons)
  • Will be baked into the SharePoint 2016 Search Service Application. This means we still need a “lightweight” farm to house it, complete with a SQL Server instance. I expect the update for SharePoint 2013 will work in the same way, although that wasn’t clarified.
  • Relies on the same foundational hybrid identity management bits that are needed for todays’ SP2013 hybrid solutions (directory synchronisation, OAuth 2.0 trust between SharePoint on-premises and Azure ACS etc.)
  • Relies on an on-premises Office Web Apps farm if Search Previews of on-premises SP2013 or SP2016 content are required (e.g. from within SPO search results pages). This is the only reason that an “inbound” (SPO -> on-premises) hybrid configuration would be required
  • Does not “publish” on-premises content externally. We still need a user-facing publishing capability – such as the Web Application Proxy (WAP) or Azure App Proxy – for secure external publishing of SharePoint and/or Office Web Apps. The alternative is that on-premises content can be searched in SPO, but only accessed and previewed on-premises.
  • Encrypts search metadata before it is sent to SPO in batches
  • Does not support on-premises Site Collection-scoped schema mappings, as those Site Collection objects do not exist in SPO.
  • Introduces a new Managed Property (IsExternalContent) that allows on-premises content to be identified in query rules, result sources, verticals etc.

As you can probably tell, we are pretty excited about getting our hands on this thing! In our experience, search is the most common reason for implementing a hybrid SharePoint infrastructure, and we are pleased to see that Microsoft are addressing the most common pain points about this workload. We will still require an on-premises SharePoint farm to achieve hybrid search, but hopefully it will meet user expectations, result in fewer sleepless nights for SharePoint administrators, and won’t break the bank.

That’s all from me, but you can watch the Ignite session for yourself over on Channel 9: “Implementing Next Generation SharePoint Hybrid Search with the Cloud Search Service Application“.

  Ben Athawes

Ben Athawes

Head of SharePoint Platform

Ben leads Content and Code's SharePoint Platform practice which focuses on the more technical aspects of SharePoint Online, SharePoint on-premises and everything in between. He has been working with SharePoint and related technologies such as SQL Server and AD FS since 2008.

Other Services you may be interested in.

IT Operating Models: aligning strategy to operations

An IT Operating Model translates strategic intent into operational capabilities. It serves as the foundation for execution and provides a clear guide for the enterprise leadership team, line managers and operational teams. A well-defined and articulated operating model is the bridge between strategy and day-to-day operations that guides the team, provides the context, and enables behaviours that will realise the strategy and vision.

IT Operating models aren’t just reserved for large companies – regardless of size, all companies should have an operating model of some kind. In some cases, it might be brief or not very prescriptive, but should still exist and be maintained to help bridge the gap between the why and how.

IT organisations without an operating model of any shape at all run the risk that strategy won’t be realised, processes will not be optimised, and IT staff won’t be aligned to a common view of how the IT organisation should work to deliver business value.

Aligning IT Strategy to Operations – defining an IT Operating Model that works in the Cloud.

Benefits of IT Operating Models

Defining an operating model will provide the blueprint of how to execute on an IT strategy. Without a defined operating model, IT organisations could experience the following challenges:

  • Operational inefficiencies as people expend effort in areas not aligned with the strategic plan. An environment of busy people can further mask the reality that energy is lost to work that is not important.
  • Ambiguity around accountabilities, roles and responsibilities can slow down decision making. When these aren’t clear, there is duplication of work, or worse, slippage of critical tasks.
  • Low interaction and integration between IT teams and functional areas, as it is unclear to people how they should cross these implicit boundaries. These non-standardised approaches to processes and procedures can lead to the loss of valuable organisational learning and reduced usefulness of systems and data.
  • Increased or sustained operating risks due to the absence of clear principles, roles, responsibilities and processes. Without clear guidelines, employees can unknowingly conduct their work in a manner inconsistent with standards, and in ways that do not align to regulatory standards

IT Operating Models need to evolve along with the business model and strategy to guide how people produce the right results. The IT Operating Model serves as a blueprint for how resources are organised to get critical work done.

Benefits of implementing an IT Operating Model

Improved IT performance because of increased operational efficiency

When it’s clear who does what, duplication of work is diminished. This elimination of wasted effort allows time for innovation and improvements to user experience. Part and parcel with this is improved cost management because of better ability to understand processes, plan, and control the budget – all around IT teams aligned around a single way to operate.

A well-articulated Operating Model also creates a baseline to improve upon whereby leaders understand clearly what is done today and therefore have a starting point to improve upon tomorrow.

Better connection with users by adapting to their changing needs

As IT services are introduced or changed, organisations that adopt these services are able to meet or exceed users changing needs. A clear operating model provides a framework by which to continually map and manage stakeholders.

Increased process integration across functional areas reducing duplication of effort

Through standardisation, IT organisational learning can be leveraged across the IT teams. Systems and data become transparent and more useful, and IT teams can better link their piece to the rest of the puzzle.

Improved coordination and decision-making

Operating models provide improved ability to plan and sequence initiatives, as dependencies across the IT organisation are better understood. IT stakeholders are able to transparently see where weakness in capabilities (people, process, or technology) exist, and work together to align with the strategy.

Better ability to grow and scale quickly

When the basics are written down, they’re easier to communicate to existing and new staff, and easier to review at critical junctures as IT organisations mature and become more complex.

Improved risk management

When there is a common understanding of roles, responsibilities, goals and processes, risks can be identified and mitigated earlier and more easily. In addition, with the right governance in place, risks can be escalated.

It takes work to develop and implement an operating model….

  • Assess your current state – understanding your current state is a critical first step to developing and documenting an Operating Model. Identify and interview key stakeholders and review process documentation regularly. This will then enable you to plan for future changes to the platform.
  • Get the right people at the table – regardless of whether an Operating Model will document the current state or bring about change, getting the right people at the table will expediate decisions with representation from across IT functions.
  • Define your design principles (how will the IT organisation need to work together to achieve our strategic goals?) – design principles articulate the parameters for the future state, set the context and are derived from an organisation’s strategic priorities and current state assessment. Design principles result in key statements to guide the development of the Operating Model document
  • Shape your future state – (what critical elements need to be included in the operating model?) regardless of the degree of change anticipated, it’s important to determine the key elements that need to be included as part of the documented Operating Model. You can then give weighting and attention to the elements that are most critical to IT operations. Development of the future state should happen through a series of workshops, each focussed on a different element of the Operating Model.
  • Implement it – strong implementation includes identifying initiatives that will help achieve the new goals, assigning accountability for them, planning and executing on these plans. This needs to be implemented through dedicated change management in line with a structured communications plan that can be disseminated throughout the organisation.

Aligning Strategy to Operations in the Cloud

Join us on 17th May 2018, as we show you what it takes to overcome: operational inefficiencies; ambiguity around accountability; Low interaction and integration between IT teams and functional areas; and Increased or sustained operating risks.

REGISTER NOW

RELATED POSTS:

SharePoint Migration with Metalogix – lessons learned

I just wanted to share some lessons learned from a recent SharePoint migration project we have undertaken using both the Metalogix Essentials and Content Matrix tools. In this post I will share some potential pitfalls so that hopefully you can rectify these complications in your own migration scenarios.

Scenario:

I had a client wanting to migrate their WSS 3.0 environment, and some old BPOS sites, in to SharePoint Online. While carrying out the migration of the BPOS sites and preparing for the WSS 3.0 migration, a few issues arose that needed to be rectified so that the migration of both BPOS and WSS 3.0 sites would go as smoothly as possible.

Unfortunately, there was a time lapse between the initial work being carried out in BPOS and the next phase of the migration, however this mainly meant looking at an incremental migration instead of a full migration.

Lessons learned from the SharePoint migration project:

The following lessons were learned from the issues that were experienced during this SharePoint migration:

User mappings

In the Metalogix Essentials tool there can be times when it is intuitive and if the user UPN is in a similar format e.g. firstname.lastname@domain.com then it can map the users for the created/modified by attributes. However, this does not always work and so results in content being labelled with the user that is logged in while doing the migration. A migration service account can be used; however, this needs to exist in both the source and target environments to map correctly.

Essentials uses a CSV file to map old user credentials to new user credentials

This allows for the correct created/modified by name to be used when content is copied. However, if a mapping file is not used when content is originally copied, and then you try to use a delta migration to update the attributes afterwards, this does not work. It sees the file in the target environment as up to date and so does not change it. A full migration, or recopy with metadata using a CSV would be required to change the values. Please note that doing a copy with metadata takes considerably longer to complete.

Getting a list of users from AD is best done using the Microsoft gallery for PowerShell scripts

Although Essentials can be used to export site users from connected environments, the reports are on a Site Collection basis and so to retrieve all users it is better to connect to AD and retrieve the users in a CSV format. Remember, this is only the first part of the process as you need to get the users from Azure AD using an export or PowerShell before being mapped. The display name can normally be used with VLOOKUP formulae in Excel to match the old and new accounts, even if the format needs to be tweaked slightly.

The user mapping file for Essentials cannot be used in Content Matrix as it does not use a CSV for user mappings

This either means manually mapping all user accounts, which is no good in time critical projects, or when clients are being billed; or you can use an XML file instead. After having to switch from using Metalogix Essentials to Content Matrix after the license timed out, I had to find a way to create an XML file.

I wrote a small PowerShell script to create the XML file from a CSV file, and with some assistance from a colleague I completed the script after an issue was rectified. The script creates a “well-formed” XML file from the values that are mapped in the CSV file Metalogix have a great article for further information.

The PowerShell script that I have created is shared below and can be copied in to a file or Windows PowerShell ISE window:


#Import user mappings from existing CSV file - change the path and filename accordingly
$Import = Import-Csv -Path "C:\Users\<User>\Documents\Import_CSV_Mappings.csv"

#Create well-formed XML including the CSV mappings and add to the variable
$xmlData = "<Mappings>"

for($i=0; $i -lt $Import.length; $i++) {
$xmlData += "<Mapping Source='$($Import[$i].Column1)' Target='$($Import[$i].Column2)' />"
}

$xmlData += "</Mappings>"

#Output XML in variable to a file - change path and filename accordingly
$xmlData | Out-File "C:\Users\<User>\Documents\Export_XML_Mappings.xml"

Note: the above script is shared as an example and Content and Code take no responsibility for any issues that may occur in the running of this script.

Happy migrating!!

About our author

Lee Palmer

Solutions Consultant | Content and Code

Lee is a Solutions Consultant working in the Enterprise Solutions Architects team. Although his primary focus has been SharePoint – previously on-premise deployments with Microsoft, and now SharePoint Online for the past 3 years; he has evolved to help clients build solutions across the products in the Office 365 stack, including Microsoft Teams. Lee has helped a number of clients with migrations from older versions of SharePoint, BPOS, and file shares in to SharePoint Online. He is also accredited with Metalogix and Nintex, who Content and Code work very closely with.

RELATED POSTS:

Are you managing your data correctly in SharePoint on-premises?

In an ideal world, information architecture and governance should be in place before any SharePoint sites are launched to end users. However, the reality is this often comes in the other way around or simply doesn’t exist at all. As a result, data management becomes a challenge.

Sometimes, even with the governance guideline in place, you can never guarantee that the end users have the same level of understanding and follow the guidelines correctly.

Managing structured and unstructured data

 

It’s a totally different story when it comes to managing structured and unstructured data. For a traditional application, managing structured data is mainly about managing databases. But for SharePoint on-premises, apart from managing all the content databases, this also involves managing a number of variables including; SharePoint Sites, Lists, Site Columns, Content Types, TermSets, etc.

Administrators all like structured data because it’s easy to search and manage, however, end users might have a different view as it could mean more restriction and work for them, and this can cause conflict between departments.

We all know the importance of database management, but this is a whole other topic on its own.  Here we are looking at managing data strictly from a SharePoint level and how to turn the unstructured data into the structured data.

Data comes in a variety of different guises

 

First of all, the data could come in various forms, here is a short list of the common ways:

Files / documents

It’s very easy to upload a file to SharePoint. But is it easy for your end users to find them or navigate to them?

 

  • Size of files

SharePoint has a default limit of 50MB per file within SharePoint 2013. What do you do if your file exceeds the limit? Would you simply increase the limit or work out a better solution for that?

With SharePoint 2013, you can increase the limit to a maximum of 2GB. But in doing so, remember the limit is for the whole SharePoint site rather than for a specific Document Library. Also, another important factor to consider is; what is the user experience like for accessing a 2GB file from a SharePoint site?

 

  • Type of files

There is no restriction (apart from the executable files) on the type of files that can be stored in a standard Document Library. However, you might want to think about again where to store images or videos.

 

  • Contents in the documents

Natively, SharePoint cannot index the contents of a document, which makes it hard for users to find the relevant documents without prior planning.

SharePoint Lists

As content databases are managed by SharePoint and are not supported by Microsoft once they are amended manually, SharePoint lists are the first option to store structured data. Because of that, SharePoint lists are sometimes designed and used as database tables or views without knowing the limitations for example, list view thresholds and the maximum lookup columns supported etc.

 

Emails / Alerts

Email is probably still the most commonly used form to share information. But it’s hard to track as most of data is in an email body which is unstructured. Governance and training are required to help users understand the benefits of sharing data in other ways. Also, some restrictions can be implemented to restrict users from sharing certain files.

 

SharePoint Pages

SharePoint Pages are easier to index in SharePoint than the documents. This is because the content of the pages are in plain text and often have the predefined templates. Documents however quite often aren’t in plain text and they are more flexible and rich regarding the content and format.

Articulating an important companywide message?

 

Email communications to “All Company” might be the first port of call here. But it’s hard to track complicated and unstructured data as mentioned above.  And if you have an Intranet built on SharePoint already, why not utilise that?

What if the announcement contains a fair amount of data?

 

You might not want your particular announcement to take up valuable real estate on your intranet home page. Normally, a link can be provided for details instead. But the question now is whether the detailed information of the announcement warrants an entire page or the creation of a single document? This is largely dependent on how well your SharePoint intranet site was built and how well your SharePoint search is optimised.

Generally, it’s more difficult to index the content of a document than the content of a page. So, creating a page for the announcement sounds like a better option. But do you know if your users can still find the detailed information (e.g. a new policy) once the announcement is removed from your home page?

Let’s take this up a notch

 

If the details of the announcement consist of a few different training materials there are a few options available to you to ensure that you are managing data in you SharePoint environment correctly.

Pages Library

Depending on the types of content you could be better off creating multiple pages in a shared SharePoint Pages library. It is fairly easy to create a page in SharePoint if the predefined SharePoint layouts meet the requirements of your announcement.

Document Libraries

Again, depending on the types of training materials, these could be created in documents like PDF, Words or PowerPoints, then saved to a document library. Apart from rich content requirement, if the materials need to be accessible offline, SharePoint pages would not be a good idea in this instance.

Document Sets

If your materials are available in smaller batches for end-users to consume, creating document sets could be a good option. Document Sets however are only available in SharePoint 2013, although would work well to group a small set of documents together.

What about restricting access to sensitive documents?

 

In many cases it’s important to consider that some business critical and sensitive materials should only be accessible by certain groups of staff or departments within the business. You must now determine the best approach to ensuring that your data is effectively managed within your SharePoint environment.

Single document library

Should the materials be created in the folders with unique permissions within the same Document Library.  Although technically it’s doable, but generally Item (folder) level permission is not a great idea as it’s less intuitive and creates management overhead.

Multiple document libraries

Scattered across multiple Document Libraries and set up unique permissions for each?

Do you understand the difference between the two approaches? A few key things, with multiple document libraries, you can easily set up the dedicated content type for each of them which helps on search, also you can create dedicated list view for each library to present the data.

What if your announcement contains rich content?

 

Further to the above, what happens if the training materials you have released come with pre-created videos, documents and images? Would a few separate SharePoint pages and libraries suffice? It may not be a good idea. Too many document libraries in a same site can make it hard to navigate and find the right information.

Does a sub site sound like a better choice? You would have to have a dedicated place to manage all the relevant materials, which is much easier for users to find all the relevant materials and also it makes it easier for site owners to manage the access to the data.

Enterprise Social vs Discussion Boards

 

It is also important to consider, what happens to your data if you also require some feedback around your announcement and engage your workforce around the topic.

You have a few options for this. SharePoint itself has an out of the box Discussion Board feature, that could work well for this purpose. Data is stored within SharePoint, although it will require some customisation to help SharePoint search crawl and understand the data.

But what about the wider Office 365 product stack? If your organisation is looking to move to Office 365 in the near future, Yammer could provide the perfect platform for engaging your workforce and creating an environment that actively encourages wider discussions throughout your organisation.   Data is stored in the cloud, and there are lots of statistic features and out of box search to help manage the data in Yammer.

It’s always good to keep your options open with data management

 

It’s always good to have more options. However, without sufficient understanding and planning, this could also work against the productivity. But, how do you know if your data are managed correctly and reached out to the max number of your end users? Remember the easiest option might not be the best option. A health check can help you determine how lost productivity time could have been impacted by poor data management.

Secure your SharePoint future

Learn the key steps to completing a successful SharePoint migration

SECURE YOUR SPOT

About our guest author

Michael Wang

Technical Account Manager | Content and Code

Michael leads the DevOps team and is responsible for our Development On-Demand service prominently for SharePoint in Office 365, or on-premises. Michael has worked on a vast number of DevOps projects for clients large and small, often with complex and dynamic requirements.

RELATED POSTS: